[GitHub] flink pull request: [FLINK-1622][java-api][scala-api] add a partia...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/466#issuecomment-78101926 Sorry, I completely blanked, of course, You still need the grouping, only the shuffle step you don't need. So, I suggest only better tests, using a combination of partitionByHash() and groupReducePartial(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts
[ https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356656#comment-14356656 ] ASF GitHub Bot commented on FLINK-1605: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/454#discussion_r26199913 --- Diff: pom.xml --- @@ -1159,6 +726,55 @@ under the License. /execution /executions /plugin + + !-- We use shading in all packages for relocating some classes, such as + Guava and ASM. + By doing so, users adding Flink as a dependency won't run into conflicts. + (For example users can use whatever guava version they want, because we don't + expose our guava dependency) + -- + plugin + groupIdorg.apache.maven.plugins/groupId + artifactIdmaven-shade-plugin/artifactId + version2.3/version + executions + execution + idshade-flink/id + phasepackage/phase + goals + goalshade/goal + /goals + configuration + shadeTestJartrue/shadeTestJar + shadedArtifactAttachedfalse/shadedArtifactAttached + createDependencyReducedPomtrue/createDependencyReducedPom + dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation + artifactSet + includes + includecom.google.guava:*/include + includeorg.ow2.asm:*/include + !-- includeorg.apache.flume:*/include Required because flume depends on guava. This only affects the flink-streaming connector which is not shipped through flink-dist by default. So its only about users adding flume as a dependency. -- --- End diff -- Why is it commented out rather than removed? Create a shaded Hadoop fat jar to resolve library version conflicts --- Key: FLINK-1605 URL: https://issues.apache.org/jira/browse/FLINK-1605 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger As per mailing list discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78440416 Is there a JIRA associated with this PR? Once the issues with the documentation are resolved, I'd say its good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78262466 Why are you using Scala 2.11.4? The mentioned bug seems to be fixed in 2.11.6. 2.11.6 will be out in April. It is between 2.11.5 (which is known to have issues) and 2.11.4. Why aren't we adding a _2.11 suffix to the Scala 2.11 Flink builds? We can do this, and it certainly makes sense if you want to ship pre-builds of both versions. With the current setup if you want to use Flink with 2.11 you have to build and install the maven projects yourself (I'm blindly following the Spark model here, let me know if you prefer the other option). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26215888 --- Diff: docs/_includes/navbar.html --- @@ -24,15 +24,15 @@ We might be on an externally hosted documentation site. Please keep the site.FLINK_WEBSITE_URL below to ensure a link back to the Flink website. {% endcomment %} - a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home + a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home img class=hidden-xs hidden-sm img-responsive - src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo + src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo /a div class=row visible-xs div class=col-xs-3 - a href={{ site.baseurl }}index.html title=Home + a href={{ site.baseurl }}/index.html title=Home --- End diff -- Yes, we can set in it the _config.yml. I would not change it now because it has some other implications. For example, local testing of the website will be more complicated because all links will point to the online version. We can open a separate issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26215940 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java --- @@ -684,4 +693,178 @@ private void testRemovingTrailingCR(String lineBreakerInFile, String lineBreaker } } + @Test + public void testPojoType() throws Exception { + File tempFile = File.createTempFile(CsvReaderPojoType, tmp); + tempFile.deleteOnExit(); + tempFile.setWritable(true); + + OutputStreamWriter wrt = new OutputStreamWriter(new FileOutputStream(tempFile)); + wrt.write(123,AAA,3.123,BBB\n); + wrt.write(456,BBB,1.123,AAA\n); + wrt.close(); + + @SuppressWarnings(unchecked) + TypeInformationPojoItem typeInfo = (TypeInformationPojoItem) TypeExtractor.createTypeInfo(PojoItem.class); + CsvInputFormatPojoItem inputFormat = new CsvInputFormatPojoItem(new Path(tempFile.toURI().toString()), typeInfo); + + Configuration parameters = new Configuration(); + inputFormat.configure(parameters); + + inputFormat.setDelimiter('\n'); + inputFormat.setFieldDelimiter(','); + + FileInputSplit[] splits = inputFormat.createInputSplits(1); + + inputFormat.open(splits[0]); + + PojoItem item = new PojoItem(); + inputFormat.nextRecord(item); + + assertEquals(123, item.field1); + assertEquals(AAA, item.field2); + assertEquals(Double.valueOf(3.123), item.field3); + assertEquals(BBB, item.field4); + + inputFormat.nextRecord(item); + + assertEquals(456, item.field1); + assertEquals(BBB, item.field2); + assertEquals(Double.valueOf(1.123), item.field3); + assertEquals(AAA, item.field4); + } + + @Test + public void testPojoTypeWithPrivateField() throws Exception { + File tempFile = File.createTempFile(CsvReaderPojoType, tmp); + tempFile.deleteOnExit(); + tempFile.setWritable(true); + + OutputStreamWriter wrt = new OutputStreamWriter(new FileOutputStream(tempFile)); + wrt.write(123,AAA,3.123,BBB\n); + wrt.write(456,BBB,1.123,AAA\n); + wrt.close(); + + @SuppressWarnings(unchecked) + TypeInformationPrivatePojoItem typeInfo = (TypeInformationPrivatePojoItem) TypeExtractor.createTypeInfo(PrivatePojoItem.class); + CsvInputFormatPrivatePojoItem inputFormat = new CsvInputFormatPrivatePojoItem(new Path(tempFile.toURI().toString()), typeInfo); + + Configuration parameters = new Configuration(); + inputFormat.configure(parameters); + + inputFormat.setDelimiter('\n'); + inputFormat.setFieldDelimiter(','); + + FileInputSplit[] splits = inputFormat.createInputSplits(1); + + inputFormat.open(splits[0]); + + PrivatePojoItem item = new PrivatePojoItem(); + inputFormat.nextRecord(item); + + assertEquals(123, item.field1); + assertEquals(AAA, item.field2); + assertEquals(Double.valueOf(3.123), item.field3); + assertEquals(BBB, item.field4); + + inputFormat.nextRecord(item); + + assertEquals(456, item.field1); + assertEquals(BBB, item.field2); + assertEquals(Double.valueOf(1.123), item.field3); + assertEquals(AAA, item.field4); + } + + @Test + public void testPojoTypeWithMappingInformation() throws Exception { + File tempFile = File.createTempFile(CsvReaderPojoType, tmp); + tempFile.deleteOnExit(); + tempFile.setWritable(true); + + OutputStreamWriter wrt = new OutputStreamWriter(new FileOutputStream(tempFile)); + wrt.write(123,3.123,AAA,BBB\n); + wrt.write(456,1.123,BBB,AAA\n); + wrt.close(); + + @SuppressWarnings(unchecked) + TypeInformationPojoItem typeInfo = (TypeInformationPojoItem) TypeExtractor.createTypeInfo(PojoItem.class); + CsvInputFormatPojoItem inputFormat = new CsvInputFormatPojoItem(new Path(tempFile.toURI().toString()), typeInfo); + inputFormat.setFieldsMap(new String[]{field1, field3, field2, field4}); + + Configuration parameters = new Configuration(); + inputFormat.configure(parameters); + + inputFormat.setDelimiter('\n'); + inputFormat.setFieldDelimiter(','); +
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358299#comment-14358299 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26284910 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- I know, I have tried in the beginning to create them as instance fields in the class, however I received an error because the fields are not serializable. I tried to declare them as transient, but Flink threw Can not submit Job exception, If you have suggestions please let me know Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26284910 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- I know, I have tried in the beginning to create them as instance fields in the class, however I received an error because the fields are not serializable. I tried to declare them as transient, but Flink threw Can not submit Job exception, If you have suggestions please let me know --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other
[ https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358335#comment-14358335 ] Ufuk Celebi commented on FLINK-1679: I've discovered a minor inconsistency recently: in the execution environment, the setter is called {{setDegreeOfParallelism(int)}} whereas for certain operators like FlatMap its is called {{setParallelism(int)}}. Thanks for bringing this up. I really think we should document this properly and prominently for the upcoming release. Let's have an offline chat about the user feedback you've received and I can come up with an initial draft for the docs in the next week. Document how degree of parallelism / parallelism / slots are connected to each other --- Key: FLINK-1679 URL: https://issues.apache.org/jira/browse/FLINK-1679 Project: Flink Issue Type: Task Components: Documentation Affects Versions: 0.9 Reporter: Robert Metzger I see too many users being confused about properly setting up Flink with respect to parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1692) Add option to Flink YARN client to stop a detached YARN session.
Robert Metzger created FLINK-1692: - Summary: Add option to Flink YARN client to stop a detached YARN session. Key: FLINK-1692 URL: https://issues.apache.org/jira/browse/FLINK-1692 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 0.9 Reporter: Robert Metzger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other
[ https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358345#comment-14358345 ] Robert Metzger commented on FLINK-1679: --- I would suggest to remove all occurrences of degreeOfParalleism in the system and replace it by parallelism everywhere. The CLI frontend for example also calls it {{-p}}, not {{-dop}}. I would also suggest to set the parallelism by default to {{AUTOMAX}} in the CliFrontend. Document how degree of parallelism / parallelism / slots are connected to each other --- Key: FLINK-1679 URL: https://issues.apache.org/jira/browse/FLINK-1679 Project: Flink Issue Type: Task Components: Documentation Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Ufuk Celebi I see too many users being confused about properly setting up Flink with respect to parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1525: -- Assignee: (was: Robert Metzger) Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-1525: - Assignee: Robert Metzger Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other
[ https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reassigned FLINK-1679: -- Assignee: Ufuk Celebi Document how degree of parallelism / parallelism / slots are connected to each other --- Key: FLINK-1679 URL: https://issues.apache.org/jira/browse/FLINK-1679 Project: Flink Issue Type: Task Components: Documentation Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Ufuk Celebi I see too many users being confused about properly setting up Flink with respect to parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78307840 Any remarks on the my reply to the general comments from @rmetzger (scala version, suffix)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78300593 Nice solution. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78317231 I'll ask on the ML --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358379#comment-14358379 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26288077 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,62 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +while (!simpleTweetInputFormat.reachedEnd()) { --- End diff -- Changed, I have tried to use Switch instead of multiple if-else because it more efficient, but the compiler of flink uses Java 1.6, and it can not accept String in switches. Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238 ] Sachin Goel edited comment on FLINK-1537 at 3/11/15 5:24 PM: - Yes. I completely agree with the transformer-learner chain design methodology. The decision tree I'll write will provide an interface for first specifying the structure of the data, i.e., the tuple, as in the types, ranges, etc. and any other statistics possible to help with the learning. I do not see myself how it makes a difference to store the data column wise or row wise, although it might have some far-reaching consequences on how the learning process proceeds. In fact, this seems like a valid idea for a learning process which treats each coordinate one by one. It might help in providing all the values of one particular coordinate in one go and learn some statistics on it, which might help in a better learning process. In fact, in a decision tree implementation on big data, it becomes prudent to learn such a statistic to ensure only a reasonable number of splits on the data are considered. I will look into how this could be achieved with a row-style data representation. As for the deep learning framework, you are indeed right. I am not sure myself if anyone has yet evaluated the potential of a deep learning system on a distributed system. I will look into the H2O project's implementation related to this. As of yet, I'm still not sure if deep learning can be as fast on distributed systems as it is on GPUs. was (Author: sachingoel0101): Yes. I completely agree with the transformer-learner chain design methodology. The decision tree I'll write will provide an interface for first specifying the structure of the data, i.e., the tuple, as in the types, ranges, etc. and any other statistics possible to help with the learning. I do not see myself how it makes a difference to store the data column wise or row wise, although it might have some far-reaching consequences on how the learning process proceeds. In fact, this seems like a valid idea for a learning process which treat each coordinate one by one. It might help in providing all the attributes of one particular coordinate in one go and learn some statistics on it, which might help in a better learning process. In fact, in a decision tree implementation on big data, it becomes prudent to learn such a statistic to ensure only a reasonable number of splits on the data are considered. I will look into how this could be achieved with a row-style data representation. As for the deep learning framework, you are indeed right. I am not sure myself if anyone has yet evaluated the potential of a deep learning system on a distributed system. I will look into the H2O project's implementation related to this. As of yet, I'm still not sure if deep learning can be as fast on distributed systems as it is on GPUs. GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26288077 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,62 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Before; +import org.junit.Test; + +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +while (!simpleTweetInputFormat.reachedEnd()) { --- End diff -- Changed, I have tried to use Switch instead of multiple if-else because it more efficient, but the compiler of flink uses Java 1.6, and it can not accept String in switches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26285363 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); +TweetHandler handler = new TweetHandler(); + +try { + +handler.reuse = reuse; +parser.parse(jsonReader, handler, false); +} catch (ParseException e) { + +LOG.debug(Class +SimpleTweetInputFormat.class+ +e.getMessage() ); --- End diff -- Edited --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26285316 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- Putting the parser into a `transient` field and initalizing it in the `open()` method is the way to go. Can you give me the full stacktrace for the Can not submit Job exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26219828 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet --- End diff -- Local testing should work out of the box. We could set `{{site.baseurl}}` to http://ci.apache.org/projects/flink/flink-docs-master/ and then change our build_docs script to serve the website for local testing with the `--baseurl http://127.0.01/` parameter. If you want, you can fix this here or we open a JIRA for that. http://jekyllrb.com/docs/configuration/#serve-command-options --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357144#comment-14357144 ] Stephan Ewen commented on FLINK-1690: - I retract my statement. After careful log defusing with [~uce], I think that things go as planned. The test has actually a too limited time budget. It takes almost 8 seconds on my machine, so I grant it that Travis may not be able to complete it in 30 seconds. ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis -- Key: FLINK-1690 URL: https://issues.apache.org/jira/browse/FLINK-1690 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Priority: Minor I got the following error on Travis. {code} ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The program did not finish in time {code} I think we have to increase the timeouts for this test case to make it reliably run on Travis. The log of the failed Travis build can be found [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358317#comment-14358317 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26285363 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); +TweetHandler handler = new TweetHandler(); + +try { + +handler.reuse = reuse; +parser.parse(jsonReader, handler, false); +} catch (ParseException e) { + +LOG.debug(Class +SimpleTweetInputFormat.class+ +e.getMessage() ); --- End diff -- Edited Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358316#comment-14358316 ] ASF GitHub Bot commented on FLINK-1615: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26285316 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- Putting the parser into a `transient` field and initalizing it in the `open()` method is the way to go. Can you give me the full stacktrace for the Can not submit Job exception? Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78441869 No there is no Jira issue for the machine learning library. The problem with the links should be fixed. On Mar 12, 2015 9:30 AM, Robert Metzger notificati...@github.com wrote: Is there a JIRA associated with this PR? Once the issues with the documentation are resolved, I'd say its good to merge. — Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/479#issuecomment-78440416. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method
[ https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358493#comment-14358493 ] ASF GitHub Bot commented on FLINK-1633: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293025 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -326,6 +328,34 @@ public ExecutionEnvironment getContext() { } --- End diff -- some description of what this method does would be nice :-) Add getTriplets() Gelly method -- Key: FLINK-1633 URL: https://issues.apache.org/jira/browse/FLINK-1633 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Andra Lungu Priority: Minor Labels: starter In some graph algorithms, it is required to access the graph edges together with the vertex values of the source and target vertices. For example, several graph weighting schemes compute some kind of similarity weights for edges, based on the attributes of the source and target vertices. This issue proposes adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, TrgVertex triplets from the input graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method
[ https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358505#comment-14358505 ] ASF GitHub Bot commented on FLINK-1633: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293336 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; --- End diff -- I would put this in `org.apache.flink.graph` instead. Add getTriplets() Gelly method -- Key: FLINK-1633 URL: https://issues.apache.org/jira/browse/FLINK-1633 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Andra Lungu Priority: Minor Labels: starter In some graph algorithms, it is required to access the graph edges together with the vertex values of the source and target vertices. For example, several graph weighting schemes compute some kind of similarity weights for edges, based on the attributes of the source and target vertices. This issue proposes adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, TrgVertex triplets from the input graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293336 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; --- End diff -- I would put this in `org.apache.flink.graph` instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26289003 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; --- End diff -- Missing license header. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1576] [gelly] improvements to the gelly...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/470#issuecomment-78459161 Hi! If no comments here, I'd like to merge this one :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method
[ https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358499#comment-14358499 ] ASF GitHub Bot commented on FLINK-1633: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293223 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; + +import org.apache.flink.api.java.tuple.Tuple5; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Vertex; + +import java.io.Serializable; + +/** + * A wrapper around Tuple5 used to avoid duplicate vertex ids and to improve readability in + * the {@link org.apache.flink.graph.Graph#getTriplets()} method. + * + * @param K the vertex key type + * @param VV the vertex value type + * @param EV the edge value type + */ +public class Triplet K extends ComparableK Serializable, VV extends Serializable, EV extends Serializable + extends Tuple5K, K, VV, VV, EV { + + public Triplet() {} + + public Triplet(K srcId, K trgId, VV srcVal, VV trgVal, EV edgeVal) { + super(srcId, trgId, srcVal, trgVal, edgeVal); --- End diff -- Wouldn't a srcVertex, trgVertex, edge constructor be more useful/intuitive for a user? We can keep this one too, but better add a javadoc :-) Add getTriplets() Gelly method -- Key: FLINK-1633 URL: https://issues.apache.org/jira/browse/FLINK-1633 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Andra Lungu Priority: Minor Labels: starter In some graph algorithms, it is required to access the graph edges together with the vertex values of the source and target vertices. For example, several graph weighting schemes compute some kind of similarity weights for edges, based on the attributes of the source and target vertices. This issue proposes adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, TrgVertex triplets from the input graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method
[ https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358513#comment-14358513 ] ASF GitHub Bot commented on FLINK-1633: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/452#issuecomment-78461681 Thanks for implementing this @andralungu! I have added some inline comments. Also, we are definitely missing documentation and an example using the method. I would suggest something simple, like weighting an input graph, by computing the euclidean distance of the src/vertex values for each edge and attaching this value as the edge weight. What do you think? Add getTriplets() Gelly method -- Key: FLINK-1633 URL: https://issues.apache.org/jira/browse/FLINK-1633 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Andra Lungu Priority: Minor Labels: starter In some graph algorithms, it is required to access the graph edges together with the vertex values of the source and target vertices. For example, several graph weighting schemes compute some kind of similarity weights for edges, based on the attributes of the source and target vertices. This issue proposes adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, TrgVertex triplets from the input graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1547) Disable automated ApplicationMaster restart
[ https://issues.apache.org/jira/browse/FLINK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1547. --- Resolution: Fixed Fix Version/s: 0.9 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1. Disable automated ApplicationMaster restart --- Key: FLINK-1547 URL: https://issues.apache.org/jira/browse/FLINK-1547 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 Currently, Flink on YARN is restarting the the ApplicationMaster, if it crashes. The other components don't support this (frontend tries to reconnect.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1629) Add option to start Flink on YARN in a detached mode
[ https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1629. --- Resolution: Fixed Fix Version/s: 0.9 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1 Add option to start Flink on YARN in a detached mode Key: FLINK-1629 URL: https://issues.apache.org/jira/browse/FLINK-1629 Project: Flink Issue Type: Improvement Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 Right now, we expect the YARN command line interface to be connected with the Application Master all the time to control the yarn session or the job. For very long running sessions or jobs users want to just fire and forget a job/session to YARN. Stopping the session will still be possible using YARN's tools. Also, prior to detaching itself, the CLI frontend could print the required command to kill the session as a convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode
[ https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358462#comment-14358462 ] ASF GitHub Bot commented on FLINK-1629: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/468 Add option to start Flink on YARN in a detached mode Key: FLINK-1629 URL: https://issues.apache.org/jira/browse/FLINK-1629 Project: Flink Issue Type: Improvement Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Right now, we expect the YARN command line interface to be connected with the Application Master all the time to control the yarn session or the job. For very long running sessions or jobs users want to just fire and forget a job/session to YARN. Stopping the session will still be possible using YARN's tools. Also, prior to detaching itself, the CLI frontend could print the required command to kill the session as a convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1630) Add option to YARN client to re-allocate failed containers
[ https://issues.apache.org/jira/browse/FLINK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1630. --- Resolution: Fixed Fix Version/s: 0.9 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1. Add option to YARN client to re-allocate failed containers -- Key: FLINK-1630 URL: https://issues.apache.org/jira/browse/FLINK-1630 Project: Flink Issue Type: Improvement Components: YARN Client Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 The current Flink YARN client tries to allocate only the initial number of containers. If a containers fail (in particular for long-running sessions) there is no way of re-allocating them. We should add a option to the ApplicationMaster to re-allocate missing/failed containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples
[ https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358485#comment-14358485 ] ASF GitHub Bot commented on FLINK-1576: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/470#discussion_r26292649 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java --- @@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) { private static final class ProjectVertexId implements MapFunctionTuple2Long,Long, Long { public Long map(Tuple2Long, Long value) { return value.f0; } } + + @Override + public String getDescription() { + return Graph Metrics Example; + } + + // ** + // UTIL METHODS + // ** + + private static boolean fileOutput = false; + + private static String edgesInputPath = null; + + static final int NUM_VERTICES = 100; + + static final long SEED = 9876; + + private static boolean parseParameters(String[] args) { + + if(args.length 0) { + if(args.length != 1) { + System.err.println(Usage: LabelPropagation vertex path edge path output path num iterations); --- End diff -- LabelPropagation should be GraphMetrics Change the Gelly examples to be consistent with the other Flink examples Key: FLINK-1576 URL: https://issues.apache.org/jira/browse/FLINK-1576 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.8.0 Reporter: Andra Lungu Assignee: Vasia Kalavri Labels: easyfix The current Gelly examples just work on default input data. If we look at the other Flink examples, e.g. Connected Components, they also allow input data to be read from a text file passed as a parameter to the main method. It would be nice to follow the same approach in our examples. A first step in that direction is the SSSP example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1576] [gelly] improvements to the gelly...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/470#discussion_r26293641 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java --- @@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) { private static final class ProjectVertexId implements MapFunctionTuple2Long,Long, Long { public Long map(Tuple2Long, Long value) { return value.f0; } } + + @Override + public String getDescription() { + return Graph Metrics Example; + } + + // ** + // UTIL METHODS + // ** + + private static boolean fileOutput = false; + + private static String edgesInputPath = null; + + static final int NUM_VERTICES = 100; + + static final long SEED = 9876; + + private static boolean parseParameters(String[] args) { + + if(args.length 0) { + if(args.length != 1) { + System.err.println(Usage: LabelPropagation vertex path edge path output path num iterations); --- End diff -- Thanks @uce ^^ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1683] [jobmanager]� Fix scheduling pref...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/476#issuecomment-78471123 Looks good, very important fix! +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358668#comment-14358668 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on the pull request: https://github.com/apache/flink/pull/442#issuecomment-78480464 @StephanEwen I have added the license. Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1350) Add blocking intermediate result partitions
[ https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358681#comment-14358681 ] ASF GitHub Bot commented on FLINK-1350: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/471#issuecomment-78481915 PS: where we wouldn't need the asynchronous implementations are for local reads. There it should be perfectly fine to just synchronously read the spilled partitions. Add blocking intermediate result partitions --- Key: FLINK-1350 URL: https://issues.apache.org/jira/browse/FLINK-1350 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Ufuk Celebi Assignee: Ufuk Celebi The current state of runtime support for intermediate results (see https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only supports pipelined intermediate results (with back pressure), which are consumed as they are being produced. The next variant we need to support are blocking intermediate results (without back pressure), which are fully produced before being consumed. This is for example desirable in situations, where we currently may run into deadlocks when running pipelined. I will start working on this on top of my pending pull request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1350] [runtime] Add blocking result par...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/471#issuecomment-78481915 PS: where we wouldn't need the asynchronous implementations are for local reads. There it should be perfectly fine to just synchronously read the spilled partitions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26300742 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- DONE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26299783 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- I think this behavior was intended, by checking the DelimitedInputFormat UnitTest I found this, I will edit my test case, to fix the problem. ![selection_019](https://cloud.githubusercontent.com/assets/2375289/6618391/d3366786-c8c2-11e4-8854-2a38a8da4da3.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358641#comment-14358641 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26299783 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- I think this behavior was intended, by checking the DelimitedInputFormat UnitTest I found this, I will edit my test case, to fix the problem. ![selection_019](https://cloud.githubusercontent.com/assets/2375289/6618391/d3366786-c8c2-11e4-8854-2a38a8da4da3.png) Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358660#comment-14358660 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26300742 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- DONE Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on the pull request: https://github.com/apache/flink/pull/442#issuecomment-78480294 @rmetzger This is the standard Tweet format as per Twitter. Here You can find [Twitter Official Documentation](https://dev.twitter.com/overview/api/tweets). My parser is retrieving all the tweet except Bounding Box object, and retweeted object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358648#comment-14358648 ] ASF GitHub Bot commented on FLINK-1615: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26300021 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- Cool. Then I'd recommend to add a similar test to your test. Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26300021 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- Cool. Then I'd recommend to add a similar test to your test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1350] [runtime] Add blocking result par...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/471#issuecomment-78479167 The root cause of all asynchronous operations is that TCP connections are shared among multiple logical channels, which are handled by a fixed number of network I/O threads. In case of synchronous I/O operations, we would essentially block progress on all channels sharing that connection/thread. When do you issue the read requests to the reader (from disk)? Is that dependent on when the TCP channel is writable? Yes, the network I/O thread has subpartitions queued for transfer and only queries them for data when the TCP channel is writable. When the read request is issued, before the response comes, if the subpartition de-registered from netty and the re-registered one a buffer has returned from disk? Exactly. If there is no buffer available, the read request is issued and the next available subpartition is tried. If none of the subpartitions has data available, the network I/O thread works on another TCP channel (this is done by Netty, which multiplexes all TCP channels over a fixed amount of network I/O threads). Given many spilled partitions, which one is read from next? How is the buffer assignment realized? There is a lot of trickyness in there, because disk I/O performs well with longer sequential reads, but that may occupy many buffers that are missing for other reads into writable TCP channels. Initially this depends on the order of partition requests. After that on the order of data availability. Regarding the buffers: trickyness, indeed. The current state with the buffers is kind of an intermediate solution as we will issue zero-transfer reads in the future (requires minimal changes), where we essentially only trigger reads to gather offsets. The reads are then only affected by TCP channel writability. Currently, the reads are batched in sizes of two buffers (64k). Regarding @tillrohrmann's changes: what was this exactly? Then I can verify that the changes are not undone. In general (minus the question regarding Till's changes) I think this PR is good to merge. The tests are stable and passing. There will be definitely a need to do refactorings and performance evaluations, but I think that is to be expected with such a big change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1350) Add blocking intermediate result partitions
[ https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358657#comment-14358657 ] ASF GitHub Bot commented on FLINK-1350: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/471#issuecomment-78479167 The root cause of all asynchronous operations is that TCP connections are shared among multiple logical channels, which are handled by a fixed number of network I/O threads. In case of synchronous I/O operations, we would essentially block progress on all channels sharing that connection/thread. When do you issue the read requests to the reader (from disk)? Is that dependent on when the TCP channel is writable? Yes, the network I/O thread has subpartitions queued for transfer and only queries them for data when the TCP channel is writable. When the read request is issued, before the response comes, if the subpartition de-registered from netty and the re-registered one a buffer has returned from disk? Exactly. If there is no buffer available, the read request is issued and the next available subpartition is tried. If none of the subpartitions has data available, the network I/O thread works on another TCP channel (this is done by Netty, which multiplexes all TCP channels over a fixed amount of network I/O threads). Given many spilled partitions, which one is read from next? How is the buffer assignment realized? There is a lot of trickyness in there, because disk I/O performs well with longer sequential reads, but that may occupy many buffers that are missing for other reads into writable TCP channels. Initially this depends on the order of partition requests. After that on the order of data availability. Regarding the buffers: trickyness, indeed. The current state with the buffers is kind of an intermediate solution as we will issue zero-transfer reads in the future (requires minimal changes), where we essentially only trigger reads to gather offsets. The reads are then only affected by TCP channel writability. Currently, the reads are batched in sizes of two buffers (64k). Regarding @tillrohrmann's changes: what was this exactly? Then I can verify that the changes are not undone. In general (minus the question regarding Till's changes) I think this PR is good to merge. The tests are stable and passing. There will be definitely a need to do refactorings and performance evaluations, but I think that is to be expected with such a big change. Add blocking intermediate result partitions --- Key: FLINK-1350 URL: https://issues.apache.org/jira/browse/FLINK-1350 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Ufuk Celebi Assignee: Ufuk Celebi The current state of runtime support for intermediate results (see https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only supports pipelined intermediate results (with back pressure), which are consumed as they are being produced. The next variant we need to support are blocking intermediate results (without back pressure), which are fully produced before being consumed. This is for example desirable in situations, where we currently may run into deadlocks when running pipelined. I will start working on this on top of my pending pull request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on the pull request: https://github.com/apache/flink/pull/442#issuecomment-78480464 @StephanEwen I have added the license. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1695) Create machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359846#comment-14359846 ] ASF GitHub Bot commented on FLINK-1695: --- Github user junwucs commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78766228 Hi, @tillrohrmann, it's a great job! I am interested in ALS implementation and preparing to use it in our work. Recently I worked on branch of https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, such as org.apache.flink.api.common.functions.InvalidTypesException: Could not convert GenericArrayType to Class. So I found this new branch and hope your work can help me, thanks a lot! Create machine learning library --- Key: FLINK-1695 URL: https://issues.apache.org/jira/browse/FLINK-1695 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Create the infrastructure for Flink's machine learning library. This includes the creation of the module structure and the implementation of basic types such as vectors and matrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...
Github user junwucs commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78765829 Hi, tillrohrmann! It's a great job! I am interested in ALS implementation and preparing to use it in our work. Recently I worked on branch of https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, such as org.apache.flink.api.common.functions.InvalidTypesException: Could not convert GenericArrayType to Class. So I found this new branch and hope your work can help me, thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...
Github user junwucs commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78766228 Hi, @tillrohrmann, it's a great job! I am interested in ALS implementation and preparing to use it in our work. Recently I worked on branch of https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, such as org.apache.flink.api.common.functions.InvalidTypesException: Could not convert GenericArrayType to Class. So I found this new branch and hope your work can help me, thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1695) Create machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359844#comment-14359844 ] ASF GitHub Bot commented on FLINK-1695: --- Github user junwucs commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78765829 Hi, tillrohrmann! It's a great job! I am interested in ALS implementation and preparing to use it in our work. Recently I worked on branch of https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, such as org.apache.flink.api.common.functions.InvalidTypesException: Could not convert GenericArrayType to Class. So I found this new branch and hope your work can help me, thanks a lot! Create machine learning library --- Key: FLINK-1695 URL: https://issues.apache.org/jira/browse/FLINK-1695 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Create the infrastructure for Flink's machine learning library. This includes the creation of the module structure and the implementation of basic types such as vectors and matrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: IntelliJ Setup Guide
GitHub user nltran opened a pull request: https://github.com/apache/flink/pull/480 IntelliJ Setup Guide Following my interactions with fhueske on the IRC channel, I summarized the steps to correctly setup IntelliJ with flink core. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nltran/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/480.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #480 commit f8f6b8bf52ae21d549cf44f323633ddeca6c9dbf Author: Nam-Luc Tran namluc.t...@euranova.eu Date: 2015-03-11T17:45:38Z added documentation on setup of intelliJ commit 170c6330d917c878a8c10e4f0b302a9e14b8aa09 Author: Nam-Luc Tran namluc.t...@euranova.eu Date: 2015-03-12T15:13:21Z Merge branch 'master' of https://github.com/nltran/flink --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: IntelliJ Setup Guide
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358863#comment-14358863 ] Adnan Khan commented on FLINK-1388: --- Also - Why should pass the name of all the fields that should be written out? If I have a reference to the POJO itself, I can always just grab the {{Fields[]}} using Java's reflection api. I feel I've missed something. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: IntelliJ Setup Guide
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/480#issuecomment-78508276 Great setup guide for IntelliJ @nltran. I'll link it to the internals and merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78541637 @tillrohrmann, could you fire JIRA for this one? Should help when we want to manage releases or merge between branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1450] Added fold operator for the Strea...
GitHub user akshaydixi opened a pull request: https://github.com/apache/flink/pull/481 [FLINK-1450] Added fold operator for the Streaming API Tried to follow the steps described in the JIRA Issue comments: - Created a FoldFunction and a RichFoldFunction - Created a StreamFoldInvokable - Integrated it to the DataStream for both Java and Scala - Implemented StreamFoldTest based on StreamReduceTest You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshaydixi/flink FLINK-1450 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/481.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #481 commit 9848f23a0a73228fb45346d69aee68091b135d3d Author: Akshay Dixit akshayd...@gmail.com Date: 2015-03-12T17:49:31Z [FLINK-1450] Added fold operator for the Streaming API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1450) Add Fold operator to the Streaming api
[ https://issues.apache.org/jira/browse/FLINK-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359069#comment-14359069 ] ASF GitHub Bot commented on FLINK-1450: --- GitHub user akshaydixi opened a pull request: https://github.com/apache/flink/pull/481 [FLINK-1450] Added fold operator for the Streaming API Tried to follow the steps described in the JIRA Issue comments: - Created a FoldFunction and a RichFoldFunction - Created a StreamFoldInvokable - Integrated it to the DataStream for both Java and Scala - Implemented StreamFoldTest based on StreamReduceTest You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshaydixi/flink FLINK-1450 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/481.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #481 commit 9848f23a0a73228fb45346d69aee68091b135d3d Author: Akshay Dixit akshayd...@gmail.com Date: 2015-03-12T17:49:31Z [FLINK-1450] Added fold operator for the Streaming API Add Fold operator to the Streaming api -- Key: FLINK-1450 URL: https://issues.apache.org/jira/browse/FLINK-1450 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Gyula Fora Priority: Minor Labels: starter The streaming API currently doesn't support a fold operator. This operator would work as the foldLeft method in Scala. This would allow effective implementations in a lot of cases where a the simple reduce is inappropriate due to different return types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1677][gelly] Suppressed Sysout Printing...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78622670 Hi @uce , First of all thanks! :) Your comments are entirely true, especially the one with MultipleProgramsTestBase. I spent some time to figure out how to unit test this behavior. I got advice from different people... hopefully this last version will be satisfactory. @vasia once travis passes, if nobody else has objections, maybe we can merge this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite
[ https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359440#comment-14359440 ] ASF GitHub Bot commented on FLINK-1677: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78622670 Hi @uce , First of all thanks! :) Your comments are entirely true, especially the one with MultipleProgramsTestBase. I spent some time to figure out how to unit test this behavior. I got advice from different people... hopefully this last version will be satisfactory. @vasia once travis passes, if nobody else has objections, maybe we can merge this :) Properly Suppress Sysout Printing for the Degrees with exception test suite --- Key: FLINK-1677 URL: https://issues.apache.org/jira/browse/FLINK-1677 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor Suppress systout printing similar to: flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java Speedup test suite by reusing a mini-cluster similar to: flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78502251 [Some of the tests failed](https://travis-ci.org/apache/flink/builds/53879571) but I'm not sure whether this is related to the changes or not. I get the following error in the three failed Travis jobs: ``` [ERROR] Failed to execute goal on project XXX: Could not resolve dependencies for project org.apache.flink.archetypetest:testArtifact:jar:0.1: Could not find artifact org.apache.flink:flink-shaded-include-yarn:jar:0.9-SNAPSHOT in sonatype-snapshots (https://oss.sonatype.org/content/repositories/snapshots/) ``` Could it be that the new artifact has not yet made it to Sonatype? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358878#comment-14358878 ] Fabian Hueske commented on FLINK-1388: -- Sure you can do that. I thought it might be a good idea to give the option to select the fields to write out and their order. But we of course start with writing all fields. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/454#issuecomment-78560058 How much time does this new shading add to the total compile? It used to be around 16-18mins for me using mvn clean install -DskipTests. I just did merge today and it has been more than 25 mins and has not complete the build =( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts
[ https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359139#comment-14359139 ] ASF GitHub Bot commented on FLINK-1605: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/454#issuecomment-78560058 How much time does this new shading add to the total compile? It used to be around 16-18mins for me using mvn clean install -DskipTests. I just did merge today and it has been more than 25 mins and has not complete the build =( Create a shaded Hadoop fat jar to resolve library version conflicts --- Key: FLINK-1605 URL: https://issues.apache.org/jira/browse/FLINK-1605 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger As per mailing list discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1695) Create machine learning library
Till Rohrmann created FLINK-1695: Summary: Create machine learning library Key: FLINK-1695 URL: https://issues.apache.org/jira/browse/FLINK-1695 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Create the infrastructure for Flink's machine learning library. This includes the creation of the module structure and the implementation of basic types such as vectors and matrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1697) Add alternating least squares algorithm for matrix factorization to ML library
[ https://issues.apache.org/jira/browse/FLINK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-1697: - Summary: Add alternating least squares algorithm for matrix factorization to ML library (was: Add alternating least squares algorithm to ML library) Add alternating least squares algorithm for matrix factorization to ML library -- Key: FLINK-1697 URL: https://issues.apache.org/jira/browse/FLINK-1697 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Add alternating least squares algorithm to ML library -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1697) Add alternating least squares algorithm for matrix factorization to ML library
[ https://issues.apache.org/jira/browse/FLINK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-1697: - Description: Add alternating least squares algorithm for matrix factorization to ML library (was: Add alternating least squares algorithm to ML library) Add alternating least squares algorithm for matrix factorization to ML library -- Key: FLINK-1697 URL: https://issues.apache.org/jira/browse/FLINK-1697 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Add alternating least squares algorithm for matrix factorization to ML library -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1696) Add multiple linear regression to ML library
Till Rohrmann created FLINK-1696: Summary: Add multiple linear regression to ML library Key: FLINK-1696 URL: https://issues.apache.org/jira/browse/FLINK-1696 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Add multiple linear regression to ML library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1697) Add alternating least squares algorithm to ML library
Till Rohrmann created FLINK-1697: Summary: Add alternating least squares algorithm to ML library Key: FLINK-1697 URL: https://issues.apache.org/jira/browse/FLINK-1697 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Add alternating least squares algorithm to ML library -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1698) Add polynomial base feature mapper to ML library
Till Rohrmann created FLINK-1698: Summary: Add polynomial base feature mapper to ML library Key: FLINK-1698 URL: https://issues.apache.org/jira/browse/FLINK-1698 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Add feature mapper which maps a vector into the polynomial feature space. This can be used as a preprocessing step prior to applying a {{Learner}} of Flink's ML library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78709707 I filed the corresponding JIRA issues. I'll update the commits with the respective JIRA tags once the PR will be merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1695) Create machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359673#comment-14359673 ] ASF GitHub Bot commented on FLINK-1695: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78709707 I filed the corresponding JIRA issues. I'll update the commits with the respective JIRA tags once the PR will be merged. Create machine learning library --- Key: FLINK-1695 URL: https://issues.apache.org/jira/browse/FLINK-1695 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Create the infrastructure for Flink's machine learning library. This includes the creation of the module structure and the implementation of basic types such as vectors and matrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite
[ https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358483#comment-14358483 ] ASF GitHub Bot commented on FLINK-1677: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78459308 Thanks for the PR. I think it's a good fix, **but**: I noticed that `DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId`is sometimes failing (for example here: https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt). I initially thought that it was a problem of my runtime changes in #471, but the problem is the following: You expect a certain exception msg (The edge src/trg id could not be found within the vertexIds) for the test to pass. But in the getDegrees() test there is a reducer following the coGroup (this one throws the expected exception), which can also fail with another exception, because of the cancelling before in the coCroup. If this other exception gets reported to the client, the test fails. I think it would be enough to test that the job fails or even think about whether it makes sense to cover this in a unit test of the operator, because imo it all boils down to the behaviour of the degree computing coGroup, which throws the expected `NoSuchElementException`. --- Other than that: - Initially I thought whether you should extends MultipleProgramsTestBase as in other test cases, but I think that doesn't work with the Exception checking. In the future, we should add some test utils to test failing high-level programs. - Typo in the commit msg: suppres**s**ed If you think it's OK to just test that the program is failing, go ahead and change it. After that it would be good to merge. Properly Suppress Sysout Printing for the Degrees with exception test suite --- Key: FLINK-1677 URL: https://issues.apache.org/jira/browse/FLINK-1677 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor Suppress systout printing similar to: flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java Speedup test suite by reusing a mini-cluster similar to: flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples
[ https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358479#comment-14358479 ] ASF GitHub Bot commented on FLINK-1576: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/470#issuecomment-78459161 Hi! If no comments here, I'd like to merge this one :) Change the Gelly examples to be consistent with the other Flink examples Key: FLINK-1576 URL: https://issues.apache.org/jira/browse/FLINK-1576 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.8.0 Reporter: Andra Lungu Assignee: Vasia Kalavri Labels: easyfix The current Gelly examples just work on default input data. If we look at the other Flink examples, e.g. Connected Components, they also allow input data to be read from a text file passed as a parameter to the main method. It would be nice to follow the same approach in our examples. A first step in that direction is the SSSP example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1677][gelly] Suppresed Sysout Printing
Github user uce commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78459308 Thanks for the PR. I think it's a good fix, **but**: I noticed that `DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId`is sometimes failing (for example here: https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt). I initially thought that it was a problem of my runtime changes in #471, but the problem is the following: You expect a certain exception msg (The edge src/trg id could not be found within the vertexIds) for the test to pass. But in the getDegrees() test there is a reducer following the coGroup (this one throws the expected exception), which can also fail with another exception, because of the cancelling before in the coCroup. If this other exception gets reported to the client, the test fails. I think it would be enough to test that the job fails or even think about whether it makes sense to cover this in a unit test of the operator, because imo it all boils down to the behaviour of the degree computing coGroup, which throws the expected `NoSuchElementException`. --- Other than that: - Initially I thought whether you should extends MultipleProgramsTestBase as in other test cases, but I think that doesn't work with the Exception checking. In the future, we should add some test utils to test failing high-level programs. - Typo in the commit msg: suppres**s**ed If you think it's OK to just test that the program is failing, go ahead and change it. After that it would be good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite
[ https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358484#comment-14358484 ] ASF GitHub Bot commented on FLINK-1677: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78459525 +1 for what @uce said :) Properly Suppress Sysout Printing for the Degrees with exception test suite --- Key: FLINK-1677 URL: https://issues.apache.org/jira/browse/FLINK-1677 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor Suppress systout printing similar to: flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java Speedup test suite by reusing a mini-cluster similar to: flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358566#comment-14358566 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26295918 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- I tried to do so, but the problem is that the reachedEnd() is updated inside nextRecord() in DelimitedInputFormat. I tried to make a condition when the readPos == limit, do not call nextRecord, but I could not because these fields are private. So, I have to stop reading when the reachedEnd is true, which in turn means that the returned object from nextRecord [ the tweet ] is null. To add the required assert, all the tests will fail. I do not know how to avoid this problem, do u have a suggestion ? Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358570#comment-14358570 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26296323 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- It worked now, Thanks Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26296323 --- Diff: flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java --- @@ -0,0 +1,67 @@ +package org.apache.flink.contrib.tweetinputformat.io; + +import org.apache.flink.api.common.io.DelimitedInputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.GenericTypeInfo; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.codehaus.jackson.JsonParseException; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStreamReader; + + +public class SimpleTweetInputFormat extends DelimitedInputFormatTweet implements ResultTypeQueryableTweet { + +private static final Logger LOG = LoggerFactory.getLogger(SimpleTweetInputFormat.class); + +@Override +public Tweet nextRecord(Tweet record) throws IOException { +Boolean result = false; + +do { +try { +record.reset(0); +record = super.nextRecord(record); +result = true; + +} catch (JsonParseException e) { +result = false; + +} +} while (!result); + +return record; +} + +@Override +public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int numBytes) throws IOException { + + +InputStreamReader jsonReader = new InputStreamReader(new ByteArrayInputStream(bytes)); +jsonReader.skip(offset); + +JSONParser parser = new JSONParser(); --- End diff -- It worked now, Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26296679 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; --- End diff -- DONE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358576#comment-14358576 ] ASF GitHub Bot commented on FLINK-1615: --- Github user Elbehery commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26296679 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; --- End diff -- DONE Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples
[ https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358515#comment-14358515 ] ASF GitHub Bot commented on FLINK-1576: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/470#discussion_r26293641 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java --- @@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) { private static final class ProjectVertexId implements MapFunctionTuple2Long,Long, Long { public Long map(Tuple2Long, Long value) { return value.f0; } } + + @Override + public String getDescription() { + return Graph Metrics Example; + } + + // ** + // UTIL METHODS + // ** + + private static boolean fileOutput = false; + + private static String edgesInputPath = null; + + static final int NUM_VERTICES = 100; + + static final long SEED = 9876; + + private static boolean parseParameters(String[] args) { + + if(args.length 0) { + if(args.length != 1) { + System.err.println(Usage: LabelPropagation vertex path edge path output path num iterations); --- End diff -- Thanks @uce ^^ Change the Gelly examples to be consistent with the other Flink examples Key: FLINK-1576 URL: https://issues.apache.org/jira/browse/FLINK-1576 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.8.0 Reporter: Andra Lungu Assignee: Vasia Kalavri Labels: easyfix The current Gelly examples just work on default input data. If we look at the other Flink examples, e.g. Connected Components, they also allow input data to be read from a text file passed as a parameter to the main method. It would be nice to follow the same approach in our examples. A first step in that direction is the SSSP example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1693) Deprecate the Spargel API
Vasia Kalavri created FLINK-1693: Summary: Deprecate the Spargel API Key: FLINK-1693 URL: https://issues.apache.org/jira/browse/FLINK-1693 Project: Flink Issue Type: Task Components: Spargel Affects Versions: 0.9 Reporter: Vasia Kalavri For the upcoming 0.9 release, we should mark all user-facing methods from the Spargel API as deprecated, with a warning that we are going to remove it at some point. We should also add a comment in the docs and point people to Gelly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1694) Change the split between create/run of a vertex-centric iteration
[ https://issues.apache.org/jira/browse/FLINK-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-1694: - Issue Type: Improvement (was: Task) Change the split between create/run of a vertex-centric iteration - Key: FLINK-1694 URL: https://issues.apache.org/jira/browse/FLINK-1694 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Vasia Kalavri Currently, the vertex-centric API in Gelly looks like this: {code:java} Graph inputGaph = ... //create graph VertexCentricIteration iteration = inputGraph.createVertexCentricIteration(); ... // configure the iteration Graph newGraph = inputGaph.runVertexCentricIteration(iteration); {code} We have this create/run split, in order to expose the iteration object and be able to call the public methods of VertexCentricIteration. However, this is not very nice and might lead to errors, if create and run are mistakenly called on different graph objects. One suggestion is to change this to the following: {code:java} VertexCentricIteration iteration = inputGraph.createVertexCentricIteration(); ... // configure the iteration Graph newGraph = iteration.result(); {code} or to go with a single run call, where we add an IterationConfiguration object as a parameter and we don't expose the iteration object to the user at all: {code:java} IterationConfiguration parameters = ... Graph newGraph = inputGraph.runVertexCentricIteration(parameters); {code} and we can also have a simplified method where no configuration is passed. What do you think? Personally, I like the second option a bit more. -Vasia. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1694) Change the split between create/run of a vertex-centric iteration
Vasia Kalavri created FLINK-1694: Summary: Change the split between create/run of a vertex-centric iteration Key: FLINK-1694 URL: https://issues.apache.org/jira/browse/FLINK-1694 Project: Flink Issue Type: Task Components: Gelly Reporter: Vasia Kalavri Currently, the vertex-centric API in Gelly looks like this: {code:java} Graph inputGaph = ... //create graph VertexCentricIteration iteration = inputGraph.createVertexCentricIteration(); ... // configure the iteration Graph newGraph = inputGaph.runVertexCentricIteration(iteration); {code} We have this create/run split, in order to expose the iteration object and be able to call the public methods of VertexCentricIteration. However, this is not very nice and might lead to errors, if create and run are mistakenly called on different graph objects. One suggestion is to change this to the following: {code:java} VertexCentricIteration iteration = inputGraph.createVertexCentricIteration(); ... // configure the iteration Graph newGraph = iteration.result(); {code} or to go with a single run call, where we add an IterationConfiguration object as a parameter and we don't expose the iteration object to the user at all: {code:java} IterationConfiguration parameters = ... Graph newGraph = inputGraph.runVertexCentricIteration(parameters); {code} and we can also have a simplified method where no configuration is passed. What do you think? Personally, I like the second option a bit more. -Vasia. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26288947 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- Can you make the test fail if `tweet` is null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets
[ https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358404#comment-14358404 ] ASF GitHub Bot commented on FLINK-1615: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/442#discussion_r26288947 --- Diff: flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java --- @@ -0,0 +1,76 @@ +package org.apache.flink.contrib; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat; +import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet; +import org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import java.io.File; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + + +public class SimpleTweetInputFormatTest { + +private Tweet tweet; + +private SimpleTweetInputFormat simpleTweetInputFormat; + +private FileInputSplit fileInputSplit; + +protected Configuration config; + +protected File tempFile; + + +@Before +public void testSetUp() { + + +simpleTweetInputFormat = new SimpleTweetInputFormat(); + +File jsonFile = new File(../flink-contrib/src/main/resources/HashTagTweetSample.json); + +fileInputSplit = new FileInputSplit(0, new Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost}); +} + +@Test +public void testTweetInput() throws Exception { + + +simpleTweetInputFormat.open(fileInputSplit); +ListString result; + +while (!simpleTweetInputFormat.reachedEnd()) { +tweet = new Tweet(); +tweet = simpleTweetInputFormat.nextRecord(tweet); + +if(tweet != null){ --- End diff -- Can you make the test fail if `tweet` is null? Introduces a new InputFormat for Tweets --- Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New Feature Components: flink-contrib Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Minor An event-driven parser for Tweets into Java Pojos. It parses all the important part of the tweet into Java objects. Tested on cluster and the performance in pretty well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1677][gelly] Suppresed Sysout Printing
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/475#issuecomment-78459525 +1 for what @uce said :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293223 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; + +import org.apache.flink.api.java.tuple.Tuple5; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Vertex; + +import java.io.Serializable; + +/** + * A wrapper around Tuple5 used to avoid duplicate vertex ids and to improve readability in + * the {@link org.apache.flink.graph.Graph#getTriplets()} method. + * + * @param K the vertex key type + * @param VV the vertex value type + * @param EV the edge value type + */ +public class Triplet K extends ComparableK Serializable, VV extends Serializable, EV extends Serializable + extends Tuple5K, K, VV, VV, EV { + + public Triplet() {} + + public Triplet(K srcId, K trgId, VV srcVal, VV trgVal, EV edgeVal) { + super(srcId, trgId, srcVal, trgVal, edgeVal); --- End diff -- Wouldn't a srcVertex, trgVertex, edge constructor be more useful/intuitive for a user? We can keep this one too, but better add a javadoc :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method
[ https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358497#comment-14358497 ] ASF GitHub Bot commented on FLINK-1633: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293166 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; + +import org.apache.flink.api.java.tuple.Tuple5; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Vertex; + +import java.io.Serializable; + +/** + * A wrapper around Tuple5 used to avoid duplicate vertex ids and to improve readability in --- End diff -- I would prefer a description about what a Triplet is conceptually and how it can be used by user here, instead of explaining that it wraps a Tuple5 :)) Add getTriplets() Gelly method -- Key: FLINK-1633 URL: https://issues.apache.org/jira/browse/FLINK-1633 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Andra Lungu Priority: Minor Labels: starter In some graph algorithms, it is required to access the graph edges together with the vertex values of the source and target vertices. For example, several graph weighting schemes compute some kind of similarity weights for edges, based on the attributes of the source and target vertices. This issue proposes adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, TrgVertex triplets from the input graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/452#discussion_r26293166 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.utils; + +import org.apache.flink.api.java.tuple.Tuple5; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Vertex; + +import java.io.Serializable; + +/** + * A wrapper around Tuple5 used to avoid duplicate vertex ids and to improve readability in --- End diff -- I would prefer a description about what a Triplet is conceptually and how it can be used by user here, instead of explaining that it wraps a Tuple5 :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Remove -j and -a parameters which seemed no lo...
GitHub user hsaputra opened a pull request: https://github.com/apache/flink/pull/482 Remove -j and -a parameters which seemed no longer valid in the doc example for YARN From: ./bin/flink run -j ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \ -a 1 hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt To: ./bin/flink run ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \ hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt Looking at http://ci.apache.org/projects/flink/flink-docs-master/cli.html seemed like -j and -a are no longer valid? You can merge this pull request into a Git repository by running: $ git pull https://github.com/hsaputra/flink fix_doc_run_onyarn_params Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/482.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #482 commit 1ae0c5779b0c8b64feb9fd5f00b51a9b83cd4e0e Author: Henry Saputra henry.sapu...@gmail.com Date: 2015-03-12T23:18:03Z Remove -j and -a parameters which seemed no longer valid in the doc example for submit job to Flink run in YARN. From: ./bin/flink run -j ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \ -a 1 hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt To: ./bin/flink run ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \ hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1594] [streaming] [wip] DataStream supp...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/472#issuecomment-78431621 Ok seems like there is an issue with chained tasks, so lets not merge it until that is fixed. Gabor is working on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---