[GitHub] flink pull request: [FLINK-1622][java-api][scala-api] add a partia...

2015-03-12 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/466#issuecomment-78101926
  
Sorry, I completely blanked, of course, You still need the grouping, only 
the shuffle step you don't need.

So, I suggest only better tests, using a combination of partitionByHash() 
and groupReducePartial().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356656#comment-14356656
 ] 

ASF GitHub Bot commented on FLINK-1605:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/454#discussion_r26199913
  
--- Diff: pom.xml ---
@@ -1159,6 +726,55 @@ under the License.
/execution
/executions
/plugin
+
+   !-- We use shading in all packages for relocating some 
classes, such as
+   Guava and ASM.
+   By doing so, users adding Flink as a dependency 
won't run into conflicts.
+   (For example users can use whatever guava 
version they want, because we don't
+   expose our guava dependency)
+   --
+   plugin
+   groupIdorg.apache.maven.plugins/groupId
+   artifactIdmaven-shade-plugin/artifactId
+   version2.3/version
+   executions
+   execution
+   idshade-flink/id
+   phasepackage/phase
+   goals
+   goalshade/goal
+   /goals
+   configuration
+   
shadeTestJartrue/shadeTestJar
+   
shadedArtifactAttachedfalse/shadedArtifactAttached
+   
createDependencyReducedPomtrue/createDependencyReducedPom
+   
dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation
+   artifactSet
+   includes
+   
includecom.google.guava:*/include
+   
includeorg.ow2.asm:*/include
+   !--  
includeorg.apache.flume:*/include Required because flume depends on guava. 
This only affects the flink-streaming connector which is not shipped through 
flink-dist by default. So its only about users adding flume as a dependency. --
--- End diff --

Why is it commented out rather than removed?


 Create a shaded Hadoop fat jar to resolve library version conflicts
 ---

 Key: FLINK-1605
 URL: https://issues.apache.org/jira/browse/FLINK-1605
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 As per mailing list discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78440416
  
Is there a JIRA associated with this PR?
Once the issues with the documentation are resolved, I'd say its good to 
merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78262466
  
 Why are you using Scala 2.11.4? The mentioned bug seems to be fixed in 
2.11.6.

2.11.6 will be out in April. It is between 2.11.5 (which is known to have 
issues) and 2.11.4.

 Why aren't we adding a _2.11 suffix to the Scala 2.11 Flink builds?

We can do this, and it certainly makes sense if you want to ship pre-builds 
of both versions. With the current setup if you want to use Flink with 2.11 you 
have to build and install the maven projects yourself (I'm blindly following 
the Spark model here, let me know if you prefer the other option).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26215888
  
--- Diff: docs/_includes/navbar.html ---
@@ -24,15 +24,15 @@
 We might be on an externally hosted documentation site.
 Please keep the site.FLINK_WEBSITE_URL below to ensure a link back 
to the Flink website.
 {% endcomment %}
-   a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home
+   a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home
  img class=hidden-xs hidden-sm img-responsive
-  src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo
+  src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo
/a
div class=row visible-xs
  div class=col-xs-3
-   a href={{ site.baseurl }}index.html title=Home  
+   a href={{ site.baseurl }}/index.html title=Home
--- End diff --

Yes, we can set in it the _config.yml. I would not change it now because it 
has some other implications. For example, local testing of the website will be 
more complicated because all links will point to the online version. We can 
open a separate issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-12 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26215940
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java 
---
@@ -684,4 +693,178 @@ private void testRemovingTrailingCR(String 
lineBreakerInFile, String lineBreaker
}
}
 
+   @Test
+   public void testPojoType() throws Exception {
+   File tempFile = File.createTempFile(CsvReaderPojoType, tmp);
+   tempFile.deleteOnExit();
+   tempFile.setWritable(true);
+
+   OutputStreamWriter wrt = new OutputStreamWriter(new 
FileOutputStream(tempFile));
+   wrt.write(123,AAA,3.123,BBB\n);
+   wrt.write(456,BBB,1.123,AAA\n);
+   wrt.close();
+
+   @SuppressWarnings(unchecked)
+   TypeInformationPojoItem typeInfo = 
(TypeInformationPojoItem) TypeExtractor.createTypeInfo(PojoItem.class);
+   CsvInputFormatPojoItem inputFormat = new 
CsvInputFormatPojoItem(new Path(tempFile.toURI().toString()), typeInfo);
+
+   Configuration parameters = new Configuration();
+   inputFormat.configure(parameters);
+
+   inputFormat.setDelimiter('\n');
+   inputFormat.setFieldDelimiter(',');
+
+   FileInputSplit[] splits = inputFormat.createInputSplits(1);
+
+   inputFormat.open(splits[0]);
+
+   PojoItem item = new PojoItem();
+   inputFormat.nextRecord(item);
+
+   assertEquals(123, item.field1);
+   assertEquals(AAA, item.field2);
+   assertEquals(Double.valueOf(3.123), item.field3);
+   assertEquals(BBB, item.field4);
+
+   inputFormat.nextRecord(item);
+
+   assertEquals(456, item.field1);
+   assertEquals(BBB, item.field2);
+   assertEquals(Double.valueOf(1.123), item.field3);
+   assertEquals(AAA, item.field4);
+   }
+
+   @Test
+   public void testPojoTypeWithPrivateField() throws Exception {
+   File tempFile = File.createTempFile(CsvReaderPojoType, tmp);
+   tempFile.deleteOnExit();
+   tempFile.setWritable(true);
+
+   OutputStreamWriter wrt = new OutputStreamWriter(new 
FileOutputStream(tempFile));
+   wrt.write(123,AAA,3.123,BBB\n);
+   wrt.write(456,BBB,1.123,AAA\n);
+   wrt.close();
+
+   @SuppressWarnings(unchecked)
+   TypeInformationPrivatePojoItem typeInfo = 
(TypeInformationPrivatePojoItem) 
TypeExtractor.createTypeInfo(PrivatePojoItem.class);
+   CsvInputFormatPrivatePojoItem inputFormat = new 
CsvInputFormatPrivatePojoItem(new Path(tempFile.toURI().toString()), 
typeInfo);
+
+   Configuration parameters = new Configuration();
+   inputFormat.configure(parameters);
+
+   inputFormat.setDelimiter('\n');
+   inputFormat.setFieldDelimiter(',');
+
+   FileInputSplit[] splits = inputFormat.createInputSplits(1);
+
+   inputFormat.open(splits[0]);
+
+   PrivatePojoItem item = new PrivatePojoItem();
+   inputFormat.nextRecord(item);
+
+   assertEquals(123, item.field1);
+   assertEquals(AAA, item.field2);
+   assertEquals(Double.valueOf(3.123), item.field3);
+   assertEquals(BBB, item.field4);
+
+   inputFormat.nextRecord(item);
+
+   assertEquals(456, item.field1);
+   assertEquals(BBB, item.field2);
+   assertEquals(Double.valueOf(1.123), item.field3);
+   assertEquals(AAA, item.field4);
+   }
+
+   @Test
+   public void testPojoTypeWithMappingInformation() throws Exception {
+   File tempFile = File.createTempFile(CsvReaderPojoType, tmp);
+   tempFile.deleteOnExit();
+   tempFile.setWritable(true);
+
+   OutputStreamWriter wrt = new OutputStreamWriter(new 
FileOutputStream(tempFile));
+   wrt.write(123,3.123,AAA,BBB\n);
+   wrt.write(456,1.123,BBB,AAA\n);
+   wrt.close();
+
+   @SuppressWarnings(unchecked)
+   TypeInformationPojoItem typeInfo = 
(TypeInformationPojoItem) TypeExtractor.createTypeInfo(PojoItem.class);
+   CsvInputFormatPojoItem inputFormat = new 
CsvInputFormatPojoItem(new Path(tempFile.toURI().toString()), typeInfo);
+   inputFormat.setFieldsMap(new String[]{field1, field3, 
field2, field4});
+
+   Configuration parameters = new Configuration();
+   inputFormat.configure(parameters);
+
+   inputFormat.setDelimiter('\n');
+   inputFormat.setFieldDelimiter(',');
+
  

[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358299#comment-14358299
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26284910
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

I know, I have tried in the beginning to create them as instance fields in 
the class, however I received an error because the fields are not serializable. 
I tried to declare them as transient, but Flink threw Can not submit Job 
exception, If you have suggestions please let me know


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26284910
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

I know, I have tried in the beginning to create them as instance fields in 
the class, however I received an error because the fields are not serializable. 
I tried to declare them as transient, but Flink threw Can not submit Job 
exception, If you have suggestions please let me know


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358335#comment-14358335
 ] 

Ufuk Celebi commented on FLINK-1679:


I've discovered a minor inconsistency recently: in the execution environment, 
the setter is called {{setDegreeOfParallelism(int)}} whereas for certain 
operators like FlatMap its is called {{setParallelism(int)}}.



Thanks for bringing this up. I really think we should document this properly 
and prominently for the upcoming release. Let's have an offline chat about the 
user feedback you've received and I can come up with an initial draft for the 
docs in the next week.



 Document how degree of parallelism /  parallelism / slots are connected 
 to each other
 ---

 Key: FLINK-1679
 URL: https://issues.apache.org/jira/browse/FLINK-1679
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.9
Reporter: Robert Metzger

 I see too many users being confused about properly setting up Flink with 
 respect to parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1692) Add option to Flink YARN client to stop a detached YARN session.

2015-03-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1692:
-

 Summary: Add option to Flink YARN client to stop a detached YARN 
session.
 Key: FLINK-1692
 URL: https://issues.apache.org/jira/browse/FLINK-1692
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358345#comment-14358345
 ] 

Robert Metzger commented on FLINK-1679:
---

I would suggest to remove all occurrences of degreeOfParalleism in the system 
and replace it by parallelism everywhere.
The CLI frontend for example also calls it {{-p}}, not {{-dop}}.

I would also suggest to set the parallelism by default to {{AUTOMAX}} in the 
CliFrontend.

 Document how degree of parallelism /  parallelism / slots are connected 
 to each other
 ---

 Key: FLINK-1679
 URL: https://issues.apache.org/jira/browse/FLINK-1679
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Ufuk Celebi

 I see too many users being confused about properly setting up Flink with 
 respect to parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-03-12 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1525:
--
Assignee: (was: Robert Metzger)

 Provide utils to pass -D parameters to UDFs 
 

 Key: FLINK-1525
 URL: https://issues.apache.org/jira/browse/FLINK-1525
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Robert Metzger
  Labels: starter

 Hadoop users are used to setting job configuration through -D on the 
 command line.
 Right now, Flink users have to manually parse command line arguments and pass 
 them to the methods.
 It would be nice to provide a standard args parser with is taking care of 
 such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-03-12 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-1525:
-

Assignee: Robert Metzger

 Provide utils to pass -D parameters to UDFs 
 

 Key: FLINK-1525
 URL: https://issues.apache.org/jira/browse/FLINK-1525
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Robert Metzger
Assignee: Robert Metzger
  Labels: starter

 Hadoop users are used to setting job configuration through -D on the 
 command line.
 Right now, Flink users have to manually parse command line arguments and pass 
 them to the methods.
 It would be nice to provide a standard args parser with is taking care of 
 such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-1679:
--

Assignee: Ufuk Celebi

 Document how degree of parallelism /  parallelism / slots are connected 
 to each other
 ---

 Key: FLINK-1679
 URL: https://issues.apache.org/jira/browse/FLINK-1679
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Ufuk Celebi

 I see too many users being confused about properly setting up Flink with 
 respect to parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78307840
  
Any remarks on the my reply to the general comments from @rmetzger (scala 
version, suffix)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78300593
  
Nice solution. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78317231
  
I'll ask on the ML


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358379#comment-14358379
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26288077
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,62 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+while (!simpleTweetInputFormat.reachedEnd()) {
--- End diff --

Changed, I have tried to use Switch instead of multiple if-else because 
it more efficient, but the compiler of flink uses Java 1.6, and it can not 
accept String in switches.


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-12 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238
 ] 

Sachin Goel edited comment on FLINK-1537 at 3/11/15 5:24 PM:
-

Yes. I completely agree with the transformer-learner chain design methodology. 
The decision tree I'll write will provide an interface for first specifying the 
structure of the data, i.e., the tuple, as in the types, ranges, etc. and any 
other statistics possible to help with the learning.
I do not see myself how it makes a difference to store the data column wise or 
row wise, although it might have some far-reaching consequences on how the 
learning process proceeds. In fact, this seems like a valid idea for a learning 
process which treats each coordinate one by one. It might help in providing all 
the values of one particular coordinate in one go and learn some statistics on 
it, which might help in a better learning process. In fact, in a decision tree 
implementation on big data, it becomes prudent to learn such a statistic to 
ensure only a reasonable number of splits on the data are considered. I will 
look into how this could be achieved with a row-style data representation.
As for the deep learning framework, you are indeed right. I am not sure myself 
if anyone has yet evaluated the potential of a deep learning system on a 
distributed system. I will look into the H2O project's implementation related 
to this. As of yet, I'm still not sure if deep learning can be as fast on 
distributed systems as it is on GPUs.


was (Author: sachingoel0101):
Yes. I completely agree with the transformer-learner chain design methodology. 
The decision tree I'll write will provide an interface for first specifying the 
structure of the data, i.e., the tuple, as in the types, ranges, etc. and any 
other statistics possible to help with the learning.
I do not see myself how it makes a difference to store the data column wise or 
row wise, although it might have some far-reaching consequences on how the 
learning process proceeds. In fact, this seems like a valid idea for a learning 
process which treat each coordinate one by one. It might help in providing all 
the attributes of one particular coordinate in one go and learn some statistics 
on it, which might help in a better learning process. In fact, in a decision 
tree implementation on big data, it becomes prudent to learn such a statistic 
to ensure only a reasonable number of splits on the data are considered. I will 
look into how this could be achieved with a row-style data representation.
As for the deep learning framework, you are indeed right. I am not sure myself 
if anyone has yet evaluated the potential of a deep learning system on a 
distributed system. I will look into the H2O project's implementation related 
to this. As of yet, I'm still not sure if deep learning can be as fast on 
distributed systems as it is on GPUs.

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26288077
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,62 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+while (!simpleTweetInputFormat.reachedEnd()) {
--- End diff --

Changed, I have tried to use Switch instead of multiple if-else because 
it more efficient, but the compiler of flink uses Java 1.6, and it can not 
accept String in switches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26285363
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
+TweetHandler handler = new TweetHandler();
+
+try {
+
+handler.reuse = reuse;
+parser.parse(jsonReader, handler, false);
+} catch (ParseException e) {
+
+LOG.debug(Class +SimpleTweetInputFormat.class+ 
+e.getMessage() );
--- End diff --

Edited


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26285316
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

Putting the parser into a `transient` field and initalizing it in the 
`open()` method is the way to go.

Can you give me the full stacktrace for the Can not submit Job exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26219828
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
--- End diff --

Local testing should work out of the box. We could set `{{site.baseurl}}` 
to http://ci.apache.org/projects/flink/flink-docs-master/ and then change our 
build_docs script to serve the website for local testing with the `--baseurl 
http://127.0.01/` parameter. If you want, you can fix this here or we open a 
JIRA for that.

http://jekyllrb.com/docs/configuration/#serve-command-options



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis

2015-03-12 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357144#comment-14357144
 ] 

Stephan Ewen commented on FLINK-1690:
-

I retract my statement. After careful log defusing with [~uce], I think that 
things go as planned.

The test has actually a too limited time budget. It takes almost 8 seconds on 
my machine, so I grant it that Travis may not be able to complete it in 30 
seconds.

 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously 
 fails on Travis
 --

 Key: FLINK-1690
 URL: https://issues.apache.org/jira/browse/FLINK-1690
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 I got the following error on Travis.
 {code}
 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The 
 program did not finish in time
 {code}
 I think we have to increase the timeouts for this test case to make it 
 reliably run on Travis.
 The log of the failed Travis build can be found 
 [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358317#comment-14358317
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26285363
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
+TweetHandler handler = new TweetHandler();
+
+try {
+
+handler.reuse = reuse;
+parser.parse(jsonReader, handler, false);
+} catch (ParseException e) {
+
+LOG.debug(Class +SimpleTweetInputFormat.class+ 
+e.getMessage() );
--- End diff --

Edited


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358316#comment-14358316
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26285316
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

Putting the parser into a `transient` field and initalizing it in the 
`open()` method is the way to go.

Can you give me the full stacktrace for the Can not submit Job exception?


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78441869
  
No there is no Jira issue for the machine learning library.

The problem with the links should be fixed.
On Mar 12, 2015 9:30 AM, Robert Metzger notificati...@github.com wrote:

 Is there a JIRA associated with this PR?
 Once the issues with the documentation are resolved, I'd say its good to
 merge.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/479#issuecomment-78440416.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358493#comment-14358493
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293025
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -326,6 +328,34 @@ public ExecutionEnvironment getContext() {
}
 
--- End diff --

some description of what this method does would be nice :-)


 Add getTriplets() Gelly method
 --

 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor
  Labels: starter

 In some graph algorithms, it is required to access the graph edges together 
 with the vertex values of the source and target vertices. For example, 
 several graph weighting schemes compute some kind of similarity weights for 
 edges, based on the attributes of the source and target vertices. This issue 
 proposes adding a convenience Gelly method that generates a DataSet of 
 srcVertex, Edge, TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358505#comment-14358505
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293336
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
--- End diff --

I would put this in `org.apache.flink.graph` instead.


 Add getTriplets() Gelly method
 --

 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor
  Labels: starter

 In some graph algorithms, it is required to access the graph edges together 
 with the vertex values of the source and target vertices. For example, 
 several graph weighting schemes compute some kind of similarity weights for 
 edges, based on the attributes of the source and target vertices. This issue 
 proposes adding a convenience Gelly method that generates a DataSet of 
 srcVertex, Edge, TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...

2015-03-12 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293336
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
--- End diff --

I would put this in `org.apache.flink.graph` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26289003
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
--- End diff --

Missing license header.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1576] [gelly] improvements to the gelly...

2015-03-12 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/470#issuecomment-78459161
  
Hi! If no comments here, I'd like to merge this one :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358499#comment-14358499
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293223
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
+
+import org.apache.flink.api.java.tuple.Tuple5;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Vertex;
+
+import java.io.Serializable;
+
+/**
+ * A wrapper around Tuple5 used to avoid duplicate vertex ids and to 
improve readability in
+ * the {@link org.apache.flink.graph.Graph#getTriplets()} method.
+ *
+ * @param K the vertex key type
+ * @param VV the vertex value type
+ * @param EV the edge value type
+ */
+public class Triplet K extends ComparableK  Serializable, VV extends 
Serializable, EV extends Serializable
+   extends Tuple5K, K, VV, VV, EV {
+
+   public Triplet() {}
+
+   public Triplet(K srcId, K trgId, VV srcVal, VV trgVal, EV edgeVal) {
+   super(srcId, trgId, srcVal, trgVal, edgeVal);
--- End diff --

Wouldn't a srcVertex, trgVertex, edge constructor be more 
useful/intuitive for a user?
We can keep this one too, but better add a javadoc :-)


 Add getTriplets() Gelly method
 --

 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor
  Labels: starter

 In some graph algorithms, it is required to access the graph edges together 
 with the vertex values of the source and target vertices. For example, 
 several graph weighting schemes compute some kind of similarity weights for 
 edges, based on the attributes of the source and target vertices. This issue 
 proposes adding a convenience Gelly method that generates a DataSet of 
 srcVertex, Edge, TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358513#comment-14358513
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/452#issuecomment-78461681
  
Thanks for implementing this @andralungu!
I have added some inline comments.

Also, we are definitely missing documentation and an example using the 
method. 
I would suggest something simple, like weighting an input graph, by 
computing the euclidean distance of the src/vertex values for each edge and 
attaching this value as the edge weight.
What do you think?


 Add getTriplets() Gelly method
 --

 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor
  Labels: starter

 In some graph algorithms, it is required to access the graph edges together 
 with the vertex values of the source and target vertices. For example, 
 several graph weighting schemes compute some kind of similarity weights for 
 edges, based on the attributes of the source and target vertices. This issue 
 proposes adding a convenience Gelly method that generates a DataSet of 
 srcVertex, Edge, TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1547) Disable automated ApplicationMaster restart

2015-03-12 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1547.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1.

 Disable automated ApplicationMaster restart
 ---

 Key: FLINK-1547
 URL: https://issues.apache.org/jira/browse/FLINK-1547
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 Currently, Flink on YARN is restarting the the ApplicationMaster, if it 
 crashes.
 The other components don't support this (frontend tries to reconnect.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1629) Add option to start Flink on YARN in a detached mode

2015-03-12 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1629.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1

 Add option to start Flink on YARN in a detached mode
 

 Key: FLINK-1629
 URL: https://issues.apache.org/jira/browse/FLINK-1629
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 Right now, we expect the YARN command line interface to be connected with the 
 Application Master all the time to control the yarn session or the job.
 For very long running sessions or jobs users want to just fire and forget a 
 job/session to YARN.
 Stopping the session will still be possible using YARN's tools.
 Also, prior to detaching itself, the CLI frontend could print the required 
 command to kill the session as a convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358462#comment-14358462
 ] 

ASF GitHub Bot commented on FLINK-1629:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/468


 Add option to start Flink on YARN in a detached mode
 

 Key: FLINK-1629
 URL: https://issues.apache.org/jira/browse/FLINK-1629
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger

 Right now, we expect the YARN command line interface to be connected with the 
 Application Master all the time to control the yarn session or the job.
 For very long running sessions or jobs users want to just fire and forget a 
 job/session to YARN.
 Stopping the session will still be possible using YARN's tools.
 Also, prior to detaching itself, the CLI frontend could print the required 
 command to kill the session as a convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1630) Add option to YARN client to re-allocate failed containers

2015-03-12 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1630.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/13bb21b1.

 Add option to YARN client to re-allocate failed containers
 --

 Key: FLINK-1630
 URL: https://issues.apache.org/jira/browse/FLINK-1630
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 The current Flink YARN client tries to allocate only the initial number of 
 containers.
 If a containers fail (in particular for long-running sessions) there is no 
 way of re-allocating them.
 We should add a option to the ApplicationMaster to re-allocate missing/failed 
 containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358485#comment-14358485
 ] 

ASF GitHub Bot commented on FLINK-1576:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/470#discussion_r26292649
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java
 ---
@@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) {
private static final class ProjectVertexId implements 
MapFunctionTuple2Long,Long, Long {
public Long map(Tuple2Long, Long value) { return value.f0; }
}
+
+   @Override
+   public String getDescription() {
+   return Graph Metrics Example;
+   }
+
+   // 
**
+   // UTIL METHODS
+   // 
**
+
+   private static boolean fileOutput = false;
+
+   private static String edgesInputPath = null;
+
+   static final int NUM_VERTICES = 100;
+
+   static final long SEED = 9876;
+
+   private static boolean parseParameters(String[] args) {
+
+   if(args.length  0) {
+   if(args.length != 1) {
+   System.err.println(Usage: LabelPropagation 
vertex path edge path output path num iterations);
--- End diff --

LabelPropagation should be GraphMetrics


 Change the Gelly examples to be consistent with the other Flink examples
 

 Key: FLINK-1576
 URL: https://issues.apache.org/jira/browse/FLINK-1576
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.8.0
Reporter: Andra Lungu
Assignee: Vasia Kalavri
  Labels: easyfix

 The current Gelly examples just work on default input data. 
 If we look at the other Flink examples, e.g. Connected Components, they also 
 allow input data to be read from a text file passed as a parameter to the 
 main method. 
  
 It would be nice to follow the same approach in our examples. A first step in 
 that direction is the SSSP example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1576] [gelly] improvements to the gelly...

2015-03-12 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/470#discussion_r26293641
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java
 ---
@@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) {
private static final class ProjectVertexId implements 
MapFunctionTuple2Long,Long, Long {
public Long map(Tuple2Long, Long value) { return value.f0; }
}
+
+   @Override
+   public String getDescription() {
+   return Graph Metrics Example;
+   }
+
+   // 
**
+   // UTIL METHODS
+   // 
**
+
+   private static boolean fileOutput = false;
+
+   private static String edgesInputPath = null;
+
+   static final int NUM_VERTICES = 100;
+
+   static final long SEED = 9876;
+
+   private static boolean parseParameters(String[] args) {
+
+   if(args.length  0) {
+   if(args.length != 1) {
+   System.err.println(Usage: LabelPropagation 
vertex path edge path output path num iterations);
--- End diff --

Thanks @uce ^^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1683] [jobmanager]� Fix scheduling pref...

2015-03-12 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/476#issuecomment-78471123
  
Looks good, very important fix!

+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358668#comment-14358668
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-78480464
  
@StephanEwen  I have added the license. 


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1350) Add blocking intermediate result partitions

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358681#comment-14358681
 ] 

ASF GitHub Bot commented on FLINK-1350:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/471#issuecomment-78481915
  
PS: where we wouldn't need the asynchronous implementations are for local 
reads. There it should be perfectly fine to just synchronously read the spilled 
partitions. 


 Add blocking intermediate result partitions
 ---

 Key: FLINK-1350
 URL: https://issues.apache.org/jira/browse/FLINK-1350
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi

 The current state of runtime support for intermediate results (see 
 https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only 
 supports pipelined intermediate results (with back pressure), which are 
 consumed as they are being produced.
 The next variant we need to support are blocking intermediate results 
 (without back pressure), which are fully produced before being consumed. This 
 is for example desirable in situations, where we currently may run into 
 deadlocks when running pipelined.
 I will start working on this on top of my pending pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1350] [runtime] Add blocking result par...

2015-03-12 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/471#issuecomment-78481915
  
PS: where we wouldn't need the asynchronous implementations are for local 
reads. There it should be perfectly fine to just synchronously read the spilled 
partitions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26300742
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

DONE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26299783
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

I think this behavior was intended, by checking the DelimitedInputFormat 
UnitTest I found this, I will edit my test case, to fix the problem.

![selection_019](https://cloud.githubusercontent.com/assets/2375289/6618391/d3366786-c8c2-11e4-8854-2a38a8da4da3.png)
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358641#comment-14358641
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26299783
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

I think this behavior was intended, by checking the DelimitedInputFormat 
UnitTest I found this, I will edit my test case, to fix the problem.

![selection_019](https://cloud.githubusercontent.com/assets/2375289/6618391/d3366786-c8c2-11e4-8854-2a38a8da4da3.png)
 


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358660#comment-14358660
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26300742
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

DONE


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-78480294
  
@rmetzger  This is the standard Tweet format as per Twitter. Here You can 
find [Twitter Official 
Documentation](https://dev.twitter.com/overview/api/tweets). My parser is 
retrieving all the tweet except Bounding Box object, and retweeted object. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358648#comment-14358648
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26300021
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

Cool. 
Then I'd recommend to add a similar test to your test.


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26300021
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

Cool. 
Then I'd recommend to add a similar test to your test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1350] [runtime] Add blocking result par...

2015-03-12 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/471#issuecomment-78479167
  
The root cause of all asynchronous operations is that TCP connections are 
shared among multiple logical channels, which are handled by a fixed number of 
network I/O threads. In case of synchronous I/O operations, we would 
essentially block progress on all channels sharing that connection/thread.

 When do you issue the read requests to the reader (from disk)? Is that 
dependent on when the TCP channel is writable?

Yes, the network I/O thread has subpartitions queued for transfer and only 
queries them for data when the TCP channel is writable.

 When the read request is issued, before the response comes, if the 
subpartition de-registered from netty and the re-registered one a buffer has 
returned from disk?

Exactly. If there is no buffer available, the read request is issued and 
the next available subpartition is tried. If none of the subpartitions has data 
available, the network I/O thread works on another TCP channel (this is done by 
Netty, which multiplexes all TCP channels over a fixed amount of network I/O 
threads).

 Given many spilled partitions, which one is read from next? How is the 
buffer assignment realized? There is a lot of trickyness in there, because disk 
I/O performs well with longer sequential reads, but that may occupy many 
buffers that are missing for other reads into writable TCP channels.

Initially this depends on the order of partition requests. After that on 
the order of data availability. 
Regarding the buffers: trickyness, indeed. The current state with the 
buffers is kind of an intermediate solution as we will issue zero-transfer 
reads in the future (requires minimal changes), where we essentially only 
trigger reads to gather offsets. The reads are then only affected by TCP 
channel writability. Currently, the reads are batched in sizes of two buffers 
(64k).



Regarding @tillrohrmann's changes: what was this exactly? Then I can verify 
that the changes are not undone.

In general (minus the question regarding Till's changes) I think this PR is 
good to merge. The tests are stable and passing. There will be definitely a 
need to do refactorings and performance evaluations, but I think that is to be 
expected with such a big change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1350) Add blocking intermediate result partitions

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358657#comment-14358657
 ] 

ASF GitHub Bot commented on FLINK-1350:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/471#issuecomment-78479167
  
The root cause of all asynchronous operations is that TCP connections are 
shared among multiple logical channels, which are handled by a fixed number of 
network I/O threads. In case of synchronous I/O operations, we would 
essentially block progress on all channels sharing that connection/thread.

 When do you issue the read requests to the reader (from disk)? Is that 
dependent on when the TCP channel is writable?

Yes, the network I/O thread has subpartitions queued for transfer and only 
queries them for data when the TCP channel is writable.

 When the read request is issued, before the response comes, if the 
subpartition de-registered from netty and the re-registered one a buffer has 
returned from disk?

Exactly. If there is no buffer available, the read request is issued and 
the next available subpartition is tried. If none of the subpartitions has data 
available, the network I/O thread works on another TCP channel (this is done by 
Netty, which multiplexes all TCP channels over a fixed amount of network I/O 
threads).

 Given many spilled partitions, which one is read from next? How is the 
buffer assignment realized? There is a lot of trickyness in there, because disk 
I/O performs well with longer sequential reads, but that may occupy many 
buffers that are missing for other reads into writable TCP channels.

Initially this depends on the order of partition requests. After that on 
the order of data availability. 
Regarding the buffers: trickyness, indeed. The current state with the 
buffers is kind of an intermediate solution as we will issue zero-transfer 
reads in the future (requires minimal changes), where we essentially only 
trigger reads to gather offsets. The reads are then only affected by TCP 
channel writability. Currently, the reads are batched in sizes of two buffers 
(64k).



Regarding @tillrohrmann's changes: what was this exactly? Then I can verify 
that the changes are not undone.

In general (minus the question regarding Till's changes) I think this PR is 
good to merge. The tests are stable and passing. There will be definitely a 
need to do refactorings and performance evaluations, but I think that is to be 
expected with such a big change.


 Add blocking intermediate result partitions
 ---

 Key: FLINK-1350
 URL: https://issues.apache.org/jira/browse/FLINK-1350
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi

 The current state of runtime support for intermediate results (see 
 https://github.com/apache/incubator-flink/pull/254 and FLINK-986) only 
 supports pipelined intermediate results (with back pressure), which are 
 consumed as they are being produced.
 The next variant we need to support are blocking intermediate results 
 (without back pressure), which are fully produced before being consumed. This 
 is for example desirable in situations, where we currently may run into 
 deadlocks when running pipelined.
 I will start working on this on top of my pending pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-78480464
  
@StephanEwen  I have added the license. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1695) Create machine learning library

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359846#comment-14359846
 ] 

ASF GitHub Bot commented on FLINK-1695:
---

Github user junwucs commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78766228
  
Hi, @tillrohrmann, it's a great job! I am interested in ALS implementation 
and preparing to use it in our work. Recently I worked on branch of 
https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, 
such as org.apache.flink.api.common.functions.InvalidTypesException: Could not 
convert GenericArrayType to Class. So I found this new branch and hope your 
work can help me, thanks a lot!


 Create machine learning library
 ---

 Key: FLINK-1695
 URL: https://issues.apache.org/jira/browse/FLINK-1695
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Create the infrastructure for Flink's machine learning library. This includes 
 the creation of the module structure and the implementation of basic types 
 such as vectors and matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...

2015-03-12 Thread junwucs
Github user junwucs commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78765829
  
Hi, tillrohrmann!  It's a great job! I am interested in ALS implementation 
and preparing to use it in our work. Recently I worked on branch of 
https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, 
such as org.apache.flink.api.common.functions.InvalidTypesException: Could not 
convert GenericArrayType to Class. So I found this new branch and hope your 
work can help me, thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...

2015-03-12 Thread junwucs
Github user junwucs commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78766228
  
Hi, @tillrohrmann, it's a great job! I am interested in ALS implementation 
and preparing to use it in our work. Recently I worked on branch of 
https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, 
such as org.apache.flink.api.common.functions.InvalidTypesException: Could not 
convert GenericArrayType to Class. So I found this new branch and hope your 
work can help me, thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1695) Create machine learning library

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359844#comment-14359844
 ] 

ASF GitHub Bot commented on FLINK-1695:
---

Github user junwucs commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78765829
  
Hi, tillrohrmann!  It's a great job! I am interested in ALS implementation 
and preparing to use it in our work. Recently I worked on branch of 
https://github.com/tillrohrmann/flink-perf and meet some tricky bugs unsolved, 
such as org.apache.flink.api.common.functions.InvalidTypesException: Could not 
convert GenericArrayType to Class. So I found this new branch and hope your 
work can help me, thanks a lot!


 Create machine learning library
 ---

 Key: FLINK-1695
 URL: https://issues.apache.org/jira/browse/FLINK-1695
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Create the infrastructure for Flink's machine learning library. This includes 
 the creation of the module structure and the implementation of basic types 
 such as vectors and matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: IntelliJ Setup Guide

2015-03-12 Thread nltran
GitHub user nltran opened a pull request:

https://github.com/apache/flink/pull/480

IntelliJ Setup Guide

Following my interactions with fhueske on the IRC channel, I summarized the 
steps to correctly setup IntelliJ with flink core.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nltran/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #480


commit f8f6b8bf52ae21d549cf44f323633ddeca6c9dbf
Author: Nam-Luc Tran namluc.t...@euranova.eu
Date:   2015-03-11T17:45:38Z

added documentation on setup of intelliJ

commit 170c6330d917c878a8c10e4f0b302a9e14b8aa09
Author: Nam-Luc Tran namluc.t...@euranova.eu
Date:   2015-03-12T15:13:21Z

Merge branch 'master' of https://github.com/nltran/flink




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: IntelliJ Setup Guide

2015-03-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-12 Thread Adnan Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358863#comment-14358863
 ] 

Adnan Khan commented on FLINK-1388:
---

Also - Why should pass the name of all the fields that should be written out? 
If I have a reference to the POJO itself, I can always just grab the 
{{Fields[]}} using Java's reflection api. I feel I've missed something.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: IntelliJ Setup Guide

2015-03-12 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/480#issuecomment-78508276
  
Great setup guide for IntelliJ @nltran. I'll link it to the internals and 
merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-12 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78541637
  
@tillrohrmann, could you fire JIRA for this one? Should help when we want 
to manage releases or merge between branches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1450] Added fold operator for the Strea...

2015-03-12 Thread akshaydixi
GitHub user akshaydixi opened a pull request:

https://github.com/apache/flink/pull/481

[FLINK-1450] Added fold operator for the Streaming API

Tried to follow the steps described in the JIRA Issue comments:
- Created a FoldFunction and a RichFoldFunction
- Created a StreamFoldInvokable
- Integrated it to the DataStream for both Java and Scala
- Implemented StreamFoldTest based on StreamReduceTest

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshaydixi/flink FLINK-1450

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #481


commit 9848f23a0a73228fb45346d69aee68091b135d3d
Author: Akshay Dixit akshayd...@gmail.com
Date:   2015-03-12T17:49:31Z

[FLINK-1450] Added fold operator for the Streaming API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1450) Add Fold operator to the Streaming api

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359069#comment-14359069
 ] 

ASF GitHub Bot commented on FLINK-1450:
---

GitHub user akshaydixi opened a pull request:

https://github.com/apache/flink/pull/481

[FLINK-1450] Added fold operator for the Streaming API

Tried to follow the steps described in the JIRA Issue comments:
- Created a FoldFunction and a RichFoldFunction
- Created a StreamFoldInvokable
- Integrated it to the DataStream for both Java and Scala
- Implemented StreamFoldTest based on StreamReduceTest

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshaydixi/flink FLINK-1450

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #481


commit 9848f23a0a73228fb45346d69aee68091b135d3d
Author: Akshay Dixit akshayd...@gmail.com
Date:   2015-03-12T17:49:31Z

[FLINK-1450] Added fold operator for the Streaming API




 Add Fold operator to the Streaming api
 --

 Key: FLINK-1450
 URL: https://issues.apache.org/jira/browse/FLINK-1450
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Gyula Fora
Priority: Minor
  Labels: starter

 The streaming API currently doesn't support a fold operator.
 This operator would work as the foldLeft method in Scala. This would allow 
 effective implementations in a lot of cases where a the simple reduce is 
 inappropriate due to different return types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1677][gelly] Suppressed Sysout Printing...

2015-03-12 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78622670
  
Hi @uce , 

First of all thanks! :)
Your comments are entirely true, especially the one with 
MultipleProgramsTestBase. I spent some time to figure out how to unit test this 
behavior. I got advice from different people... hopefully this last version 
will be satisfactory. 

@vasia once travis passes, if nobody else has objections, maybe we can 
merge this :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359440#comment-14359440
 ] 

ASF GitHub Bot commented on FLINK-1677:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78622670
  
Hi @uce , 

First of all thanks! :)
Your comments are entirely true, especially the one with 
MultipleProgramsTestBase. I spent some time to figure out how to unit test this 
behavior. I got advice from different people... hopefully this last version 
will be satisfactory. 

@vasia once travis passes, if nobody else has objections, maybe we can 
merge this :)



 Properly Suppress Sysout Printing for the Degrees with exception test suite
 ---

 Key: FLINK-1677
 URL: https://issues.apache.org/jira/browse/FLINK-1677
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 Suppress systout printing similar to:
 flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java
 Speedup test suite by reusing a mini-cluster similar to:
 flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78502251
  
[Some of the tests 
failed](https://travis-ci.org/apache/flink/builds/53879571) but I'm not sure 
whether this is related to the changes or not. I get the following error in the 
three failed Travis jobs:

```
 [ERROR] Failed to execute goal on project XXX: Could not resolve 
dependencies for project org.apache.flink.archetypetest:testArtifact:jar:0.1: 
Could not find artifact 
org.apache.flink:flink-shaded-include-yarn:jar:0.9-SNAPSHOT in 
sonatype-snapshots (https://oss.sonatype.org/content/repositories/snapshots/)
```

Could it be that the new artifact has not yet made it to Sonatype?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-12 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358878#comment-14358878
 ] 

Fabian Hueske commented on FLINK-1388:
--

Sure you can do that. I thought it might be a good idea to give the option to 
select the fields to write out and their order.
But we of course start with writing all fields.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...

2015-03-12 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/454#issuecomment-78560058
  
How much time does this new shading add to the total compile? It used to be 
around 16-18mins for me using mvn clean install -DskipTests.

I just did merge today and it has been more than 25 mins and has not 
complete the build =(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359139#comment-14359139
 ] 

ASF GitHub Bot commented on FLINK-1605:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/454#issuecomment-78560058
  
How much time does this new shading add to the total compile? It used to be 
around 16-18mins for me using mvn clean install -DskipTests.

I just did merge today and it has been more than 25 mins and has not 
complete the build =(


 Create a shaded Hadoop fat jar to resolve library version conflicts
 ---

 Key: FLINK-1605
 URL: https://issues.apache.org/jira/browse/FLINK-1605
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 As per mailing list discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1695) Create machine learning library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1695:


 Summary: Create machine learning library
 Key: FLINK-1695
 URL: https://issues.apache.org/jira/browse/FLINK-1695
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Create the infrastructure for Flink's machine learning library. This includes 
the creation of the module structure and the implementation of basic types such 
as vectors and matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1697) Add alternating least squares algorithm for matrix factorization to ML library

2015-03-12 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1697:
-
Summary: Add alternating least squares algorithm for matrix factorization 
to ML library  (was: Add alternating least squares algorithm to ML library)

 Add alternating least squares algorithm for matrix factorization to ML library
 --

 Key: FLINK-1697
 URL: https://issues.apache.org/jira/browse/FLINK-1697
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Add alternating least squares algorithm to ML library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1697) Add alternating least squares algorithm for matrix factorization to ML library

2015-03-12 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1697:
-
Description: Add alternating least squares algorithm for matrix 
factorization to ML library  (was: Add alternating least squares algorithm to 
ML library)

 Add alternating least squares algorithm for matrix factorization to ML library
 --

 Key: FLINK-1697
 URL: https://issues.apache.org/jira/browse/FLINK-1697
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Add alternating least squares algorithm for matrix factorization to ML library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1696) Add multiple linear regression to ML library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1696:


 Summary: Add multiple linear regression to ML library
 Key: FLINK-1696
 URL: https://issues.apache.org/jira/browse/FLINK-1696
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Add multiple linear regression to ML library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1697) Add alternating least squares algorithm to ML library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1697:


 Summary: Add alternating least squares algorithm to ML library
 Key: FLINK-1697
 URL: https://issues.apache.org/jira/browse/FLINK-1697
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Add alternating least squares algorithm to ML library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1698) Add polynomial base feature mapper to ML library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1698:


 Summary: Add polynomial base feature mapper to ML library
 Key: FLINK-1698
 URL: https://issues.apache.org/jira/browse/FLINK-1698
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Add feature mapper which maps a vector into the polynomial feature space. This 
can be used as a preprocessing step prior to applying a {{Learner}} of Flink's 
ML library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1695] Kick off of Flink's machine learn...

2015-03-12 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78709707
  
I filed the corresponding JIRA issues. I'll update the commits with the 
respective JIRA tags once the PR will be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1695) Create machine learning library

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359673#comment-14359673
 ] 

ASF GitHub Bot commented on FLINK-1695:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78709707
  
I filed the corresponding JIRA issues. I'll update the commits with the 
respective JIRA tags once the PR will be merged.


 Create machine learning library
 ---

 Key: FLINK-1695
 URL: https://issues.apache.org/jira/browse/FLINK-1695
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Create the infrastructure for Flink's machine learning library. This includes 
 the creation of the module structure and the implementation of basic types 
 such as vectors and matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358483#comment-14358483
 ] 

ASF GitHub Bot commented on FLINK-1677:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78459308
  
Thanks for the PR. I think it's a good fix, **but**:

I noticed that 
`DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId`is sometimes failing 
(for example here: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt). I 
initially thought that it was a problem of my runtime changes in #471, but the 
problem is the following:

You expect a certain exception msg (The edge src/trg id could not be found 
within the vertexIds) for the test to pass. But in the getDegrees() test there 
is a reducer following the coGroup (this one throws the expected exception), 
which can also fail with another exception, because of the cancelling before in 
the coCroup. If this other exception gets reported to the client, the test 
fails.

I think it would be enough to test that the job fails or even think about 
whether it makes sense to cover this in a unit test of the operator, because 
imo it all boils down to the behaviour of the degree computing coGroup, which 
throws the expected `NoSuchElementException`.

---

Other than that:
- Initially I thought whether you should extends MultipleProgramsTestBase 
as in other test cases, but I think that doesn't work with the Exception 
checking. In the future, we should add some test utils to test failing 
high-level programs.
- Typo in the commit msg: suppres**s**ed

If you think it's OK to just test that the program is failing, go ahead and 
change it. After that it would be good to merge.



 Properly Suppress Sysout Printing for the Degrees with exception test suite
 ---

 Key: FLINK-1677
 URL: https://issues.apache.org/jira/browse/FLINK-1677
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 Suppress systout printing similar to:
 flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java
 Speedup test suite by reusing a mini-cluster similar to:
 flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358479#comment-14358479
 ] 

ASF GitHub Bot commented on FLINK-1576:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/470#issuecomment-78459161
  
Hi! If no comments here, I'd like to merge this one :)


 Change the Gelly examples to be consistent with the other Flink examples
 

 Key: FLINK-1576
 URL: https://issues.apache.org/jira/browse/FLINK-1576
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.8.0
Reporter: Andra Lungu
Assignee: Vasia Kalavri
  Labels: easyfix

 The current Gelly examples just work on default input data. 
 If we look at the other Flink examples, e.g. Connected Components, they also 
 allow input data to be read from a text file passed as a parameter to the 
 main method. 
  
 It would be nice to follow the same approach in our examples. A first step in 
 that direction is the SSSP example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1677][gelly] Suppresed Sysout Printing

2015-03-12 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78459308
  
Thanks for the PR. I think it's a good fix, **but**:

I noticed that 
`DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId`is sometimes failing 
(for example here: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt). I 
initially thought that it was a problem of my runtime changes in #471, but the 
problem is the following:

You expect a certain exception msg (The edge src/trg id could not be found 
within the vertexIds) for the test to pass. But in the getDegrees() test there 
is a reducer following the coGroup (this one throws the expected exception), 
which can also fail with another exception, because of the cancelling before in 
the coCroup. If this other exception gets reported to the client, the test 
fails.

I think it would be enough to test that the job fails or even think about 
whether it makes sense to cover this in a unit test of the operator, because 
imo it all boils down to the behaviour of the degree computing coGroup, which 
throws the expected `NoSuchElementException`.

---

Other than that:
- Initially I thought whether you should extends MultipleProgramsTestBase 
as in other test cases, but I think that doesn't work with the Exception 
checking. In the future, we should add some test utils to test failing 
high-level programs.
- Typo in the commit msg: suppres**s**ed

If you think it's OK to just test that the program is failing, go ahead and 
change it. After that it would be good to merge.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1677) Properly Suppress Sysout Printing for the Degrees with exception test suite

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358484#comment-14358484
 ] 

ASF GitHub Bot commented on FLINK-1677:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78459525
  
+1 for what @uce said :)


 Properly Suppress Sysout Printing for the Degrees with exception test suite
 ---

 Key: FLINK-1677
 URL: https://issues.apache.org/jira/browse/FLINK-1677
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 Suppress systout printing similar to:
 flink-clients/src/test/java/org/apache/flink/client/program/ExecutionPlanAfterExecutionTest.java
 Speedup test suite by reusing a mini-cluster similar to:
 flink-tests/src/test/java/org/apache/flink/test/recovery/SimpleRecoveryITCase.java
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358566#comment-14358566
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26295918
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

I tried to do so, but the problem is that the reachedEnd() is updated 
inside nextRecord() in DelimitedInputFormat. I tried to make a condition when 
the readPos == limit, do not call nextRecord, but I could not because these 
fields are private. 

So, I have to stop reading when the reachedEnd is true, which in turn means 
that the returned object from nextRecord [ the tweet ] is null. To add the 
required assert, all the tests will fail. I do not know how to avoid this 
problem, do u have a suggestion ?


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358570#comment-14358570
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26296323
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

It worked now, Thanks


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26296323
  
--- Diff: 
flink-contrib/src/main/java/org/apache/flink/contrib/tweetinputformat/io/SimpleTweetInputFormat.java
 ---
@@ -0,0 +1,67 @@
+package org.apache.flink.contrib.tweetinputformat.io;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.GenericTypeInfo;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import org.codehaus.jackson.JsonParseException;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+
+public class SimpleTweetInputFormat extends DelimitedInputFormatTweet 
implements ResultTypeQueryableTweet {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(SimpleTweetInputFormat.class);
+
+@Override
+public Tweet nextRecord(Tweet record) throws IOException {
+Boolean result = false;
+
+do {
+try {
+record.reset(0);
+record = super.nextRecord(record);
+result = true;
+
+} catch (JsonParseException e) {
+result = false;
+
+}
+} while (!result);
+
+return record;
+}
+
+@Override
+public Tweet readRecord(Tweet reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
+
+
+InputStreamReader jsonReader = new InputStreamReader(new 
ByteArrayInputStream(bytes));
+jsonReader.skip(offset);
+
+JSONParser parser = new JSONParser();
--- End diff --

It worked now, Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread Elbehery
Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26296679
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
--- End diff --

DONE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358576#comment-14358576
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26296679
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
--- End diff --

DONE


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1576) Change the Gelly examples to be consistent with the other Flink examples

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358515#comment-14358515
 ] 

ASF GitHub Bot commented on FLINK-1576:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/470#discussion_r26293641
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java
 ---
@@ -120,4 +120,58 @@ public Double map(Tuple2Long, Long sumTuple) {
private static final class ProjectVertexId implements 
MapFunctionTuple2Long,Long, Long {
public Long map(Tuple2Long, Long value) { return value.f0; }
}
+
+   @Override
+   public String getDescription() {
+   return Graph Metrics Example;
+   }
+
+   // 
**
+   // UTIL METHODS
+   // 
**
+
+   private static boolean fileOutput = false;
+
+   private static String edgesInputPath = null;
+
+   static final int NUM_VERTICES = 100;
+
+   static final long SEED = 9876;
+
+   private static boolean parseParameters(String[] args) {
+
+   if(args.length  0) {
+   if(args.length != 1) {
+   System.err.println(Usage: LabelPropagation 
vertex path edge path output path num iterations);
--- End diff --

Thanks @uce ^^


 Change the Gelly examples to be consistent with the other Flink examples
 

 Key: FLINK-1576
 URL: https://issues.apache.org/jira/browse/FLINK-1576
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.8.0
Reporter: Andra Lungu
Assignee: Vasia Kalavri
  Labels: easyfix

 The current Gelly examples just work on default input data. 
 If we look at the other Flink examples, e.g. Connected Components, they also 
 allow input data to be read from a text file passed as a parameter to the 
 main method. 
  
 It would be nice to follow the same approach in our examples. A first step in 
 that direction is the SSSP example. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1693) Deprecate the Spargel API

2015-03-12 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1693:


 Summary: Deprecate the Spargel API
 Key: FLINK-1693
 URL: https://issues.apache.org/jira/browse/FLINK-1693
 Project: Flink
  Issue Type: Task
  Components: Spargel
Affects Versions: 0.9
Reporter: Vasia Kalavri


For the upcoming 0.9 release, we should mark all user-facing methods from the 
Spargel API as deprecated, with a warning that we are going to remove it at 
some point.

We should also add a comment in the docs and point people to Gelly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1694) Change the split between create/run of a vertex-centric iteration

2015-03-12 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-1694:
-
Issue Type: Improvement  (was: Task)

 Change the split between create/run of a vertex-centric iteration
 -

 Key: FLINK-1694
 URL: https://issues.apache.org/jira/browse/FLINK-1694
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Reporter: Vasia Kalavri

 Currently, the vertex-centric API in Gelly looks like this:
 {code:java}
 Graph inputGaph = ... //create graph
 VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
 ... // configure the iteration
 Graph newGraph = inputGaph.runVertexCentricIteration(iteration);
 {code}
 We have this create/run split, in order to expose the iteration object and be 
 able to call the public methods of VertexCentricIteration.
 However, this is not very nice and might lead to errors, if create and run 
 are  mistakenly called on different graph objects.
 One suggestion is to change this to the following:
 {code:java}
 VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
 ... // configure the iteration
 Graph newGraph = iteration.result();
 {code}
 or to go with a single run call, where we add an IterationConfiguration 
 object as a parameter and we don't expose the iteration object to the user at 
 all:
 {code:java}
 IterationConfiguration parameters  = ...
 Graph newGraph = inputGraph.runVertexCentricIteration(parameters);
 {code}
 and we can also have a simplified method where no configuration is passed.
 What do you think?
 Personally, I like the second option a bit more.
 -Vasia.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1694) Change the split between create/run of a vertex-centric iteration

2015-03-12 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1694:


 Summary: Change the split between create/run of a vertex-centric 
iteration
 Key: FLINK-1694
 URL: https://issues.apache.org/jira/browse/FLINK-1694
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri


Currently, the vertex-centric API in Gelly looks like this:

{code:java}
Graph inputGaph = ... //create graph
VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
... // configure the iteration
Graph newGraph = inputGaph.runVertexCentricIteration(iteration);
{code}

We have this create/run split, in order to expose the iteration object and be 
able to call the public methods of VertexCentricIteration.

However, this is not very nice and might lead to errors, if create and run are  
mistakenly called on different graph objects.

One suggestion is to change this to the following:

{code:java}
VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
... // configure the iteration
Graph newGraph = iteration.result();
{code}

or to go with a single run call, where we add an IterationConfiguration object 
as a parameter and we don't expose the iteration object to the user at all:

{code:java}
IterationConfiguration parameters  = ...
Graph newGraph = inputGraph.runVertexCentricIteration(parameters);
{code}

and we can also have a simplified method where no configuration is passed.

What do you think?
Personally, I like the second option a bit more.

-Vasia.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-03-12 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26288947
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

Can you make the test fail if `tweet` is null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358404#comment-14358404
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/442#discussion_r26288947
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/SimpleTweetInputFormatTest.java
 ---
@@ -0,0 +1,76 @@
+package org.apache.flink.contrib;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.contrib.tweetinputformat.io.SimpleTweetInputFormat;
+import org.apache.flink.contrib.tweetinputformat.model.tweet.Tweet;
+import 
org.apache.flink.contrib.tweetinputformat.model.tweet.entities.HashTags;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+
+public class SimpleTweetInputFormatTest {
+
+private Tweet tweet;
+
+private SimpleTweetInputFormat simpleTweetInputFormat;
+
+private FileInputSplit fileInputSplit;
+
+protected Configuration config;
+
+protected File tempFile;
+
+
+@Before
+public void testSetUp() {
+
+
+simpleTweetInputFormat = new SimpleTweetInputFormat();
+
+File jsonFile = new  
File(../flink-contrib/src/main/resources/HashTagTweetSample.json);
+
+fileInputSplit = new FileInputSplit(0, new 
Path(jsonFile.getPath()), 0, jsonFile.length(), new String[] {localhost});
+}
+
+@Test
+public void testTweetInput() throws Exception {
+
+
+simpleTweetInputFormat.open(fileInputSplit);
+ListString result;
+
+while (!simpleTweetInputFormat.reachedEnd()) {
+tweet = new Tweet();
+tweet = simpleTweetInputFormat.nextRecord(tweet);
+
+if(tweet != null){
--- End diff --

Can you make the test fail if `tweet` is null?


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1677][gelly] Suppresed Sysout Printing

2015-03-12 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/475#issuecomment-78459525
  
+1 for what @uce said :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...

2015-03-12 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293223
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
+
+import org.apache.flink.api.java.tuple.Tuple5;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Vertex;
+
+import java.io.Serializable;
+
+/**
+ * A wrapper around Tuple5 used to avoid duplicate vertex ids and to 
improve readability in
+ * the {@link org.apache.flink.graph.Graph#getTriplets()} method.
+ *
+ * @param K the vertex key type
+ * @param VV the vertex value type
+ * @param EV the edge value type
+ */
+public class Triplet K extends ComparableK  Serializable, VV extends 
Serializable, EV extends Serializable
+   extends Tuple5K, K, VV, VV, EV {
+
+   public Triplet() {}
+
+   public Triplet(K srcId, K trgId, VV srcVal, VV trgVal, EV edgeVal) {
+   super(srcId, trgId, srcVal, trgVal, edgeVal);
--- End diff --

Wouldn't a srcVertex, trgVertex, edge constructor be more 
useful/intuitive for a user?
We can keep this one too, but better add a javadoc :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1633) Add getTriplets() Gelly method

2015-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358497#comment-14358497
 ] 

ASF GitHub Bot commented on FLINK-1633:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293166
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
+
+import org.apache.flink.api.java.tuple.Tuple5;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Vertex;
+
+import java.io.Serializable;
+
+/**
+ * A wrapper around Tuple5 used to avoid duplicate vertex ids and to 
improve readability in
--- End diff --

I would prefer a description about what a Triplet is conceptually and how 
it can be used by user here, instead of explaining that it wraps a Tuple5 :)) 


 Add getTriplets() Gelly method
 --

 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor
  Labels: starter

 In some graph algorithms, it is required to access the graph edges together 
 with the vertex values of the source and target vertices. For example, 
 several graph weighting schemes compute some kind of similarity weights for 
 edges, based on the attributes of the source and target vertices. This issue 
 proposes adding a convenience Gelly method that generates a DataSet of 
 srcVertex, Edge, TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1633][gelly] Added getTriplets() method...

2015-03-12 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/452#discussion_r26293166
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Triplet.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.utils;
+
+import org.apache.flink.api.java.tuple.Tuple5;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Vertex;
+
+import java.io.Serializable;
+
+/**
+ * A wrapper around Tuple5 used to avoid duplicate vertex ids and to 
improve readability in
--- End diff --

I would prefer a description about what a Triplet is conceptually and how 
it can be used by user here, instead of explaining that it wraps a Tuple5 :)) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Remove -j and -a parameters which seemed no lo...

2015-03-12 Thread hsaputra
GitHub user hsaputra opened a pull request:

https://github.com/apache/flink/pull/482

Remove -j and -a parameters which seemed no longer valid in the doc example 
for YARN

From:

./bin/flink run -j 
./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \
  -a 1 hdfs:////apache-license-v2.txt 
hdfs:///.../wordcount-result.txt

To:

./bin/flink run ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT 
}}-WordCount.jar \
   hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt

Looking at http://ci.apache.org/projects/flink/flink-docs-master/cli.html 
seemed like -j and -a are no longer valid?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsaputra/flink fix_doc_run_onyarn_params

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #482


commit 1ae0c5779b0c8b64feb9fd5f00b51a9b83cd4e0e
Author: Henry Saputra henry.sapu...@gmail.com
Date:   2015-03-12T23:18:03Z

Remove -j and -a parameters which seemed no longer valid in the doc example
for submit job to Flink run in YARN.

From:

./bin/flink run -j 
./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT }}-WordCount.jar \
  -a 1 hdfs:////apache-license-v2.txt 
hdfs:///.../wordcount-result.txt

To:

./bin/flink run ./examples/flink-java-examples-{{site.FLINK_VERSION_SHORT 
}}-WordCount.jar \
   hdfs:////apache-license-v2.txt hdfs:///.../wordcount-result.txt




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1594] [streaming] [wip] DataStream supp...

2015-03-12 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/472#issuecomment-78431621
  
Ok seems like there is an issue with chained tasks, so lets not merge it 
until that is fixed. Gabor is working on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >