[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275169#comment-15275169
 ] 

Greg Hogan commented on FLINK-3879:
---

What are 2044's authority scores after two iterations?

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3618) Rename abstract UDF classes in Scatter-Gather implementation

2016-05-07 Thread Martin Junghanns (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275179#comment-15275179
 ] 

Martin Junghanns commented on FLINK-3618:
-

[~greghogan] thanks for taking this. I'm currently a bit busy (building graph 
pattern matching on Flink).

> Rename abstract UDF classes in Scatter-Gather implementation
> 
>
> Key: FLINK-3618
> URL: https://issues.apache.org/jira/browse/FLINK-3618
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.0.1
>Reporter: Martin Junghanns
>Assignee: Greg Hogan
>Priority: Minor
>
> We now offer three Vertex-centric computing abstractions:
> * Pregel
> * Gather-Sum-Apply
> * Scatter-Gather
> Each of these abstractions provides abstract classes that need to be 
> implemented by the user:
> * Pregel: {{ComputeFunction}}
> * GSA: {{GatherFunction}}, {{SumFunction}}, {{ApplyFunction}}
> * Scatter-Gather: {{MessagingFunction}}, {{VertexUpdateFunction}}
> In Pregel and GSA, the names of those functions follow the name of the 
> abstraction or the name suggested in the corresponding papers. For 
> consistency of the API, I propose to rename {{MessageFunction}} to 
> {{ScatterFunction}} and {{VertexUpdateFunction}} to {{GatherFunction}}.
> Also for consistency, I would like to change the parameter order in 
> {{Graph.runScatterGatherIteration(VertexUpdateFunction f1, MessagingFunction 
> f2}} to  {{Graph.runScatterGatherIteration(ScatterFunction f1, GatherFunction 
> f2}} (like in {{Graph.runGatherSumApplyFunction(...)}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275240#comment-15275240
 ] 

GaoLun commented on FLINK-3879:
---

My: (1,0.847998304005088,0.0), (2,0.5299989400031799,0.5144957554275266), 
(3,0.0,0.8574929257125442)
Yours: (1,0.8479983040050879,0.0), (2,0.5299989400031799,0.5240974256643347), 
(3,0.0,0.8516583167045438)

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-07 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275136#comment-15275136
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-3211:


[~rmetzger]
Perhaps there is one more possible solution: use only the AWS Java SDK for the 
connector to avoid the protobuf version clash issue?
The consumer only uses 1 class (UserRecords) from the KCL which we can 
replicate to the code with notes of its source from AWS, and it shouldn't be 
too hard to replace the KPL code in the producer either.

Might not be the cleanest approach though, what do you think? If all not works, 
I'm fine with publishing it as a dataArtisans GitHub project until cleaner 
integration into the Flink codebase is possible :)

> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> 

[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-07 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275172#comment-15275172
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-3211:


If you want, I can start looking into it tomorrow. Have the next 3 days free to 
work on it :)

> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
> [4] http://data-artisans.com/kafka-flink-a-practical-how-to/
> [5] 
> http://docs.aws.amazon.com//kinesis/latest/dev/developing-producers-with-kpl.html#d0e4998



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-07 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275162#comment-15275162
 ] 

Robert Metzger commented on FLINK-3211:
---

That's a good idea.
I think we need to figure out how hard it would be to implement a producer 
without using the producer library. 

> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
> [4] http://data-artisans.com/kafka-flink-a-practical-how-to/
> [5] 
> http://docs.aws.amazon.com//kinesis/latest/dev/developing-producers-with-kpl.html#d0e4998



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275242#comment-15275242
 ] 

GaoLun commented on FLINK-3879:
---

Hub values are same but authority values have a little difference.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-07 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217636194
  
@ankitcha you could write your own reporter that simply wraps the reporters 
you want to use, simply forwarding the calls to all of them. this should 
actually be really easy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275160#comment-15275160
 ] 

GaoLun commented on FLINK-3879:
---

Hi [~greghogan], the PR of FLINK-2044 has been updated and support returning 
both value now. And i have changed the normalization method from sum to square 
sum.
I wrote a simple test for your implementation to compare the result with mine, 
but i find the result is different.
For a simple graph: {{1->2, 1->3, 2->3}} with one iteration
result :
Mine:  {{(1,0.8320502943378436,0.0), (2,0.554700196225229,0.4472135954999579), 
(3,0.0,0.894427190159)}}
Yours: {{(1,0.8320502943378437,0.0), (2,0.5547001962252291,0.5144957554275265), 
(3,0.0,0.8574929257125441)}}
We can calculate the hub/authority value manually, the result should be:
{{(1, sqrt(9/13), 0.0), (2,sqrt(4/13), 1/sqrt(5)), (3, 0.0, 2/sqrt(5))}}
which is a little different with yours.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275303#comment-15275303
 ] 

Fabian Hueske commented on FLINK-3881:
--

Hi [~mans2singh], thanks for creating this issue!
I assigned it to you and gave you contributor permissions for JIRA. You can now 
assign issues to yourself.

Cheers, Fabian

> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-3881:
-
Assignee: Mans Singh

> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mans Singh updated FLINK-3881:
--
Labels: documentation java8 sample  (was: docuentation java8 sample)

> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: documentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-07 Thread GaoLun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275265#comment-15275265
 ] 

GaoLun commented on FLINK-3879:
---

yes, make sense.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)
Mans Singh created FLINK-3881:
-

 Summary: Error in Java 8 Documentation Sample
 Key: FLINK-3881
 URL: https://issues.apache.org/jira/browse/FLINK-3881
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.3
 Environment: All
Reporter: Mans Singh
Priority: Minor


The java8 documentation 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
samples (one of them) included below:

DataSet input = env.fromElements(1, 2, 3);

// collector type must be declared
input.flatMap((Integer number, Collector out) -> {
for(int i = 0; i < number; i++) {
out.collect("a");
}
})
// returns "a", "a", "aa", "a", "aa" , "aaa"
.print();

I tried the sample and I think there are two issues with it (unless I have 
missed anything):
1. The DataSet should be DataSet and not DataSet
2. It should have a StringBuffer that appends "a" in the for loop and output 
(out.collect(buffer.toString()) it rather than just out.collect("a");.  
Currently, this produces only "a" each time rather than   "a", "a", "aa", "a", 
"aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mans Singh updated FLINK-3881:
--
Description: 
The java8 documentation 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
samples, one of them included below:

{code:java}
DataSet input = env.fromElements(1, 2, 3);

// collector type must be declared
input.flatMap((Integer number, Collector out) -> {
for(int i = 0; i < number; i++) {
out.collect("a");
}
})
// returns "a", "a", "aa", "a", "aa" , "aaa"
.print();
{code}

I tried the sample and I think there are two issues with it (unless I have 
missed anything):
1. The DataSet should be DataSet and not DataSet
2. There should probably be a StringBuffer that in the flatMap function that is 
used to  append "a" in the for loop and output (out.collect(buffer.toString()) 
it rather than just out.collect("a");.  Currently, this produces only "a" each 
time rather than   "a", "a", "aa", "a", "aa" , "aaa" as shown the comments.

  was:
The java8 documentation 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
samples (one of them) included below:

{code:java}
DataSet input = env.fromElements(1, 2, 3);

// collector type must be declared
input.flatMap((Integer number, Collector out) -> {
for(int i = 0; i < number; i++) {
out.collect("a");
}
})
// returns "a", "a", "aa", "a", "aa" , "aaa"
.print();
{code}

I tried the sample and I think there are two issues with it (unless I have 
missed anything):
1. The DataSet should be DataSet and not DataSet
2. There should probably be a StringBuffer that in the flatMap function that is 
used to  append "a" in the for loop and output (out.collect(buffer.toString()) 
it rather than just out.collect("a");.  Currently, this produces only "a" each 
time rather than   "a", "a", "aa", "a", "aa" , "aaa" as shown the comments.


> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275350#comment-15275350
 ] 

Mans Singh commented on FLINK-3881:
---

Hey Folks:  

I've corrected this java8 documentation sample error.  Please let me know if 
there is any comment/suggestion. 

Thanks

Mans

> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3881] [docs] Java 8 Documetation Sample...

2016-05-07 Thread mans2singh
GitHub user mans2singh opened a pull request:

https://github.com/apache/flink/pull/1970

[FLINK-3881] [docs] Java 8 Documetation Sample Correction

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mans2singh/flink FLINK-3881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1970.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1970


commit 24636be11803ccc3efa85afcb3f48922b7f67ee8
Author: mans2singh 
Date:   2016-05-07T19:08:02Z

[FLINK-3881] [docs] Java 8 Documetation Sample Correction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Issue Comment Deleted] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mans Singh updated FLINK-3881:
--
Comment: was deleted

(was: Hey Folks:  

I've corrected this java8 documentation sample error.  Please let me know if 
there is any comment/suggestion. 

Thanks

Mans)

> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples, one of them included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3881) Error in Java 8 Documentation Sample

2016-05-07 Thread Mans Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mans Singh updated FLINK-3881:
--
Description: 
The java8 documentation 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
samples (one of them) included below:

{code:java}
DataSet input = env.fromElements(1, 2, 3);

// collector type must be declared
input.flatMap((Integer number, Collector out) -> {
for(int i = 0; i < number; i++) {
out.collect("a");
}
})
// returns "a", "a", "aa", "a", "aa" , "aaa"
.print();
{code}

I tried the sample and I think there are two issues with it (unless I have 
missed anything):
1. The DataSet should be DataSet and not DataSet
2. There should probably be a StringBuffer that in the flatMap function that is 
used to  append "a" in the for loop and output (out.collect(buffer.toString()) 
it rather than just out.collect("a");.  Currently, this produces only "a" each 
time rather than   "a", "a", "aa", "a", "aa" , "aaa" as shown the comments.

  was:
The java8 documentation 
(https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
samples (one of them) included below:

DataSet input = env.fromElements(1, 2, 3);

// collector type must be declared
input.flatMap((Integer number, Collector out) -> {
for(int i = 0; i < number; i++) {
out.collect("a");
}
})
// returns "a", "a", "aa", "a", "aa" , "aaa"
.print();

I tried the sample and I think there are two issues with it (unless I have 
missed anything):
1. The DataSet should be DataSet and not DataSet
2. It should have a StringBuffer that appends "a" in the for loop and output 
(out.collect(buffer.toString()) it rather than just out.collect("a");.  
Currently, this produces only "a" each time rather than   "a", "a", "aa", "a", 
"aa" , "aaa" as shown the comments.


> Error in Java 8 Documentation Sample
> 
>
> Key: FLINK-3881
> URL: https://issues.apache.org/jira/browse/FLINK-3881
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Priority: Minor
>  Labels: docuentation, java8, sample
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The java8 documentation 
> (https://ci.apache.org/projects/flink/flink-docs-master/apis/java8.html) has 
> samples (one of them) included below:
> {code:java}
> DataSet input = env.fromElements(1, 2, 3);
> // collector type must be declared
> input.flatMap((Integer number, Collector out) -> {
> for(int i = 0; i < number; i++) {
> out.collect("a");
> }
> })
> // returns "a", "a", "aa", "a", "aa" , "aaa"
> .print();
> {code}
> I tried the sample and I think there are two issues with it (unless I have 
> missed anything):
> 1. The DataSet should be DataSet and not DataSet
> 2. There should probably be a StringBuffer that in the flatMap function that 
> is used to  append "a" in the for loop and output 
> (out.collect(buffer.toString()) it rather than just out.collect("a");.  
> Currently, this produces only "a" each time rather than   "a", "a", "aa", 
> "a", "aa" , "aaa" as shown the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275389#comment-15275389
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user ankitcha commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217669339
  
@zentol thanks for the suggestion. I think that can work, I will try that 
and confirm.

Thanks!


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)