[jira] [Commented] (FLINK-2494) Fix StreamGraph getJobGraph bug

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661394#comment-14661394
 ] 

ASF GitHub Bot commented on FLINK-2494:
---

Github user ffbin commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128610658
  
enableCheckpointing() will call forceCheckpoint() and set forceCheckpoint 
true.The old code(if (isIterative() && isCheckpointingEnabled() && 
!forceCheckpoint) ) will not throw UnsupportedOperationException when enable 
checkpointing for iterative job.So we should change it.


> Fix StreamGraph getJobGraph bug
> ---
>
> Key: FLINK-2494
> URL: https://issues.apache.org/jira/browse/FLINK-2494
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.8.1
>Reporter: fangfengbin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2494 ]Fix StreamGraph getJobGraph bug

2015-08-06 Thread ffbin
Github user ffbin commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128610658
  
enableCheckpointing() will call forceCheckpoint() and set forceCheckpoint 
true.The old code(if (isIterative() && isCheckpointingEnabled() && 
!forceCheckpoint) ) will not throw UnsupportedOperationException when enable 
checkpointing for iterative job.So we should change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2494) Fix StreamGraph getJobGraph bug

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661372#comment-14661372
 ] 

ASF GitHub Bot commented on FLINK-2494:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128605743
  
i would assume that forceCheckpoint is supposed to do exactly that, enforce 
checkpointing regardless of its support.

this change also means that if checkPointint is enabled, but not forced, 
the job will not hit an UnsupportedOperationException, which doesn't make any 
sense whatsoever.

-1


> Fix StreamGraph getJobGraph bug
> ---
>
> Key: FLINK-2494
> URL: https://issues.apache.org/jira/browse/FLINK-2494
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.8.1
>Reporter: fangfengbin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2494 ]Fix StreamGraph getJobGraph bug

2015-08-06 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/998#issuecomment-128605743
  
i would assume that forceCheckpoint is supposed to do exactly that, enforce 
checkpointing regardless of its support.

this change also means that if checkPointint is enabled, but not forced, 
the job will not hit an UnsupportedOperationException, which doesn't make any 
sense whatsoever.

-1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661258#comment-14661258
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-128580607
  
Hi, @tillrohrmann , current implementation of sample with fixed size would 
generate fixed size sample for each partition randomly instead of the whole 
dataset, user may expect the later one actually most of the time. I'm research 
on how to sample fixed size elements randomly from distributed data stream, i 
think we can pause this PR review until i merge the previous fix.


> Create sample operator for Dataset
> --
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Theodore Vasiloudis
>Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-128580607
  
Hi, @tillrohrmann , current implementation of sample with fixed size would 
generate fixed size sample for each partition randomly instead of the whole 
dataset, user may expect the later one actually most of the time. I'm research 
on how to sample fixed size elements randomly from distributed data stream, i 
think we can pause this PR review until i merge the previous fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fix StreamGraph getJobGraph bug

2015-08-06 Thread ffbin
GitHub user ffbin opened a pull request:

https://github.com/apache/flink/pull/998

Fix StreamGraph getJobGraph bug

When forceCheckpoint is true,checkpointing will be enabled for iterative 
jobs.But now temporarily  forbid checkpointing for iterative jobs, so if 
forceCheckpoint is true, will throw UnsupportedOperationException.
The old code logic is reversed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ffbin/flink FLINK-2494

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #998


commit 226354ccf3060e2d0c2ba4dd607bf83ce02735c1
Author: ffbin <869218...@qq.com>
Date:   2015-08-07T02:41:55Z

Fix StreamGraph getJobGraph bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2494) Fix StreamGraph getJobGraph bug

2015-08-06 Thread fangfengbin (JIRA)
fangfengbin created FLINK-2494:
--

 Summary: Fix StreamGraph getJobGraph bug
 Key: FLINK-2494
 URL: https://issues.apache.org/jira/browse/FLINK-2494
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.8.1
Reporter: fangfengbin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128563750
  
Thanks for the review, @StephanEwen , i'm very interesting in this project, 
and i would like to contribute more. @vasia , I think stephan has helped to 
answer the question yet, the most important reason is that i want to reuse the 
memory occupied by hash table buckets. Besides, since this is a performance 
sense issue, i try to make this bloom filter as much simple and efficient as i 
can, for example, the hashcode of join key is already generated and stored in 
hybrid hash join, i just reuse the hashcode instead of generate it by join key 
value inside bloom filter again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661203#comment-14661203
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128563750
  
Thanks for the review, @StephanEwen , i'm very interesting in this project, 
and i would like to contribute more. @vasia , I think stephan has helped to 
answer the question yet, the most important reason is that i want to reuse the 
memory occupied by hash table buckets. Besides, since this is a performance 
sense issue, i try to make this bloom filter as much simple and efficient as i 
can, for example, the hashcode of join key is already generated and stored in 
hybrid hash join, i just reuse the hashcode instead of generate it by join key 
value inside bloom filter again. 


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661204#comment-14661204
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi closed the pull request at:

https://github.com/apache/flink/pull/888


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread ChengXiangLi
Github user ChengXiangLi closed the pull request at:

https://github.com/apache/flink/pull/888


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660983#comment-14660983
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r36478648
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -119,13 +124,20 @@ public void run(SourceContext ctx) throws 
Exception {
while (isRunning) {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
-   if (nextElement == null && splitIterator.hasNext()) {
+   if (nextElement == null && splitIterator.hasNext() ) {
--- End diff --

unnecessary space


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Sheetal Parade
>  Labels: easyfix, starter
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r36478648
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -119,13 +124,20 @@ public void run(SourceContext ctx) throws 
Exception {
while (isRunning) {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
-   if (nextElement == null && splitIterator.hasNext()) {
+   if (nextElement == null && splitIterator.hasNext() ) {
--- End diff --

unnecessary space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660981#comment-14660981
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/997#issuecomment-128539152
  
Thanks for your contribution. This is a comment for your pull request.

Flink uses tab character as indent but your test code 
(FileSourceFunctionTest) uses spaces instead of tab. Please change spaces to 
tabs. 

You should test with `mvn verify` command before sending pull request. 


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Sheetal Parade
>  Labels: easyfix, starter
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/997#issuecomment-128539152
  
Thanks for your contribution. This is a comment for your pull request.

Flink uses tab character as indent but your test code 
(FileSourceFunctionTest) uses spaces instead of tab. Please change spaces to 
tabs. 

You should test with `mvn verify` command before sending pull request. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-655) Add support for both single and set of broadcast values

2015-08-06 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660977#comment-14660977
 ] 

Henry Saputra commented on FLINK-655:
-

Ah, mea culpa from me for not getting it done sooner.

I would love to make this happen but as Fabian has mentioned, this will cause 
API changes.

I am ok to make this won't fix for now, since currently there is no request to 
have this feature

> Add support for both single and set of broadcast values
> ---
>
> Key: FLINK-655
> URL: https://issues.apache.org/jira/browse/FLINK-655
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Ufuk Celebi
>Assignee: Henry Saputra
>  Labels: breaking-api, github-import, starter
> Fix For: pre-apache
>
>
> To broadcast a data set you have to do the following:
> ```java
> lorem.map(new MyMapper()).withBroadcastSet(toBroadcast, "toBroadcastName")
> ```
> In the operator you call:
> ```java
> getRuntimeContext().getBroadcastVariable("toBroadcastName")
> ```
> I propose to have both method names consistent, e.g.
>   - `withBroadcastVariable(DataSet, String)`, or
>   - `getBroadcastSet(String)`.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/655
> Created by: [uce|https://github.com/uce]
> Labels: enhancement, java api, user satisfaction, 
> Created at: Wed Apr 02 16:29:08 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2398) Decouple StreamGraph Building from the API

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660792#comment-14660792
 ] 

ASF GitHub Bot commented on FLINK-2398:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/988#issuecomment-128507574
  
Yes, I think the automatic rebalance is good. My approach of throwing the 
exception was just the easiest way of dealing with the previously faulty 
behavior. I think people coming from storm are also used to just having 
operations that are executed even if you don't have a sink. So maybe we should 
keep that.


> Decouple StreamGraph Building from the API
> --
>
> Key: FLINK-2398
> URL: https://issues.apache.org/jira/browse/FLINK-2398
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, the building of the StreamGraph is very intertwined with the API 
> methods. DataStream knows about the StreamGraph and keeps track of splitting, 
> selected names, unions and so on. This leads to the problem that is is very 
> hard to understand how the StreamGraph is built because the code that does it 
> is all over the place. This also makes it hard to extend/change parts of the 
> Streaming system.
> I propose to introduce "Transformations". A transformation hold information 
> about one operation: The input streams, types, names, operator and so on. An 
> API method creates a transformation instead of fiddling with the StreamGraph 
> directly. A new component, the StreamGraphGenerator creates a StreamGraph 
> from the tree of transformations that result from program specification using 
> the API methods. This would relieve DataStream from knowing about the 
> StreamGraph and makes unions, splitting, selection visible transformations 
> instead of being scattered across the different API classes as fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2398][api-breaking] Introduce StreamGra...

2015-08-06 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/988#issuecomment-128507574
  
Yes, I think the automatic rebalance is good. My approach of throwing the 
exception was just the easiest way of dealing with the previously faulty 
behavior. I think people coming from storm are also used to just having 
operations that are executed even if you don't have a sink. So maybe we should 
keep that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660703#comment-14660703
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

GitHub user sheetalparade opened a pull request:

https://github.com/apache/flink/pull/997

[FLINK-2314]

[FLINK-2314] - Added checkpointing feature into File Source


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sheetalparade/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #997


commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c
Author: Sheetal Parade 
Date:   2015-08-06T16:29:19Z

[FLINK-2314]

Added checkpointing feature into File Source

[FLINK-2314]

Added checkpointing feature into File Source




> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Sheetal Parade
>  Labels: easyfix, starter
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread sheetalparade
GitHub user sheetalparade opened a pull request:

https://github.com/apache/flink/pull/997

[FLINK-2314]

[FLINK-2314] - Added checkpointing feature into File Source


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sheetalparade/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #997


commit c32d9bfada1f6b601c221185a08c8b1c5a454a5c
Author: Sheetal Parade 
Date:   2015-08-06T16:29:19Z

[FLINK-2314]

Added checkpointing feature into File Source

[FLINK-2314]

Added checkpointing feature into File Source




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1138) Allow users to specify methods instead of fields in key expressions

2015-08-06 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660700#comment-14660700
 ] 

Fabian Hueske commented on FLINK-1138:
--

+1 for "won't fix"

> Allow users to specify methods instead of fields in key expressions
> ---
>
> Key: FLINK-1138
> URL: https://issues.apache.org/jira/browse/FLINK-1138
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Robert Metzger
>Priority: Minor
>
> Currently, users can specify grouping fields only on the fields of a POJO.
> It would be nice to allow users also to name a method (such as 
> "getVertexId()") to be called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2493) Simplify names of example program JARs

2015-08-06 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2493:
---

 Summary: Simplify names of example program JARs
 Key: FLINK-2493
 URL: https://issues.apache.org/jira/browse/FLINK-2493
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Affects Versions: 0.10
Reporter: Stephan Ewen
Priority: Minor


I find the names of the example JARs a bit annoying.

Why not name the file {{examples/ConnectedComponents.jar}} rather than 
{{examples/flink-java-examples-0.10-SNAPSHOT-ConnectedComponents.jar}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1138) Allow users to specify methods instead of fields in key expressions

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660547#comment-14660547
 ] 

Stephan Ewen commented on FLINK-1138:
-

Should we close this with "wont't fix"? There have been no requests, it is 
fairly low priority, and very tricky...

> Allow users to specify methods instead of fields in key expressions
> ---
>
> Key: FLINK-1138
> URL: https://issues.apache.org/jira/browse/FLINK-1138
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Robert Metzger
>Priority: Minor
>
> Currently, users can specify grouping fields only on the fields of a POJO.
> It would be nice to allow users also to name a method (such as 
> "getVertexId()") to be called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1116) Packaged Scala Examples do not work due to missing test data

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1116.
-
Resolution: Invalid

Scala examples are not packaged any more (redundancy with java examples)

> Packaged Scala Examples do not work due to missing test data
> 
>
> Key: FLINK-1116
> URL: https://issues.apache.org/jira/browse/FLINK-1116
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Stephan Ewen
>Priority: Minor
>
> The example data classes are in the java examples project. The maven jar 
> plugin cannot include them into the jars of the Scala examples, causing the 
> examples to fail with a ClassNotFoundException when staring the example job.
> For now, I disabled the Scala example jars from being built, because they do 
> not work anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1083) WebInterface improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1083.
---

> WebInterface improvements
> -
>
> Key: FLINK-1083
> URL: https://issues.apache.org/jira/browse/FLINK-1083
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager
>Reporter: Jonathan Hasenburg
>
> New Issue to summarize all things that should be done regarding the 
> webinterface.
> * rework dashboard in a way that more than one job can be shown ... . If a 
> job is clicked you get to the details.
> * DONE: add history to dashboard
> * Running jobs should get a view like jobs in the history.
> * DONE: rework the menu and try to add some links to the dashboard (like the 
> taskmanager section)
> * improve the way the jsons are send to the webinterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1135) Blog post with topic "Accessing Data Stored in Hive with Flink"

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660516#comment-14660516
 ] 

Stephan Ewen commented on FLINK-1135:
-

I this still going to happen?

> Blog post with topic "Accessing Data Stored in Hive with Flink"
> ---
>
> Key: FLINK-1135
> URL: https://issues.apache.org/jira/browse/FLINK-1135
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Timo Walther
>Assignee: Robert Metzger
>Priority: Minor
> Attachments: 2014-09-29-querying-hive.md
>
>
> Recently, I implemented a Flink job that accessed Hive. Maybe someone else is 
> going to try this. I created a blog post for the website to share my 
> experience.
> You'll find the blog post file attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1116) Packaged Scala Examples do not work due to missing test data

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1116.
---

> Packaged Scala Examples do not work due to missing test data
> 
>
> Key: FLINK-1116
> URL: https://issues.apache.org/jira/browse/FLINK-1116
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Stephan Ewen
>Priority: Minor
>
> The example data classes are in the java examples project. The maven jar 
> plugin cannot include them into the jars of the Scala examples, causing the 
> examples to fail with a ClassNotFoundException when staring the example job.
> For now, I disabled the Scala example jars from being built, because they do 
> not work anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1082) WebClient improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1082.
-
Resolution: Done

Remaining issues will not be fixed. Remainder is subsumed by new web frontend 
[FLINK-2357]

> WebClient improvements
> --
>
> Key: FLINK-1082
> URL: https://issues.apache.org/jira/browse/FLINK-1082
> Project: Flink
>  Issue Type: Improvement
>Reporter: Jonathan Hasenburg
>
> New Issue to summarize all things that should be done regarding the webclient.
> * DONE: Setting the ship strategy of a broadcast variable from "broadcast" to 
> "broadcast variable"
> * DONE: Reduce size of nodes by removing some information
> * DONE: Some nodes can be at the same time "next partial solution" and 
> "termination criterion", or "next workset" and "solution set delta". It 
> currently seems to highlight one one of them.
> * DONE: Recreate the generation of the JSON strings which are used to create 
> the graph. Change to an object based layout to get away from the string based 
> layout.
> * NOT POSSIBLE, because this creates a circle and it can happen that the 
> first node is at the right side of the graph -> confusing: Feedback arrow 
> from the next partial solution to the partial solution
> * show full graph at very beginning
> * fix bug in chrome where the node boxes are to small
> * DONE: graph should be redrawn when the window gets resized



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1083) WebInterface improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1083.
-
Resolution: Done

Remainder is subsumed by new web frontend [FLINK-2357]

> WebInterface improvements
> -
>
> Key: FLINK-1083
> URL: https://issues.apache.org/jira/browse/FLINK-1083
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager
>Reporter: Jonathan Hasenburg
>
> New Issue to summarize all things that should be done regarding the 
> webinterface.
> * rework dashboard in a way that more than one job can be shown ... . If a 
> job is clicked you get to the details.
> * DONE: add history to dashboard
> * Running jobs should get a view like jobs in the history.
> * DONE: rework the menu and try to add some links to the dashboard (like the 
> taskmanager section)
> * improve the way the jsons are send to the webinterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1082) WebClient improvements

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1082.
---

> WebClient improvements
> --
>
> Key: FLINK-1082
> URL: https://issues.apache.org/jira/browse/FLINK-1082
> Project: Flink
>  Issue Type: Improvement
>Reporter: Jonathan Hasenburg
>
> New Issue to summarize all things that should be done regarding the webclient.
> * DONE: Setting the ship strategy of a broadcast variable from "broadcast" to 
> "broadcast variable"
> * DONE: Reduce size of nodes by removing some information
> * DONE: Some nodes can be at the same time "next partial solution" and 
> "termination criterion", or "next workset" and "solution set delta". It 
> currently seems to highlight one one of them.
> * DONE: Recreate the generation of the JSON strings which are used to create 
> the graph. Change to an object based layout to get away from the string based 
> layout.
> * NOT POSSIBLE, because this creates a circle and it can happen that the 
> first node is at the right side of the graph -> confusing: Feedback arrow 
> from the next partial solution to the partial solution
> * show full graph at very beginning
> * fix bug in chrome where the node boxes are to small
> * DONE: graph should be redrawn when the window gets resized



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1028) Certain exceptions are not visibly reported to the user

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660491#comment-14660491
 ] 

Stephan Ewen commented on FLINK-1028:
-

Is this still valid? Can you give an example to reproduce this?

> Certain exceptions are not visibly reported to the user
> ---
>
> Key: FLINK-1028
> URL: https://issues.apache.org/jira/browse/FLINK-1028
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime, TaskManager
>Reporter: Aljoscha Krettek
>
> When I throw a RuntimeException in a user code method I get the Exception in 
> my IDE or on the terminal (when using the CLI client). When I throw an 
> exception in the constructor of a TypeComparator the exception is only logged 
> but not presented to the user. The code continues and at some later point 
> fails with something that seems unrelated, making it very hard to find the 
> original cause.
> Does anyone know why it is like that? Is it like this on purpose?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-981) Support for generated Cloudera Hadoop configuration

2015-08-06 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-981.

Resolution: Invalid

> Support for generated Cloudera Hadoop configuration 
> 
>
> Key: FLINK-981
> URL: https://issues.apache.org/jira/browse/FLINK-981
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime, YARN Client
>Reporter: Robert Metzger
>
> Cloudera Hadoop generates configuration files that different from the vanilla 
> upstream Hadoop configuration files.
> The HDFS and the YARN component both access configuration values from Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration

2015-08-06 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660492#comment-14660492
 ] 

Robert Metzger commented on FLINK-981:
--

I will close this issue for now. No user every complained about it, I've used 
Flink on a cloudera system 2-3 month ago.

> Support for generated Cloudera Hadoop configuration 
> 
>
> Key: FLINK-981
> URL: https://issues.apache.org/jira/browse/FLINK-981
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime, YARN Client
>Reporter: Robert Metzger
>
> Cloudera Hadoop generates configuration files that different from the vanilla 
> upstream Hadoop configuration files.
> The HDFS and the YARN component both access configuration values from Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1013.
---

> ArithmeticException: / by zero in MutableHashTable
> --
>
> Key: FLINK-1013
> URL: https://issues.apache.org/jira/browse/FLINK-1013
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> I encountered a division by zero exception in the MutableHashTable. It 
> happened when I joined two datasets of relatively big records (approx. 40-50 
> MB I think). When joining them the buildTableFromSpilledPartition method of 
> the MutableHashTable is called. In case that the available buffers are 
> smaller than the needed number of buffers, the mutable hash table will 
> calculate the bucket count
> {code}
> bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / 
> (avgRecordLenPartition + RECORD_OVERHEAD_BYTES));
> {code}
> If the average record length is sufficiently large, then the bucket count 
> will be 0. Initializing the hash table with a 0 bucket count will cause then 
> the division by 0 exception. I don't know whether this problem can be 
> mitigated but it should at least throw a meaningful exception instead of the 
> ArithmeticException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1013) ArithmeticException: / by zero in MutableHashTable

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1013.
-
   Resolution: Fixed
 Assignee: Stephan Ewen
Fix Version/s: 0.9

Fixed already a while back by delegating computation to existing utility 
function that is also used to build the initial table.

> ArithmeticException: / by zero in MutableHashTable
> --
>
> Key: FLINK-1013
> URL: https://issues.apache.org/jira/browse/FLINK-1013
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> I encountered a division by zero exception in the MutableHashTable. It 
> happened when I joined two datasets of relatively big records (approx. 40-50 
> MB I think). When joining them the buildTableFromSpilledPartition method of 
> the MutableHashTable is called. In case that the available buffers are 
> smaller than the needed number of buffers, the mutable hash table will 
> calculate the bucket count
> {code}
> bucketCount = (int) (((long) totalBuffersAvailable) * RECORD_TABLE_BYTES / 
> (avgRecordLenPartition + RECORD_OVERHEAD_BYTES));
> {code}
> If the average record length is sufficiently large, then the bucket count 
> will be 0. Initializing the hash table with a 0 bucket count will cause then 
> the division by 0 exception. I don't know whether this problem can be 
> mitigated but it should at least throw a meaningful exception instead of the 
> ArithmeticException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-967.
--

> Make intermediate results a first-class citizen in the JobGraph
> ---
>
> Key: FLINK-967
> URL: https://issues.apache.org/jira/browse/FLINK-967
> Project: Flink
>  Issue Type: New Feature
>  Components: JobManager, TaskManager
>Affects Versions: 0.6-incubating
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> In order to add incremental plan rollout to the system, we need to make 
> intermediate results a first-class citizen in the job graph and scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-981) Support for generated Cloudera Hadoop configuration

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660484#comment-14660484
 ] 

Stephan Ewen commented on FLINK-981:


Is this still valid, or is this fixed already by properly loading the Hadoop 
configuration?

> Support for generated Cloudera Hadoop configuration 
> 
>
> Key: FLINK-981
> URL: https://issues.apache.org/jira/browse/FLINK-981
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime, YARN Client
>Reporter: Robert Metzger
>
> Cloudera Hadoop generates configuration files that different from the vanilla 
> upstream Hadoop configuration files.
> The HDFS and the YARN component both access configuration values from Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-967.

Resolution: Fixed

Integrated in the JobGraph. Integration into API and sessions is pending.

> Make intermediate results a first-class citizen in the JobGraph
> ---
>
> Key: FLINK-967
> URL: https://issues.apache.org/jira/browse/FLINK-967
> Project: Flink
>  Issue Type: New Feature
>  Components: JobManager, TaskManager
>Affects Versions: 0.6-incubating
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> In order to add incremental plan rollout to the system, we need to make 
> intermediate results a first-class citizen in the job graph and scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-938.
--

> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-938.

Resolution: Won't Fix

After a long discussion, the decision was not to add this. With the addition of 
high-availability configurations, it would create too many too confusing ways 
to configure or start the cluster.

> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-915) Introduce two in one progress bars

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-915.
--

> Introduce two in one progress bars
> --
>
> Key: FLINK-915
> URL: https://issues.apache.org/jira/browse/FLINK-915
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Trivial
>  Labels: github-import
> Attachments: pull-request-915-1281267458081589740.patch
>
>
> The two in one progress bars are approximations which are calculated out
> of the job event information.
> Additionally: FINISHING tasks are still shown as running tasks.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/pull/915
> Created by: [tobwiens|https://github.com/tobwiens]
> Labels: enhancement, 
> Created at: Fri Jun 06 17:02:53 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-915) Introduce two in one progress bars

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-915.

   Resolution: Invalid
Fix Version/s: (was: pre-apache)

Outdated and subsumed by new Task State Model and new web frontend.

> Introduce two in one progress bars
> --
>
> Key: FLINK-915
> URL: https://issues.apache.org/jira/browse/FLINK-915
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Trivial
>  Labels: github-import
> Attachments: pull-request-915-1281267458081589740.patch
>
>
> The two in one progress bars are approximations which are calculated out
> of the job event information.
> Additionally: FINISHING tasks are still shown as running tasks.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/pull/915
> Created by: [tobwiens|https://github.com/tobwiens]
> Labels: enhancement, 
> Created at: Fri Jun 06 17:02:53 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-871) Create a documentation distribution together with other release artifacts

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-871.
--

> Create a documentation distribution together with other release artifacts
> -
>
> Key: FLINK-871
> URL: https://issues.apache.org/jira/browse/FLINK-871
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> It would be good to have a documentation distribution together with the other 
> release artifacts. We can use markdown files, .md, for documentation and use 
> maven md plugin for managements. The same documentation can be used for the 
> web site etc.. 
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/871
> Created by: [danrk|https://github.com/danrk]
> Labels: documentation, enhancement, 
> Created at: Tue May 27 02:44:53 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-871) Create a documentation distribution together with other release artifacts

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-871.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)

Documentation is part of the source code, and the Apache CI infrastructure 
builds and updates snapshot documentation on a nightly basis.

> Create a documentation distribution together with other release artifacts
> -
>
> Key: FLINK-871
> URL: https://issues.apache.org/jira/browse/FLINK-871
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> It would be good to have a documentation distribution together with the other 
> release artifacts. We can use markdown files, .md, for documentation and use 
> maven md plugin for managements. The same documentation can be used for the 
> web site etc.. 
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/871
> Created by: [danrk|https://github.com/danrk]
> Labels: documentation, enhancement, 
> Created at: Tue May 27 02:44:53 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-854) Web interface: show runtime of subtasks

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-854.
--

> Web interface: show runtime of subtasks
> ---
>
> Key: FLINK-854
> URL: https://issues.apache.org/jira/browse/FLINK-854
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Ufuk Celebi
>Priority: Minor
>  Labels: github-import
>
> When I click on the detailed view of a task, I see all subtasks as in the 
> screenshot below. I would also like to show the runtime per stage, e.g. I 
> want to know how long the yellow subtask was in running.
> ![screen shot 2014-05-24 at 18 09 
> 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png]
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/854
> Created by: [uce|https://github.com/uce]
> Labels: enhancement, gui, user satisfaction, 
> Milestone: Release 0.6 (unplanned)
> Created at: Sat May 24 18:10:25 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-854) Web interface: show runtime of subtasks

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-854.

   Resolution: Won't Fix
Fix Version/s: (was: pre-apache)

Subsumed by effort for the new web interface [FLINK-2357]

> Web interface: show runtime of subtasks
> ---
>
> Key: FLINK-854
> URL: https://issues.apache.org/jira/browse/FLINK-854
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Ufuk Celebi
>Priority: Minor
>  Labels: github-import
>
> When I click on the detailed view of a task, I see all subtasks as in the 
> screenshot below. I would also like to show the runtime per stage, e.g. I 
> want to know how long the yellow subtask was in running.
> ![screen shot 2014-05-24 at 18 09 
> 36|https://cloud.githubusercontent.com/assets/1756620/3074999/d0ebe1d6-e35d-11e3-9f18-26ac5ce96999.png]
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/854
> Created by: [uce|https://github.com/uce]
> Labels: enhancement, gui, user satisfaction, 
> Milestone: Release 0.6 (unplanned)
> Created at: Sat May 24 18:10:25 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2452] [Gelly] adds a playcount threshol...

2015-08-06 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128456801
  
Nope, no comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-848) Move combine() from GroupReduceFunction to Interface

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-848.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)
   0.7.1-incubating

Has been moved to interfaces {{CombineFunction}} and {{GroupCombineFunction}}

> Move combine() from GroupReduceFunction to Interface
> 
>
> Key: FLINK-848
> URL: https://issues.apache.org/jira/browse/FLINK-848
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API
>Reporter: Fabian Hueske
>Assignee: Kostas Tzoumas
>  Labels: breaking-api, github-import
> Fix For: 0.7.1-incubating
>
>
> Currently, the combine method of the GroupReduceFunction allows to return 
> multiple values using a collector. However, most combiners do not need this 
> because they return only a single value. Furthermore, a single value 
> returning combiner can be executed using more efficient hash-based strategies.
> Hence, we propose to introduce a combine method for GroupReduce which returns 
> only a single value. In order to keep support for the rare cases where more 
> than one value needs to be returned, we want to keep the collector-combiner 
> as well.
> To do so, we could remove the combine method from the abstract 
> GroupReduceFunction class and add two Combinable interfaces, one for a 
> single-value and one for a multi-value combiner.
> This would also make the Combinable annotation obsolete as the optimizer can 
> check whether a GroupReduceFunction implements one of the Combinable 
> interfaces or not.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/848
> Created by: [fhueske|https://github.com/fhueske]
> Labels: core, enhancement, 
> Milestone: Release 0.6 (unplanned)
> Created at: Thu May 22 10:23:04 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-848) Move combine() from GroupReduceFunction to Interface

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-848.
--

> Move combine() from GroupReduceFunction to Interface
> 
>
> Key: FLINK-848
> URL: https://issues.apache.org/jira/browse/FLINK-848
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API
>Reporter: Fabian Hueske
>Assignee: Kostas Tzoumas
>  Labels: breaking-api, github-import
> Fix For: 0.7.1-incubating
>
>
> Currently, the combine method of the GroupReduceFunction allows to return 
> multiple values using a collector. However, most combiners do not need this 
> because they return only a single value. Furthermore, a single value 
> returning combiner can be executed using more efficient hash-based strategies.
> Hence, we propose to introduce a combine method for GroupReduce which returns 
> only a single value. In order to keep support for the rare cases where more 
> than one value needs to be returned, we want to keep the collector-combiner 
> as well.
> To do so, we could remove the combine method from the abstract 
> GroupReduceFunction class and add two Combinable interfaces, one for a 
> single-value and one for a multi-value combiner.
> This would also make the Combinable annotation obsolete as the optimizer can 
> check whether a GroupReduceFunction implements one of the Combinable 
> interfaces or not.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/848
> Created by: [fhueske|https://github.com/fhueske]
> Labels: core, enhancement, 
> Milestone: Release 0.6 (unplanned)
> Created at: Thu May 22 10:23:04 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-701) Change new Java API functions to SAMs

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-701.
--

> Change new Java API functions to SAMs
> -
>
> Key: FLINK-701
> URL: https://issues.apache.org/jira/browse/FLINK-701
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: GitHub Import
>Assignee: Kostas Tzoumas
>  Labels: github-import
> Fix For: 0.6-incubating
>
>
> In order to support a compact syntax with Java 8 Lambdas, we would need to 
> change the types of the functions to Single Abstract Method types (SAMs). 
> Only those can be implemented by Lambdas.
> That means that DataSet.map(MapFunction) would accept an interface 
> MapFunction, not the abstract class that we use now. Many UDFs would not 
> inherit form `AbstractFunction` any more. The inheritance from 
> AbstractFunction would be optional, if life cycle methods (open / close) and 
> runtime contexts are needed.
> This may have also implications on the type extraction, as the generic 
> parameters are in generic superinterfaces, rather than in generic 
> superclasses.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/701
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: enhancement, java api, user satisfaction, 
> Milestone: Release 0.6 (unplanned)
> Created at: Thu Apr 17 13:06:40 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2452) Add a playcount threshold to the MusicProfiles example

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660454#comment-14660454
 ] 

ASF GitHub Bot commented on FLINK-2452:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128456801
  
Nope, no comment. 


> Add a playcount threshold to the MusicProfiles example
> --
>
> Key: FLINK-2452
> URL: https://issues.apache.org/jira/browse/FLINK-2452
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>Priority: Minor
>
> In the MusicProfiles example, when creating the user-user similarity graph, 
> an edge is created between any 2 users that have listened to the same song 
> (even if once). Depending on the input data, this might produce a projection 
> graph with many more edges than the original user-song graph.
> To make this computation more efficient, this issue proposes adding a 
> user-defined parameter that filters out songs that a user has listened to 
> only a few times. Essentially, it is a threshold for playcount, above which a 
> user is considered to like a song.
> For reference, with a threshold value of 30, the whole Last.fm dataset is 
> analyzed on my laptop in a few minutes, while no threshold results in a 
> runtime of several hours.
> There are many solutions to this problem, but since this is just an example 
> (not a library method), I think that keeping it simple is important.
> Thanks to [~andralungu] for spotting the inefficiency!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-841) Updating a non-existing key in solution set yields a NPE

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-841.

   Resolution: Fixed
 Assignee: Stephan Ewen
Fix Version/s: (was: pre-apache)
   0.8.2

Solution sets can be appended with new keys.

> Updating a non-existing key in solution set yields a NPE
> 
>
> Key: FLINK-841
> URL: https://issues.apache.org/jira/browse/FLINK-841
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Assignee: Stephan Ewen
>  Labels: github-import
> Fix For: 0.8.2
>
>
> When running the Scala Connected Components example with some invalid input 
> data (an input edge contained a vertex that was not present in the vertex 
> input) I got a NPE.
> This NPE should be replaced by some meaningful error message.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/841
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, runtime, user satisfaction, 
> Milestone: Release 0.5.1
> Created at: Tue May 20 22:37:13 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-841) Updating a non-existing key in solution set yields a NPE

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-841.
--

> Updating a non-existing key in solution set yields a NPE
> 
>
> Key: FLINK-841
> URL: https://issues.apache.org/jira/browse/FLINK-841
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Assignee: Stephan Ewen
>  Labels: github-import
> Fix For: 0.8.2
>
>
> When running the Scala Connected Components example with some invalid input 
> data (an input edge contained a vertex that was not present in the vertex 
> input) I got a NPE.
> This NPE should be replaced by some meaningful error message.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/841
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, runtime, user satisfaction, 
> Milestone: Release 0.5.1
> Created at: Tue May 20 22:37:13 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660438#comment-14660438
 ] 

ASF GitHub Bot commented on FLINK-2386:
---

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/996

[WIP][FLINK-2386] Add new Kafka Consumers

I'm opening a WIP pull request (against our rules) to get some feedback on 
my ongoing work.
Please note that I'm on vacation next week (until August 17)

**Why this rework?**

The current `PersistentKafkaSource` does not always provide exactly-once 
processing guarantees because we are using the high level Consumer API of Kafka.
We've chosen to use that API because it is handling all the corner cases 
such as leader election, leader failover and other low level stuff.
The problem is that the API does not allow us to
- commit offsets manually
- consistently (across restarts) assign partitions to Flink instances  

The Kafka community is aware of these issues and actively working on a new 
Consumer API. See 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
 and https://issues.apache.org/jira/browse/KAFKA-1326
The release of Kafka 0.8.3 is scheduled for October 2015 (see plan: 
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan)

Therefore, I decided for the following approach:
Copy the code of the unreleased, new Kafka Consumer into the Flink consumer 
and use it.
The new API has all the bells and whistles we need (manual committing, 
per-partition subscriptions, nice APIs), but it is not completely backwards 
compatible.

We can retrieve topic metadata with the new API from Kafka 0.8.1, 0.8.2 
(and of course 0.8.3)
We can retrieve data from Kafka 0.8.2 (and 0.8.3)
We can only commit to Kafka 0.8.3

Therefore, this pull request contains three different user facing classes 
`FlinkKafkaConsumer081`,  `FlinkKafkaConsumer082` and `FlinkKafkaConsumer083` 
for the different possible combinations.
For 0.8.1 we are using a hand-crafted implementation against the simple 
consumer API 
(https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example)
 so we had to do what we originally wanted to avoid.
I tried to make that implementation as robust and efficient as possible. 
I'm intentionally not handling any broker failures in the code. For these 
cases, I'm relying on Flink's fault tolerance mechanisms (which effectively 
means redeploying the Kafka sources against other online brokers)

For reviewing the pull request, there are only a few important classes to 
look at:
- FlinkKafkaConsumerBase
- IncludedFetcher
- LegacyFetcher (the one implementing the SimpleConsumer API)
I fixed a little bug in the stream graph generator. It was ignoring the 
"number of execution retries" when no checkpointing is enabled. 


Known issues:
- this pull request contains at least one failing test
- the KafkaConsumer contains at least one known, yet untested bug
- missing documentation

I will also open a pull request for using the new Producer API. It provides 
much better performance and usability.

Open questions:
- Do we really want to copy 20k+ lines of code into our code base (for 
now)? 
If there are concerns about this, I could also manually implement the 
missing pieces. Its probably 100 lines of code for getting the partition infos 
for a topic, and we would use the Simple Consumer also for reading from 0.8.2.

- Do we want to use the packaging I'm suggesting here (additional maven 
module for `flink-connector-kafka-083`). We would need to introduce it anyways 
when Kafka releases 0.8.3 because the dependencies are not compatible.
But its adding confusion for our users.
I will write more documentation for guidance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink2386

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #996


commit 177e0bbc6bc613b67111ba038e0ded4fae8474f1
Author: Robert Metzger 
Date:   2015-07-20T19:39:46Z

wip

commit 70cb8f1ecb7df7a98313796609d2fa0dbade86bf
Author: Robert Metzger 
Date:   2015-07-21T15:21:45Z

[FLINK-2386] Add initial code for the new kafka connector, with everything 
unreleased copied from the kafka sources

commit a4a2847908a8c2f118b8667d7cb66693c065c38d
Author: Robert Metzger 
Date:   2015-07-21T17:58:13Z

wip

commit b02cde37c2120ff6f0fcf1c233391a1d8804e594
Author: Robert Metzger 
Date:   2015-07-22T15:29:58Z

wip

commit 54a05c39d150b016e0a

[GitHub] flink pull request: [WIP][FLINK-2386] Add new Kafka Consumers

2015-08-06 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/996

[WIP][FLINK-2386] Add new Kafka Consumers

I'm opening a WIP pull request (against our rules) to get some feedback on 
my ongoing work.
Please note that I'm on vacation next week (until August 17)

**Why this rework?**

The current `PersistentKafkaSource` does not always provide exactly-once 
processing guarantees because we are using the high level Consumer API of Kafka.
We've chosen to use that API because it is handling all the corner cases 
such as leader election, leader failover and other low level stuff.
The problem is that the API does not allow us to
- commit offsets manually
- consistently (across restarts) assign partitions to Flink instances  

The Kafka community is aware of these issues and actively working on a new 
Consumer API. See 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
 and https://issues.apache.org/jira/browse/KAFKA-1326
The release of Kafka 0.8.3 is scheduled for October 2015 (see plan: 
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan)

Therefore, I decided for the following approach:
Copy the code of the unreleased, new Kafka Consumer into the Flink consumer 
and use it.
The new API has all the bells and whistles we need (manual committing, 
per-partition subscriptions, nice APIs), but it is not completely backwards 
compatible.

We can retrieve topic metadata with the new API from Kafka 0.8.1, 0.8.2 
(and of course 0.8.3)
We can retrieve data from Kafka 0.8.2 (and 0.8.3)
We can only commit to Kafka 0.8.3

Therefore, this pull request contains three different user facing classes 
`FlinkKafkaConsumer081`,  `FlinkKafkaConsumer082` and `FlinkKafkaConsumer083` 
for the different possible combinations.
For 0.8.1 we are using a hand-crafted implementation against the simple 
consumer API 
(https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example)
 so we had to do what we originally wanted to avoid.
I tried to make that implementation as robust and efficient as possible. 
I'm intentionally not handling any broker failures in the code. For these 
cases, I'm relying on Flink's fault tolerance mechanisms (which effectively 
means redeploying the Kafka sources against other online brokers)

For reviewing the pull request, there are only a few important classes to 
look at:
- FlinkKafkaConsumerBase
- IncludedFetcher
- LegacyFetcher (the one implementing the SimpleConsumer API)
I fixed a little bug in the stream graph generator. It was ignoring the 
"number of execution retries" when no checkpointing is enabled. 


Known issues:
- this pull request contains at least one failing test
- the KafkaConsumer contains at least one known, yet untested bug
- missing documentation

I will also open a pull request for using the new Producer API. It provides 
much better performance and usability.

Open questions:
- Do we really want to copy 20k+ lines of code into our code base (for 
now)? 
If there are concerns about this, I could also manually implement the 
missing pieces. Its probably 100 lines of code for getting the partition infos 
for a topic, and we would use the Simple Consumer also for reading from 0.8.2.

- Do we want to use the packaging I'm suggesting here (additional maven 
module for `flink-connector-kafka-083`). We would need to introduce it anyways 
when Kafka releases 0.8.3 because the dependencies are not compatible.
But its adding confusion for our users.
I will write more documentation for guidance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink2386

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #996


commit 177e0bbc6bc613b67111ba038e0ded4fae8474f1
Author: Robert Metzger 
Date:   2015-07-20T19:39:46Z

wip

commit 70cb8f1ecb7df7a98313796609d2fa0dbade86bf
Author: Robert Metzger 
Date:   2015-07-21T15:21:45Z

[FLINK-2386] Add initial code for the new kafka connector, with everything 
unreleased copied from the kafka sources

commit a4a2847908a8c2f118b8667d7cb66693c065c38d
Author: Robert Metzger 
Date:   2015-07-21T17:58:13Z

wip

commit b02cde37c2120ff6f0fcf1c233391a1d8804e594
Author: Robert Metzger 
Date:   2015-07-22T15:29:58Z

wip

commit 54a05c39d150b016e0a089daedb3492d986b93bd
Author: Robert Metzger 
Date:   2015-07-22T19:56:41Z

wip

commit 393fd6766a5df4bf14ef0c13864b8a4abdb62bb4
Author: Robert Metzger 
Date:   2015-07-22T20:20:20Z

we are good for a test drive

commit 3d66332e61665df9bafa05d2644b

[jira] [Commented] (FLINK-655) Add support for both single and set of broadcast values

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660395#comment-14660395
 ] 

Stephan Ewen commented on FLINK-655:


Will this still be done, or should we close it as "won't fix"?

> Add support for both single and set of broadcast values
> ---
>
> Key: FLINK-655
> URL: https://issues.apache.org/jira/browse/FLINK-655
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Ufuk Celebi
>Assignee: Henry Saputra
>  Labels: breaking-api, github-import, starter
> Fix For: pre-apache
>
>
> To broadcast a data set you have to do the following:
> ```java
> lorem.map(new MyMapper()).withBroadcastSet(toBroadcast, "toBroadcastName")
> ```
> In the operator you call:
> ```java
> getRuntimeContext().getBroadcastVariable("toBroadcastName")
> ```
> I propose to have both method names consistent, e.g.
>   - `withBroadcastVariable(DataSet, String)`, or
>   - `getBroadcastSet(String)`.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/655
> Created by: [uce|https://github.com/uce]
> Labels: enhancement, java api, user satisfaction, 
> Created at: Wed Apr 02 16:29:08 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-553) Add getGroupKey() method to group-at-time operators

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-553:
---
Fix Version/s: (was: pre-apache)

> Add getGroupKey() method to group-at-time operators
> ---
>
> Key: FLINK-553
> URL: https://issues.apache.org/jira/browse/FLINK-553
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Reporter: GitHub Import
>  Labels: github-import
>
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one 
> UDF call. Often these UDFs need to access the key that is common to all 
> records of a group.
> We could add a function to set a the key of a group before the UDF is called 
> (``setGroupKey()``) and a function to get the key (``getGroupKey()``) that 
> can be called from the UDF.
> What do you think about this?
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction, 
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-553) Add getGroupKey() method to group-at-time operators

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-553:
---
Assignee: Fabian Hueske

> Add getGroupKey() method to group-at-time operators
> ---
>
> Key: FLINK-553
> URL: https://issues.apache.org/jira/browse/FLINK-553
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Reporter: GitHub Import
>Assignee: Fabian Hueske
>  Labels: github-import
>
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one 
> UDF call. Often these UDFs need to access the key that is common to all 
> records of a group.
> We could add a function to set a the key of a group before the UDF is called 
> (``setGroupKey()``) and a function to get the key (``getGroupKey()``) that 
> can be called from the UDF.
> What do you think about this?
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction, 
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-553) Add getGroupKey() method to group-at-time operators

2015-08-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660388#comment-14660388
 ] 

Stephan Ewen commented on FLINK-553:


There is an almost ready solution by [~fhueske]

> Add getGroupKey() method to group-at-time operators
> ---
>
> Key: FLINK-553
> URL: https://issues.apache.org/jira/browse/FLINK-553
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Reporter: GitHub Import
>Assignee: Fabian Hueske
>  Labels: github-import
>
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one 
> UDF call. Often these UDFs need to access the key that is common to all 
> records of a group.
> We could add a function to set a the key of a group before the UDF is called 
> (``setGroupKey()``) and a function to get the key (``getGroupKey()``) that 
> can be called from the UDF.
> What do you think about this?
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction, 
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-509.

   Resolution: Invalid
Fix Version/s: (was: pre-apache)

No virtual machine images are built by the Flink Apache infrastructure. Docker 
support has been added.

> Move virtual machines building process / hosting to Amazon S3 / EC2
> ---
>
> Key: FLINK-509
> URL: https://issues.apache.org/jira/browse/FLINK-509
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> The virtual machine images are currently hosted at on a very unreliable 
> server.
> I'd be happy if someone could come up with a more cost-efficient solution 
> than Amazon. But we need a reliable solution.
> * The hosting is unreliable
> * the automated build process stopped for some reason
> * there is no VM for the 0.4 release
> * there is no? error reporting
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/509
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: bug, build system, website, 
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Feb 26 10:49:33 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-509) Move virtual machines building process / hosting to Amazon S3 / EC2

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-509.
--

> Move virtual machines building process / hosting to Amazon S3 / EC2
> ---
>
> Key: FLINK-509
> URL: https://issues.apache.org/jira/browse/FLINK-509
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> The virtual machine images are currently hosted at on a very unreliable 
> server.
> I'd be happy if someone could come up with a more cost-efficient solution 
> than Amazon. But we need a reliable solution.
> * The hosting is unreliable
> * the automated build process stopped for some reason
> * there is no VM for the 0.4 release
> * there is no? error reporting
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/509
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: bug, build system, website, 
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Feb 26 10:49:33 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2444) Add tests for HadoopInputFormats

2015-08-06 Thread James Cao (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660371#comment-14660371
 ] 

James Cao commented on FLINK-2444:
--

Hi, for a sufficient test, what's the expected strategy?
The hive and hadoop community use hadoop minicluster to do end-to-end unit 
test. I tried run a flink word count task against the minicluster inside the 
ide, it takes about ~5s (including provisioning of the mini cluster, and tear 
down the cluster afterwards.) Is this an acceptable running time?
I guess if we use minicluster, we can get relative sufficient test for the 
HadoopInputFormats's wrapped "format" for both mapred and mapreduce style api, 
and it's probably not very easy to set up a mock test that simulate the hadoop 
fs environment. The problem with minicluster is that it's only available in 
hadoop2. So it's not available in hadoop1 profile. 

I think the issue I am working on [FLINK-1919] Hcatoutputformat also has a 
similar problem. Do we want to run the test against a mini-hive server in that 
case?

> Add tests for HadoopInputFormats
> 
>
> Key: FLINK-2444
> URL: https://issues.apache.org/jira/browse/FLINK-2444
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.10, 0.9.0
>Reporter: Fabian Hueske
>  Labels: starter
>
> The HadoopInputFormats and HadoopInputFormatBase classes are not sufficiently 
> covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop InputFormats 
> are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-266) Warn user if cluster did not come up as expected

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-266.
--

> Warn user if cluster did not come up as expected
> 
>
> Key: FLINK-266
> URL: https://issues.apache.org/jira/browse/FLINK-266
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
>
> While I did some work on a cluster, I was wondering why my job did not 
> utilize all TaskManagers.
> It seems that I started my job too early (before all TaskManager registered 
> with the JobManager) and therefore, the compiler did not consider them.
> We should either make the `start-cluster.sh` script blocking (with a 
> timeout). Or the pact-client.sh should report a warning if less TaskManagers 
> than expected (number in `slaves` file) are up.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/266
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, runtime, simple-issue, 
> Created at: Mon Nov 11 15:56:56 CET 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-266) Warn user if cluster did not come up as expected

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-266.

   Resolution: Won't Fix
Fix Version/s: (was: pre-apache)

The deployment model is different now, the pre-flight phase and optimizer do 
not connect to the cluster any more to gather the availability of resources. 
Therefor, this issue is not really an issue any more ;-)

> Warn user if cluster did not come up as expected
> 
>
> Key: FLINK-266
> URL: https://issues.apache.org/jira/browse/FLINK-266
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
>
> While I did some work on a cluster, I was wondering why my job did not 
> utilize all TaskManagers.
> It seems that I started my job too early (before all TaskManager registered 
> with the JobManager) and therefore, the compiler did not consider them.
> We should either make the `start-cluster.sh` script blocking (with a 
> timeout). Or the pact-client.sh should report a warning if less TaskManagers 
> than expected (number in `slaves` file) are up.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/266
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, runtime, simple-issue, 
> Created at: Mon Nov 11 15:56:56 CET 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-202) Workset Iterations: "No Match Found" Behaviour of Solution Set Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-202.
--

> Workset Iterations: "No Match Found" Behaviour of Solution Set Join
> ---
>
> Key: FLINK-202
> URL: https://issues.apache.org/jira/browse/FLINK-202
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: 0.8.0
>
>
> If I do a solution set match and there is no corresponding entry in the 
> solution set index a RuntimeException is thrown and the job fails. Therefore 
> the initial solution set must already contain every element which will be in 
> the final solution set.
> I'm not sure if this is a real limitation, but I find it inconvenient. When I 
> was playing around with the workset connected components I couldn't just use 
> parts of my test data, because it resulted in a solution set join where 
> records couldn't be matched.
> I just wanted to think out loudly here. Did anybody else find this 
> inconvenient? What alternative should we provide? 
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/202
> Created by: [uce|https://github.com/uce]
> Labels: core, enhancement, 
> Created at: Wed Oct 23 23:24:29 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-202) Workset Iterations: "No Match Found" Behaviour of Solution Set Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-202.

   Resolution: Fixed
Fix Version/s: (was: pre-apache)
   0.8.0

Solution set returns null on lookups of missing entries.

> Workset Iterations: "No Match Found" Behaviour of Solution Set Join
> ---
>
> Key: FLINK-202
> URL: https://issues.apache.org/jira/browse/FLINK-202
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: 0.8.0
>
>
> If I do a solution set match and there is no corresponding entry in the 
> solution set index a RuntimeException is thrown and the job fails. Therefore 
> the initial solution set must already contain every element which will be in 
> the final solution set.
> I'm not sure if this is a real limitation, but I find it inconvenient. When I 
> was playing around with the workset connected components I couldn't just use 
> parts of my test data, because it resulted in a solution set join where 
> records couldn't be matched.
> I just wanted to think out loudly here. Did anybody else find this 
> inconvenient? What alternative should we provide? 
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/202
> Created by: [uce|https://github.com/uce]
> Labels: core, enhancement, 
> Created at: Wed Oct 23 23:24:29 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660359#comment-14660359
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128446267
  
The bloom filters are stored in subregions of Flink's memory segments, not 
in any additional memory. 

That is very nice (occupies no extra memory), but requires to go against 
Flink's memory segments, rather than longs or byte arrays. Hence, the custom 
implementation.


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128446267
  
The bloom filters are stored in subregions of Flink's memory segments, not 
in any additional memory. 

That is very nice (occupies no extra memory), but requires to go against 
Flink's memory segments, rather than longs or byte arrays. Hence, the custom 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128445757
  
Hi,
this looks great indeed!

Just out of curiosity, why did you write your own bloom filter 
implementation and not use a ready one, e.g. from guava? I'm wondering because 
in #923 we also want to use a bloom filter for an approximate algorithm 
implementation.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7) [GitHub] Enable Range Partitioner

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-7:
-
Issue Type: Sub-task  (was: New Feature)
Parent: FLINK-598

> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660355#comment-14660355
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128445757
  
Hi,
this looks great indeed!

Just out of curiosity, why did you write your own bloom filter 
implementation and not use a ready one, e.g. from guava? I'm wondering because 
in #923 we also want to use a bloom filter for an approximate algorithm 
implementation.

Thanks!


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-100) Pact API Proposal: Add keyless CoGroup (send all to a single group)

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-100.

   Resolution: Won't Fix
Fix Version/s: (was: pre-apache)

This has been open for a year, and no one asked for it or implemented it. I 
think this is fair to close.

> Pact API Proposal: Add keyless CoGroup (send all to a single group)
> ---
>
> Key: FLINK-100
> URL: https://issues.apache.org/jira/browse/FLINK-100
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> I propose to add a keyless version of CoGroup that groups both inputs in a 
> single group, analogous to the keyless Reducer version that was added in 
> https://github.com/dimalabs/ozone/pull/61
> ```
> CoGroupContract myCoGroup = CoGroupContract.builder(MyUdf.class)
> .input1(contractA)
> .input2(contractB)
> .build();
> ```
> I have a use case where I need to process the output of two contracts in a 
> single udf and I currently have to use the workaround to add a constant field 
> and use this as grouping key.
> Adding a keyless version would reduce the overhead (network traffic, 
> serialization and code-writing) and give the compiler additional knowledge 
> (The compiler knows that there will be only a single group and a single udf 
> call. If setAvgRecordsEmittedPerStubCall is set, it could infer the output 
> cardinality)
> Furthermore I think this would be consequent, because CoGroup is like Reduce 
> for multiple inputs.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/100
> Created by: [andrehacker|https://github.com/andrehacker]
> Labels: enhancement, 
> Created at: Sat Sep 14 23:15:59 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-100) Pact API Proposal: Add keyless CoGroup (send all to a single group)

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-100.
--

> Pact API Proposal: Add keyless CoGroup (send all to a single group)
> ---
>
> Key: FLINK-100
> URL: https://issues.apache.org/jira/browse/FLINK-100
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>Priority: Minor
>  Labels: github-import
>
> I propose to add a keyless version of CoGroup that groups both inputs in a 
> single group, analogous to the keyless Reducer version that was added in 
> https://github.com/dimalabs/ozone/pull/61
> ```
> CoGroupContract myCoGroup = CoGroupContract.builder(MyUdf.class)
> .input1(contractA)
> .input2(contractB)
> .build();
> ```
> I have a use case where I need to process the output of two contracts in a 
> single udf and I currently have to use the workaround to add a constant field 
> and use this as grouping key.
> Adding a keyless version would reduce the overhead (network traffic, 
> serialization and code-writing) and give the compiler additional knowledge 
> (The compiler knows that there will be only a single group and a single udf 
> call. If setAvgRecordsEmittedPerStubCall is set, it could infer the output 
> cardinality)
> Furthermore I think this would be consequent, because CoGroup is like Reduce 
> for multiple inputs.
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/100
> Created by: [andrehacker|https://github.com/andrehacker]
> Labels: enhancement, 
> Created at: Sat Sep 14 23:15:59 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2240.
-
   Resolution: Implemented
Fix Version/s: 0.10

Implemented in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

Thank you for the contribution!

> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2240.
---

> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2492.
---

> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2492.
-
Resolution: Fixed

Fixed via 685086a3dd9afcec2eec76485298bc7b3f031a3c

> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660334#comment-14660334
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128442757
  
Oh, I forgot to add the closing message to the commit, so the ASF bot did 
not close the pull request. Can you close the pull request manually (only you 
as the owner can do that).


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128442757
  
Oh, I forgot to add the closing message to the commit, so the ASF bot did 
not close the pull request. Can you close the pull request manually (only you 
as the owner can do that).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660331#comment-14660331
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128441993
  
This was a super cool contribution. A pretty sophisticated addition, super 
testing, high code quality.
I am very impressed!

I hope you will contribute more to Flink. Already saw that you opened 
another pull request, for a sampling operator. Happy that this is happening :-)

In the future, I can hopefully review and merge the pull requests faster. 
The past weeks, I did not get to code work as much as I wanted, and the list of 
critical issues was long, so this pull request got delayed a bit.


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128441993
  
This was a super cool contribution. A pretty sophisticated addition, super 
testing, high code quality.
I am very impressed!

I hope you will contribute more to Flink. Already saw that you opened 
another pull request, for a sampling operator. Happy that this is happening :-)

In the future, I can hopefully review and merge the pull requests faster. 
The past weeks, I did not get to code work as much as I wanted, and the list of 
critical issues was long, so this pull request got delayed a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660322#comment-14660322
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128440090
  
Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

I added a commit on top to pass the flag to enable/disable bloom filters 
through the runtime configuration. That is the basis for later allowing it to 
enable it on a per-job basis. Also, we want to get rid of the 
`GlobalConfiguration` singleton pattern.


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-128440090
  
Manually merged in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

I added a commit on top to pass the flag to enable/disable bloom filters 
through the runtime configuration. That is the basis for later allowing it to 
enable it on a per-job basis. Also, we want to get rid of the 
`GlobalConfiguration` singleton pattern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...

2015-08-06 Thread StephanEwen
Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/995


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660316#comment-14660316
 ] 

ASF GitHub Bot commented on FLINK-2492:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/995


> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660315#comment-14660315
 ] 

ASF GitHub Bot commented on FLINK-2492:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128439329
  
Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c


> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128439329
  
Manually merged in 685086a3dd9afcec2eec76485298bc7b3f031a3c


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660245#comment-14660245
 ] 

ASF GitHub Bot commented on FLINK-2492:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128425652
  
+1 to merge, its just a simple renaming.


> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...

2015-08-06 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128425652
  
+1 to merge, its just a simple renaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API

2015-08-06 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/808#issuecomment-128420592
  
Hi @PieterJanVanAeken! Thanks for the update. It seems something went wrong 
with your merge. Your last commit shows 1000+ files changed... Could you try to 
rebase instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660221#comment-14660221
 ] 

ASF GitHub Bot commented on FLINK-1962:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/808#issuecomment-128420592
  
Hi @PieterJanVanAeken! Thanks for the update. It seems something went wrong 
with your merge. Your last commit shows 1000+ files changed... Could you try to 
rebase instead?


> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: PJ Van Aeken
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2452) Add a playcount threshold to the MusicProfiles example

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660179#comment-14660179
 ] 

ASF GitHub Bot commented on FLINK-2452:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128413716
  
Thanks @fhueske! Any comment @andralungu? Otherwise, I'll merge this :)


> Add a playcount threshold to the MusicProfiles example
> --
>
> Key: FLINK-2452
> URL: https://issues.apache.org/jira/browse/FLINK-2452
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>Priority: Minor
>
> In the MusicProfiles example, when creating the user-user similarity graph, 
> an edge is created between any 2 users that have listened to the same song 
> (even if once). Depending on the input data, this might produce a projection 
> graph with many more edges than the original user-song graph.
> To make this computation more efficient, this issue proposes adding a 
> user-defined parameter that filters out songs that a user has listened to 
> only a few times. Essentially, it is a threshold for playcount, above which a 
> user is considered to like a song.
> For reference, with a threshold value of 30, the whole Last.fm dataset is 
> analyzed on my laptop in a few minutes, while no threshold results in a 
> runtime of several hours.
> There are many solutions to this problem, but since this is just an example 
> (not a library method), I think that keeping it simple is important.
> Thanks to [~andralungu] for spotting the inefficiency!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2452] [Gelly] adds a playcount threshol...

2015-08-06 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/968#issuecomment-128413716
  
Thanks @fhueske! Any comment @andralungu? Otherwise, I'll merge this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2492] [runtime] Rename former 'match' c...

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128409801
  
This is cosmetic, to avoid confusion between old and new naming schemes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2492) Rename remaining runtime classes from "match" to "join"

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660162#comment-14660162
 ] 

ASF GitHub Bot commented on FLINK-2492:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/995#issuecomment-128409801
  
This is cosmetic, to avoid confusion between old and new naming schemes.


> Rename remaining runtime classes from "match" to "join"
> ---
>
> Key: FLINK-2492
> URL: https://issues.apache.org/jira/browse/FLINK-2492
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 0.10
>
>
> While working with the runtime join classes, I saw that many of them still 
> refer to the "join" as "match".
> Since all other parts now consistently refer to "join", we should adjust the 
> runtime classes as well. Makes it easier for new contributors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660151#comment-14660151
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128407712
  
Ups. That was not intended at all... Does not make any sense. Sorry. I 
didn't realize that I edited the wrong class because Tuple0Serializer is 
singleton, too... Just fixed it.


> Integrate Tuple0
> 
>
> Key: FLINK-2457
> URL: https://issues.apache.org/jira/browse/FLINK-2457
> Project: Flink
>  Issue Type: Improvement
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Minor
>
> Tuple0 is not cleanly integrated:
>   - missing serialization/deserialization support in runtime
>  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
> an instance of Tuple0
> Tuple0 is currently only used in Python API, but will be integrated into 
> Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128407712
  
Ups. That was not intended at all... Does not make any sense. Sorry. I 
didn't realize that I edited the wrong class because Tuple0Serializer is 
singleton, too... Just fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660147#comment-14660147
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128406575
  
Was it on purpose to add `readResolve()` to the serializer, rather than the 
tuple?


> Integrate Tuple0
> 
>
> Key: FLINK-2457
> URL: https://issues.apache.org/jira/browse/FLINK-2457
> Project: Flink
>  Issue Type: Improvement
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Minor
>
> Tuple0 is not cleanly integrated:
>   - missing serialization/deserialization support in runtime
>  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
> an instance of Tuple0
> Tuple0 is currently only used in Python API, but will be integrated into 
> Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-06 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-128406575
  
Was it on purpose to add `readResolve()` to the serializer, rather than the 
tuple?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >