date:20150721

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/926

[FLINK-2387] add streaming test case for live accumulators



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink live-accumulators-streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #926


commit e0668dc4a5e1bf23085c40d7abae4c8414c29707
Author: Maximilian Michels m...@apache.org
Date:   2015-07-21T14:54:26Z

[FLINK-2387] add streaming test case for live accumulators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635295#comment-14635295
 ] 

ASF GitHub Bot commented on FLINK-2387:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/926

[FLINK-2387] add streaming test case for live accumulators



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink live-accumulators-streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #926


commit e0668dc4a5e1bf23085c40d7abae4c8414c29707
Author: Maximilian Michels m...@apache.org
Date:   2015-07-21T14:54:26Z

[FLINK-2387] add streaming test case for live accumulators




 Add test for live accumulators in Streaming
 ---

 Key: FLINK-2387
 URL: https://issues.apache.org/jira/browse/FLINK-2387
 Project: Flink
  Issue Type: Test
  Components: Streaming
Reporter: Maximilian Michels





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2375] Add Approximate Adamic Adar Simil...

2015-07-21 Thread vasia

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/923#issuecomment-123394302
  
Hi @shghatge!

The comments I left on #892 apply here as well. The difference would be 
that the neighborhoods of each of the neighbors will be represented as a bloom 
filter.

It would also be nice to make the bloom filter parameters (size, number of 
hash functions, hash function) configurable, so that the use can adjust the 
false positives and size based on their use-case.
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2385) Scala DataSet.distinct should have parenthesis

Stephan Ewen created FLINK-2385:
---

 Summary: Scala DataSet.distinct should have parenthesis
 Key: FLINK-2385
 URL: https://issues.apache.org/jira/browse/FLINK-2385
 Project: Flink
  Issue Type: Bug
  Components: Scala API
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


The method is not a side-effect free accessor, but defines heavy computation, 
even if it does not mutate the original data set.

This is a somewhat API breaking change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API

2015-07-21 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-2386:
-

 Summary: Implement Kafka connector using the new Kafka Consumer API
 Key: FLINK-2386
 URL: https://issues.apache.org/jira/browse/FLINK-2386
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: Robert Metzger


Once Kafka has released its new consumer API, we should provide a connector for 
that version.
The release will probably be called 0.9 or 0.8.3.

The connector will be mostly compatible with Kafka 0.8.2.x, except for 
committing offsets to the broker (the new connector expects a coordinator to be 
available on Kafka). To work around that, we can provide a configuration option 
to commit offsets to zookeeper (managed by flink code).
For 0.9/0.8.3 it will be fully compatible.

It will not be compatible with 0.8.1 because of mismatching Kafka messages.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2382) Live Metric Reporting Does Not Work for Two-Input StreamTasks

2015-07-21 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-2382.
---
Resolution: Fixed

Fixed in 1d373a7.

 Live Metric Reporting Does Not Work for Two-Input StreamTasks
 -

 Key: FLINK-2382
 URL: https://issues.apache.org/jira/browse/FLINK-2382
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Aljoscha Krettek
Assignee: Maximilian Michels

 Also, there are no tests for the live metrics in streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2371) AccumulatorLiveITCase fails

2015-07-21 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-2371.
---
Resolution: Fixed

 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2387) Add test for live accumulators in Streaming

2015-07-21 Thread Maximilian Michels (JIRA)

Maximilian Michels created FLINK-2387:
-

 Summary: Add test for live accumulators in Streaming
 Key: FLINK-2387
 URL: https://issues.apache.org/jira/browse/FLINK-2387
 Project: Flink
  Issue Type: Test
  Components: Streaming
Reporter: Maximilian Michels






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2376] Raise the time limit in testFindC...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/920#issuecomment-123359910
  
Looks good, +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/887#issuecomment-123361546
  
Now that #896 is in, this becomes mergeable...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-123370928
  
Looks good. I will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2375) Add Approximate Adamic Adar Similarity method using BloomFilters

[
https://issues.apache.org/jira/browse/FLINK-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635360#comment-14635360
]

ASF GitHub Bot commented on FLINK-2375:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/923#issuecomment-123394302

Hi @shghatge!

The comments I left on #892 apply here as well. The difference would be
that the neighborhoods of each of the neighbors will be represented as a bloom
filter.

It would also be nice to make the bloom filter parameters (size, number of
hash functions, hash function) configurable, so that the use can adjust the
false positives and size based on their use-case.
What do you think?

Add Approximate Adamic Adar Similarity method using BloomFilters

Key: FLINK-2375
URL: https://issues.apache.org/jira/browse/FLINK-2375
Project: Flink
Issue Type: Task
Components: Gelly
Reporter: Shivani Ghatge
Assignee: Shivani Ghatge
Priority: Minor

Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a
set of nodes. However, instead of counting the common neighbors and dividing
them by the total number of neighbors, the similarity is weighted according
to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
The Adamic-Adar algorithm can be broken into three steps:
1). For each vertex, compute the log of its inverse degrees (with the formula
above) and set it as the vertex value.
2). Each vertex will then send this new computed value along with a list of
neighbors to the targets of its out-edges
3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of
log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is
the degree of node n). See [2]
Using BloomFilters we increase the scalability of the algorithm. The values
calculated for the edges will be approximate.
Prerequisites:
Full understanding of the Jaccard Similarity Measure algorithm
Reading the associated literature:
[1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
[2]
http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1943] [gelly] Added GSA compiler and tr...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/916#issuecomment-123362529
  
Super, +1

The test failures are unrelated. (One of the unstable tests is fixed 
already).

Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1943) Add Gelly-GSA compiler and translation tests


[ 
https://issues.apache.org/jira/browse/FLINK-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635226#comment-14635226
 ] 

ASF GitHub Bot commented on FLINK-1943:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/916#issuecomment-123362529
  
Super, +1

The test failures are unrelated. (One of the unstable tests is fixed 
already).

Will merge this...


 Add Gelly-GSA compiler and translation tests
 

 Key: FLINK-1943
 URL: https://issues.apache.org/jira/browse/FLINK-1943
 Project: Flink
  Issue Type: Test
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri

 These should be similar to the corresponding Spargel tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2312) Random Splits

[
https://issues.apache.org/jira/browse/FLINK-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635328#comment-14635328
]

ASF GitHub Bot commented on FLINK-2312:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/921#issuecomment-123390786

This leads to non-mutually exclusive splits. I tracked down the reason for
this: The input data is parallelized differently while performing the splits
for every fraction. This leads to an altogether different sequence of random
numbers, hence the problem.
@tillrohrmann, I use a seed value to initialize, as in the cross-validation
PR. Is there any way I can fix the parallelization of data, so running the
split for every fraction leads to exactly same sequence of numbers. Persist
would write the data to disk, which is something of an overhead isn't it?

Random Splits
-

Key: FLINK-2312
URL: https://issues.apache.org/jira/browse/FLINK-2312
Project: Flink
Issue Type: Wish
Components: Machine Learning Library
Reporter: Maximilian Alber
Assignee: pietro pinoli
Priority: Minor

In machine learning applications it is common to split data sets into f.e.
training and testing set.
To the best of my knowledge there is at the moment no nice way in Flink to
split a data set randomly into several partitions according to some ratio.
The wished semantic would be the same as of Sparks RDD randomSplit.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2312][utils] Randomly Splitting a Data ...

2015-07-21 Thread sachingoel0101

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/921#issuecomment-123390786
  
This leads to non-mutually exclusive splits. I tracked down the reason for 
this: The input data is parallelized differently while performing the splits 
for every fraction. This leads to an altogether different sequence of random 
numbers, hence the problem. 
@tillrohrmann, I use a seed value to initialize, as in the cross-validation 
PR. Is there any way I can fix the parallelization of data, so running the 
split for every fraction leads to exactly same sequence of numbers. Persist 
would write the data to disk, which is something of an overhead isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-07-21 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635206#comment-14635206
 ] 

Till Rohrmann commented on FLINK-1901:
--

I think this solution is indeed a little bit too hacky. It would  be very 
unintuitive for the user having to broadcast the iteration {{DataSet}} to the 
sampling operator. Furthermore, this will inflict unnecessary network I/O.

I think we should try to solve this problem properly. This means that we have a 
single sampling operator which works inside and outside of iterations. This 
will also avoid code duplication.

 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API

2015-07-21 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635243#comment-14635243
 ] 

Robert Metzger commented on FLINK-2386:
---

My current work in progress code is located here: 
https://github.com/rmetzger/flink/tree/flink2386
It contains a full copy of the new kafka consumer code (because it is not yet 
released).
I'll probably finish the changes once Kafka has released the new consumer.

 Implement Kafka connector using the new Kafka Consumer API
 --

 Key: FLINK-2386
 URL: https://issues.apache.org/jira/browse/FLINK-2386
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: Robert Metzger

 Once Kafka has released its new consumer API, we should provide a connector 
 for that version.
 The release will probably be called 0.9 or 0.8.3.
 The connector will be mostly compatible with Kafka 0.8.2.x, except for 
 committing offsets to the broker (the new connector expects a coordinator to 
 be available on Kafka). To work around that, we can provide a configuration 
 option to commit offsets to zookeeper (managed by flink code).
 For 0.9/0.8.3 it will be fully compatible.
 It will not be compatible with 0.8.1 because of mismatching Kafka messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser


[ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635276#comment-14635276
 ] 

ASF GitHub Bot commented on FLINK-2018:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-123370928
  
Looks good. I will merge this...


 Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
 argument parser
 --

 Key: FLINK-2018
 URL: https://issues.apache.org/jira/browse/FLINK-2018
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger
Priority: Minor
  Labels: starter

 In FLINK-1525 we've added the {{ParameterTool}}.
 For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
 provide a compatible parser.
 See: 
 https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
 {code}
 @Test
 public void testFromGenericOptionsParser() {
   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
 String[]{-D, input=myinput, -DexpectedCount=15});
   validate(parameter);
 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource


[ 
https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635537#comment-14635537
 ] 

ASF GitHub Bot commented on FLINK-2322:
---

GitHub user tedyu opened a pull request:

https://github.com/apache/flink/pull/928

FLINK-2322 Unclosed stream may leak resource

What size should I use for tab ?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #928


commit c4a16ce1910b7ac53ee63a4ac9c1d85299e16204
Author: tedyu yuzhih...@gmail.com
Date:   2015-07-21T18:13:17Z

FLINK-2322 Unclosed stream may leak resource




 Unclosed stream may leak resource
 -

 Key: FLINK-2322
 URL: https://issues.apache.org/jira/browse/FLINK-2322
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
  Labels: starter

 In UdfAnalyzerUtils.java :
 {code}
 ClassReader cr = new 
 ClassReader(Thread.currentThread().getContextClassLoader()
 .getResourceAsStream(internalClassName.replace('.', '/') + 
 .class));
 {code}
 The stream returned by getResourceAsStream() should be closed upon exit of 
 findMethodNode()
 In ParameterTool#fromPropertiesFile():
 {code}
 props.load(new FileInputStream(propertiesFile));
 {code}
 The FileInputStream should be closed before returning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section


[ 
https://issues.apache.org/jira/browse/FLINK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635462#comment-14635462
 ] 

ASF GitHub Bot commented on FLINK-2205:
---

GitHub user ebautistabar opened a pull request:

https://github.com/apache/flink/pull/927

[FLINK-2205] Fix confusing entries in JM UI Job Config. section

Default display for 'Number of execution retries' is now 'deactivated' and 
for 'Job parallelism' is 'auto', as suggested in JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ebautistabar/flink change-job-config-display

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #927


commit ba7534ce56d5329a25d484d0e06aa36363781a65
Author: Enrique Bautista ebautista...@gmail.com
Date:   2015-07-21T16:52:05Z

[FLINK-2205] Fix confusing entries in JM UI Job Config. section

Default display for 'Number of execution retries' is now 'deactivated'
and for 'Job parallelism' is 'auto', as suggested in JIRA.




 Confusing entries in JM Webfrontend Job Configuration section
 -

 Key: FLINK-2205
 URL: https://issues.apache.org/jira/browse/FLINK-2205
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 0.9
Reporter: Fabian Hueske
Priority: Minor
  Labels: starter

 The Job Configuration section of the job history / analyze page of the 
 JobManager webinterface contains two confusing entries:
 - {{Number of execution retries}} is actually the maximum number of retries 
 and should be renamed accordingly. The default value is -1 and should be 
 changed to deactivated (or 0).
 - {{Job parallelism}} which is -1 by default. A parallelism of -1 is not very 
 meaningful. It would be better to show something like auto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2388) JobManager should try retrieving jobs from archive

2015-07-21 Thread Enrique Bautista Barahona (JIRA)

Enrique Bautista Barahona created FLINK-2388:


 Summary: JobManager should try retrieving jobs from archive
 Key: FLINK-2388
 URL: https://issues.apache.org/jira/browse/FLINK-2388
 Project: Flink
  Issue Type: Task
  Components: JobManager
Reporter: Enrique Bautista Barahona


I was following the quickstart guide with the WordCount example and, when I 
entered the analyze page for the job, nothing came up. Apparently the 
JobManagerInfoServlet fails with a NullPointerException.

I've been reading the code and I've seen the problem is in the processing of 
the RequestAccumulatorResultsStringified message in JobManager. There's a TODO 
where the accumulators should be retrieved from the archive.

As I wanted to know more about Flink internals, I decided to try and fix it. 
I've later seen that there's currently ongoing work in that part of the code, 
so I guess maybe it's not needed, but if you want I could submit a PR. If you 
have already taken it into account and will solve ti shortly, please feel free 
to close the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files


[ 
https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635396#comment-14635396
 ] 

ASF GitHub Bot commented on FLINK-1520:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/847#issuecomment-123402206
  
I see your point @shghatge. 
However, I think naming just one method differently will be confusing..
If we're going to have custom method names, let's go with @andralungu's 
suggestion above and make sure we document these properly.
I would prefer a bit shorter method names though.
How about:
1). `keyType(K)`
2). `vertexTypes(K, VV)`
3). `edgeTypes(K, EV)`
4). `types(K, VV, EV)`
?


 Read edges and vertices from CSV files
 --

 Key: FLINK-1520
 URL: https://issues.apache.org/jira/browse/FLINK-1520
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Shivani Ghatge
Priority: Minor
  Labels: easyfix, newbie

 Add methods to create Vertex and Edge Datasets directly from CSV file inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2205] Fix confusing entries in JM UI Jo...

2015-07-21 Thread ebautistabar

GitHub user ebautistabar opened a pull request:

https://github.com/apache/flink/pull/927

[FLINK-2205] Fix confusing entries in JM UI Job Config. section

Default display for 'Number of execution retries' is now 'deactivated' and 
for 'Job parallelism' is 'auto', as suggested in JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ebautistabar/flink change-job-config-display

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #927


commit ba7534ce56d5329a25d484d0e06aa36363781a65
Author: Enrique Bautista ebautista...@gmail.com
Date:   2015-07-21T16:52:05Z

[FLINK-2205] Fix confusing entries in JM UI Job Config. section

Default display for 'Number of execution retries' is now 'deactivated'
and for 'Job parallelism' is 'auto', as suggested in JIRA.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2388) JobManager should try retrieving jobs from archive

2015-07-21 Thread Enrique Bautista Barahona (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Enrique Bautista Barahona updated FLINK-2388:
-
Description:
I was following the quickstart guide with the WordCount example and, when I
entered the analyze page for the job, nothing came up. Apparently the
JobManagerInfoServlet fails with a NullPointerException.

I've been reading the code and I've seen the problem is in the processing of
the RequestAccumulatorResultsStringified message in JobManager. There's a TODO
where the accumulators should be retrieved from the archive.

As I wanted to know more about Flink internals, I decided to try and fix it.
I've later seen that there's currently ongoing work in that part of the code,
so I guess maybe it's not needed, but if you want I could submit a PR. If you
have already taken it into account and will solve it shortly, please feel free
to close the issue.

was:
I was following the quickstart guide with the WordCount example and, when I
entered the analyze page for the job, nothing came up. Apparently the
JobManagerInfoServlet fails with a NullPointerException.

JobManager should try retrieving jobs from archive
--

Key: FLINK-2388
URL: https://issues.apache.org/jira/browse/FLINK-2388
Project: Flink
Issue Type: Task
Components: JobManager
Reporter: Enrique Bautista Barahona

I was following the quickstart guide with the WordCount example and, when I
entered the analyze page for the job, nothing came up. Apparently the
JobManagerInfoServlet fails with a NullPointerException.
I've been reading the code and I've seen the problem is in the processing of
the RequestAccumulatorResultsStringified message in JobManager. There's a
TODO where the accumulators should be retrieved from the archive.
As I wanted to know more about Flink internals, I decided to try and fix it.
I've later seen that there's currently ongoing work in that part of the code,
so I guess maybe it's not needed, but if you want I could submit a PR. If you
have already taken it into account and will solve it shortly, please feel
free to close the issue.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636124#comment-14636124
 ] 

Chengxiang Li commented on FLINK-1901:
--

every point is sampled with probability 1/N is one of the sampling 
case(sampling with fraction, without replacement), there are 3 others kind of 
sampling case which is normally used as well, like sampling with fraction, 
with replacement, sampling with fixed size, without replacement and 
sampling with fixed size, with replacement. We should support all of them 
while expose a sampling operator to user. 

 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634622#comment-14634622
 ] 

Chengxiang Li edited comment on FLINK-1901 at 7/22/15 2:05 AM:
---

To randomly choose a sample from a DataSet S, basically, there exists two kinds 
of sample requirement: sampling with fraction(such as randomly choose 5% 
percent items in S) and sampling with fixed size(such as randomly choose 100 
items from S). Besides, we do not know the size of S, unless we take extra 
cost to computer it through DataSet::count().
# Sampling with fraction
#* With replacement: the expected sample size follow [Poisson 
Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, 
so Poisson Sampling can be used to choose the sample items.
#* Without replacement: during sampling, we can take the sample of each item in 
iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial].
# Sampling with fixed size
#* Use DataSet::count() to get the dataset size, with the fixed size, we can 
turn this into sampling with factor.
#* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is 
another commonly used algorithm to randomly choose a sample of k items from a 
list S containing n items, where n is either a very large or unknown number, 
and there are different reservoir sampling algorithms that support reservoir 
support both sampling with replacement and sampling without replacement.



was (Author: chengxiang li):
To randomly choose a sample from a DataSet S, basically, there exists two kinds 
of sample requirement: sampling with factor(such as randomly choose 5% percent 
items in S) and sampling with fixed size(such as randomly choose 100 items 
from S). Besides, we do not know the size of S, unless we take extra cost to 
computer it through DataSet::count().
# Sampling with factor
#* With replacement: the expected sample size follow [Poisson 
Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, 
so Poisson Sampling can be used to choose the sample items.
#* Without replacement: during sampling, we can take the sample of each item in 
iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial].
# Sampling with fixed size
#* Use DataSet::count() to get the dataset size, with the fixed size, we can 
turn this into sampling with factor.
#* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is 
another commonly used algorithm to randomly choose a sample of k items from a 
list S containing n items, where n is either a very large or unknown number, 
and there are different reservoir sampling algorithms that support reservoir 
support both sampling with replacement and sampling without replacement.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636332#comment-14636332
 ] 

Chengxiang Li commented on FLINK-1901:
--

I write a simple example of sampling operator at 
[here|https://github.com/ChengXiangLi/flink/blob/FLINK-1901/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/TestSample.java],
 it works just as expected inside or outside of iteration(for example, 1000 
items, sample fraction 0.5, after 3 iterations, output contains around 125 
items), [~trohrm...@apache.org], i'm not sure whether i understand you 
correctly about sampling inside iteration, it looks same to me if it's the case 
in the example. 

 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException

2015-07-21 Thread Huang Wei (JIRA)

Huang Wei created FLINK-2391:


 Summary: Storm-compatibility:method 
FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
 Key: FLINK-2391
 URL: https://issues.apache.org/jira/browse/FLINK-2391
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.10
 Environment: win7 32bit;linux
Reporter: Huang Wei
 Fix For: 0.10


core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
FlinkOutputFieldsDeclarer).

code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));

in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2362) distinct is missing in DataSet API documentation


 [ 
https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2362.
---

 distinct is missing in DataSet API documentation
 

 Key: FLINK-2362
 URL: https://issues.apache.org/jira/browse/FLINK-2362
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Java API, Scala API
Affects Versions: 0.9, 0.10
Reporter: Fabian Hueske
Assignee: pietro pinoli
 Fix For: 0.10, 0.9.1


 The DataSet transformation {{distinct}} is not described or listed in the 
 documentation. It is not contained in the DataSet API programming guide 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html)
  and not in the DataSet Transformation 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2376) testFindConnectableAddress sometimes fails on Windows because of the time limit


 [ 
https://issues.apache.org/jira/browse/FLINK-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2376.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via bbbfd22077623633363d656d66a00028e7fea631

 testFindConnectableAddress sometimes fails on Windows because of the time 
 limit
 ---

 Key: FLINK-2376
 URL: https://issues.apache.org/jira/browse/FLINK-2376
 Project: Flink
  Issue Type: Bug
 Environment: Windows
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.10






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2377.
---

 AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
 --

 Key: FLINK-2377
 URL: https://issues.apache.org/jira/browse/FLINK-2377
 Project: Flink
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.10


 This is probably another file closing issue. (that is, Windows won't delete 
 open files, as opposed to Linux)
 I have encountered two concrete tests so far where this actually appears: 
 CsvOutputFormatITCase and CollectionSourceTest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2198) BlobManager tests fail on Windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2198.
-
   Resolution: Fixed
 Assignee: Gabor Gevay
Fix Version/s: 0.10

Fixed via 50344d70deb6348923f42a1ddaaf0e570708bc71

 BlobManager tests fail on Windows
 -

 Key: FLINK-2198
 URL: https://issues.apache.org/jira/browse/FLINK-2198
 Project: Flink
  Issue Type: Bug
  Components: Build System, Tests
Affects Versions: 0.9
 Environment: Windows 8.1, Java 7, Maven 3.3.3
Reporter: Fabian Hueske
Assignee: Gabor Gevay
 Fix For: 0.10


 Building Flink on Windows using {{mvn clean install}} fails with the 
 following error:
 {code}
 BlobUtilsTest.before:45 null
   BlobUtilsTest.before:45 null
   BlobServerDeleteTest.testDeleteFails:291 null
   BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
 remove write permissions from cache directory
   BlobServerPutTest.testPutBufferFails:224 null
   BlobServerPutTest.testPutNamedBufferFails:286 null
   JobManagerStartupTest.before:55 null
   JobManagerStartupTest.before:55 null
   DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
 not been removed
   DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
 has not been removed
   TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
 timeout (19998080696 nanoseconds) during expectMsgClass waiting for
 class 
 org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
   TaskManagerProcessReapingTest.testReapProcessOnFailure:133
 TaskManager process did not launch the TaskManager properly. Failed to
 look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2198) BlobManager tests fail on Windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2198.
---

 BlobManager tests fail on Windows
 -

 Key: FLINK-2198
 URL: https://issues.apache.org/jira/browse/FLINK-2198
 Project: Flink
  Issue Type: Bug
  Components: Build System, Tests
Affects Versions: 0.9
 Environment: Windows 8.1, Java 7, Maven 3.3.3
Reporter: Fabian Hueske
Assignee: Gabor Gevay
 Fix For: 0.10


 Building Flink on Windows using {{mvn clean install}} fails with the 
 following error:
 {code}
 BlobUtilsTest.before:45 null
   BlobUtilsTest.before:45 null
   BlobServerDeleteTest.testDeleteFails:291 null
   BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
 remove write permissions from cache directory
   BlobServerPutTest.testPutBufferFails:224 null
   BlobServerPutTest.testPutNamedBufferFails:286 null
   JobManagerStartupTest.before:55 null
   JobManagerStartupTest.before:55 null
   DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
 not been removed
   DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
 has not been removed
   TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
 timeout (19998080696 nanoseconds) during expectMsgClass waiting for
 class 
 org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
   TaskManagerProcessReapingTest.testReapProcessOnFailure:133
 TaskManager process did not launch the TaskManager properly. Failed to
 look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser


 [ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2018.
---

 Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
 argument parser
 --

 Key: FLINK-2018
 URL: https://issues.apache.org/jira/browse/FLINK-2018
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger
Priority: Minor
  Labels: starter
 Fix For: 0.10


 In FLINK-1525 we've added the {{ParameterTool}}.
 For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
 provide a compatible parser.
 See: 
 https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
 {code}
 @Test
 public void testFromGenericOptionsParser() {
   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
 String[]{-D, input=myinput, -DexpectedCount=15});
   validate(parameter);
 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser


 [ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2018.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via 148395bcd81a93bcb1473e4e93f267edb3b71c7e

 Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
 argument parser
 --

 Key: FLINK-2018
 URL: https://issues.apache.org/jira/browse/FLINK-2018
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger
Priority: Minor
  Labels: starter
 Fix For: 0.10


 In FLINK-1525 we've added the {{ParameterTool}}.
 For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
 provide a compatible parser.
 See: 
 https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
 {code}
 @Test
 public void testFromGenericOptionsParser() {
   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
 String[]{-D, input=myinput, -DexpectedCount=15});
   validate(parameter);
 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2389) Add dashboard frontend architecture amd build infrastructue


 [ 
https://issues.apache.org/jira/browse/FLINK-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2389.
-
Resolution: Fixed

Fixed via d59cebd8c3f643dc6da88924300e78f65f26c640 and 
ea2b1b139eb221b2b1a81ee776a04e711c3f8af4

(caveat: wrong issue number accidentally committed)

 Add dashboard frontend architecture amd build infrastructue
 ---

 Key: FLINK-2389
 URL: https://issues.apache.org/jira/browse/FLINK-2389
 Project: Flink
  Issue Type: Sub-task
  Components: Webfrontend
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Add the build infrastructure and basic libraries for a modern and modular web 
 dashboard.
 The dashboard is going to be built using angular.js.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2389) Add dashboard frontend architecture amd build infrastructue

Stephan Ewen created FLINK-2389:
---

 Summary: Add dashboard frontend architecture amd build 
infrastructue
 Key: FLINK-2389
 URL: https://issues.apache.org/jira/browse/FLINK-2389
 Project: Flink
  Issue Type: Sub-task
  Components: Webfrontend
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


Add the build infrastructure and basic libraries for a modern and modular web 
dashboard.

The dashboard is going to be built using angular.js.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource


[ 
https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635667#comment-14635667
 ] 

ASF GitHub Bot commented on FLINK-2322:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/928#issuecomment-123453516
  
You should use tabs ;-)

Otherwise this is a good fix. To optimize it: You need not close the stream 
in two places, the finally clause will do it anyways.


 Unclosed stream may leak resource
 -

 Key: FLINK-2322
 URL: https://issues.apache.org/jira/browse/FLINK-2322
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
  Labels: starter

 In UdfAnalyzerUtils.java :
 {code}
 ClassReader cr = new 
 ClassReader(Thread.currentThread().getContextClassLoader()
 .getResourceAsStream(internalClassName.replace('.', '/') + 
 .class));
 {code}
 The stream returned by getResourceAsStream() should be closed upon exit of 
 findMethodNode()
 In ParameterTool#fromPropertiesFile():
 {code}
 props.load(new FileInputStream(propertiesFile));
 {code}
 The FileInputStream should be closed before returning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2388) JobManager should try retrieving jobs from archive

2015-07-21 Thread Enrique Bautista Barahona (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

If you want to take a look, the commit is here:
https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b

JobManager should try retrieving jobs from archive
--

Key: FLINK-2388
URL: https://issues.apache.org/jira/browse/FLINK-2388
Project: Flink
Issue Type: Task
Components: JobManager
Reporter: Enrique Bautista Barahona

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1228) Add REST Interface to JobManager

[
https://issues.apache.org/jira/browse/FLINK-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635638#comment-14635638
]

ASF GitHub Bot commented on FLINK-1228:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/297

Add REST Interface to JobManager

Key: FLINK-1228
URL: https://issues.apache.org/jira/browse/FLINK-1228
Project: Flink
Issue Type: Improvement
Reporter: Arvid Heise

For rolling out jobs to an external cluster, we currently have 3 choices:
a) Manual submission with Web Interface
b) Automatic/Manual submission with CLClient
c) Automatic submission with custom client
I propose to add a way to submit jobs automatically through a HTTP Rest
Interface. Among other benefits, this extension allows an automatic
submission of jobs through a restrictive proxy.
Rough idea:
The web interface would offer a REST entry point for example /jobs. POSTing
to this entry point allows the submission of a new job and returns the job
URL. http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html
GETting the job URL returns a small status.
DELETING the job URL aborts the job.
GETting on the /jobs returns a list of active and scheduled jobs.
Since Flink already has a Jetty web server and uses Json for other services,
the basic extension should require low effort. It would help Flink to be used
inside larger corporations and align the interfaces with the other
state-of-the-art MapReduce systems (s3, HDFS, HBase all have HTTP interface).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2376) testFindConnectableAddress sometimes fails on Windows because of the time limit


[ 
https://issues.apache.org/jira/browse/FLINK-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635633#comment-14635633
 ] 

ASF GitHub Bot commented on FLINK-2376:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/920


 testFindConnectableAddress sometimes fails on Windows because of the time 
 limit
 ---

 Key: FLINK-2376
 URL: https://issues.apache.org/jira/browse/FLINK-2376
 Project: Flink
  Issue Type: Bug
 Environment: Windows
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2362) distinct is missing in DataSet API documentation


[ 
https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635630#comment-14635630
 ] 

ASF GitHub Bot commented on FLINK-2362:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/922


 distinct is missing in DataSet API documentation
 

 Key: FLINK-2362
 URL: https://issues.apache.org/jira/browse/FLINK-2362
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Java API, Scala API
Affects Versions: 0.9, 0.10
Reporter: Fabian Hueske
Assignee: pietro pinoli
 Fix For: 0.10, 0.9.1


 The DataSet transformation {{distinct}} is not described or listed in the 
 documentation. It is not contained in the DataSet API programming guide 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html)
  and not in the DataSet Transformation 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows


[ 
https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635631#comment-14635631
 ] 

ASF GitHub Bot commented on FLINK-2377:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/924


 AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
 --

 Key: FLINK-2377
 URL: https://issues.apache.org/jira/browse/FLINK-2377
 Project: Flink
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Gabor Gevay
Priority: Minor

 This is probably another file closing issue. (that is, Windows won't delete 
 open files, as opposed to Linux)
 I have encountered two concrete tests so far where this actually appears: 
 CsvOutputFormatITCase and CollectionSourceTest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2377] Add reader.close() to readAllResu...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/924


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2376] Raise the time limit in testFindC...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1938] Add Grunt for building the front-...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/623


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1938) Add Grunt for building the front-end


[ 
https://issues.apache.org/jira/browse/FLINK-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635636#comment-14635636
 ] 

ASF GitHub Bot commented on FLINK-1938:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/623


 Add Grunt for building the front-end
 

 Key: FLINK-1938
 URL: https://issues.apache.org/jira/browse/FLINK-1938
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Webfrontend
Reporter: Vikhyat Korrapati
Priority: Minor

 This is the first step towards implementing the web interface refactoring I 
 proposed last year: 
 https://groups.google.com/forum/#!topic/stratosphere-dev/GeXmDXF9DOY
 Once this is merged, I can get started with the rest of the refactoring. For 
 now, the actual interface is kept the same, the only change is to how the 
 build is done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2377.
-
   Resolution: Fixed
 Assignee: Gabor Gevay
Fix Version/s: 0.10

Fixed via 29454de57edeeeb8b7ec404bfb8c08cba6e45781

 AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
 --

 Key: FLINK-2377
 URL: https://issues.apache.org/jira/browse/FLINK-2377
 Project: Flink
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.10


 This is probably another file closing issue. (that is, Windows won't delete 
 open files, as opposed to Linux)
 I have encountered two concrete tests so far where this actually appears: 
 CsvOutputFormatITCase and CollectionSourceTest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/720


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2374] Under Windows, skip tests that us...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/919


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2362] - distinct is missing in DataSet ...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/922


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1228] Add REST Interface to JobManager

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/297


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1943] [gelly] Added GSA compiler and tr...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-297] [web frontend] First part of JobMa...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [docs] Document some missing configuration par...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/915


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2362) distinct is missing in DataSet API documentation


 [ 
https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2362.
-
Resolution: Fixed

Fixed in 0.10 via 0818bec02c0bc90bae68f09d8d2e48dc0f6b9374

 distinct is missing in DataSet API documentation
 

 Key: FLINK-2362
 URL: https://issues.apache.org/jira/browse/FLINK-2362
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Java API, Scala API
Affects Versions: 0.9, 0.10
Reporter: Fabian Hueske
Assignee: pietro pinoli
 Fix For: 0.10, 0.9.1


 The DataSet transformation {{distinct}} is not described or listed in the 
 documentation. It is not contained in the DataSet API programming guide 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html)
  and not in the DataSet Transformation 
 (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-297) Redesign GUI client-server model


[ 
https://issues.apache.org/jira/browse/FLINK-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635632#comment-14635632
 ] 

ASF GitHub Bot commented on FLINK-297:
--

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/677


 Redesign GUI client-server model
 

 Key: FLINK-297
 URL: https://issues.apache.org/jira/browse/FLINK-297
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 Factor out job manager status information as REST service running inside the 
 same process. Implement visualization server as a separate web application 
 that runs on the client-side and renders data fetched from from the job 
 manager RESTful API.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/297
 Created by: [aalexandrov|https://github.com/aalexandrov]
 Labels: enhancement, gui, 
 Created at: Tue Nov 26 14:54:53 CET 2013
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1943) Add Gelly-GSA compiler and translation tests


[ 
https://issues.apache.org/jira/browse/FLINK-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635635#comment-14635635
 ] 

ASF GitHub Bot commented on FLINK-1943:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/916


 Add Gelly-GSA compiler and translation tests
 

 Key: FLINK-1943
 URL: https://issues.apache.org/jira/browse/FLINK-1943
 Project: Flink
  Issue Type: Test
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri

 These should be similar to the corresponding Spargel tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2374) File.setWritable doesn't work on Windows, which makes several tests fail


[ 
https://issues.apache.org/jira/browse/FLINK-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635634#comment-14635634
 ] 

ASF GitHub Bot commented on FLINK-2374:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/919


 File.setWritable doesn't work on Windows, which makes several tests fail
 

 Key: FLINK-2374
 URL: https://issues.apache.org/jira/browse/FLINK-2374
 Project: Flink
  Issue Type: Bug
 Environment: Windows
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Trivial

 The tests that use setWritable to test the handling of certain error 
 conditions should be skipped in case we are under Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent


[ 
https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635862#comment-14635862
 ] 

ASF GitHub Bot commented on FLINK-1658:
---

GitHub user mjsax opened a pull request:

https://github.com/apache/flink/pull/929

[FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent

  - deleted AbstractEvent in org.apache.flink.runtime.event.job
(see comment section in 
https://issues.apache.org/jira/browse/FLINK-1658)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/flink flink-1658-AbstractEvent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #929


commit 3cc8e3427fbfb61690417d6d1ee62745e2018662
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-07-21T19:54:08Z

[FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
  - deleted AbstractEvent in org.apache.flink.runtime.event.job
(see comment section in 
https://issues.apache.org/jira/browse/FLINK-1658)




 Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
 --

 Key: FLINK-1658
 URL: https://issues.apache.org/jira/browse/FLINK-1658
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime, Local Runtime
Reporter: Gyula Fora
Assignee: Matthias J. Sax
Priority: Trivial

 The same name is used for different event classes in the runtime which can 
 cause confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2390) Replace iteration timeout with algorithm for detecting termination

2015-07-21 Thread Gyula Fora (JIRA)

Gyula Fora created FLINK-2390:
-

 Summary: Replace iteration timeout with algorithm for detecting 
termination
 Key: FLINK-2390
 URL: https://issues.apache.org/jira/browse/FLINK-2390
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Gyula Fora
 Fix For: 0.10


Currently the user can set a timeout which will shut down the iteration 
source/sink nodes if no new data is received during that time to allow program 
termination in iterative streaming jobs.

This method is used due to the non-trivial nature of termination in iterative 
streaming jobs. While termination is not a main concern in long running 
streaming jobs, this behaviour makes iterative tests non-deterministic and they 
often fail on travis due to the timeout. Also setting a timeout can cause jobs 
to terminate prematurely.

I propose to remove iteration timeouts and replace it with the following 
algorithm for detecting termination:

-We first identify loop edges in the jobgraph (the channels from the iteration 
sources to the head operators)
-Once the head operators (the ones with loop input) finish with all their 
non-loop inputs they broadcast a marker to their outputs.
-Each operator will broadcast a marker once it received a marker from all its 
non-finished inputs
-Iteration sources are terminated when they receive 2 consecutive markers 
without receiving any record in-between

The idea behind the algorithm is to find out when no more outputs are generated 
from the operators inside an iteration after their normal inputs are finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2363) Add an end-to-end overview of program execution in Flink to the docs


[ 
https://issues.apache.org/jira/browse/FLINK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635786#comment-14635786
 ] 

ASF GitHub Bot commented on FLINK-2363:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/913#issuecomment-123472145
  
Hi @aljoscha, I haven't started and could do it but you can also give it a 
shot if you'd like to :smile: 


 Add an end-to-end overview of program execution in Flink to the docs
 

 Key: FLINK-2363
 URL: https://issues.apache.org/jira/browse/FLINK-2363
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Stephan Ewen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1658] Rename AbstractEvent to AbstractT...

2015-07-21 Thread mjsax

GitHub user mjsax opened a pull request:

https://github.com/apache/flink/pull/929

[FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent

  - deleted AbstractEvent in org.apache.flink.runtime.event.job
(see comment section in 
https://issues.apache.org/jira/browse/FLINK-1658)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/flink flink-1658-AbstractEvent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #929


commit 3cc8e3427fbfb61690417d6d1ee62745e2018662
Author: mjsax mj...@informatik.hu-berlin.de
Date:   2015-07-21T19:54:08Z

[FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
  - deleted AbstractEvent in org.apache.flink.runtime.event.job
(see comment section in 
https://issues.apache.org/jira/browse/FLINK-1658)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2304) Add named attribute access to Storm compatibility layer


[ 
https://issues.apache.org/jira/browse/FLINK-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635883#comment-14635883
 ] 

ASF GitHub Bot commented on FLINK-2304:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/878#issuecomment-123489145
  
Just rebased because #879 was merged today. This PR should be ready to get 
merged now.


 Add named attribute access to Storm compatibility layer
 ---

 Key: FLINK-2304
 URL: https://issues.apache.org/jira/browse/FLINK-2304
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Currently, Bolts running in Flink can access fields only by index. Enabling 
 named index access is possible for whole topologies and Tuple-type as well as 
 POJO type inputs for embedded Bolts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1267) Add crossGroup operator

2015-07-21 Thread pietro pinoli (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pietro pinoli reassigned FLINK-1267:


Assignee: pietro pinoli

 Add crossGroup operator
 ---

 Key: FLINK-1267
 URL: https://issues.apache.org/jira/browse/FLINK-1267
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Local Runtime, Optimizer, Scala API
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: pietro pinoli
Priority: Minor

 A common operator is to pair-wise compare or combine all elements of a group 
 (there were two questions about this on the user mailing list, recently). 
 Right now, this can be done in two ways:
 1. {{groupReduce}}: consume and store the complete iterator in memory and 
 build all pairs
 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric 
 Cartesian product.
 Both approaches have drawbacks. The {{groupReduce}} variant requires that the 
 full group fits into memory and is more cumbersome to implement for the user, 
 but pairs can be arbitrarily built. The self-{{Join}} approach pushes most of 
 the work into the system, but the execution strategy does not treat the 
 self-join different from a regular join (both identical inputs are shuffled, 
 etc.) and always builds the full symmetric Cartesian product.
 I propose to add a dedicated {{crossGroup()}} operator, that offers this 
 functionality in a proper way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634755#comment-14634755
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079328
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -109,125 +123,157 @@ public void testProgram() throws Exception {
ExecutionEnvironment env = new PlanExtractor();
DataSetString input = env.fromCollection(inputData);
input
-   .flatMap(new Tokenizer())
.flatMap(new WaitingUDF())
.output(new WaitingOutputFormat());
env.execute();
 
-   /** Extract job graph **/
+   // Extract job graph and set job id for the task to 
notify of accumulator changes.
JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) 
env).plan);
+   jobID = jobGraph.getJobID();
+
+   // register for accumulator changes
+   jobManager.tell(new 
TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef());
+   expectMsgEquals(true);
--- End diff --

When you wait for messages then this should always happen within a `within` 
block. Otherwise this might never terminate in a failure case.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634756#comment-14634756
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079360
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -109,125 +123,157 @@ public void testProgram() throws Exception {
ExecutionEnvironment env = new PlanExtractor();
DataSetString input = env.fromCollection(inputData);
input
-   .flatMap(new Tokenizer())
.flatMap(new WaitingUDF())
.output(new WaitingOutputFormat());
env.execute();
 
-   /** Extract job graph **/
+   // Extract job graph and set job id for the task to 
notify of accumulator changes.
JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) 
env).plan);
+   jobID = jobGraph.getJobID();
+
+   // register for accumulator changes
+   jobManager.tell(new 
TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef());
+   expectMsgEquals(true);
 
+   // submit job
jobManager.tell(new 
JobManagerMessages.SubmitJob(jobGraph, false), getRef());
expectMsgClass(Status.Success.class);
--- End diff --

Same here. Best if you nest all actor interaction in a `within` block.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634763#comment-14634763
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079637
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -400,20 +400,16 @@ class JobManager(
   import scala.collection.JavaConverters._
   sender ! 
RegisteredTaskManagers(instanceManager.getAllRegisteredInstances.asScala)
 
-case Heartbeat(instanceID, metricsReport, accumulators) =
+case Heartbeat(instanceID, metricsReport, accumulators, 
asyncAccumulatorUpdate) =
   log.debug(sReceived hearbeat message from $instanceID.)
 
-  Future {
-accumulators foreach {
-  case accumulators =
-  currentJobs.get(accumulators.getJobID) match {
-case Some((jobGraph, jobInfo)) =
-  jobGraph.updateAccumulators(accumulators)
-case None =
-  // ignore accumulator values for old job
-  }
-}
-  }(context.dispatcher)
+  if (asyncAccumulatorUpdate) {
--- End diff --

Mixing test code in your production code is IMO bad style. If you need 
synchronous execution, then start the actor with a `CallingThreadDispatcher` or 
overwrite this message in the `TestingJobManager`.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634622#comment-14634622
 ] 

Chengxiang Li commented on FLINK-1901:
--

To randomly choose a sample from a DataSet S, basically, there exists two kinds 
of sample requirement: sampling with factor(such as randomly choose 5% percent 
items in S) and sampling with fixed size(such as randomly choose 100 items 
from S). Besides, we do not know the size of S, unless we take extra cost to 
computer it through DataSet::count().
# Sampling with factor
#* With replacement: the expected sample size follow [Poisson 
Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, 
so Poisson Sampling can be used to choose the sample items.
#* Without replacement: during sampling, we can take the sample of each item in 
iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial].
# Sampling with fixed size
#* Use DataSet::count() to get the dataset size, with the fixed size, we can 
turn this into sampling with factor.
#* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is 
another commonly used algorithm to randomly choose a sample of k items from a 
list S containing n items, where n is either a very large or unknown number, 
and there are different reservoir sampling algorithms that support reservoir 
support both sampling with replacement and sampling without replacement.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634766#comment-14634766
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/925#issuecomment-123212399
  
Is it bad that the accumulators are not serialized in a local setup? If not 
and you only want to test the serialization in a distributed setting, then you 
can also simply start the testing cluster with multiple `ActorSystems`. This 
should enforce serialization, if I'm not mistaken.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/925#issuecomment-123212399
  
Is it bad that the accumulators are not serialized in a local setup? If not 
and you only want to test the serialization in a distributed setting, then you 
can also simply start the testing cluster with multiple `ActorSystems`. This 
should enforce serialization, if I'm not mistaken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/925#issuecomment-123213630
  
In the master, I only serialize the user-defined accumulators but not the 
Flink accumulators. As of this pull request, the accumulators are always 
serialized even in local setups. Starting multiple actor systems did not solve 
the problem because Akka always optimizes message passing in local one VM 
setups, i.e. it does not serialize objects even across actor systems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079360
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -109,125 +123,157 @@ public void testProgram() throws Exception {
ExecutionEnvironment env = new PlanExtractor();
DataSetString input = env.fromCollection(inputData);
input
-   .flatMap(new Tokenizer())
.flatMap(new WaitingUDF())
.output(new WaitingOutputFormat());
env.execute();
 
-   /** Extract job graph **/
+   // Extract job graph and set job id for the task to 
notify of accumulator changes.
JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) 
env).plan);
+   jobID = jobGraph.getJobID();
+
+   // register for accumulator changes
+   jobManager.tell(new 
TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef());
+   expectMsgEquals(true);
 
+   // submit job
jobManager.tell(new 
JobManagerMessages.SubmitJob(jobGraph, false), getRef());
expectMsgClass(Status.Success.class);
--- End diff --

Same here. Best if you nest all actor interaction in a `within` block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079328
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -109,125 +123,157 @@ public void testProgram() throws Exception {
ExecutionEnvironment env = new PlanExtractor();
DataSetString input = env.fromCollection(inputData);
input
-   .flatMap(new Tokenizer())
.flatMap(new WaitingUDF())
.output(new WaitingOutputFormat());
env.execute();
 
-   /** Extract job graph **/
+   // Extract job graph and set job id for the task to 
notify of accumulator changes.
JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) 
env).plan);
+   jobID = jobGraph.getJobID();
+
+   // register for accumulator changes
+   jobManager.tell(new 
TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef());
+   expectMsgEquals(true);
--- End diff --

When you wait for messages then this should always happen within a `within` 
block. Otherwise this might never terminate in a failure case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35079637
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -400,20 +400,16 @@ class JobManager(
   import scala.collection.JavaConverters._
   sender ! 
RegisteredTaskManagers(instanceManager.getAllRegisteredInstances.asScala)
 
-case Heartbeat(instanceID, metricsReport, accumulators) =
+case Heartbeat(instanceID, metricsReport, accumulators, 
asyncAccumulatorUpdate) =
   log.debug(sReceived hearbeat message from $instanceID.)
 
-  Future {
-accumulators foreach {
-  case accumulators =
-  currentJobs.get(accumulators.getJobID) match {
-case Some((jobGraph, jobInfo)) =
-  jobGraph.updateAccumulators(accumulators)
-case None =
-  // ignore accumulator values for old job
-  }
-}
-  }(context.dispatcher)
+  if (asyncAccumulatorUpdate) {
--- End diff --

Mixing test code in your production code is IMO bad style. If you need 
synchronous execution, then start the actor with a `CallingThreadDispatcher` or 
overwrite this message in the `TestingJobManager`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634771#comment-14634771
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/925#issuecomment-123213630
  
In the master, I only serialize the user-defined accumulators but not the 
Flink accumulators. As of this pull request, the accumulators are always 
serialized even in local setups. Starting multiple actor systems did not solve 
the problem because Akka always optimizes message passing in local one VM 
setups, i.e. it does not serialize objects even across actor systems.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35086920
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala
 ---
@@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, 
Await}
 abstract class FlinkMiniCluster(
 val userConfiguration: Configuration,
 val singleActorSystem: Boolean,
+val synchronousDispatcher: Boolean,
 val streamingMode: StreamingMode) {
 
-  def this(userConfiguration: Configuration, singleActorSystem: Boolean) 
- = this(userConfiguration, singleActorSystem, 
StreamingMode.BATCH_ONLY)
+  def this(userConfiguration: Configuration,
+   singleActorSystem: Boolean,
+   synchronousDispatcher: Boolean)
+   = this(userConfiguration, singleActorSystem, synchronousDispatcher, 
StreamingMode.BATCH_ONLY)
--- End diff --

I think the field `synchronousDispatcher` should not be part of the 
`FlinkMiniCluster`. The field seems to be only used for testing, referenced in 
`TestingCluster`, but the `FlinkMiniCluster` is also used to execute Flink 
programs locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634880#comment-14634880
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35086920
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala
 ---
@@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, 
Await}
 abstract class FlinkMiniCluster(
 val userConfiguration: Configuration,
 val singleActorSystem: Boolean,
+val synchronousDispatcher: Boolean,
 val streamingMode: StreamingMode) {
 
-  def this(userConfiguration: Configuration, singleActorSystem: Boolean) 
- = this(userConfiguration, singleActorSystem, 
StreamingMode.BATCH_ONLY)
+  def this(userConfiguration: Configuration,
+   singleActorSystem: Boolean,
+   synchronousDispatcher: Boolean)
+   = this(userConfiguration, singleActorSystem, synchronousDispatcher, 
StreamingMode.BATCH_ONLY)
--- End diff --

I think the field `synchronousDispatcher` should not be part of the 
`FlinkMiniCluster`. The field seems to be only used for testing, referenced in 
`TestingCluster`, but the `FlinkMiniCluster` is also used to execute Flink 
programs locally.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails


[ 
https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634902#comment-14634902
 ] 

ASF GitHub Bot commented on FLINK-2371:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35087953
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala
 ---
@@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, 
Await}
 abstract class FlinkMiniCluster(
 val userConfiguration: Configuration,
 val singleActorSystem: Boolean,
+val synchronousDispatcher: Boolean,
 val streamingMode: StreamingMode) {
 
-  def this(userConfiguration: Configuration, singleActorSystem: Boolean) 
- = this(userConfiguration, singleActorSystem, 
StreamingMode.BATCH_ONLY)
+  def this(userConfiguration: Configuration,
+   singleActorSystem: Boolean,
+   synchronousDispatcher: Boolean)
+   = this(userConfiguration, singleActorSystem, synchronousDispatcher, 
StreamingMode.BATCH_ONLY)
--- End diff --

Good catch. I noticed that as well and already changed that before I saw 
your answer. Now only the `TestingCluster` is affected by the changes.


 AccumulatorLiveITCase fails
 ---

 Key: FLINK-2371
 URL: https://issues.apache.org/jira/browse/FLINK-2371
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Assignee: Maximilian Michels

 AccumulatorLiveITCase fails regularly (however, not in each run). The tests 
 relies on timing (via sleep) which does not work well on Travis.
 See dev-list for more details: 
 https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/925#discussion_r35087953
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala
 ---
@@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, 
Await}
 abstract class FlinkMiniCluster(
 val userConfiguration: Configuration,
 val singleActorSystem: Boolean,
+val synchronousDispatcher: Boolean,
 val streamingMode: StreamingMode) {
 
-  def this(userConfiguration: Configuration, singleActorSystem: Boolean) 
- = this(userConfiguration, singleActorSystem, 
StreamingMode.BATCH_ONLY)
+  def this(userConfiguration: Configuration,
+   singleActorSystem: Boolean,
+   synchronousDispatcher: Boolean)
+   = this(userConfiguration, singleActorSystem, synchronousDispatcher, 
StreamingMode.BATCH_ONLY)
--- End diff --

Good catch. I noticed that as well and already changed that before I saw 
your answer. Now only the `TestingCluster` is affected by the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/879


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/906#issuecomment-123260957
  
Manually merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2295) TwoInput Task do not react to/forward checkpoint barriers


 [ 
https://issues.apache.org/jira/browse/FLINK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-2295.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b

 TwoInput Task do not react to/forward checkpoint barriers
 -

 Key: FLINK-2295
 URL: https://issues.apache.org/jira/browse/FLINK-2295
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9, 0.10, 0.9.1
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.10


 The event listener for the checkpoint barriers was never enabled for TwoInput 
 tasks. I have a fix for it and also tests that verify that it actually works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634934#comment-14634934
 ] 

ASF GitHub Bot commented on FLINK-1967:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/906#issuecomment-123260957
  
Manually merged


 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634937#comment-14634937
 ] 

ASF GitHub Bot commented on FLINK-1967:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/879


 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2301) In BarrierBuffer newer Barriers trigger old Checkpoints


 [ 
https://issues.apache.org/jira/browse/FLINK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-2301.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b

 In BarrierBuffer newer Barriers trigger old Checkpoints
 ---

 Key: FLINK-2301
 URL: https://issues.apache.org/jira/browse/FLINK-2301
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.10


 When the BarrierBuffer has some inputs blocked on barrier 0, then receives 
 barriers for barrier 1 on the other inputs this makes the BarrierBuffer 
 process the checkpoint with id 0.
 I think the BarrierBuffer should drop all previous BarrierCheckpoints when it 
 receives a barrier from a more recent checkpoint and unblock the previously 
 blocked channels. This will make it ready to correctly react to the other 
 barriers of the newer checkpoint. It should also ignore barriers that arrive 
 late when we already processed a more recent checkpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/906


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-2290) CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive


 [ 
https://issues.apache.org/jira/browse/FLINK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2290.
---
   Resolution: Fixed
Fix Version/s: 0.10

Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b

 CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive
 

 Key: FLINK-2290
 URL: https://issues.apache.org/jira/browse/FLINK-2290
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.10


 When no elements arrive the reader will always try to read from the same 
 input index. This means that it will only process elements form this input. 
 This could be problematic with Watermarks/Checkpoint barriers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/879#issuecomment-123261017
  
Superseded by #906 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634935#comment-14634935
 ] 

ASF GitHub Bot commented on FLINK-1967:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/906


 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-1967) Introduce (Event)time in Streaming


 [ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-1967.
---
Resolution: Fixed

Implemented in a2eb6cc8774ab43475829b0b691e62739fbbe88b

 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2295) TwoInput Task do not react to/forward checkpoint barriers


 [ 
https://issues.apache.org/jira/browse/FLINK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2295.
---

 TwoInput Task do not react to/forward checkpoint barriers
 -

 Key: FLINK-2295
 URL: https://issues.apache.org/jira/browse/FLINK-2295
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9, 0.10, 0.9.1
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.10


 The event listener for the checkpoint barriers was never enabled for TwoInput 
 tasks. I have a fix for it and also tests that verify that it actually works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2301) In BarrierBuffer newer Barriers trigger old Checkpoints


 [ 
https://issues.apache.org/jira/browse/FLINK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2301.
---

 In BarrierBuffer newer Barriers trigger old Checkpoints
 ---

 Key: FLINK-2301
 URL: https://issues.apache.org/jira/browse/FLINK-2301
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.10


 When the BarrierBuffer has some inputs blocked on barrier 0, then receives 
 barriers for barrier 1 on the other inputs this makes the BarrierBuffer 
 process the checkpoint with id 0.
 I think the BarrierBuffer should drop all previous BarrierCheckpoints when it 
 receives a barrier from a more recent checkpoint and unblock the previously 
 blocked channels. This will make it ready to correctly react to the other 
 barriers of the newer checkpoint. It should also ignore barriers that arrive 
 late when we already processed a more recent checkpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634936#comment-14634936
 ] 

ASF GitHub Bot commented on FLINK-1967:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/879#issuecomment-123261017
  
Superseded by #906 


 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-07-21 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634664#comment-14634664
 ] 

Till Rohrmann commented on FLINK-1901:
--

Hi Chengxiang,

good to hear that you want to work in this. I can assign you the ticket. 
However, it is not only about the sampling strategy but also about the 
integration within Flink. The reason is that we have to make sure that the 
sampling operator also works within iterations. This means that it has to be 
part of the dynamic path so that it is triggered for every iteration again and 
again. This will need a special operator type.

But you can start with the sampling strategies and then continue with the 
iteration integration.

 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2383) Add new Window Operator that can use Timestamps/Watermarks

Aljoscha Krettek created FLINK-2383:
---

 Summary: Add new Window Operator that can use Timestamps/Watermarks
 Key: FLINK-2383
 URL: https://issues.apache.org/jira/browse/FLINK-2383
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


This should mostly be based on the design documents in the Wiki:
https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams
https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent

2015-07-21 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned FLINK-1658:
--

Assignee: Matthias J. Sax

 Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
 --

 Key: FLINK-1658
 URL: https://issues.apache.org/jira/browse/FLINK-1658
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime, Local Runtime
Reporter: Gyula Fora
Assignee: Matthias J. Sax
Priority: Trivial

 The same name is used for different event classes in the runtime which can 
 cause confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase