date:20150205


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307154#comment-14307154
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73040193
  
Does this occur during local execution, or collection execution? The 
dependencies are not covered by the runtime dependencies?


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307162#comment-14307162
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73040767
  
I think if executing it in an IDE the dependencies are not there. Since 
flink-java does not depend on flink-runtime, which has the hadoop dependencies.


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73040767
  
I think if executing it in an IDE the dependencies are not there. Since 
flink-java does not depend on flink-runtime, which has the hadoop dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-883) Use MiniYARNCluster to test the Flink YARN client


[ 
https://issues.apache.org/jira/browse/FLINK-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307210#comment-14307210
 ] 

ASF GitHub Bot commented on FLINK-883:
--

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/134#issuecomment-73046443
  
I guess this one is subsumed by @rmetzger 's work on 
https://issues.apache.org/jira/browse/FLINK-883

Can we close this PR?


 Use MiniYARNCluster to test the Flink YARN client
 -

 Key: FLINK-883
 URL: https://issues.apache.org/jira/browse/FLINK-883
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Affects Versions: 0.7.0-incubating
Reporter: Robert Metzger
Assignee: Sebastian Kunert
  Labels: github-import
 Fix For: 0.9


 I would like to have a test that verifies that our YARN client is properly 
 working, using the `MiniYARNCluster` class of YARN.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/883
 Created by: [rmetzger|https://github.com/rmetzger]
 Labels: enhancement, testing, YARN, 
 Created at: Thu May 29 09:13:06 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1376] [runtime] Add proper shared slot ...

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/317#issuecomment-73046225
  
Of course. Thanks for merging.

On Thu, Feb 5, 2015 at 2:30 PM, Stephan Ewen notificati...@github.com
wrote:

 This one has been manually merged (but I made a typo in the This closes
 #xxx message ;-) )

 @tillrohrmann https://github.com/tillrohrmann Can you close the PR
 manually?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/317#issuecomment-73045884.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it

2015-02-05 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-1463:

Fix Version/s: 0.8.1

 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.8.1


 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable


[ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307236#comment-14307236
 ] 

ASF GitHub Bot commented on FLINK-1471:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/359


 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1442) Archived Execution Graph consumes too much memory

[
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stephan Ewen resolved FLINK-1442.
-
Resolution: Fixed
Fix Version/s: 0.9

Fixed via 9d181a86a0870204113271b6e45f611cba04fc7d and
8ae0dc2d768aecfa3129df553f43d827792b65d7

Archived Execution Graph consumes too much memory
-

Key: FLINK-1442
URL: https://issues.apache.org/jira/browse/FLINK-1442
Project: Flink
Issue Type: Bug
Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Max Michels
Fix For: 0.9

The JobManager archives the execution graphs, for analysis of jobs. The
graphs may consume a lot of memory.
Especially the execution edges in all2all connection patterns are extremely
many and add up in memory consumption.
The execution edges connect all parallel tasks. So for a all2all pattern
between n and m tasks, there are n*m edges. For parallelism of multiple 100
tasks, this can easily reach 100k objects and more, each with a set of
metadata.
I propose the following to solve that:
1. Clear all execution edges from the graph (majority of the memory
consumers) when it is given to the archiver.
2. Have the map/list of the archived graphs behind a soft reference, to it
will be removed under memory pressure before the JVM crashes. That may remove
graphs from the history early, but is much preferable to the JVM crashing, in
which case the graph is lost as well...
3. Long term: The graph should be archived somewhere else. Somthing like the
History server used by Hadoop and Hive would be a good idea.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [docs] add documentation for FLink GCE setup u...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/361


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1479) The spawned threads in the sorter have no context class loader


[ 
https://issues.apache.org/jira/browse/FLINK-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307189#comment-14307189
 ] 

Stephan Ewen commented on FLINK-1479:
-

Patch coming up...

 The spawned threads in the sorter have no context class loader
 --

 Key: FLINK-1479
 URL: https://issues.apache.org/jira/browse/FLINK-1479
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


 The context class loader of task threads is the user code class loader that 
 has access to the libraries of the user program.
 The sorter spawns extra threads (reading, sorting, spilling) without setting 
 the context class loader on those threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1479) The spawned threads in the sorter have no context class loader

Stephan Ewen created FLINK-1479:
---

 Summary: The spawned threads in the sorter have no context class 
loader
 Key: FLINK-1479
 URL: https://issues.apache.org/jira/browse/FLINK-1479
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


The context class loader of task threads is the user code class loader that has 
access to the libraries of the user program.

The sorter spawns extra threads (reading, sorting, spilling) without setting 
the context class loader on those threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1480) Test Flink against Hadoop 2.6.0

2015-02-05 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-1480:
-

 Summary: Test Flink against Hadoop 2.6.0
 Key: FLINK-1480
 URL: https://issues.apache.org/jira/browse/FLINK-1480
 Project: Flink
  Issue Type: Task
  Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger


One of our travis builds is building always against the latest hadoop release.
Right now, we test against 2.5.1. I would like to bump the version to 2.6.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable

2015-02-05 Thread Timo Walther (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307239#comment-14307239
 ] 

Timo Walther commented on FLINK-1471:
-

Fixed in d033fa8fa834d288ec977ef7bda043dfdc397e59.

 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable

2015-02-05 Thread Timo Walther (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-1471.
-
Resolution: Fixed

 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307147#comment-14307147
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73039917
  
@StephanEwen  If I add the exclusions then users that just add flink-java 
as a dependency will get weird errors when using Hadoop InputFormats. 
 


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1471][java-api] Fixes wrong input valid...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/359#issuecomment-73042551
  
Skipping the validation on raw types makes total sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable


[ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307180#comment-14307180
 ] 

ASF GitHub Bot commented on FLINK-1471:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/359#issuecomment-73042551
  
Skipping the validation on raw types makes total sense.


 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it

2015-02-05 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-1463:

Priority: Blocker  (was: Major)

 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker

 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [docs] add documentation for FLink GCE setup u...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/361#issuecomment-73041953
  
Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it


[ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307183#comment-14307183
 ] 

ASF GitHub Bot commented on FLINK-1463:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73042896
  
Looks good. For efficiency, we could change the factory such that it 
returns the original in the first request, and a duplicate after that.

Good change otherwise.

+1 to merge



 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker

 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-02-05 Thread Timo Walther (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307203#comment-14307203
 ] 

Timo Walther commented on FLINK-1319:
-

Actually, I don't like the drop-in approach. I think it would be much better 
if the code analysis can be included in the release. Especially once the code 
is stable enough, it would be great to enable it by default and speed up jobs 
automatically.

I did some research about other frameworks we could use instead. Soot is the 
best framework, however, I think we can also build the code analysis on top of 
the ObjectWeb ASM library[1]. It provides some functionality for data flow 
analysis[2]. The examples for BasicInterpreter and BasicVerifier look 
promising. Other projects use it for determine types[3].

Using ASM requires us to implement more but it gives us full flexibility for 
further analysis use cases.

I would try implement a simple proof-of-concept prototype. What do you think?

[1] http://asm.ow2.org/
[2] http://download.forge.objectweb.org/asm/asm4-guide.pdf, 115ff
[3] 
https://github.com/hraberg/enumerable/blob/master/src/main/java/org/enumerable/lambda/support/expression/ExpressionInterpreter.java

 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state


[ 
https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307202#comment-14307202
 ] 

ASF GitHub Bot commented on FLINK-1376:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/317#issuecomment-73045884
  
This one has been manually merged (but I made a typo in the This closes 
#xxx message ;-) )

@tillrohrmann Can you close the PR manually?


 SubSlots are not properly released in case that a TaskManager fatally fails, 
 leaving the system in a corrupted state
 

 Key: FLINK-1376
 URL: https://issues.apache.org/jira/browse/FLINK-1376
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 In case that the TaskManager fatally fails and some of the failing node's 
 slots are SharedSlots, then the slots are not properly released by the 
 JobManager. This causes that the corresponding job will not be properly 
 failed, leaving the system in a corrupted state.
 The reason for that is that the AllocatedSlot is not aware of being treated 
 as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state


[ 
https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307206#comment-14307206
 ] 

ASF GitHub Bot commented on FLINK-1376:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/317#issuecomment-73046225
  
Of course. Thanks for merging.

On Thu, Feb 5, 2015 at 2:30 PM, Stephan Ewen notificati...@github.com
wrote:

 This one has been manually merged (but I made a typo in the This closes
 #xxx message ;-) )

 @tillrohrmann https://github.com/tillrohrmann Can you close the PR
 manually?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/317#issuecomment-73045884.




 SubSlots are not properly released in case that a TaskManager fatally fails, 
 leaving the system in a corrupted state
 

 Key: FLINK-1376
 URL: https://issues.apache.org/jira/browse/FLINK-1376
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 In case that the TaskManager fatally fails and some of the failing node's 
 slots are SharedSlots, then the slots are not properly released by the 
 JobManager. This causes that the corresponding job will not be properly 
 failed, leaving the system in a corrupted state.
 The reason for that is that the AllocatedSlot is not aware of being treated 
 as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: yarn client tests

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/134#issuecomment-73046443
  
I guess this one is subsumed by @rmetzger 's work on 
https://issues.apache.org/jira/browse/FLINK-883

Can we close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state


[ 
https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307208#comment-14307208
 ] 

ASF GitHub Bot commented on FLINK-1376:
---

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/317


 SubSlots are not properly released in case that a TaskManager fatally fails, 
 leaving the system in a corrupted state
 

 Key: FLINK-1376
 URL: https://issues.apache.org/jira/browse/FLINK-1376
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 In case that the TaskManager fatally fails and some of the failing node's 
 slots are SharedSlots, then the slots are not properly released by the 
 JobManager. This causes that the corresponding job will not be properly 
 failed, leaving the system in a corrupted state.
 The reason for that is that the AllocatedSlot is not aware of being treated 
 as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: yarn client tests

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/134#issuecomment-73047820
  
Yes, the changes here have been subsumed by FLINK-883.

@skunert can you close this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Port FLINK-1391 and FLINK-1392 to release-0.8...

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/364

Port FLINK-1391 and FLINK-1392 to release-0.8 branch.

These commits port the fixes for the two issues (Avro and Protobuf support) 
to the release-0.8 branch.
They also contain a hotfix regarding the closure cleaner by @aljoscha.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink kryo081

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #364


commit 38ebc09ff5782005c5aa1f60b458cae250b8c26e
Author: Robert Metzger metzg...@web.de
Date:   2015-01-12T20:11:09Z

[FLINK-1391] Add support for using Avro-POJOs and Avro types with Kryo

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

commit cbe633537567cc39a9877125e79cd7da49ee7f3b
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-13T09:21:29Z

[FLINK-1392] Add Kryo serializer for Protobuf

Conflicts:
flink-java/pom.xml

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:
flink-shaded/pom.xml
pom.xml

commit 63472baff1fca18b83666831effe2204606cf355
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-01-15T10:46:53Z

[hotfix] Also use java closure cleaner on grouped operations

commit 9043582a4a1f4fd25e960217a99f3f32d4ba18a9
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-05T13:07:48Z

[backports] Cleanup and port changes to 0.8 branch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1415] Akka cleanups

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1391) Kryo fails to properly serialize avro collection types


[ 
https://issues.apache.org/jira/browse/FLINK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307230#comment-14307230
 ] 

ASF GitHub Bot commented on FLINK-1391:
---

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/364

Port FLINK-1391 and FLINK-1392 to release-0.8 branch.

These commits port the fixes for the two issues (Avro and Protobuf support) 
to the release-0.8 branch.
They also contain a hotfix regarding the closure cleaner by @aljoscha.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink kryo081

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #364


commit 38ebc09ff5782005c5aa1f60b458cae250b8c26e
Author: Robert Metzger metzg...@web.de
Date:   2015-01-12T20:11:09Z

[FLINK-1391] Add support for using Avro-POJOs and Avro types with Kryo

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

commit cbe633537567cc39a9877125e79cd7da49ee7f3b
Author: Robert Metzger rmetz...@apache.org
Date:   2015-01-13T09:21:29Z

[FLINK-1392] Add Kryo serializer for Protobuf

Conflicts:
flink-java/pom.xml

flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java

Conflicts:
flink-shaded/pom.xml
pom.xml

commit 63472baff1fca18b83666831effe2204606cf355
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-01-15T10:46:53Z

[hotfix] Also use java closure cleaner on grouped operations

commit 9043582a4a1f4fd25e960217a99f3f32d4ba18a9
Author: Robert Metzger rmetz...@apache.org
Date:   2015-02-05T13:07:48Z

[backports] Cleanup and port changes to 0.8 branch.




 Kryo fails to properly serialize avro collection types
 --

 Key: FLINK-1391
 URL: https://issues.apache.org/jira/browse/FLINK-1391
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 0.8, 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 Before FLINK-610, Avro was the default generic serializer.
 Now, special types coming from Avro are handled by Kryo .. which seems to 
 cause errors like:
 {code}
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: 
 java.lang.NullPointerException
   at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
   at 
 org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143)
   at 
 org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148)
   at 
 org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244)
   at 
 org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56)
   at 
 org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71)
   at 
 org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189)
   at 
 org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176)
   at 
 org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51)
   at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53)
   at 
 org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1299) Remove default values for JobManager and TaskManager heap sizes

2015-02-05 Thread Ufuk Celebi (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1299.

Resolution: Won't Fix

+1

 Remove default values for JobManager and TaskManager heap sizes
 ---

 Key: FLINK-1299
 URL: https://issues.apache.org/jira/browse/FLINK-1299
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Ufuk Celebi
Priority: Minor

 Currently, the default config contains fixed values for both the job manager 
 and the task manager heap sizes. I propose to comment both lines out and let 
 the JVM set -Xms and -Xms automatically.
 This should give a better out of the box experience when trying out Flink 
 locally or on a cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73040193
  
Does this occur during local execution, or collection execution? The 
dependencies are not covered by the runtime dependencies?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73042896
  
Looks good. For efficiency, we could change the factory such that it 
returns the original in the first request, and a duplicate after that.

Good change otherwise.

+1 to merge



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1481) Flakey JobManagerITCase


[ 
https://issues.apache.org/jira/browse/FLINK-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307321#comment-14307321
 ] 

ASF GitHub Bot commented on FLINK-1481:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/365

[FLINK-1481] Fixes flakey JobManagerITCase

The sometimes failing sender tasks are now guaranteed to have at least one 
task which fails.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixFlakeyJobManagerITCase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #365


commit 0964a6ebaa64c9ae644831564ff898f23e851bb6
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-02-05T14:48:23Z

[FLINK-1481] Fixes flakey JobManagerITCase which relied on 
non-deterministic behaviour.




 Flakey JobManagerITCase
 ---

 Key: FLINK-1481
 URL: https://issues.apache.org/jira/browse/FLINK-1481
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 We currently have some test cases which rely on non-deterministic behaviour. 
 For example the {{JobManagerItCase}} contains a test case with a 
 {{SometimesExceptionSender}} which randomly throws an exception and otherwise 
 blocks. The probability of a failure is 0.05 and we have 100 parallel tasks. 
 Thus, the overall probability that no task fails at all is still 0.005. Thus 
 every 200th test run, the test case will block and thus fail.
 In order to get a stable test case base, I think that we should try to avoid 
 these kind of test cases with random behaviour. The same goes with sleep 
 timeouts in order to establish an intended interleaving of concurrent 
 processes. With Travis this can fail too often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1443) Add replicated data source

2015-02-05 Thread Fabian Hueske (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1443.
--
   Resolution: Implemented
Fix Version/s: 0.9

Implemented in a19b4a02bfa5237e0dcd2b264da36229546f23c0

 Add replicated data source
 --

 Key: FLINK-1443
 URL: https://issues.apache.org/jira/browse/FLINK-1443
 Project: Flink
  Issue Type: New Feature
  Components: Java API, JobManager, Optimizer
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor
 Fix For: 0.9


 This issue proposes to add support for data sources that read the same data 
 in all parallel instances. This feature can be useful, if the data is 
 replicated to all machines in a cluster and can be locally read. 
 For example, a replicated input format can be used for a broadcast join 
 without sending any data over the network.
 The following changes are necessary to achieve this:
 1) Add a replicating InputSplitAssigner which assigns all splits to the all 
 parallel instances. This requires also to extend the InputSplitAssigner 
 interface to identify the exact parallel instance that requests an InputSplit 
 (currently only the hostname is provided).
 2) Make sure that the DOP of the replicated data source is identical to the 
 DOP of its successor.
 3) Let the optimizer know that the data is replicated and ensure that plan 
 enumeration works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats

2015-02-05 Thread Fabian Hueske (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1318.
--
   Resolution: Fixed
Fix Version/s: 0.9

Fixed in 2665cf4e2a9e33e0e94ac7e0b7518a10445febbb

 Make quoted String parsing optional and configurable for CSVInputFormats
 

 Key: FLINK-1318
 URL: https://issues.apache.org/jira/browse/FLINK-1318
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Scala API
Affects Versions: 0.8
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor
 Fix For: 0.9


 With the current implementation of the CSVInputFormat, quoted string parsing 
 kicks in, if the first non-whitespace character of a field is a double quote. 
 I see two issues with this implementation:
 1. Quoted String parsing cannot be disabled
 2. The quoting character is fixed to double quotes ()
 I propose to add parameters to disable quoted String parsing and set the 
 quote character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1485) Typo in Documentation - Join with Join-Function

2015-02-05 Thread Johannes (JIRA)

Johannes created FLINK-1485:
---

 Summary: Typo in Documentation - Join with Join-Function
 Key: FLINK-1485
 URL: https://issues.apache.org/jira/browse/FLINK-1485
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Johannes
Assignee: Johannes
Priority: Trivial


Small typo in documentation

In the java example for Join with Join-Function



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Typo in - Join with Join-Function

2015-02-05 Thread jkirsch

GitHub user jkirsch opened a pull request:

https://github.com/apache/flink/pull/369

Typo in - Join with Join-Function

Fix typo in the Java Code Example

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkirsch/incubator-flink typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/369.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #369


commit 7c15e144398fa615c6c67a6a21155830a9108f2e
Author: jkirsch jkirschn...@gmail.com
Date:   2015-02-05T20:42:17Z

Typo in - Join with Join-Function

Fix typo in the Java Code Example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable


[ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306869#comment-14306869
 ] 

ASF GitHub Bot commented on FLINK-1471:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/359#issuecomment-73013430
  
I do not quite get what this means now for the input validation.

Assume two classes, `A` and `B`, where `B` is a subclass of `A`.
```java
DataSetB data = ...;
data.map(new MapFunctionA, Long() {
   ...
});
```

I though that the validation previously checked that the MapFunction's 
input parameter is assignable from the data set type. So this is skipped now? 
It is not a big deal, I am just wondering why...


 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73014361
  
Looks good.

I have one suggestion concerning the Hadoop dependencies: The `flink-java` 
project depends on the Hadoop API, for the `Writable` interface, the 
`NullValue` and the InputFormat classes. 

We should be able to exclude all transitive dependencies from the Hadoop 
dependency in `flink-java`, making the project more lightweight, so that 
someone that only writes against the flink API does not have the long tail of 
transitive hadoop dependencies.

Whenever we really execute Hadoop code, we have the flink runtime involved, 
which then has the necessary dependencies.

To exclude all transitive dependencies, use
```
exclusions
  exclusion
groupId*/groupId
artifactId*/artifactId
  /exclusion
/exclusions
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1476) Flink VS Spark on loop test

2015-02-05 Thread Fabian Hueske (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306879#comment-14306879
]

Fabian Hueske commented on FLINK-1476:
--

Hi Xuhong,

your memory configuration does not look optimal.
Your machines have 32GB each and you assign 2GB to each task manager. There is
only one TM running on each node, so the total memory given to Flink is 6GB.
Each TM allocates approx. 70% of its memory for in-memory processing and
divides it among all slots (24 for each TM).
Therefore, the memory for each slot is about (2GB * 0.7) / 24 = 60MB which is
not a lot.

If you run the program with a DOP of 16, Flink will use 60MB * 16 = 960MB for
in-memory processing and put all other data to disk.

Flink VS Spark on loop test
---

Key: FLINK-1476
URL: https://issues.apache.org/jira/browse/FLINK-1476
Project: Flink
Issue Type: Test
Affects Versions: 0.7.0-incubating, 0.8
Environment: 3 machines, every machines has 24 CPU cores and allocate
16 CPU cores for the tests. The memory situation is: 3 * 32G
Reporter: xuhong
Priority: Critical

In the last days, i did some test on flink and spark. The test results
shows that flink can do better on many operations, such as GroupBy, Join and
some complex jobs. But when I do the KMeans, LinearRegression and other loop
tests, i found that flink is no more excellent than spark. I want to konw,
whether flink is more comfortable to do the loop jobs with spark.
I add code: env.setDegreeOfParallelism(16) in each test to allocate same
CPU cores as in Spark tests.
My english is not good, i wish you guys can understand me!
the following is some config of my Flnk:
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 2048
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 24
parallelization.degree.default: 72
jobmanager.web.port: 8081
webclient.port: 8085
fs.overwrite-files: true
taskmanager.memory.fraction: 0.8
taskmanager.network.numberofBuffers: 7

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/360#issuecomment-73014844
  
Very nice, +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1471][java-api] Fixes wrong input valid...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/359#issuecomment-73013430
  
I do not quite get what this means now for the input validation.

Assume two classes, `A` and `B`, where `B` is a subclass of `A`.
```java
DataSetB data = ...;
data.map(new MapFunctionA, Long() {
   ...
});
```

I though that the validation previously checked that the MapFunction's 
input parameter is assignable from the data set type. So this is skipped now? 
It is not a big deal, I am just wondering why...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-592] Add support for Kerberos secured Y...

2015-02-05 Thread mxm

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/358#issuecomment-73013486
  
@warneke Thank you for your help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306878#comment-14306878
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73014361
  
Looks good.

I have one suggestion concerning the Hadoop dependencies: The `flink-java` 
project depends on the Hadoop API, for the `Writable` interface, the 
`NullValue` and the InputFormat classes. 

We should be able to exclude all transitive dependencies from the Hadoop 
dependency in `flink-java`, making the project more lightweight, so that 
someone that only writes against the flink API does not have the long tail of 
transitive hadoop dependencies.

Whenever we really execute Hadoop code, we have the flink runtime involved, 
which then has the necessary dependencies.

To exclude all transitive dependencies, use
```
exclusions
  exclusion
groupId*/groupId
artifactId*/artifactId
  /exclusion
/exclusions
```



 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1422) Missing usage example for withParameters


[ 
https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307292#comment-14307292
 ] 

ASF GitHub Bot commented on FLINK-1422:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/350#issuecomment-73056344
  
updated.


 Missing usage example for withParameters
 --

 Key: FLINK-1422
 URL: https://issues.apache.org/jira/browse/FLINK-1422
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.8
Reporter: Alexander Alexandrov
Assignee: Chesnay Schepler
Priority: Trivial
 Fix For: 0.8.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am struggling to find a usage example of the withParameters method in the 
 documentation. At the moment I only see this note:
 {quote}
 Note: As the content of broadcast variables is kept in-memory on each node, 
 it should not become too large. For simpler things like scalar values you can 
 simply make parameters part of the closure of a function, or use the 
 withParameters(...) method to pass in a configuration.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/350#discussion_r24167714
  
--- Diff: docs/programming_guide.md ---
@@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` 
method to pass in a configuratio
 [Back to top](#top)
 
 
+Passing parameters to functions
+---
+
+Parameters can be passed to rich functions using either the constructor 
(if the function is
--- End diff --

Only the Configuration object method requires a RichFunction to overwrite 
`open()`. Configuring via the constructor (or any other method which sets 
non-transient fields) works also for functions that implement the function 
interfaces. Also it is not required that the class is defined as `static 
final`. That is only necessary if the class is defined inside another class. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1422) Missing usage example for withParameters


[ 
https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307361#comment-14307361
 ] 

ASF GitHub Bot commented on FLINK-1422:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/350#discussion_r24168441
  
--- Diff: docs/programming_guide.md ---
@@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` 
method to pass in a configuratio
 [Back to top](#top)
 
 
+Passing parameters to functions
+---
+
+Parameters can be passed to rich functions using either the constructor 
(if the function is
--- End diff --

If the function is not serialisable the program will fail in any case. That 
is not related to the configuration. 
IMO, the documentation should state that the function object is serialised 
using standard java serialization and shipped as-is to all parallel task 
instances. Mentioning that transient fields are not shipped is OK but not 
mandatory (somebody who uses the transient keyword should know what it does).


 Missing usage example for withParameters
 --

 Key: FLINK-1422
 URL: https://issues.apache.org/jira/browse/FLINK-1422
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.8
Reporter: Alexander Alexandrov
Assignee: Chesnay Schepler
Priority: Trivial
 Fix For: 0.8.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am struggling to find a usage example of the withParameters method in the 
 documentation. At the moment I only see this note:
 {quote}
 Note: As the content of broadcast variables is kept in-memory on each node, 
 it should not become too large. For simpler things like scalar values you can 
 simply make parameters part of the closure of a function, or use the 
 withParameters(...) method to pass in a configuration.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/350#discussion_r24168441
  
--- Diff: docs/programming_guide.md ---
@@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` 
method to pass in a configuratio
 [Back to top](#top)
 
 
+Passing parameters to functions
+---
+
+Parameters can be passed to rich functions using either the constructor 
(if the function is
--- End diff --

If the function is not serialisable the program will fail in any case. That 
is not related to the configuration. 
IMO, the documentation should state that the function object is serialised 
using standard java serialization and shipped as-is to all parallel task 
instances. Mentioning that transient fields are not shipped is OK but not 
mandatory (somebody who uses the transient keyword should know what it does).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1422) Missing usage example for withParameters


[ 
https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307336#comment-14307336
 ] 

ASF GitHub Bot commented on FLINK-1422:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/350#discussion_r24168173
  
--- Diff: docs/programming_guide.md ---
@@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` 
method to pass in a configuratio
 [Back to top](#top)
 
 
+Passing parameters to functions
+---
+
+Parameters can be passed to rich functions using either the constructor 
(if the function is
--- End diff --

ok, so which requirements do exist for the operator to be serializable? or 
can i just omit those details and say something like parameters can be stored 
in non-transient fields if the function is serializable.


 Missing usage example for withParameters
 --

 Key: FLINK-1422
 URL: https://issues.apache.org/jira/browse/FLINK-1422
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.8
Reporter: Alexander Alexandrov
Assignee: Chesnay Schepler
Priority: Trivial
 Fix For: 0.8.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am struggling to find a usage example of the withParameters method in the 
 documentation. At the moment I only see this note:
 {quote}
 Note: As the content of broadcast variables is kept in-memory on each node, 
 it should not become too large. For simpler things like scalar values you can 
 simply make parameters part of the closure of a function, or use the 
 withParameters(...) method to pass in a configuration.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...

2015-02-05 Thread zentol

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/350#discussion_r24168173
  
--- Diff: docs/programming_guide.md ---
@@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` 
method to pass in a configuratio
 [Back to top](#top)
 
 
+Passing parameters to functions
+---
+
+Parameters can be passed to rich functions using either the constructor 
(if the function is
--- End diff --

ok, so which requirements do exist for the operator to be serializable? or 
can i just omit those details and say something like parameters can be stored 
in non-transient fields if the function is serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1481] Fixes flakey JobManagerITCase

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/365#issuecomment-73060980
  
Good one, thank you!

+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1481) Flakey JobManagerITCase


[ 
https://issues.apache.org/jira/browse/FLINK-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307363#comment-14307363
 ] 

ASF GitHub Bot commented on FLINK-1481:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/365#issuecomment-73060980
  
Good one, thank you!

+1 to merge


 Flakey JobManagerITCase
 ---

 Key: FLINK-1481
 URL: https://issues.apache.org/jira/browse/FLINK-1481
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 We currently have some test cases which rely on non-deterministic behaviour. 
 For example the {{JobManagerItCase}} contains a test case with a 
 {{SometimesExceptionSender}} which randomly throws an exception and otherwise 
 blocks. The probability of a failure is 0.05 and we have 100 parallel tasks. 
 Thus, the overall probability that no task fails at all is still 0.005. Thus 
 every 200th test run, the test case will block and thus fail.
 In order to get a stable test case base, I think that we should try to avoid 
 these kind of test cases with random behaviour. The same goes with sleep 
 timeouts in order to establish an intended interleaving of concurrent 
 processes. With Travis this can fail too often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests


[ 
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307568#comment-14307568
 ] 

ASF GitHub Bot commented on FLINK-1166:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/366#discussion_r24179312
  
--- Diff: tools/qa-check.sh ---
@@ -0,0 +1,175 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+#
+# QA check your changes.
+# Possible options:
+# BRANCH set a another branch as the check reference
+#
+#
+# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh
+#
+
+
+BRANCH=${BRANCH:-origin/master}
+
+
+
+here=`dirname \$0\`# relative
+here=`( cd \$here\  pwd )`   # absolutized and normalized
+if [ -z $here ] ; then
+   # error; for some reason, the path is not accessible
+   # to the script (e.g. permissions re-evaled after suid)
+   exit 1  # fail
+fi
+flink_home=`dirname \$here\`
+
+echo flink_home=$flink_home here=$here
+cd $here
+
+if [ ! -d  _qa_workdir ] ; then
+   echo _qa_workdir doesnt exist. Creating it
+   mkdir _qa_workdir
+fi
+# attention, it overwrites
+echo _qa_workdir  .gitignore
+
+cd _qa_workdir
+
+if [ ! -d  flink ] ; then
+   echo There is no flink copy in the workdir. Cloning flink
+   git clone https://github.com/apache/flink.git flink
+   cd flink
+   
+fi
+cd flink
+git fetch origin
+git checkout $BRANCH
+cd $here
+# go to refrence flink directory
+
+cd _qa_workdir
+VAR_DIR=`pwd`
+cd flink
+
+# Initialize variables
+export TESTS_PASSED=true
+export MESSAGES=Flink QA-Check results:$'\n'
+
+goToTestDirectory() {
+   cd $flink_home
+}
+
+ Methods 
+
+ Javadocs 
+JAVADOC_MVN_COMMAND=mvn javadoc:aggregate -Pdocs-and-source 
-Dmaven.javadoc.failOnError=false -Dquiet=false | grep  
\WARNING\|warning\|error\ | wc -l
+
+referenceJavadocsErrors() {
+   eval $JAVADOC_MVN_COMMAND  $VAR_DIR/_JAVADOCS_NUM_WARNINGS
+}
+
+
+checkJavadocsErrors() {
+   OLD_JAVADOC_ERR_CNT=`cat $VAR_DIR/_JAVADOCS_NUM_WARNINGS` 
+   NEW_JAVADOC_ERR_CNT=`eval $JAVADOC_MVN_COMMAND`
+   if [ $NEW_JAVADOC_ERR_CNT -gt $OLD_JAVADOC_ERR_CNT ]; then
+   MESSAGES+=:-1: The change increases the number of javadoc 
errors from $OLD_JAVADOC_ERR_CNT to $NEW_JAVADOC_ERR_CNT$'\n'
+   TESTS_PASSED=false
+   else
+   MESSAGES+=:+1: The number of javadoc errors was 
$OLD_JAVADOC_ERR_CNT and is now $NEW_JAVADOC_ERR_CNT$'\n'
+   fi
+}
+
+
+ Compiler warnings 
+COMPILER_WARN_MVN_COMMAND=mvn clean compile 
-Dmaven.compiler.showWarning=true -Dmaven.compiler.showDeprecation=true | grep 
\WARNING\ | wc -l
+referenceCompilerWarnings() {
+   eval $COMPILER_WARN_MVN_COMMAND  $VAR_DIR/_COMPILER_NUM_WARNINGS
+}
+
+checkCompilerWarnings() {
+   OLD_COMPILER_ERR_CNT=`cat $VAR_DIR/_COMPILER_NUM_WARNINGS` 
+   NEW_COMPILER_ERR_CNT=`eval $COMPILER_WARN_MVN_COMMAND`
+   if [ $NEW_COMPILER_ERR_CNT -gt $OLD_COMPILER_ERR_CNT ]; then
+   MESSAGES+=:-1: The change increases the number of compiler 
warnings from $OLD_COMPILER_ERR_CNT to $NEW_COMPILER_ERR_CNT$'\n'
+   TESTS_PASSED=false
+   else
+   MESSAGES+=:+1: The number of compiler warnings was 
$OLD_COMPILER_ERR_CNT and is now $NEW_COMPILER_ERR_CNT$'\n'
+   fi
+}
+
+ Files in lib 
+BUILD_MVN_COMMAND=mvn clean package -DskipTests
--- End diff --

we could also skip Javadocs here with `-Dmaven.javadoc.skip=true`


 Add a QA bot to Flink that is testing pull requests
 ---

 Key: FLINK-1166

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73089829
  
Thank you for trying it out.
Ideally, with the bot in place the number of warnings will go down over 
time.

I'll address the comments in the source.

I'm not sure if the number of compiler warnings is correct here.

A third option would be to
a) check out the current master
b) get the reference counts on the master
c) apply the pull request as a patch to master (checking if patching is 
possible (basically testing if rebase is possible))
d) if rebase was possible, get the new counts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed


[ 
https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307619#comment-14307619
 ] 

ASF GitHub Bot commented on FLINK-1482:
---

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/367


 BlobStore does not delete directories when process is killed
 

 Key: FLINK-1482
 URL: https://issues.apache.org/jira/browse/FLINK-1482
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 The BlobStore (BlobCache + BlobServer) does not delete the blob store 
 directory if it is not properly shutdown. This happens if one kills the 
 process as it is done by Flink's start and stop shell scripts.
 We can solve the problem by registering a shutdown hook to do the cleanup. 
 The shutdown hook has to delete a unique directory which has been created by 
 the blob store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed


[ 
https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307620#comment-14307620
 ] 

ASF GitHub Bot commented on FLINK-1482:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/367#issuecomment-73089771
  
Ufuk already implemented the shutdown hooks.


 BlobStore does not delete directories when process is killed
 

 Key: FLINK-1482
 URL: https://issues.apache.org/jira/browse/FLINK-1482
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 The BlobStore (BlobCache + BlobServer) does not delete the blob store 
 directory if it is not properly shutdown. This happens if one kills the 
 process as it is done by Flink's start and stop shell scripts.
 We can solve the problem by registering a shutdown hook to do the cleanup. 
 The shutdown hook has to delete a unique directory which has been created by 
 the blob store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests

[
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307621#comment-14307621
]

ASF GitHub Bot commented on FLINK-1166:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73089829

Thank you for trying it out.
Ideally, with the bot in place the number of warnings will go down over
time.

I'll address the comments in the source.

I'm not sure if the number of compiler warnings is correct here.

A third option would be to
a) check out the current master
b) get the reference counts on the master
c) apply the pull request as a patch to master (checking if patching is
possible (basically testing if rebase is possible))
d) if rebase was possible, get the new counts.

Add a QA bot to Flink that is testing pull requests
---

Key: FLINK-1166
URL: https://issues.apache.org/jira/browse/FLINK-1166
Project: Flink
Issue Type: New Feature
Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
Labels: starter

We should have a QA bot (similar to Hadoop) that is checking incoming pull
requests for a few things:
- Changes to user-facing APIs
- More compiler warnings than before
- more Javadoc warnings than before
- change of the number of files in the lib/ directory.
- unused dependencies
- {{@author}} tag.
- guava (and other shaded jars) in the lib/ directory.
It should be somehow extensible to add new tests.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-1482) BlobStore does not delete directories when process is killed

2015-02-05 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-1482.

Resolution: Fixed

Fixed with e766dba3653bb942ff66bd9edc999237a77c2ea2

 BlobStore does not delete directories when process is killed
 

 Key: FLINK-1482
 URL: https://issues.apache.org/jira/browse/FLINK-1482
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 The BlobStore (BlobCache + BlobServer) does not delete the blob store 
 directory if it is not properly shutdown. This happens if one kills the 
 process as it is done by Flink's start and stop shell scripts.
 We can solve the problem by registering a shutdown hook to do the cleanup. 
 The shutdown hook has to delete a unique directory which has been created by 
 the blob store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests


[ 
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307553#comment-14307553
 ] 

ASF GitHub Bot commented on FLINK-1166:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73084319
  
Its not completely building flink, its only generating javadocs or 
compiling for getting the compiler warnings.

The main purpose of the script is to be executed automatically.


 Add a QA bot to Flink that is testing pull requests
 ---

 Key: FLINK-1166
 URL: https://issues.apache.org/jira/browse/FLINK-1166
 Project: Flink
  Issue Type: New Feature
  Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
  Labels: starter

 We should have a QA bot (similar to Hadoop) that is checking incoming pull 
 requests for a few things:
 - Changes to user-facing APIs
 - More compiler warnings than before
 - more Javadoc warnings than before
 - change of the number of files in the lib/ directory.
 - unused dependencies
 - {{@author}} tag.
 - guava (and other shaded jars) in the lib/ directory.
 It should be somehow extensible to add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests


[ 
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307552#comment-14307552
 ] 

ASF GitHub Bot commented on FLINK-1166:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/366#discussion_r24178788
  
--- Diff: tools/qa-check.sh ---
@@ -0,0 +1,175 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+#
+# QA check your changes.
+# Possible options:
+# BRANCH set a another branch as the check reference
+#
+#
+# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh
+#
+
+
+BRANCH=${BRANCH:-origin/master}
+
+
+
+here=`dirname \$0\`# relative
+here=`( cd \$here\  pwd )`   # absolutized and normalized
+if [ -z $here ] ; then
+   # error; for some reason, the path is not accessible
+   # to the script (e.g. permissions re-evaled after suid)
+   exit 1  # fail
+fi
+flink_home=`dirname \$here\`
+
+echo flink_home=$flink_home here=$here
+cd $here
+
+if [ ! -d  _qa_workdir ] ; then
+   echo _qa_workdir doesnt exist. Creating it
+   mkdir _qa_workdir
+fi
+# attention, it overwrites
+echo _qa_workdir  .gitignore
+
+cd _qa_workdir
+
+if [ ! -d  flink ] ; then
+   echo There is no flink copy in the workdir. Cloning flink
+   git clone https://github.com/apache/flink.git flink
+   cd flink
--- End diff --

This `cd flink` is not necessary.


 Add a QA bot to Flink that is testing pull requests
 ---

 Key: FLINK-1166
 URL: https://issues.apache.org/jira/browse/FLINK-1166
 Project: Flink
  Issue Type: New Feature
  Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
  Labels: starter

 We should have a QA bot (similar to Hadoop) that is checking incoming pull 
 requests for a few things:
 - Changes to user-facing APIs
 - More compiler warnings than before
 - more Javadoc warnings than before
 - change of the number of files in the lib/ directory.
 - unused dependencies
 - {{@author}} tag.
 - guava (and other shaded jars) in the lib/ directory.
 It should be somehow extensible to add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1482) BlobStore does not delete directories when process is killed

2015-02-05 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-1482:


 Summary: BlobStore does not delete directories when process is 
killed
 Key: FLINK-1482
 URL: https://issues.apache.org/jira/browse/FLINK-1482
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann


The BlobStore (BlobCache + BlobServer) does not delete the blob store directory 
if it is not properly shutdown. This happens if one kills the process as it is 
done by Flink's start and stop shell scripts.

We can solve the problem by registering a shutdown hook to do the cleanup. The 
shutdown hook has to delete a unique directory which has been created by the 
blob store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests

[
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307600#comment-14307600
]

ASF GitHub Bot commented on FLINK-1166:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73087911

I just ran the script on your pull request ;)

Flink QA-Check results:
:+1: The number of javadoc errors was 402 and is now 402
:-1: The change increases the number of compiler warnings from 104
to 314
:+1: The number of files in the lib/ folder was 142 before the
change and is now 142

Test finished.
Overall result: :-1:. Some tests failed. Please check messages above

I guess if the user does not rebase to the latest master, then negative QA
test results are likely to occur. Also, warnings could be hidden when the
number of warnings have increased on the master. Wouldn't it be better to
compare against the base of the user's changes?

Add a QA bot to Flink that is testing pull requests
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed


[ 
https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307612#comment-14307612
 ] 

ASF GitHub Bot commented on FLINK-1482:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/367

[FLINK-1482] [runtime] Adds shutdown hook to delete blob store directories

Adds shutdown hook to delete blob store directories. This also deletes the 
blob store files in case that the JVM was terminated. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixBlobStoreFileDeletion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #367


commit e868c0ed39a8f48d74d7315e911f23d5ccab111a
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-02-05T17:29:13Z

[FLINK-1482] [runtime] Adds shutdown hook to delete blob store directories




 BlobStore does not delete directories when process is killed
 

 Key: FLINK-1482
 URL: https://issues.apache.org/jira/browse/FLINK-1482
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 The BlobStore (BlobCache + BlobServer) does not delete the blob store 
 directory if it is not properly shutdown. This happens if one kills the 
 process as it is done by Flink's start and stop shell scripts.
 We can solve the problem by registering a shutdown hook to do the cleanup. 
 The shutdown hook has to delete a unique directory which has been created by 
 the blob store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1482] [runtime] Adds shutdown hook to d...

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/367#issuecomment-73089771
  
Ufuk already implemented the shutdown hooks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1482] [runtime] Adds shutdown hook to d...

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/367


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1483) Temporary channel files are not properly deleted when Flink is terminated

2015-02-05 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1483:
-
Description: 
The temporary channel files are not properly deleted if the IOManager does not 
shut down properly. This can be the case when the TaskManagers are terminated 
by Flink's shell scripts.

A solution could be to store all channel files of one TaskManager in a uniquely 
identifiable directory and to register a shutdown hook which deletes this file 
upon termination.

  was:The temporary channel files are not properly deleted if the IOManager 
does not shut down properly. This can be the case when the Job/TaskManager are 
terminated by Flink's shell scripts.


 Temporary channel files are not properly deleted when Flink is terminated
 -

 Key: FLINK-1483
 URL: https://issues.apache.org/jira/browse/FLINK-1483
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Ufuk Celebi

 The temporary channel files are not properly deleted if the IOManager does 
 not shut down properly. This can be the case when the TaskManagers are 
 terminated by Flink's shell scripts.
 A solution could be to store all channel files of one TaskManager in a 
 uniquely identifiable directory and to register a shutdown hook which deletes 
 this file upon termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73077279
  
Do you think that with the additional checking logic this would really make 
up for one superfluous duplication? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests


[ 
https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307549#comment-14307549
 ] 

ASF GitHub Bot commented on FLINK-1166:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73083750
  
Very nice. Will try it out now :)


 Add a QA bot to Flink that is testing pull requests
 ---

 Key: FLINK-1166
 URL: https://issues.apache.org/jira/browse/FLINK-1166
 Project: Flink
  Issue Type: New Feature
  Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
  Labels: starter

 We should have a QA bot (similar to Hadoop) that is checking incoming pull 
 requests for a few things:
 - Changes to user-facing APIs
 - More compiler warnings than before
 - more Javadoc warnings than before
 - change of the number of files in the lib/ directory.
 - unused dependencies
 - {{@author}} tag.
 - guava (and other shaded jars) in the lib/ directory.
 It should be somehow extensible to add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73083750
  
Very nice. Will try it out now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory

[
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307484#comment-14307484
]

ASF GitHub Bot commented on FLINK-1442:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/344#issuecomment-73074657

Awesome! Thanks for the update.

Archived Execution Graph consumes too much memory
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it


[ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307507#comment-14307507
 ] 

ASF GitHub Bot commented on FLINK-1463:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73077279
  
Do you think that with the additional checking logic this would really make 
up for one superfluous duplication? 


 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.8.1


 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it


[ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307524#comment-14307524
 ] 

ASF GitHub Bot commented on FLINK-1463:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73079537
  
Thanks. Sorry for being picky ;-)


 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.8.1


 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread mxm

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73083755
  
The script builds Flink twice and therefore takes a while. I don't know if 
we can force users to execute it before making pull requests. However, it would 
be good to get an automatic report for open pull requests using this script.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it


[ 
https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307516#comment-14307516
 ] 

ASF GitHub Bot commented on FLINK-1463:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73078373
  
Yes. Some serializers are heavyweight and often, the factories produce only 
one serializer through their lifetime. It is a nice improvement, bot a crucial 
one, though.


 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later 
 tries to use it
 

 Key: FLINK-1463
 URL: https://issues.apache.org/jira/browse/FLINK-1463
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.8.1


 At least one user has seen an exception because of this. In theory, the 
 ClassLoader is set again in readParametersFromConfig. But the way it is used 
 in TupleComparatorBase, this method is never called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/353#issuecomment-73079161
  
Ok, then I'll add this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73084249
  
The idea is to run this automatically or on request, which should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

2015-02-05 Thread mxm

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/366#discussion_r24178788
  
--- Diff: tools/qa-check.sh ---
@@ -0,0 +1,175 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+#
+# QA check your changes.
+# Possible options:
+# BRANCH set a another branch as the check reference
+#
+#
+# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh
+#
+
+
+BRANCH=${BRANCH:-origin/master}
+
+
+
+here=`dirname \$0\`# relative
+here=`( cd \$here\  pwd )`   # absolutized and normalized
+if [ -z $here ] ; then
+   # error; for some reason, the path is not accessible
+   # to the script (e.g. permissions re-evaled after suid)
+   exit 1  # fail
+fi
+flink_home=`dirname \$here\`
+
+echo flink_home=$flink_home here=$here
+cd $here
+
+if [ ! -d  _qa_workdir ] ; then
+   echo _qa_workdir doesnt exist. Creating it
+   mkdir _qa_workdir
+fi
+# attention, it overwrites
+echo _qa_workdir  .gitignore
+
+cd _qa_workdir
+
+if [ ! -d  flink ] ; then
+   echo There is no flink copy in the workdir. Cloning flink
+   git clone https://github.com/apache/flink.git flink
+   cd flink
--- End diff --

This `cd flink` is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/366#issuecomment-73084319
  
Its not completely building flink, its only generating javadocs or 
compiling for getting the compiler warnings.

The main purpose of the script is to be executed automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-1477) YARN Client: Use HADOOP_HOME if set

2015-02-05 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1477.
---
Resolution: Fixed

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/5e1cc9e2

 YARN Client: Use HADOOP_HOME if set
 ---

 Key: FLINK-1477
 URL: https://issues.apache.org/jira/browse/FLINK-1477
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger

 As part of FLINK-592 I removed code to load the Hadoop configuration from the 
 HADOOP_HOME variable.
 But it seems that we still have users where this deprecated variable is the 
 only hadoop conf variable which is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...

Github user fhueske closed the pull request at:

https://github.com/apache/flink/pull/360


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/360#issuecomment-73025884
  
Merged as a19b4a02bfa5237e0dcd2b264da36229546f23c0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1415] Akka cleanups

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/319#issuecomment-73031242
  
Rebased on Stephan's latest master branch containing the subslot release 
fix and reduced memory footprint of archived ExecutionGraphs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/344


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-1477) YARN Client: Use HADOOP_HOME if set

2015-02-05 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-1477:
-

 Summary: YARN Client: Use HADOOP_HOME if set
 Key: FLINK-1477
 URL: https://issues.apache.org/jira/browse/FLINK-1477
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger


As part of FLINK-592 I removed code to load the Hadoop configuration from the 
HADOOP_HOME variable.
But it seems that we still have users where this deprecated variable is the 
only hadoop conf variable which is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306959#comment-14306959
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73024197
  
I vote for keeping @aljoscha's original approach.
Users might not notice the different interfaces there, so the Hadoop in 
the method name makes it more explicit.
Also, it could lead to confusions because Flink's and Hadoop's InputFormats 
have pretty similar names (they actually only differ in the package names).
Lastly, it would cause some work on Aljoscha's side to update the code and 
the documentation.


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats


[ 
https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306972#comment-14306972
 ] 

ASF GitHub Bot commented on FLINK-1318:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/265


 Make quoted String parsing optional and configurable for CSVInputFormats
 

 Key: FLINK-1318
 URL: https://issues.apache.org/jira/browse/FLINK-1318
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Scala API
Affects Versions: 0.8
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Minor

 With the current implementation of the CSVInputFormat, quoted string parsing 
 kicks in, if the first non-whitespace character of a field is a double quote. 
 I see two issues with this implementation:
 1. Quoted String parsing cannot be disabled
 2. The quoting character is fixed to double quotes ()
 I propose to add parameters to disable quoted String parsing and set the 
 quote character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable


[ 
https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307049#comment-14307049
 ] 

ASF GitHub Bot commented on FLINK-1471:
---

Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/359#issuecomment-73028436
  
The discussion is how a Function without any generics (raw) should be 
treated. 

In general we have 3 possiblities:
- Skip input validation completely if Function implements 
ResultTypeQueryable
- Introduce a Interface /Anntotation SkipInputValidation for special use 
cases
- This PR: Skip input validation if Function is a raw type.


 Allow KeySelectors to implement ResultTypeQueryable
 ---

 Key: FLINK-1471
 URL: https://issues.apache.org/jira/browse/FLINK-1471
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Timo Walther
 Fix For: 0.9


 See https://github.com/apache/flink/pull/354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1478) Add strictly local input split assignment

Stephan Ewen created FLINK-1478:
---

 Summary: Add strictly local input split assignment
 Key: FLINK-1478
 URL: https://issues.apache.org/jira/browse/FLINK-1478
 Project: Flink
  Issue Type: New Feature
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1458] Allow Interfaces and abstract typ...

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/357


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory

[
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307133#comment-14307133
]

ASF GitHub Bot commented on FLINK-1442:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/344

Archived Execution Graph consumes too much memory
-

Key: FLINK-1442
URL: https://issues.apache.org/jira/browse/FLINK-1442
Project: Flink
Issue Type: Bug
Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Max Michels

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73020223
  
I addressed the comments. What do the others think about overloading 
readFile()? I made it like this on purpose. So that the user sees in the API 
that they are using Hadoop input formats or that they can be used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306928#comment-14306928
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73020223
  
I addressed the comments. What do the others think about overloading 
readFile()? I made it like this on purpose. So that the user sees in the API 
that they are using Hadoop input formats or that they can be used.


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

2015-02-05 Thread hsaputra

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/368#discussion_r24220454
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging {
   override def postStop(): Unit = {
 log.info(sStopping job manager ${self.path}.)
 
+// disconnect the registered task managers
+instanceManager.getAllRegisteredInstances.asScala.foreach{
+  _.getTaskManager ! Disconnected(JobManager is stopping)}
+
 for((e,_) - currentJobs.values){
   e.fail(new Exception(The JobManager is shutting down.))
--- End diff --

Since we are cleaning up messages, maybe remove The so it is consistent 
with other messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1484) JobManager restart does not notify the TaskManager


[ 
https://issues.apache.org/jira/browse/FLINK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308540#comment-14308540
 ] 

ASF GitHub Bot commented on FLINK-1484:
---

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/368#discussion_r24220454
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging {
   override def postStop(): Unit = {
 log.info(sStopping job manager ${self.path}.)
 
+// disconnect the registered task managers
+instanceManager.getAllRegisteredInstances.asScala.foreach{
+  _.getTaskManager ! Disconnected(JobManager is stopping)}
+
 for((e,_) - currentJobs.values){
   e.fail(new Exception(The JobManager is shutting down.))
--- End diff --

Since we are cleaning up messages, maybe remove The so it is consistent 
with other messages.


 JobManager restart does not notify the TaskManager
 --

 Key: FLINK-1484
 URL: https://issues.apache.org/jira/browse/FLINK-1484
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 In case of a JobManager restart, which can happen due to an uncaught 
 exception, the JobManager is restarted. However, connected TaskManager are 
 not informed about the disconnection and continue sending messages to a 
 JobManager with a reseted state. 
 TaskManager should be informed about a possible restart and cleanup their own 
 state in such a case. Afterwards, they can try to reconnect to a restarted 
 JobManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1455) ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords: Potential Memory leak


 [ 
https://issues.apache.org/jira/browse/FLINK-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1455.
-
Resolution: Invalid

Not a memory leak. The error message comes only because the JVM I/O problem 
caused the test to abort pre-maturely.

The root cause is a spurious JVM failure:
{code}
Caused by: java.io.IOException: Cannot allocate memory
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
{code}

 ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords: 
 Potential Memory leak
 

 Key: FLINK-1455
 URL: https://issues.apache.org/jira/browse/FLINK-1455
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Robert Metzger
Priority: Minor

 This error occurred in one of my Travis jobs: 
 https://travis-ci.org/rmetzger/flink/jobs/48343022
 Would be cool if somebody who knows the sorter better could verify/invalidate 
 the issue.
 {code}
 Running org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase
 java.lang.RuntimeException: Error obtaining the sorted input: Thread 
 'SortMerger spilling thread' terminated due to an exception: Cannot allocate 
 memory
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:593)
   at 
 org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords(ExternalSortLargeRecordsITCase.java:285)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Caused by: java.io.IOException: Thread 'SortMerger spilling thread' 
 terminated due to an exception: Cannot allocate memory
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:770)
 Caused by: java.io.IOException: Cannot allocate memory
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
   at 
 org.apache.flink.runtime.io.disk.iomanager.SegmentWriteRequest.write(AsynchronousFileIOChannel.java:267)
   at 
 org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$WriterThread.run(IOManagerAsync.java:440)
 Tests run: 5,

[jira] [Commented] (FLINK-837) Allow FunctionAnnotations on Methods


[ 
https://issues.apache.org/jira/browse/FLINK-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306890#comment-14306890
 ] 

Stephan Ewen commented on FLINK-837:


I guess that this is sufficient for now.

 Allow FunctionAnnotations on Methods
 

 Key: FLINK-837
 URL: https://issues.apache.org/jira/browse/FLINK-837
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 Allowing function annotations on methods makes it possible to use them with 
 anonymous inner classes.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/837
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: 
 Created at: Mon May 19 22:19:58 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.


[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306943#comment-14306943
 ] 

ASF GitHub Bot commented on FLINK-1396:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73022621
  
Hmm, yes. That's also a valid point. 
But on the other hand, new users might not even be aware of the different 
types of InputFormats. It all would look natural ;-)

I am more leaning towards overloading, but would be fine with having 
separate functions as well.


 Add hadoop input formats directly to the user API.
 --

 Key: FLINK-1396
 URL: https://issues.apache.org/jira/browse/FLINK-1396
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73022621
  
Hmm, yes. That's also a valid point. 
But on the other hand, new users might not even be aware of the different 
types of InputFormats. It all would look natural ;-)

I am more leaning towards overloading, but would be fine with having 
separate functions as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/363#issuecomment-73024197
  
I vote for keeping @aljoscha's original approach.
Users might not notice the different interfaces there, so the Hadoop in 
the method name makes it more explicit.
Also, it could lead to confusions because Flink's and Hadoop's InputFormats 
have pretty similar names (they actually only differ in the package names).
Lastly, it would cause some work on Aljoscha's side to update the code and 
the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...