[jira] [Closed] (FLINK-2629) Refactor YARN ApplicationMasterActor to use the async AM<-->RM client

2015-12-11 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-2629.
-
Resolution: Duplicate

> Refactor YARN ApplicationMasterActor to use the async AM<-->RM client 
> --
>
> Key: FLINK-2629
> URL: https://issues.apache.org/jira/browse/FLINK-2629
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>
> YARN also offers an async client for the AM <--> RM communication: 
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/async/AMRMClientAsync.html
> Currently, container failures can block the entire JobManager actor systems 
> for a few seconds because of blocking AM <---> RM calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2936] Fix ClassCastException for Event-...

2015-12-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1448#issuecomment-163918357
  
@StephanEwen do we want to fix it like this now, or do we want to integrate 
it into the StreamRecordSerializer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2936) ClassCastException when using EventTimeSourceFunction in non-EventTime program

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052655#comment-15052655
 ] 

ASF GitHub Bot commented on FLINK-2936:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1448#issuecomment-163918357
  
@StephanEwen do we want to fix it like this now, or do we want to integrate 
it into the StreamRecordSerializer?


> ClassCastException when using EventTimeSourceFunction in non-EventTime program
> --
>
> Key: FLINK-2936
> URL: https://issues.apache.org/jira/browse/FLINK-2936
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> Using an {{EventTimeSourceFunction}} in a DataStream programs that does not 
> operate with {{TimeCharacteristic.EventTime}} leads to a 
> {{ClassCastException}} when the first {{Watermark}} is emitted:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
> {code}
> This exception is not very helpful for users that simply for got to set the 
> correct {{TimeCharacteristic}} and should be improved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2769) Web dashboard port not configurable on client side

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052551#comment-15052551
 ] 

ASF GitHub Bot commented on FLINK-2769:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1449#issuecomment-163893598
  
Thanks for the reviews Robert and Sachin!

I've addressed Robert's comment. If there are no objections, I would like 
to merge this.


> Web dashboard port not configurable on client side
> --
>
> Key: FLINK-2769
> URL: https://issues.apache.org/jira/browse/FLINK-2769
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Affects Versions: 0.10.0
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 0.10.0
>
>
> The new web dashboard port configuration only affects the server side, but 
> not the client side.
> To reproduce set in {{flink-conf.yaml}}:
> {code}
> jobmanager.web.port: 9091
> jobmanager.new-web-frontend: true
> {code}
> Run
> {code}
> $ bin/start-cluster.sh
> {code}
> You can browse to {{http://localhost:9091}}, but you won't see any data from 
> the job manager. Requests to http://localhost:9091/overview etc. work as 
> expected, but the client side JavaScript is running requests against 
> {{http://localhost:8081}}.
> The problem is that 
> {{flink-runtime-web/web-dashboard/app/scripts/index.coffee}} hard codes:
> {code}
> .value 'flinkConfig', {
>   jobServer: 'http://localhost:8081'
> }
> {code} which is picked up by all generated scripts.
> This needs to be configurable for both standalone mode but especially for 
> recovery mode. This was a major pain point for me today, because I was 
> working on the dashboard and thought that my changes were wrong.
> In general, we need to add more tests to the new dashboard, which should soon 
> replace the old one imo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2769] [runtime-web] Add configurable jo...

2015-12-11 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1449#issuecomment-163893598
  
Thanks for the reviews Robert and Sachin!

I've addressed Robert's comment. If there are no objections, I would like 
to merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2769] [runtime-web] Add configurable jo...

2015-12-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1449#issuecomment-163953942
  
I think its good to merge, +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2769) Web dashboard port not configurable on client side

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052853#comment-15052853
 ] 

ASF GitHub Bot commented on FLINK-2769:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1449#issuecomment-163953942
  
I think its good to merge, +1


> Web dashboard port not configurable on client side
> --
>
> Key: FLINK-2769
> URL: https://issues.apache.org/jira/browse/FLINK-2769
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Affects Versions: 0.10.0
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 0.10.0
>
>
> The new web dashboard port configuration only affects the server side, but 
> not the client side.
> To reproduce set in {{flink-conf.yaml}}:
> {code}
> jobmanager.web.port: 9091
> jobmanager.new-web-frontend: true
> {code}
> Run
> {code}
> $ bin/start-cluster.sh
> {code}
> You can browse to {{http://localhost:9091}}, but you won't see any data from 
> the job manager. Requests to http://localhost:9091/overview etc. work as 
> expected, but the client side JavaScript is running requests against 
> {{http://localhost:8081}}.
> The problem is that 
> {{flink-runtime-web/web-dashboard/app/scripts/index.coffee}} hard codes:
> {code}
> .value 'flinkConfig', {
>   jobServer: 'http://localhost:8081'
> }
> {code} which is picked up by all generated scripts.
> This needs to be configurable for both standalone mode but especially for 
> recovery mode. This was a major pain point for me today, because I was 
> working on the dashboard and thought that my changes were wrong.
> In general, we need to add more tests to the new dashboard, which should soon 
> replace the old one imo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2936) ClassCastException when using EventTimeSourceFunction in non-EventTime program

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052782#comment-15052782
 ] 

ASF GitHub Bot commented on FLINK-2936:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1448#issuecomment-163940693
  
Probably okay to do it like this.
How hard is it to pass the "timestamps" flag into the constructor of the 
SourceOperator, rather than through the runtime context?


> ClassCastException when using EventTimeSourceFunction in non-EventTime program
> --
>
> Key: FLINK-2936
> URL: https://issues.apache.org/jira/browse/FLINK-2936
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> Using an {{EventTimeSourceFunction}} in a DataStream programs that does not 
> operate with {{TimeCharacteristic.EventTime}} leads to a 
> {{ClassCastException}} when the first {{Watermark}} is emitted:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
> {code}
> This exception is not very helpful for users that simply for got to set the 
> correct {{TimeCharacteristic}} and should be improved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2936] Fix ClassCastException for Event-...

2015-12-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1448#issuecomment-163940693
  
Probably okay to do it like this.
How hard is it to pass the "timestamps" flag into the constructor of the 
SourceOperator, rather than through the runtime context?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2936] Fix ClassCastException for Event-...

2015-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1448


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3121] Emit Final Watermark in Kafka Sou...

2015-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1447


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3134) Make YarnJobManager's allocate call asynchronous

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052874#comment-15052874
 ] 

ASF GitHub Bot commented on FLINK-3134:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/1450

[FLINK-3134][yarn] asynchronous YarnJobManager heartbeats

- use AMRMClientAsync instead of AMRMClient
- handle allocation and startup of containers in callbacks
- remove YarnHeartbeat message

The AMRMClientAsync uses one thread to communicate with the resource
manager and an additional thread to execute the callbacks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink yart-heartbeat-async

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1450


commit 8ec6d492795801ffc7c76f37a3a3200bbf85f81c
Author: Maximilian Michels 
Date:   2015-12-08T13:35:04Z

[FLINK-3134][yarn] asynchronous YarnJobManager heartbeats

- use AMRMClientAsync instead of AMRMClient
- handle allocation and startup of containers in callbacks
- remove YarnHeartbeat message

The AMRMClientAsync uses one thread to communicate with the resource
manager and an additional thread to execute the callbacks.




> Make YarnJobManager's allocate call asynchronous
> 
>
> Key: FLINK-3134
> URL: https://issues.apache.org/jira/browse/FLINK-3134
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 0.10.0, 1.0.0, 0.10.1
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.0.0
>
>
> The {{allocate()}} call is used in the {{YarnJobManager}} to send a heartbeat 
> to the YARN resource manager. This call may block the JobManager actor system 
> for arbitrary time, e.g. if retry handlers are set up within the call to 
> allocate.
> I propose to use the {{AMRMClientAsync}}'s callback methods to send 
> heartbeats and update the container information. The API is available for our 
> supported Hadoop versions (2.3.0 and above).
> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/yarn/client/api/async/AMRMClientAsync.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3121) Watermark forwarding does not work for sources not producing any data

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052989#comment-15052989
 ] 

ASF GitHub Bot commented on FLINK-3121:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1447


> Watermark forwarding does not work for sources not producing any data
> -
>
> Key: FLINK-3121
> URL: https://issues.apache.org/jira/browse/FLINK-3121
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: Aljoscha Krettek
>Priority: Blocker
>
> This mailing list discussion explains the issue in detail: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-TimestampExtractor-and-FlinkKafkaConsumer082-td3488.html
> As a workaround for now, the Kafka source can emit a final watermark for 
> sources not producing any data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2936) ClassCastException when using EventTimeSourceFunction in non-EventTime program

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052990#comment-15052990
 ] 

ASF GitHub Bot commented on FLINK-2936:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1448


> ClassCastException when using EventTimeSourceFunction in non-EventTime program
> --
>
> Key: FLINK-2936
> URL: https://issues.apache.org/jira/browse/FLINK-2936
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> Using an {{EventTimeSourceFunction}} in a DataStream programs that does not 
> operate with {{TimeCharacteristic.EventTime}} leads to a 
> {{ClassCastException}} when the first {{Watermark}} is emitted:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
> {code}
> This exception is not very helpful for users that simply for got to set the 
> correct {{TimeCharacteristic}} and should be improved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3160) Aggregate operator statistics by TaskManager

2015-12-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3160:
-

 Summary: Aggregate operator statistics by TaskManager
 Key: FLINK-3160
 URL: https://issues.apache.org/jira/browse/FLINK-3160
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.0.0
Reporter: Greg Hogan


The web client job info page presents a table of the following per task 
statistics: start time, end time, duration, bytes received, records received, 
bytes sent, records sent, attempt, host, status.

Flink supports clusters with thousands of slots and a job setting a high 
parallelism renders this job info page unwieldy and difficult to analyze in 
real-time.

It would be helpful to optionally or automatically aggregate statistics by 
TaskManager. These rows could then be expanded to reveal the current per task 
statistics.

Start time, end time, duration, and attempt are not applicable to a TaskManager 
since new tasks for repeated attempts may be started. Bytes received, records 
received, bytes sent, and records sent are summed. Any throughput metrics can 
be averaged over the total task time or time window. Status could reference the 
number of running tasks on the TaskManager or an idle state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3121) Watermark forwarding does not work for sources not producing any data

2015-12-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-3121.
-
   Resolution: Fixed
Fix Version/s: 1.0.0

Fixed in 
https://github.com/apache/flink/commit/6bd5714d2a045e581b1a761830d010598f803de7

> Watermark forwarding does not work for sources not producing any data
> -
>
> Key: FLINK-3121
> URL: https://issues.apache.org/jira/browse/FLINK-3121
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.0.0
>
>
> This mailing list discussion explains the issue in detail: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-TimestampExtractor-and-FlinkKafkaConsumer082-td3488.html
> As a workaround for now, the Kafka source can emit a final watermark for 
> sources not producing any data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2936) ClassCastException when using EventTimeSourceFunction in non-EventTime program

2015-12-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-2936.
-
Resolution: Fixed

Fixed in 
https://github.com/apache/flink/commit/4b648870b4673c5a9c4d80f185e7de679967098e

> ClassCastException when using EventTimeSourceFunction in non-EventTime program
> --
>
> Key: FLINK-2936
> URL: https://issues.apache.org/jira/browse/FLINK-2936
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> Using an {{EventTimeSourceFunction}} in a DataStream programs that does not 
> operate with {{TimeCharacteristic.EventTime}} leads to a 
> {{ClassCastException}} when the first {{Watermark}} is emitted:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
> {code}
> This exception is not very helpful for users that simply for got to set the 
> correct {{TimeCharacteristic}} and should be improved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3161) Externalize cluster start-up and tear-down when available

2015-12-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3161:
-

 Summary: Externalize cluster start-up and tear-down when available
 Key: FLINK-3161
 URL: https://issues.apache.org/jira/browse/FLINK-3161
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Affects Versions: 1.0.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


I have been using pdsh, pdcp, and rpdcp to both distribute compiled Flink and 
to start and stop the TaskManagers. The current shell script initializes 
TaskManagers one-at-a-time. This is trivial to background but would be 
unthrottled.

>From pdsh's archived homepage: "uses a sliding window of threads to execute 
>remote commands, conserving socket resources while allowing some connections 
>to timeout if needed".

What other tools could be supported when available?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3134][yarn] asynchronous YarnJobManager...

2015-12-11 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/1450

[FLINK-3134][yarn] asynchronous YarnJobManager heartbeats

- use AMRMClientAsync instead of AMRMClient
- handle allocation and startup of containers in callbacks
- remove YarnHeartbeat message

The AMRMClientAsync uses one thread to communicate with the resource
manager and an additional thread to execute the callbacks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink yart-heartbeat-async

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1450


commit 8ec6d492795801ffc7c76f37a3a3200bbf85f81c
Author: Maximilian Michels 
Date:   2015-12-08T13:35:04Z

[FLINK-3134][yarn] asynchronous YarnJobManager heartbeats

- use AMRMClientAsync instead of AMRMClient
- handle allocation and startup of containers in callbacks
- remove YarnHeartbeat message

The AMRMClientAsync uses one thread to communicate with the resource
manager and an additional thread to execute the callbacks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052409#comment-15052409
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1255#issuecomment-163873769
  
Thanks for the update. Looks good. Will try it once more on a cluster.

@uce, @StephanEwen , we inject a `DataExchangeMode.BATCH` for range 
partitioning (`RangePartitionRewriter`, line 168). IIRC, there are some 
implication wrt. to iterations. Will that work?


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-12-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1255#issuecomment-163873769
  
Thanks for the update. Looks good. Will try it once more on a cluster.

@uce, @StephanEwen , we inject a `DataExchangeMode.BATCH` for range 
partitioning (`RangePartitionRewriter`, line 168). IIRC, there are some 
implication wrt. to iterations. Will that work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3162) Configure number of TaskManager slots as ratio of available processors

2015-12-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3162:
-

 Summary: Configure number of TaskManager slots as ratio of 
available processors
 Key: FLINK-3162
 URL: https://issues.apache.org/jira/browse/FLINK-3162
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: 1.0.0
Reporter: Greg Hogan
Priority: Minor


The number of TaskManager slots is currently only configurable by explicitly 
setting {{taskmanager.numberOfTaskSlots}}. Make this configurable by a ratio of 
the number of available processors (for example, "2", for hyperthreading). This 
can work in the same way as {{taskmanager.memory.size}} and 
{{taskmanager.memory.fraction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3164) Spread out scheduling strategy

2015-12-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3164:
-

 Summary: Spread out scheduling strategy
 Key: FLINK-3164
 URL: https://issues.apache.org/jira/browse/FLINK-3164
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime, Java API, Scala API
Affects Versions: 1.0.0
Reporter: Greg Hogan


The size of a Flink cluster is bounded by the amount of memory allocated for 
network buffers. The all-to-all distribution of data during a network shuffle 
means that doubling the number of TaskManager slots quadruples the required 
number of network buffers.

A Flink job can be configured to execute operators with lower parallelism which 
reduces the number of network buffers used across the cluster. Since the Flink 
scheduler clusters tasks the number of network buffers to be configured cannot 
be reduced.

For example, if each TaskManager has 32 slots and the cluster has 32 
TaskManagers the maximum parallelism can be set to 1024. If the preceding 
operator has a parallelism of 32 then the TaskManager fan-out is between 1*1024 
(tasks evenly distributed) and 32*1024 (executed on a single TaskManager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3163) Configure Flink for NUMA systems

2015-12-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3163:
-

 Summary: Configure Flink for NUMA systems
 Key: FLINK-3163
 URL: https://issues.apache.org/jira/browse/FLINK-3163
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Affects Versions: 1.0.0
Reporter: Greg Hogan
Assignee: Greg Hogan


On NUMA systems Flink can be pinned to a single physical processor ("node") 
using {{numactl --membind=$node --cpunodebind=$node }}. Commonly 
available NUMA systems include the largest AWS and Google Compute instances.

For example, on an AWS c4.8xlarge system with 36 hyperthreads the user could 
configure a single TaskManager with 36 slots or have Flink create two 
TaskManagers bound to each of the NUMA nodes, each with 18 slots.

There may be some extra overhead in transferring network buffers between 
TaskManagers on the same system, though the fraction of data shuffled in this 
manner decreases with the size of the cluster. The performance improvement from 
only accessing local memory looks to be significant though difficult to 
benchmark.

The JobManagers may fit into NUMA nodes rather than requiring full systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)