[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-25 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
@kl0u 
>> and pass the FilePathFilter as an argument to the constructor of the 
FileInputFormat and then do the filtering the same way you do it
I removed configuration override, but added a setter instead of a 
constructor. I assumed that this will be better since we don't need to update 
constructors of all classes that inherit FileInputFormat. What do you think 
about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349840#comment-15349840
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
@kl0u 
>> and pass the FilePathFilter as an argument to the constructor of the 
FileInputFormat and then do the filtering the same way you do it
I removed configuration override, but added a setter instead of a 
constructor. I assumed that this will be better since we don't need to update 
constructors of all classes that inherit FileInputFormat. What do you think 
about that?


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349826#comment-15349826
 ] 

ASF GitHub Bot commented on FLINK-4118:
---

Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2164
  
The docker images script was simplified and the image size was reduced.

Previous image:
flink latest   
6475add651c724 minutes ago  711.6 MB

Image after FLINK-4118
flink latest   
555e60f24c1020 seconds ago  252.5 MB




> The docker-flink image is outdated (1.0.2) and can be slimmed down
> --
>
> Key: FLINK-4118
> URL: https://issues.apache.org/jira/browse/FLINK-4118
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ismaël Mejía
>Priority: Minor
>
> This issue is to upgrade the docker image and polish some details in it (e.g. 
> it can be slimmed down if we remove some unneeded dependencies, and the code 
> can be polished).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2164: [FLINK-4118] The docker-flink image is outdated (1.0.2) a...

2016-06-25 Thread iemejia
Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2164
  
The docker images script was simplified and the image size was reduced.

Previous image:
flink latest   
6475add651c724 minutes ago  711.6 MB

Image after FLINK-4118
flink latest   
555e60f24c1020 seconds ago  252.5 MB




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2164: [FLINK-4118] The docker-flink image is outdated (1...

2016-06-25 Thread iemejia
GitHub user iemejia opened a pull request:

https://github.com/apache/flink/pull/2164

[FLINK-4118] The docker-flink image is outdated (1.0.2) and can be slimmed 
down

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/flink FLINK-4118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2164


commit 15bf574bfec2b12ee50f4a21f1b1a43421f632e2
Author: Ismaël Mejía 
Date:   2016-06-23T16:39:50Z

[FLINK-4118] Update docker image to 1.0.3 and remove unneeded deps

Some of the changes include:

- Remove unneeded dependencies (nano, wget)
- Remove apt lists to reduce image size
- Reduce number of layers on the docker image (best docker practice)
- Remove useless variables and base the code in generic ones e.g.
FLINK_HOME
- Change the default JDK from oracle to openjdk-8-jre-headless, based on
two reasons:

1. You cannot legally repackage the oracle jdk in docker images
2. The open-jdk headless is more appropriate for a server image (no GUI 
stuff)

- Return port assignation to the standard FLINK one:

Variable: docker-flink -> flink

taskmanager.rpc.port: 6121 -> 6122
taskmanager.data.port: 6122 -> 6121
jobmanager.web.port: 8080 -> 8081

commit 523e8b0ece28537635792cc9514d4e9bbaa2c53c
Author: Ismaël Mejía 
Date:   2016-06-24T14:52:22Z

[FLINK-4118] Base the image on the official java alpine and remove ssh




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349825#comment-15349825
 ] 

ASF GitHub Bot commented on FLINK-4118:
---

GitHub user iemejia opened a pull request:

https://github.com/apache/flink/pull/2164

[FLINK-4118] The docker-flink image is outdated (1.0.2) and can be slimmed 
down

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/flink FLINK-4118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2164


commit 15bf574bfec2b12ee50f4a21f1b1a43421f632e2
Author: Ismaël Mejía 
Date:   2016-06-23T16:39:50Z

[FLINK-4118] Update docker image to 1.0.3 and remove unneeded deps

Some of the changes include:

- Remove unneeded dependencies (nano, wget)
- Remove apt lists to reduce image size
- Reduce number of layers on the docker image (best docker practice)
- Remove useless variables and base the code in generic ones e.g.
FLINK_HOME
- Change the default JDK from oracle to openjdk-8-jre-headless, based on
two reasons:

1. You cannot legally repackage the oracle jdk in docker images
2. The open-jdk headless is more appropriate for a server image (no GUI 
stuff)

- Return port assignation to the standard FLINK one:

Variable: docker-flink -> flink

taskmanager.rpc.port: 6121 -> 6122
taskmanager.data.port: 6122 -> 6121
jobmanager.web.port: 8080 -> 8081

commit 523e8b0ece28537635792cc9514d4e9bbaa2c53c
Author: Ismaël Mejía 
Date:   2016-06-24T14:52:22Z

[FLINK-4118] Base the image on the official java alpine and remove ssh




> The docker-flink image is outdated (1.0.2) and can be slimmed down
> --
>
> Key: FLINK-4118
> URL: https://issues.apache.org/jira/browse/FLINK-4118
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ismaël Mejía
>Priority: Minor
>
> This issue is to upgrade the docker image and polish some details in it (e.g. 
> it can be slimmed down if we remove some unneeded dependencies, and the code 
> can be polished).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349699#comment-15349699
 ] 

Josep Rubió commented on FLINK-1707:


Hi Vasia,

Maybe does not make sense to continue with this implementation. Even being a 
"graph" algorithm it does not seem to fit good to distributed graph platforms. 
I know there are some implementations of the original AP and they should be 
working good (I guess you know them), maybe this is what you need for Gelly.

I also think performance should be tested but I don't have access to a real 
cluster. I've done some tests before for Hadoop with a cluster mounted in my 
laptop, but 4 nodes of 3gb of memory is the maximum I can reach. Not much 
useful :(

By the way, before doing anything I'll document an example with some iterations 
and ask some concrete doubts about the implementation.

Thanks Vasia!

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4041) Failure while asking ResourceManager for RegisterResource

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4041.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved with a5aa4e115814539518fcaa88cbbca47da9d7ede5

> Failure while asking ResourceManager for RegisterResource
> -
>
> Key: FLINK-4041
> URL: https://issues.apache.org/jira/browse/FLINK-4041
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> In this build 
> (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I 
> got the following YARN Test failure:
> {code}
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down 
> remote daemon.
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon 
> shut down; proceeding with flushing remote transports.
> 2016-06-09 10:21:42,355 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut 
> down.
> 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager
>   - Failure while asking ResourceManager for RegisterResource
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/$c#1255104255]] after [1 ms]
>   at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>   at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>   at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
>   at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2016-06-09 10:21:42,376 INFO  org.apache.flink.yarn.YarnJobManager
>   - Shutdown 

[jira] [Commented] (FLINK-1946) Make yarn tests logging less verbose

2016-06-25 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349695#comment-15349695
 ] 

Maximilian Michels commented on FLINK-1946:
---

Fix with 5b2ad7f03ef67ce529551e7b464d7db94e2a1d90. Also, the log level for 
Hadoop classes has been adjusted previously. I think we want to keep the log 
level at INFO for debugging purposes.

Anything else we can do?

> Make yarn tests logging less verbose
> 
>
> Key: FLINK-1946
> URL: https://issues.apache.org/jira/browse/FLINK-1946
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Till Rohrmann
>Priority: Minor
>
> Currently, the yarn tests log on the INFO level making the test outputs 
> confusing. Furthermore some status messages are written to stdout. I think 
> these messages are not necessary to be shown to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3838) CLI parameter parser is munging application params

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3838.
---
Resolution: Fixed

master: dfecca7794f56eb2a4438e3bdca6878379929cfa
release-1.0: f5a40855fbd90a375cb920358baaf176876a42f0

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3757.
---
   Resolution: Fixed
 Assignee: Maximilian Michels
Fix Version/s: 1.1.0

Fixed via 9c15406400af16861be8cf08e60d94074addbba0

> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349690#comment-15349690
 ] 

ASF GitHub Bot commented on FLINK-3757:
---

Github user mxm commented on the issue:

https://github.com/apache/flink/pull/2138
  
Thank you for the suggestion. I incorporated it!


> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Priority: Minor
> Fix For: 1.1.0
>
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2138: [FLINK-3757] clarify JavaDoc for addAccumulator method

2016-06-25 Thread mxm
Github user mxm commented on the issue:

https://github.com/apache/flink/pull/2138
  
Thank you for the suggestion. I incorporated it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3864) Yarn tests don't check for prohibited strings in log output

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3864.
-
Resolution: Fixed

Fixed with d92aeb7aa63490278d9a9e36aae595e6cbe39f32

> Yarn tests don't check for prohibited strings in log output
> ---
>
> Key: FLINK-3864
> URL: https://issues.apache.org/jira/browse/FLINK-3864
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN Client
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> {{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must 
> not appear in the log output. {{perJobYarnCluster}} and 
> {{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the 
> prohibited strings; probably an artifact of an older test code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2163: [FLINK-4119] Null checks in close() for Cassandra ...

2016-06-25 Thread alkagin
GitHub user alkagin opened a pull request:

https://github.com/apache/flink/pull/2163

[FLINK-4119] Null checks in close() for Cassandra Input/Output Formats, 
checking arguments via Flink Preconditions

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

…tions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alkagin/flink FLINK-4119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2163


commit b1752f82f61f074084a34e6b3a7a16940edf66f1
Author: Andrea Sella 
Date:   2016-06-25T16:03:20Z

[FLINK-4119] Add null checks, refactor check arg using Flink Preconditions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4119) Null checks in close() for Cassandra Input/Output Formats, checking arguments via Flink Preconditions

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349655#comment-15349655
 ] 

ASF GitHub Bot commented on FLINK-4119:
---

GitHub user alkagin opened a pull request:

https://github.com/apache/flink/pull/2163

[FLINK-4119] Null checks in close() for Cassandra Input/Output Formats, 
checking arguments via Flink Preconditions

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

…tions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alkagin/flink FLINK-4119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2163


commit b1752f82f61f074084a34e6b3a7a16940edf66f1
Author: Andrea Sella 
Date:   2016-06-25T16:03:20Z

[FLINK-4119] Add null checks, refactor check arg using Flink Preconditions




> Null checks in close() for Cassandra Input/Output Formats, checking arguments 
> via Flink Preconditions
> -
>
> Key: FLINK-4119
> URL: https://issues.apache.org/jira/browse/FLINK-4119
> Project: Flink
>  Issue Type: Improvement
>  Components: Batch Connectors and Input/Output Formats, Cassandra 
> Connector
>Reporter: Andrea Sella
>Assignee: Andrea Sella
>Priority: Minor
> Fix For: 1.1.0
>
>
> Add null checks for session and cluster to align the behaviour with Cassandra 
> Streaming Connector, refactor check arguments using Flink Preconditions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3034) Redis SInk Connector

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349640#comment-15349640
 ] 

ASF GitHub Bot commented on FLINK-3034:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/1813
  
Hi @mjsax,

Regarding 1): just came to me that for SORTED_SET, a fixed secondary key 
doesn't make sense, because the secondary key is supposed to define the order 
of the inserted element in the set. For HASH, I guess it depends on the use 
case. For example, a stream of user events with user id can update Redis using 
the user id as primary key, and inner hash field (secondary key) depending on 
what event occurred.

I agree with your suggestion to settle with a subset of commands to start 
with, and if we want to, make it more flexible for multiple command per data 
type with new JIRAs as it would probably need larger refactoring.


> Redis SInk Connector
> 
>
> Key: FLINK-3034
> URL: https://issues.apache.org/jira/browse/FLINK-3034
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Matthias J. Sax
>Assignee: Subhankar Biswas
>Priority: Minor
>
> Flink does not provide a sink connector for Redis.
> See FLINK-3033



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1813: [FLINK-3034] Redis Sink Connector

2016-06-25 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/1813
  
Hi @mjsax,

Regarding 1): just came to me that for SORTED_SET, a fixed secondary key 
doesn't make sense, because the secondary key is supposed to define the order 
of the inserted element in the set. For HASH, I guess it depends on the use 
case. For example, a stream of user events with user id can update Redis using 
the user id as primary key, and inner hash field (secondary key) depending on 
what event occurred.

I agree with your suggestion to settle with a subset of commands to start 
with, and if we want to, make it more flexible for multiple command per data 
type with new JIRAs as it would probably need larger refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-06-25 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349638#comment-15349638
 ] 

Faye Beligianni commented on FLINK-1815:


I think I found what was wrong! ignore the above question

> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3034) Redis SInk Connector

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349615#comment-15349615
 ] 

ASF GitHub Bot commented on FLINK-3034:
---

Github user mjsax commented on the issue:

https://github.com/apache/flink/pull/1813
  
@tzulitai Thank a lot for testing this! Your feedback is really great!

1) I am not a Redis users either -- from my understanding, the second key 
determined the field that would be updated -- it makes sense to me, that this 
field is the same for all updates (ie, independent of the data itself). Thus, 
it would not make sense to extract it from the record via `RedisMapper`.
2) Might be confusing for Redis users (I did not stumble over it as a 
no-user ;)) -- renaming seems reasonable -- maybe `RedisSinkConfig` ?
3+4) Well, the data type is the same, no matter if you want to read or 
write it. But I agree, that using the date type itself in `RedisSink` might not 
be sufficient. I think the idea to support `LPUSH` was, that Flink as a 
streaming system might do best to append at the tail, not at the head... 
5) Similar to my answer to (4) -- the data type is the same -- 
independently what operation you perform on it. And you state yourself, that 
`RedisMapper` maps the type to the command/action.
6) See my answer to (1)

As a non-Redis users, it is hard for me to judge what flexibility is 
required/expected by users (ie, secondary key per record or not?). It also 
seems, that not all available action (LPUSH vs RPUSH) are implemented. How 
important is it, to support all action -- do you think, that the current once 
cover the most basic subset to get started with. It might be good, to just 
start with a subset of commands, and add new commands later on (just a 
suggestion). And do the think the implemented actions for each type are the 
most useful? If we want to support multiple different actions per data type, it 
would be a larger refactoring of this code I think.

@subhankarb @rmetzger What is your opinion on this?


> Redis SInk Connector
> 
>
> Key: FLINK-3034
> URL: https://issues.apache.org/jira/browse/FLINK-3034
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Matthias J. Sax
>Assignee: Subhankar Biswas
>Priority: Minor
>
> Flink does not provide a sink connector for Redis.
> See FLINK-3033



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1813: [FLINK-3034] Redis Sink Connector

2016-06-25 Thread mjsax
Github user mjsax commented on the issue:

https://github.com/apache/flink/pull/1813
  
@tzulitai Thank a lot for testing this! Your feedback is really great!

1) I am not a Redis users either -- from my understanding, the second key 
determined the field that would be updated -- it makes sense to me, that this 
field is the same for all updates (ie, independent of the data itself). Thus, 
it would not make sense to extract it from the record via `RedisMapper`.
2) Might be confusing for Redis users (I did not stumble over it as a 
no-user ;)) -- renaming seems reasonable -- maybe `RedisSinkConfig` ?
3+4) Well, the data type is the same, no matter if you want to read or 
write it. But I agree, that using the date type itself in `RedisSink` might not 
be sufficient. I think the idea to support `LPUSH` was, that Flink as a 
streaming system might do best to append at the tail, not at the head... 
5) Similar to my answer to (4) -- the data type is the same -- 
independently what operation you perform on it. And you state yourself, that 
`RedisMapper` maps the type to the command/action.
6) See my answer to (1)

As a non-Redis users, it is hard for me to judge what flexibility is 
required/expected by users (ie, secondary key per record or not?). It also 
seems, that not all available action (LPUSH vs RPUSH) are implemented. How 
important is it, to support all action -- do you think, that the current once 
cover the most basic subset to get started with. It might be good, to just 
start with a subset of commands, and add new commands later on (just a 
suggestion). And do the think the implemented actions for each type are the 
most useful? If we want to support multiple different actions per data type, it 
would be a larger refactoring of this code I think.

@subhankarb @rmetzger What is your opinion on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-06-25 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349585#comment-15349585
 ] 

Faye Beligianni commented on FLINK-1815:


I faced the following problem when I try to test the scala 
{{fromAdjacencyListFile}} method. 
The  {{fromAdjacencyListFile}} scala method is actually a wrapper of the 
corresponding java code. 

In the scala test method, I create a temporary file which represents the 
Adjacency List formatted text file,like this:

{code:scala}
val adjFileContent = "0-Node0\t1-0.1,4-0.2,5-0.3\n" + "1-Node1\t2-0.3\n" +
  "2-Node2 3-0.3\n" + "3-Node3 \n" + "4-Node4\n" + "5-Node5 6-0.7\n" + 
"6-Node6 3-0.2\n" +
  "7-Node7 4-0.1\n" + "8-Node8 0-0.2\n"

val adjFileSplit = createTempFile(adjFileContent)
{code}

Just for clarification, in the "0-Node0\t1-0.1,4-0.2,5-0.3\n" line of a 
Adjacency list formatted text file:
srcvertex-id : 0
srcVertex-value : Node0
neighbours : 1,4,5 , with 0.1,0.2,0.3 being the corresponding edge values for 
each edge between the srcVertex and its neighbours.

When I read the file and split each line of the file, in order to create the 
graph, I noticed that I can split the srcVertex from its neighbours, when there 
is a "\t" character, but not when I have just a whitespace character (" ").  I 
didn't faced this problem in the java tests.

I use the split method in order to split the lines in the specified delimiters.
{code:java}
line.split("\\s+")
{code}  

Any suggestions on how to handle this?

> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-06-25 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349585#comment-15349585
 ] 

Faye Beligianni edited comment on FLINK-1815 at 6/25/16 10:15 AM:
--

I faced the following problem when I try to test the scala 
{{fromAdjacencyListFile}} method. 
The  {{fromAdjacencyListFile}} scala method is actually a wrapper of the 
corresponding java code. 

In the scala test method, I create a temporary file which represents the 
Adjacency List formatted text file,like this:

{code:java}
val adjFileContent = "0-Node0\t1-0.1,4-0.2,5-0.3\n" + "1-Node1\t2-0.3\n" +
  "2-Node2 3-0.3\n" + "3-Node3 \n" + "4-Node4\n" + "5-Node5 6-0.7\n" + 
"6-Node6 3-0.2\n" +
  "7-Node7 4-0.1\n" + "8-Node8 0-0.2\n"

val adjFileSplit = createTempFile(adjFileContent)
{code}

Just for clarification, in the "0-Node0\t1-0.1,4-0.2,5-0.3\n" line of a 
Adjacency list formatted text file:
srcvertex-id : 0
srcVertex-value : Node0
neighbours : 1,4,5 , with 0.1,0.2,0.3 being the corresponding edge values for 
each edge between the srcVertex and its neighbours.

When I read the file and split each line of the file, in order to create the 
graph, I noticed that I can split the srcVertex from its neighbours, when there 
is a "\t" character, but not when I have just a whitespace character (" ").  I 
didn't faced this problem in the java tests.

I use the split method in order to split the lines in the specified delimiters.
{code:java}
line.split("\\s+")
{code}  

Any suggestions on how to handle this?


was (Author: fobeligi):
I faced the following problem when I try to test the scala 
{{fromAdjacencyListFile}} method. 
The  {{fromAdjacencyListFile}} scala method is actually a wrapper of the 
corresponding java code. 

In the scala test method, I create a temporary file which represents the 
Adjacency List formatted text file,like this:

{code:scala}
val adjFileContent = "0-Node0\t1-0.1,4-0.2,5-0.3\n" + "1-Node1\t2-0.3\n" +
  "2-Node2 3-0.3\n" + "3-Node3 \n" + "4-Node4\n" + "5-Node5 6-0.7\n" + 
"6-Node6 3-0.2\n" +
  "7-Node7 4-0.1\n" + "8-Node8 0-0.2\n"

val adjFileSplit = createTempFile(adjFileContent)
{code}

Just for clarification, in the "0-Node0\t1-0.1,4-0.2,5-0.3\n" line of a 
Adjacency list formatted text file:
srcvertex-id : 0
srcVertex-value : Node0
neighbours : 1,4,5 , with 0.1,0.2,0.3 being the corresponding edge values for 
each edge between the srcVertex and its neighbours.

When I read the file and split each line of the file, in order to create the 
graph, I noticed that I can split the srcVertex from its neighbours, when there 
is a "\t" character, but not when I have just a whitespace character (" ").  I 
didn't faced this problem in the java tests.

I use the split method in order to split the lines in the specified delimiters.
{code:java}
line.split("\\s+")
{code}  

Any suggestions on how to handle this?

> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1813: [FLINK-3034] Redis Sink Connector

2016-06-25 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/1813
  
Hi,
I've tested this on a local single node installation & cluster 
installation. It works and have not come across problems.

Have a few other questions on the usage of the connector (I haven't went 
through the code thoroughly enough and am not an expert Redis user, so I 
apologize if I may be missing things in asking them). Most of them are my own 
opinion from a user's point of view:

1) as a user, I think the current way of giving secondary key for `HASH` 
and `SORTED_SET` is a bit confusing. Is the current design a common usage 
pattern across Redis client libraries? If not, IMO, perhaps a 
`getSecondaryKeyFromData` in the `RedisMapper` interface will be more explicit.
2) I wonder what's the reason for having our own `JedisPoolConfig`, when 
Jedis has its own `JedisPoolConfig`? If there's a need for it for 
Flink-specific `RedisSink` settings, perhaps the name conflict is a bit 
misleading?
3) Would it be better to have a `RedisDataType.PUB` instead of 
`RedisDataType.PUBSUB` to better acknowledge the `publish` method on Jedis 
clients? We can have `RedisDataType.SUB` when we add a Redis source to the 
connector.
4) Can `RedisDataType.LIST` be changed to `RedisDataType.LPUSH` and 
`RedisDataType.RPUSH`? If it isn't too hard to add, the connector would be much 
more useful with this.
5) In general, I have the feeling that `RedisDataType` should be named as 
`RedisCommandType`, as the `RedisMapper` is actually mapping the incoming data 
to a Redis command / action.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2131: [FLINK-3231][streaming-connectors] FlinkKinesisConsumer r...

2016-06-25 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2131
  
I've also tested the last commit on the unit & manual tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3231) Handle Kinesis-side resharding in Kinesis streaming consumer

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349496#comment-15349496
 ] 

ASF GitHub Bot commented on FLINK-3231:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2131
  
I've also tested the last commit on the unit & manual tests.


> Handle Kinesis-side resharding in Kinesis streaming consumer
> 
>
> Key: FLINK-3231
> URL: https://issues.apache.org/jira/browse/FLINK-3231
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> A big difference between Kinesis shards and Kafka partitions is that Kinesis 
> users can choose to "merge" and "split" shards at any time for adjustable 
> stream throughput capacity. This article explains this quite clearly: 
> https://brandur.org/kinesis-by-example.
> This will break the static shard-to-task mapping implemented in the basic 
> version of the Kinesis consumer 
> (https://issues.apache.org/jira/browse/FLINK-3229). The static shard-to-task 
> mapping is done in a simple round-robin-like distribution which can be 
> locally determined at each Flink consumer task (Flink Kafka consumer does 
> this too).
> To handle Kinesis resharding, we will need some way to let the Flink consumer 
> tasks coordinate which shards they are currently handling, and allow the 
> tasks to ask the coordinator for a shards reassignment when the task finds 
> out it has found a closed shard at runtime (shards will be closed by Kinesis 
> when it is merged and split).
> We need a centralized coordinator state store which is visible to all Flink 
> consumer tasks. Tasks can use this state store to locally determine what 
> shards it can be reassigned. Amazon KCL uses a DynamoDB table for the 
> coordination, but as described in 
> https://issues.apache.org/jira/browse/FLINK-3211, we unfortunately can't use 
> KCL for the implementation of the consumer if we want to leverage Flink's 
> checkpointing mechanics. For our own implementation, Zookeeper can be used 
> for this state store, but that means it would require the user to set up ZK 
> to work.
> Since this feature introduces extensive work, it is opened as a separate 
> sub-task from the basic implementation 
> https://issues.apache.org/jira/browse/FLINK-3229.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2131: [FLINK-3231][streaming-connectors] FlinkKinesisConsumer r...

2016-06-25 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2131
  
Hi @rmetzger ,
I've addressed your initial comments with the last commit. No urgent hurry 
on the remaining review, please take your time :) Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3231) Handle Kinesis-side resharding in Kinesis streaming consumer

2016-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349494#comment-15349494
 ] 

ASF GitHub Bot commented on FLINK-3231:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2131
  
Hi @rmetzger ,
I've addressed your initial comments with the last commit. No urgent hurry 
on the remaining review, please take your time :) Thanks!


> Handle Kinesis-side resharding in Kinesis streaming consumer
> 
>
> Key: FLINK-3231
> URL: https://issues.apache.org/jira/browse/FLINK-3231
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> A big difference between Kinesis shards and Kafka partitions is that Kinesis 
> users can choose to "merge" and "split" shards at any time for adjustable 
> stream throughput capacity. This article explains this quite clearly: 
> https://brandur.org/kinesis-by-example.
> This will break the static shard-to-task mapping implemented in the basic 
> version of the Kinesis consumer 
> (https://issues.apache.org/jira/browse/FLINK-3229). The static shard-to-task 
> mapping is done in a simple round-robin-like distribution which can be 
> locally determined at each Flink consumer task (Flink Kafka consumer does 
> this too).
> To handle Kinesis resharding, we will need some way to let the Flink consumer 
> tasks coordinate which shards they are currently handling, and allow the 
> tasks to ask the coordinator for a shards reassignment when the task finds 
> out it has found a closed shard at runtime (shards will be closed by Kinesis 
> when it is merged and split).
> We need a centralized coordinator state store which is visible to all Flink 
> consumer tasks. Tasks can use this state store to locally determine what 
> shards it can be reassigned. Amazon KCL uses a DynamoDB table for the 
> coordination, but as described in 
> https://issues.apache.org/jira/browse/FLINK-3211, we unfortunately can't use 
> KCL for the implementation of the consumer if we want to leverage Flink's 
> checkpointing mechanics. For our own implementation, Zookeeper can be used 
> for this state store, but that means it would require the user to set up ZK 
> to work.
> Since this feature introduces extensive work, it is opened as a separate 
> sub-task from the basic implementation 
> https://issues.apache.org/jira/browse/FLINK-3229.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349470#comment-15349470
 ] 

Ismaël Mejía commented on FLINK-4118:
-

I am working on this and I have a PR ready for review can somebody please 
assign this to me, Thanks.

> The docker-flink image is outdated (1.0.2) and can be slimmed down
> --
>
> Key: FLINK-4118
> URL: https://issues.apache.org/jira/browse/FLINK-4118
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ismaël Mejía
>Priority: Minor
>
> This issue is to upgrade the docker image and polish some details in it (e.g. 
> it can be slimmed down if we remove some unneeded dependencies, and the code 
> can be polished).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-25 Thread JIRA
Ismaël Mejía created FLINK-4118:
---

 Summary: The docker-flink image is outdated (1.0.2) and can be 
slimmed down
 Key: FLINK-4118
 URL: https://issues.apache.org/jira/browse/FLINK-4118
 Project: Flink
  Issue Type: Improvement
Reporter: Ismaël Mejía
Priority: Minor


This issue is to upgrade the docker image and polish some details in it (e.g. 
it can be slimmed down if we remove some unneeded dependencies, and the code 
can be polished).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)