[jira] [Created] (FLINK-6351) Refactoring YarnFlinkApplicationMasterRunner by combining AbstractYarnFlinkApplicationMasterRunner in one class

2017-04-20 Thread mingleizhang (JIRA)
mingleizhang created FLINK-6351:
---

 Summary: Refactoring YarnFlinkApplicationMasterRunner by combining 
AbstractYarnFlinkApplicationMasterRunner in one class
 Key: FLINK-6351
 URL: https://issues.apache.org/jira/browse/FLINK-6351
 Project: Flink
  Issue Type: Improvement
  Components: YARN
Reporter: mingleizhang
Assignee: mingleizhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6346) Migrate from Java serialization for GenericWriteAheadSink's state

2017-04-20 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-6346:
--

 Summary: Migrate from Java serialization for 
GenericWriteAheadSink's state
 Key: FLINK-6346
 URL: https://issues.apache.org/jira/browse/FLINK-6346
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai


See umbrella JIRA FLINK-6343 for details. This subtask tracks the migration for 
{{GenericWriteAheadSink}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6345) Migrate from Java serialization for ContinuousFileReaderOperator's state

2017-04-20 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-6345:
--

 Summary: Migrate from Java serialization for 
ContinuousFileReaderOperator's state
 Key: FLINK-6345
 URL: https://issues.apache.org/jira/browse/FLINK-6345
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai


See umbrella JIRA FLINK-6343 for details. This subtask tracks the migration for 
{{ContinuousFileReaderOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6344) Migrate from Java serialization for `BucketingSink`'s state

2017-04-20 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-6344:
--

 Summary: Migrate from Java serialization for `BucketingSink`'s 
state
 Key: FLINK-6344
 URL: https://issues.apache.org/jira/browse/FLINK-6344
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai


See umbrella JIRA FLINK-6343 for details. This subtask tracks the migration for 
`BucketingSink`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Idempotent Job Submission

2017-04-20 Thread James Bucher
Hey all,

I have been doing some digging to see if there is a good way to do an 
idempotent job submission. I was hoping to write a job submission agent that 
does the following:

  1.  Checks to see if the cluster is running yet (can contact a JobManager)
  2.  Checks to see if the job it is watching is running.
  3.  Submits the job if it is not yet running.
  4.  Retry if there are any issues.

Specifically at the moment there doesn’t seem to be any functionality for 
submitting a job if it doesn’t exist. The current interface creates a situation 
where a race condition is possible (as far as I can tell):

For example if the following sequence of events occurs:

  1.  JobManager fails and a new Leader is re-elected:
 *   JobManager Asynchronously starts restoring jobs: 
here
  2.  Client Calls to list currently running jobs (before jobs are restored) 
and gets back an incomplete list of running 
jobs
 because SubmitJob registers jobs in 
currentJobs
  3.  Client Assumes Job is no longer running so uses HTTP/CLI/Whatever to 
restore job.
  4.  Current interfaces don’t pass in the same JobID (a new one is generated 
for each submit) so a new Job is submitted with a new JobID
  5.  JobManager restores previous instance of the running Job
  6.  Now there are 2 instances of the job running in the cluster.

While the above state is pretty unlikely to hit when one is submitting jobs 
manually, it seems to me that an agent like the above might end up hitting it 
if the cluster was having trouble with JobManagers failing.

I can see that FLIP-6 is 
rewriting the whole JobManager itself. From my reading of the current code base 
this work is 1/2 way done in master.

From my reading of the code/docs it seems that from the submission side the 
expectation for Docker/Kubernetes is that you will create two sets of 
containers:

  1.  A JobMaster/ResourceManager container that contains the user’s job in 
some form (jar or as a serialized JobGraph).
  2.  A TaskManager container which is either generic or potentially has user 
libs (up to the implementer/cluster maintainer)

As I currently understand the code the JobMaster instances will:

  1.  Start up a JobMasterRunner which connects to the Leader service and 
creates a JobMaster with the supplied JobGraph (which I assume will always have 
the same JobID for restore purposes).
  2.  When the node is granted leadership the JobMasterRunner starts the 
JobMaster which will schedule the ExecutionGraph which it created from the 
supplied JobGraph.

This all seems fine for a new job submission but the since the restore logic is 
not yet implemented I am wondering what the way that people will interact with 
clusters for job submission. From this 
doc 
it appears that the current JobManager infrastructure will instead become a 
“FlinkDispatcher". Is the intent to have the savepoint launch restore logic in 
the FlinkDispatcher and have it control the Job upgrade lifecycle?

We are currently looking at running Flink on Kubernetes. FLIP-6 looks to 
organize that much better than the way things currently work. Specifically for 
us we are looking to implement a clean way to have clients have a clear 
deployment/upgrade path for Flink jobs that can be integrated into automated 
build pipelines and such. Is the intention on the new system to have another 
orchestration layer for upgrading jobs or will the JobMaster itself handle 
those situations?

To me the JobMaster seems like the correct place to do the job upgrade 
coordination because its the single point of control for a single job. Then, 
for example on kubernetes, one would just have to re-launch the JobMaster 
containers and it would take care of the rest in the JobMaster logic to 
consolidate the upgraded JobGraph. On the other hand this might not fit cleanly 
into the separation of concerns that currently exists within the JobManager.

I am also wondering what work needs doing on FLIP-6. Overall it closely aligns 
with what we are trying to do on our end to make Flink easier to use so I might 
get some time to help out with this effort.

Thanks,
James bucher


Re: [VOTE] Release Apache Flink 1.2.1 (RC2)

2017-04-20 Thread Henry Saputra
LICENSE file exists
NOTICE file looks good
Signature files look good
Hash files look good
No 3rd party exes in source artifact
Source compiled and pass tests
Local run work
Run simple job on YARN

+1

- Henry

On Wed, Apr 12, 2017 at 4:06 PM, Robert Metzger  wrote:

> Dear Flink community,
>
> Please vote on releasing the following candidate as Apache Flink version
> 1.2
> .1.
>
> The commit to be voted on:
> 76eba4e0 
> (*http://git-wip-us.apache.org/repos/asf/flink/commit/76eba4e0
> *)
>
> Branch:
> release-1.2.1-rc2
>
> The release artifacts to be voted on can be found at:
> http://people.apache.org/~rmetzger/flink-1.2.1-rc2/
>
>
> The release artifacts are signed with the key with fingerprint D9839159:
> http://www.apache.org/dist/flink/KEYS
>
> The staging repository for this release can be found at:
> *https://repository.apache.org/content/repositories/orgapacheflink-1117
> *
>
> -
>
>
> The vote ends on Tuesday, 1pm CET.
>
> [ ] +1 Release this package as Apache Flink 1.2.1
> [ ] -1 Do not release this package, because ...
>


Why TimerService interface in ProcessFunction doesn't have deleteEventTimeTimer

2017-04-20 Thread Jagadish Bihani
Hi

I am working on a use case where I want to start a timer for a given event
type and when that timer expires it will perform certain action. This can
be done using Process Function.

But I also want to cancel scheduled timer in case of some other types of
events. I also checked the implementation of HeapInternalTimerService which
implements InternalTimerService interface has those implementations
already. Also SimpleTimerService which overrides TimerService also uses
InternalTimerService and simply passes VoidNamespace.INSTANCE.

So in a way we are using InternalTimerService interface's implementations
everywhere.

So what is the reason that ProcessFunction.Context uses TimerService? Any
reason 'deleteEventTimeTimer' is not exposed to users? If I want to use the
deleteEvent functionality how should I go about it?





-- 
Thanks and Regards,
Jagadish Bihani


[jira] [Created] (FLINK-6341) JobManager can go to definite message sending loop when TaskManager registered

2017-04-20 Thread Tao Wang (JIRA)
Tao Wang created FLINK-6341:
---

 Summary: JobManager can go to definite message sending loop when 
TaskManager registered
 Key: FLINK-6341
 URL: https://issues.apache.org/jira/browse/FLINK-6341
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Reporter: Tao Wang
Assignee: Tao Wang


When TaskManager register to JobManager, JM will send a "NotifyResourceStarted" 
message to kick off Resource Manager, then trigger a reconnection to resource 
manager through sending a "TriggerRegistrationAtJobManager".

When the ref of resource manager in JobManager is not None and the reconnection 
is to same resource manager, JobManager will go to a infinite message sending 
loop which will always sending himself a "ReconnectResourceManager" every 2 
seconds.

We have already observed that phonomenon. More details, check how JobManager 
handles `ReconnectResourceManager`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6340) Introduce a TerminationFuture for Execution

2017-04-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-6340:
---

 Summary: Introduce a TerminationFuture for Execution
 Key: FLINK-6340
 URL: https://issues.apache.org/jira/browse/FLINK-6340
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.3.0


In order to have more flexibility in tracking how tasks reach a final state, 
each Execution should have a future that is completed when it reaches a final 
state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6339) Remove useless and unused class ConnectorSource

2017-04-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-6339:
---

 Summary: Remove useless and unused class ConnectorSource
 Key: FLINK-6339
 URL: https://issues.apache.org/jira/browse/FLINK-6339
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6338) SimpleStringUtils should be called StringValueUtils

2017-04-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-6338:
---

 Summary: SimpleStringUtils should be called StringValueUtils
 Key: FLINK-6338
 URL: https://issues.apache.org/jira/browse/FLINK-6338
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Trivial
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6337) Remove the buffer provider from PartitionRequestServerHandler

2017-04-20 Thread zhijiang (JIRA)
zhijiang created FLINK-6337:
---

 Summary: Remove the buffer provider from 
PartitionRequestServerHandler
 Key: FLINK-6337
 URL: https://issues.apache.org/jira/browse/FLINK-6337
 Project: Flink
  Issue Type: Improvement
  Components: Network
Reporter: zhijiang
Assignee: zhijiang
Priority: Minor


Currently, {{PartitionRequestServerHandler}} will create a {{LocalBufferPool}} 
when the channel is registered. The {{LocalBufferPool}} is only used to get 
segment size for creating read view in {{SpillableSubpartition}}, and the 
buffers in the pool will not be used all the time, so it will waste the buffer 
resource of global pool.

We would like to remove the {{LocalBufferPool}} from the 
{{PartitionRequestServerHandler}}, and the {{LocalBufferPool}} in 
{{ResultPartition}} can also provide the segment size for creating sub 
partition view.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.2.1 (RC2)

2017-04-20 Thread Ted Yu
I downloaded and expanded flink-1.2.1-src.tgz one more time.

The failed tests are not in source tar ball.

Looks like I expanded the tar ball into the same directory for previous RC
last week.

Cheers

On Thu, Apr 20, 2017 at 1:32 AM, Aljoscha Krettek 
wrote:

> @Ted What is the hash of the commit where you saw the failing test? I
> think it might have been some intermediate commit because these tests are
> not in the code anymore on the release branch.
>
> > On 19. Apr 2017, at 18:35, Henry Saputra 
> wrote:
> >
> > This should be the one: https://github.com/aljoscha/FliRTT
> >
> > On Wed, Apr 19, 2017 at 7:48 AM, Ted Yu  wrote:
> >
> >> Till:
> >> A bit curious: where can I find the Flirrt tool ?
> >>
> >> Thanks
> >>
> >> On Wed, Apr 19, 2017 at 5:24 AM, Till Rohrmann 
> >> wrote:
> >>
> >>> +1 (binding) for the release
> >>>
> >>> - checked checksums and signature
> >>> - no dependencies added or removed
> >>> - build Flink with Hadoop 2.7.1 and Scala 2.11 from sources and ran all
> >>> tests
> >>> - Ran Flirrt locally, on standalone cluster and Yarn with Hadoop 2.7.1
> >> and
> >>> Scala 2.10
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>>
> >>> On Thu, Apr 13, 2017 at 11:27 AM, Gyula Fóra 
> >> wrote:
> >>>
>  Hi,
> 
>  Unfortunately I cannot test run the rc as I am on vacation. But we
> have
>  been running pretty much the same build (+1-2 commits) in production
> >> for
>  some time now.
> 
>  +1 from me
> 
>  Gyula
> 
>  On Thu, Apr 13, 2017, 08:27 Andrew Psaltis 
>  wrote:
> 
> > +1 -- checked out all code, built with all tests, ran local cluster,
> > deployed example streaming jobs
> >
> > On Thu, Apr 13, 2017 at 2:26 AM, Andrew Psaltis <
>  psaltis.and...@gmail.com>
> > wrote:
> >
> >> Ted -- I did not see those errors. My environment is:
> >> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> >> 2015-11-10T11:41:47-05:00)
> >> Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> >> Java version: 1.8.0_121, vendor: Oracle Corporation
> >> Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_
> >> 121.jdk/Contents/Home/jre
> >> Default locale: en_US, platform encoding: UTF-8
> >> OS name: "mac os x", version: "10.12.3", arch: "x86_64", family:
> >>> "mac"
> >>
> >>
> >>
> >>
> >> On Thu, Apr 13, 2017 at 12:36 AM, Ted Yu 
> >>> wrote:
> >>
> >>> I ran test suite where the following failed:
> >>>
> >>> Failed tests:
> >>>  StreamExecutionEnvironmentTest.testDefaultParallelismIsDefaul
> >>> t:143
> >>> expected:<-1> but was:<24>
> >>>
> >>> StreamExecutionEnvironmentTest.testMaxParallelismMustBeBigge
> >>> rEqualParallelism
> >>> Expected test to throw an instance of java.lang.
>  IllegalArgumentException
> >>>
> >>> StreamExecutionEnvironmentTest.testParallelismMustBeSmallerE
> >>> qualMaxParallelism
> >>> Expected test to throw an instance of java.lang.
>  IllegalArgumentException
> >>>
> >>> This is what I used:
> >>>
> >>> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> >>> 2015-11-10T16:41:47+00:00)
> >>> Java version: 1.8.0_101, vendor: Oracle Corporation
> >>>
> >>> Have anyone else seen the above failures ?
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Apr 12, 2017 at 4:06 PM, Robert Metzger <
> >>> rmetz...@apache.org>
> >>> wrote:
> >>>
>  Dear Flink community,
> 
>  Please vote on releasing the following candidate as Apache Flink
> > version
>  1.2
>  .1.
> 
>  The commit to be voted on:
>  76eba4e0 <
> > http://git-wip-us.apache.org/repos/asf/flink/commit/76eba4e0>
>  (*http://git-wip-us.apache.org/repos/asf/flink/commit/76eba4e0
>   >>> *)
> 
>  Branch:
>  release-1.2.1-rc2
> 
>  The release artifacts to be voted on can be found at:
>  http://people.apache.org/~rmetzger/flink-1.2.1-rc2/
> 
> 
>  The release artifacts are signed with the key with fingerprint
> > D9839159:
>  http://www.apache.org/dist/flink/KEYS
> 
>  The staging repository for this release can be found at:
>  *
> > https://repository.apache.org/content/repositories/
> >> orgapacheflink-1117
>  <
> > https://repository.apache.org/content/repositories/
> >> orgapacheflink-1117
>  *
> 
>  -
> 
> 
>  The vote ends on Tuesday, 1pm CET.
> 
>  [ ] +1 Release this package as Apache Flink 1.2.1
>  [ ] -1 

[jira] [Created] (FLINK-6336) Placement Constraints for Mesos

2017-04-20 Thread Stephen Gran (JIRA)
Stephen Gran created FLINK-6336:
---

 Summary: Placement Constraints for Mesos
 Key: FLINK-6336
 URL: https://issues.apache.org/jira/browse/FLINK-6336
 Project: Flink
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 1.2.0
Reporter: Stephen Gran
Priority: Minor


Fenzo supports placement constraints for tasks, and operators expose agent 
attributes to frameworks in the form of attributes about the agent offer.

It would be extremely helpful in our multi-tenant cluster to be able to make 
use of this facility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Flink 1.2.1 (RC2)

2017-04-20 Thread Aljoscha Krettek
@Ted What is the hash of the commit where you saw the failing test? I think it 
might have been some intermediate commit because these tests are not in the 
code anymore on the release branch.

> On 19. Apr 2017, at 18:35, Henry Saputra  wrote:
> 
> This should be the one: https://github.com/aljoscha/FliRTT
> 
> On Wed, Apr 19, 2017 at 7:48 AM, Ted Yu  wrote:
> 
>> Till:
>> A bit curious: where can I find the Flirrt tool ?
>> 
>> Thanks
>> 
>> On Wed, Apr 19, 2017 at 5:24 AM, Till Rohrmann 
>> wrote:
>> 
>>> +1 (binding) for the release
>>> 
>>> - checked checksums and signature
>>> - no dependencies added or removed
>>> - build Flink with Hadoop 2.7.1 and Scala 2.11 from sources and ran all
>>> tests
>>> - Ran Flirrt locally, on standalone cluster and Yarn with Hadoop 2.7.1
>> and
>>> Scala 2.10
>>> 
>>> Cheers,
>>> Till
>>> 
>>> 
>>> On Thu, Apr 13, 2017 at 11:27 AM, Gyula Fóra 
>> wrote:
>>> 
 Hi,
 
 Unfortunately I cannot test run the rc as I am on vacation. But we have
 been running pretty much the same build (+1-2 commits) in production
>> for
 some time now.
 
 +1 from me
 
 Gyula
 
 On Thu, Apr 13, 2017, 08:27 Andrew Psaltis 
 wrote:
 
> +1 -- checked out all code, built with all tests, ran local cluster,
> deployed example streaming jobs
> 
> On Thu, Apr 13, 2017 at 2:26 AM, Andrew Psaltis <
 psaltis.and...@gmail.com>
> wrote:
> 
>> Ted -- I did not see those errors. My environment is:
>> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
>> 2015-11-10T11:41:47-05:00)
>> Maven home: /usr/local/Cellar/maven/3.3.9/libexec
>> Java version: 1.8.0_121, vendor: Oracle Corporation
>> Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_
>> 121.jdk/Contents/Home/jre
>> Default locale: en_US, platform encoding: UTF-8
>> OS name: "mac os x", version: "10.12.3", arch: "x86_64", family:
>>> "mac"
>> 
>> 
>> 
>> 
>> On Thu, Apr 13, 2017 at 12:36 AM, Ted Yu 
>>> wrote:
>> 
>>> I ran test suite where the following failed:
>>> 
>>> Failed tests:
>>>  StreamExecutionEnvironmentTest.testDefaultParallelismIsDefaul
>>> t:143
>>> expected:<-1> but was:<24>
>>> 
>>> StreamExecutionEnvironmentTest.testMaxParallelismMustBeBigge
>>> rEqualParallelism
>>> Expected test to throw an instance of java.lang.
 IllegalArgumentException
>>> 
>>> StreamExecutionEnvironmentTest.testParallelismMustBeSmallerE
>>> qualMaxParallelism
>>> Expected test to throw an instance of java.lang.
 IllegalArgumentException
>>> 
>>> This is what I used:
>>> 
>>> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
>>> 2015-11-10T16:41:47+00:00)
>>> Java version: 1.8.0_101, vendor: Oracle Corporation
>>> 
>>> Have anyone else seen the above failures ?
>>> 
>>> Cheers
>>> 
>>> On Wed, Apr 12, 2017 at 4:06 PM, Robert Metzger <
>>> rmetz...@apache.org>
>>> wrote:
>>> 
 Dear Flink community,
 
 Please vote on releasing the following candidate as Apache Flink
> version
 1.2
 .1.
 
 The commit to be voted on:
 76eba4e0 <
> http://git-wip-us.apache.org/repos/asf/flink/commit/76eba4e0>
 (*http://git-wip-us.apache.org/repos/asf/flink/commit/76eba4e0
 >> *)
 
 Branch:
 release-1.2.1-rc2
 
 The release artifacts to be voted on can be found at:
 http://people.apache.org/~rmetzger/flink-1.2.1-rc2/
 
 
 The release artifacts are signed with the key with fingerprint
> D9839159:
 http://www.apache.org/dist/flink/KEYS
 
 The staging repository for this release can be found at:
 *
> https://repository.apache.org/content/repositories/
>> orgapacheflink-1117
 <
> https://repository.apache.org/content/repositories/
>> orgapacheflink-1117
 *
 
 -
 
 
 The vote ends on Tuesday, 1pm CET.
 
 [ ] +1 Release this package as Apache Flink 1.2.1
 [ ] -1 Do not release this package, because ...
 
>>> 
>> 
>> 
>> 
>> --
>> Thanks,
>> Andrew
>> 
>> Subscribe to my book: Streaming Data 
>> 
>> twiiter: @itmdata >> user?screen_name=itmdata>
>> 
> 
> 
> 
> --
> Thanks,
> Andrew
> 
> Subscribe to my book: Streaming Data 
> 

[jira] [Created] (FLINK-6335) Support DISTINCT over grouped window in stream SQL

2017-04-20 Thread Haohui Mai (JIRA)
Haohui Mai created FLINK-6335:
-

 Summary: Support DISTINCT over grouped window in stream SQL
 Key: FLINK-6335
 URL: https://issues.apache.org/jira/browse/FLINK-6335
 Project: Flink
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


The SQL on the batch side supports the {{DISTINCT}} keyword over aggregation. 
This jira proposes to support the {{DISTINCT}} keyword on streaming aggregation 
using the same technique on the batch side.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)