[jira] [Created] (FLINK-4448) Use Listeners to monitor execution status

2016-08-22 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-4448:
---

 Summary: Use Listeners to monitor execution status
 Key: FLINK-4448
 URL: https://issues.apache.org/jira/browse/FLINK-4448
 Project: Flink
  Issue Type: Sub-task
  Components: Cluster Management
Reporter: Xiaogang Shi
Assignee: Xiaogang Shi


Currently, JobMaster monitors the ExecutionGraph's job status and execution 
state through Akka. Since the dependencies on Akka should be removed in the 
refactoring, JobMaster will utilize JobStatusListener and 
ExecutionStateListener to receive the notifications from ExecutionGraph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4447) Include NettyConfig options on Configurations page

2016-08-22 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4447:
-

 Summary: Include NettyConfig options on Configurations page
 Key: FLINK-4447
 URL: https://issues.apache.org/jira/browse/FLINK-4447
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.2.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial
 Fix For: 1.2.0


{{NettyConfig}} looks for the following configuration options which are not 
listed in the Flink documentation.

{noformat}
public static final String NUM_ARENAS = "taskmanager.net.num-arenas";

public static final String NUM_THREADS_SERVER = 
"taskmanager.net.server.numThreads";

public static final String NUM_THREADS_CLIENT = 
"taskmanager.net.client.numThreads";

public static final String CONNECT_BACKLOG = 
"taskmanager.net.server.backlog";

public static final String CLIENT_CONNECT_TIMEOUT_SECONDS = 
"taskmanager.net.client.connectTimeoutSec";

public static final String SEND_RECEIVE_BUFFER_SIZE = 
"taskmanager.net.sendReceiveBufferSize";

public static final String TRANSPORT_TYPE = "taskmanager.net.transport";
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Python API for Fllink libraries

2016-08-22 Thread Chesnay Schepler

Hello Greg,

While I generally agree with you, there have been cases reported where 
the Python API was actually faster due to the usage of C libraries.


Regards,
Chesnay

On 22.08.2016 16:21, Greg Hogan wrote:

Hi Ivan,

My expectation would be that programs written for the Python API would be
much slower than when implementing with Java or Scala. A performance
comparison would be quite interesting. Gelly has both iterative and
non-iterative algorithms.

Greg

On Sat, Aug 20, 2016 at 7:11 PM, Ivan Mushketyk 
wrote:


Hi Chesnay,

Thank you for you repply.
Out of curiosity, do you know why Python API reception was  *tumbleweed*?

Regarding the Python API, do you know what specifically should be done
there? I have some Python background so I was considering to contribute,
but I didn't find much tasks in the "Python" component:
https://issues.apache.org/jira/browse/FLINK-1926?jql=
project%20%3D%20FLINK%20AND%20component%20%3D%20%22Python%
20API%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%
20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

Best regards,
Ivan.


On Fri, 19 Aug 2016 at 22:45 Chesnay Schepler  wrote:


Hello,

I would say no, as the general reception of the Python API was
*tumbleweed* so far.

In my opinion this would just lead to a massive increase in code to
maintain; we would need at least 2-3 active long-term python

contributors.

Especially so since ML, CEP and Table are afaik still in heavy

development.

If anything, before thinking about porting the libraries to python it
would make more sense to implement a python streaming API.
Or maybe /finish/ porting the DataSet API...

Regards,
Chesnay

On 19.08.2016 22:07, Ivan Mushketyk wrote:

Hi Flink developers,

It seems to me that Flink has two important "selling points":

1. It has Java, Scala and Python APIs
2. I has a number of useful libraries (ML, Gelly, CEP, and Table)

But as far as I understand, currently users cannot use any of these
libraries using a Python API. It seems to be a gap worth filling.

What do you think about it? Does it make sense to add

CEP/Gelly/ML/Table

Python APIs?

Best regards,
Ivan.







[jira] [Created] (FLINK-4446) Move Flume Sink from Flink to Bahir

2016-08-22 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-4446:
-

 Summary: Move Flume Sink from Flink to Bahir
 Key: FLINK-4446
 URL: https://issues.apache.org/jira/browse/FLINK-4446
 Project: Flink
  Issue Type: Task
  Components: Streaming Connectors
Reporter: Robert Metzger
Assignee: Robert Metzger


As per [1] the Flink community decided to move the Flume connector from Flink 
to Bahir.
[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Move-Redis-and-Flume-connectors-to-Apache-Bahir-and-redirect-contributions-there-td13102.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Python API for Fllink libraries

2016-08-22 Thread Greg Hogan
Hi Ivan,

My expectation would be that programs written for the Python API would be
much slower than when implementing with Java or Scala. A performance
comparison would be quite interesting. Gelly has both iterative and
non-iterative algorithms.

Greg

On Sat, Aug 20, 2016 at 7:11 PM, Ivan Mushketyk 
wrote:

> Hi Chesnay,
>
> Thank you for you repply.
> Out of curiosity, do you know why Python API reception was  *tumbleweed*?
>
> Regarding the Python API, do you know what specifically should be done
> there? I have some Python background so I was considering to contribute,
> but I didn't find much tasks in the "Python" component:
> https://issues.apache.org/jira/browse/FLINK-1926?jql=
> project%20%3D%20FLINK%20AND%20component%20%3D%20%22Python%
> 20API%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%
> 20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
>
> Best regards,
> Ivan.
>
>
> On Fri, 19 Aug 2016 at 22:45 Chesnay Schepler  wrote:
>
> > Hello,
> >
> > I would say no, as the general reception of the Python API was
> > *tumbleweed* so far.
> >
> > In my opinion this would just lead to a massive increase in code to
> > maintain; we would need at least 2-3 active long-term python
> contributors.
> > Especially so since ML, CEP and Table are afaik still in heavy
> development.
> >
> > If anything, before thinking about porting the libraries to python it
> > would make more sense to implement a python streaming API.
> > Or maybe /finish/ porting the DataSet API...
> >
> > Regards,
> > Chesnay
> >
> > On 19.08.2016 22:07, Ivan Mushketyk wrote:
> > > Hi Flink developers,
> > >
> > > It seems to me that Flink has two important "selling points":
> > >
> > > 1. It has Java, Scala and Python APIs
> > > 2. I has a number of useful libraries (ML, Gelly, CEP, and Table)
> > >
> > > But as far as I understand, currently users cannot use any of these
> > > libraries using a Python API. It seems to be a gap worth filling.
> > >
> > > What do you think about it? Does it make sense to add
> CEP/Gelly/ML/Table
> > > Python APIs?
> > >
> > > Best regards,
> > > Ivan.
> > >
> >
> >
>


Re: [DISCUSS] FLIP-3 - Organization of Documentation

2016-08-22 Thread Maximilian Michels
Very nice work Ufuk!

On Fri, Aug 19, 2016 at 12:07 PM, Till Rohrmann  wrote:
> I second Aljoscha :-)
>
> On Fri, Aug 19, 2016 at 11:53 AM, Aljoscha Krettek 
> wrote:
>
>> I checked it out and I liked it. :-)
>>
>> On Thu, 18 Aug 2016 at 19:40 Ufuk Celebi  wrote:
>>
>> > Initial PR for the layout: https://github.com/apache/flink/pull/2387
>> >
>> > On Tue, Aug 2, 2016 at 5:18 PM, Aljoscha Krettek 
>> > wrote:
>> > > +1
>> > >
>> > > On Tue, 2 Aug 2016 at 03:15 Till Rohrmann 
>> wrote:
>> > >
>> > >> +1 :-)
>> > >>
>> > >> On Tue, Aug 2, 2016 at 6:09 PM, Stephan Ewen 
>> wrote:
>> > >>
>> > >> > +1, thanks :-)
>> > >> >
>> > >> > On Tue, Aug 2, 2016 at 11:39 AM, Ufuk Celebi 
>> wrote:
>> > >> >
>> > >> > > If there are no objections, I would like to work on this in the
>> next
>> > >> > > days. I would like to only do the restructuring and don't add any
>> > new
>> > >> > > content (e.g. we would have a few empty pages in the beginning).
>> > >> > >
>> > >> > > On Wed, Jul 20, 2016 at 9:57 PM, Stephan Ewen 
>> > >> wrote:
>> > >> > > > I added to the "Application Development" Docs the Section
>> "Types,
>> > >> > > > TypeInformation, Serialization".
>> > >> > > > I think that is an important enough aspect to warrant separate
>> > docs.
>> > >> > > >
>> > >> > > > On Mon, Jul 18, 2016 at 3:36 PM, Till Rohrmann <
>> > trohrm...@apache.org
>> > >> >
>> > >> > > wrote:
>> > >> > > >
>> > >> > > >> +1 for the FLIP and making streaming the common case. Very good
>> > >> > proposal
>> > >> > > >> :-)
>> > >> > > >>
>> > >> > > >> On Mon, Jul 18, 2016 at 11:48 AM, Aljoscha Krettek <
>> > >> > aljos...@apache.org
>> > >> > > >
>> > >> > > >> wrote:
>> > >> > > >>
>> > >> > > >> > +1 I like it a lot!
>> > >> > > >> >
>> > >> > > >> > On Fri, 15 Jul 2016 at 18:43 Stephan Ewen 
>> > >> wrote:
>> > >> > > >> >
>> > >> > > >> > > My take would be to take streaming as the common case and
>> > make
>> > >> > > special
>> > >> > > >> > > sections for batch.
>> > >> > > >> > >
>> > >> > > >> > > We can still have a few streaming-only sections (end to end
>> > >> > exactly
>> > >> > > >> once)
>> > >> > > >> > > and a few batch-only sections (optimizer).
>> > >> > > >> > >
>> > >> > > >> > > On Fri, Jul 15, 2016 at 6:03 PM, Ufuk Celebi <
>> u...@apache.org
>> > >
>> > >> > > wrote:
>> > >> > > >> > >
>> > >> > > >> > > > I very much like this proposal. This is long overdue. Our
>> > >> > > >> > > > documentation never "broke up" with the old batch focus.
>> > >> That's
>> > >> > > where
>> > >> > > >> > > > the current structure comes from and why people often
>> don't
>> > >> find
>> > >> > > what
>> > >> > > >> > > > they are looking for. We were trying to treat streaming
>> and
>> > >> > batch
>> > >> > > as
>> > >> > > >> > > > equals. We never were "brave" enough to move
>> streaming-only
>> > >> > > concepts
>> > >> > > >> > > > to the top-level. I really like that you are proposing
>> this
>> > >> now
>> > >> > > (for
>> > >> > > >> > > > example for Event time, State Backends etc.). I would
>> love
>> > to
>> > >> > have
>> > >> > > >> > > > this go hand in hand with the 1.2 release.
>> > >> > > >> > > >
>> > >> > > >> > > > What is your opinion about pages affecting both streaming
>> > and
>> > >> > > batch
>> > >> > > >> > > > like "Connectors" or "Failure model"? We could have the
>> > >> landing
>> > >> > > page
>> > >> > > >> > > > cover the general material (e.g. restart strategies) and
>> > then
>> > >> > have
>> > >> > > >> > > > sub-pages for streaming- and batch-specific stuff. Or we
>> > treat
>> > >> > > >> > > > streaming as the common case and have a sub-section for
>> > batch.
>> > >> > We
>> > >> > > >> > > > probably have to decide this case-by-case, but to me it
>> > feels
>> > >> > like
>> > >> > > >> > > > this was the main problem with the old documentation
>> > structure
>> > >> > > >> > > > (content is a different story of course ;)).
>> > >> > > >> > > >
>> > >> > > >> > > > On Fri, Jul 15, 2016 at 4:09 PM, Stephan Ewen <
>> > >> se...@apache.org
>> > >> > >
>> > >> > > >> > wrote:
>> > >> > > >> > > > > Hi all!
>> > >> > > >> > > > >
>> > >> > > >> > > > > I posted another FLIP - this time about a suggestion to
>> > make
>> > >> > the
>> > >> > > >> > > > > documentation more accessible.
>> > >> > > >> > > > >
>> > >> > > >> > > > > FLIP-3 - Organization of Documentation
>> > >> > > >> > > > >
>> > >> > > >> > > >
>> > >> > > >> > >
>> > >> > > >> >
>> > >> > > >>
>> > >> > >
>> > >> >
>> > >>
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
>> 3+-+Organization+of+Documentation
>> > >> > > >> > > > >
>> > >> > > >> > > > > The issue of accessibility of information came up
>> > repeatedly
>> > >> > > from
>> > >> > > >> > > users I
>> > >> > > >> > > > > talked to, so this is a suggestion how to improve this.

Re: [DISCUSS] FLIP-10: Unify Savepoints and Checkpoints

2016-08-22 Thread Till Rohrmann
+1 for the FLIP.

I like the described changes and new functionality.

When looking at the public interface, I was wondering whether we should not
allow the user to specify a TimeUnit for the periodic interval. I think
it's nicer to be able to specify the time unit instead of converting
everything to milli seconds.

Cheers,
Till

On Fri, Aug 19, 2016 at 6:21 PM, Aljoscha Krettek 
wrote:

> +1 A great proposal.
>
> I'd like to suggest one change. In one paragraph you say "If no savepoint
> path is configured, fail the savepoint.". I think this is
> problematic because it means that users who forgot to add the setting to
> the config will be at a dead if they want to update. Can we make it work by
> specifying a file path with the command? A bit further below you mention
> adding an explicit file path to the savepoint command but it seems that one
> would only work if you also specified the config setting?
>
> Cheers,
> Aljoscha
>
> On Fri, 19 Aug 2016 at 18:07 Gyula Fóra  wrote:
>
> > Hi Ufuk,
> >
> > Great initiative in general I agree with pretty much everything in the
> > proposal.
> >
> > Some minor comments:
> > With these changes savepoints and checkpoints become pretty much the same
> > thing so we might as well drop one of the concepts and just refer to
> > checkpoints as savepoints that are discarded by the system.
> >
> > How do you plan to keep track of the available savepoints? It would be
> > great to provide a rest api or something to query it. Otherwise devs need
> > to have a lot of custom bookkeeping outside of the system. This might be
> > inevitable though...
> >
> > Cheers,
> > Gyula
> >
> > On Fri, Aug 19, 2016, 17:59 Ufuk Celebi  wrote:
> >
> > > Hey devs! I've created a FLIP to unify savepoints and checkpoints:
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 10%3A+Unify+Checkpoints+and+Savepoints#
> > >
> > > Looking forward to your feedback.
> > >
> > > – Ufuk
> > >
> >
>


Re: [FLINK-305] Code test coverage - how FLINK using it?

2016-08-22 Thread Till Rohrmann
Thanks a lot for your help with that Pavel :-)

On Fri, Aug 19, 2016 at 9:48 PM, Pavel Fadeev 
wrote:

> Fabian, Till,
> thanks for your comments!
>
> I`ve raised https://issues.apache.org/jira/browse/INFRA-12458 for the
> same.
> Will play around to see if this solution acceptable and does not affect
> build duration too much.
>
> 2016-08-19 12:51 GMT+03:00 Till Rohrmann :
>
> > Hi Pavel,
> >
> > I think it's a good point you're raising here. The Flink community isn't
> > using metrics like test coverage to ensure high quality code yet. I think
> > that is one thing which we can/should improve. Unfortunately, the ASF
> does
> > not allow to use codecov.io (or at least the Apache Infra team) [1].
> > However, they encourage to use coveralls.io.
> >
> > Do you know this tool and want to take a look how it could be integrated
> > with the Flink repository? It seems as if it is free for open source
> > projects. Maybe you can create a JIRA issue for this integration and then
> > take the lead there.
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-11273
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 19, 2016 at 11:33 AM, Fabian Hueske 
> wrote:
> >
> > > Hi Pavel,
> > >
> > > the Cobertura plugin was removed in this PR:
> > > https://github.com/apache/flink/pull/454
> > > Not sure if it was accidentally removed or on purpose.
> > > It was not included in the regular builds to reduce build time and
> AFAIK,
> > > it wasn't manually used either (otherwise somebody would have noticed
> > that
> > > it is gone).
> > >
> > > I am not aware of any code coverage stats for the Flink code base but
> it
> > > would be nice to some, IMO.
> > >
> > > Best, Fabian
> > >
> > > 2016-08-19 0:36 GMT+02:00 Pavel Fadeev :
> > >
> > > > Dear team,
> > > >
> > > > I`m just looking around into the project - complete novice at Flink
> :)
> > > > Sorry if Qs below already have answers!
> > > >
> > > > At first glance I`ve discovered that code coverage feature has been
> > > > introduced with FLINK-305  > > jira/browse/FLINK-305
> > > > >
> > > > and then removed for some reason on March 2015.
> > > >
> > > > Are you aware if it is not required anymore? Worried a bit about this
> > > after
> > > > local coverage run for flink-code as well. Do we have some
> integration
> > > like
> > > > codecov  here and do you feel if it is required?
> > > >
> > > > Also, do you know if there are some statistics (or team knowledge)
> for
> > > > regression bugs from uncovered code?
> > > >
> > >
> >
>


[jira] [Created] (FLINK-4445) Ignore unmatched state when restoring from savepoint

2016-08-22 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-4445:
--

 Summary: Ignore unmatched state when restoring from savepoint
 Key: FLINK-4445
 URL: https://issues.apache.org/jira/browse/FLINK-4445
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.1.1
Reporter: Ufuk Celebi


When currently submitting a job with a savepoint, we require that all state is 
matched to the new job. Many users have noted that this is overly strict. I 
would like to loosen this and allow savepoints to be restored without matching 
all state.

The following options come to mind:

(1) Keep the current behaviour, but add a flag to allow ignoring state when 
restoring, e.g. {{bin/flink -s  --ignoreUnmatchedState}}. This would 
be non-API breaking.

(2) Ignore unmatched state and continue. Additionally add a flag to be strict 
about checking the state, e.g. {{bin/flink -s  --strict}}. This 
would be API-breaking as the default behaviour would change. Users might be 
confused by this because there is no straight forward way to notice that 
nothing has been restored.

I'm not sure what's the best thing here. [~gyfora], [~aljoscha] What do you 
think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4444) Add a DFSInputChannel and DFSSubPartition

2016-08-22 Thread shuai.xu (JIRA)
shuai.xu created FLINK-:
---

 Summary: Add a DFSInputChannel and DFSSubPartition
 Key: FLINK-
 URL: https://issues.apache.org/jira/browse/FLINK-
 Project: Flink
  Issue Type: Sub-task
  Components: Batch Connectors and Input/Output Formats
Reporter: shuai.xu


Add a new ResultPartitionType and ResultPartitionLocation type for DFS
Add DFSSubpartition and DFSInputChannel for writing and reading DFS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)