Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Tzu-Li (Gordon) Tai
Great news, congratulations!

Thank you Jark and Kostas for all your contribution to the Flink community so 
far,
and really looking forward seeing it grow even more in the future with you 
being aboard :-D

Cheers,
Gordon

On February 8, 2017 at 11:35:08 AM, Jark Wu (wuchong...@alibaba-inc.com) wrote:

Thank you very much everyone! Hoping to help out the community as much as I 
can!  

- Jark  

> 在 2017年2月8日,上午11:04,艾广  写道:  
>  
> Congratulations, Jark and Kostas!  
>  
> 2017-02-08 10:53 GMT+08:00 Shaoxuan Wang :  
>  
>> Congratulations, Jark and Kostas!  
>> Let's push "flink" moving "forward" rapidly.  
>>  
>> -Shaoxuan  
>>  
>>  
>>  
>> On Wed, Feb 8, 2017 at 4:16 AM, Fabian Hueske  wrote:  
>>  
>>> Hi everybody,  
>>>  
>>> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the  
>>> invitation of the Flink PMC to become committers of the Apache Flink  
>>> project.  
>>>  
>>> Jark and Kostas are longtime members of the Flink community.  
>>> Both are actively driving Flink's development and contributing to its  
>>> community in many ways.  
>>>  
>>> Please join me in welcoming Kostas and Jark as committers.  
>>>  
>>> Fabian  
>>>  
>>  



Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Jark Wu
Thank you very much everyone! Hoping to help out the community as much as I can!

- Jark 

> 在 2017年2月8日,上午11:04,艾广  写道:
> 
> Congratulations, Jark and Kostas!
> 
> 2017-02-08 10:53 GMT+08:00 Shaoxuan Wang :
> 
>> Congratulations, Jark and Kostas!
>> Let's push "flink" moving "forward" rapidly.
>> 
>> -Shaoxuan
>> 
>> 
>> 
>> On Wed, Feb 8, 2017 at 4:16 AM, Fabian Hueske  wrote:
>> 
>>> Hi everybody,
>>> 
>>> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
>>> invitation of the Flink PMC to become committers of the Apache Flink
>>> project.
>>> 
>>> Jark and Kostas are longtime members of the Flink community.
>>> Both are actively driving Flink's development and contributing to its
>>> community in many ways.
>>> 
>>> Please join me in welcoming Kostas and Jark as committers.
>>> 
>>> Fabian
>>> 
>> 



Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread 艾广
Congratulations, Jark and Kostas!

2017-02-08 10:53 GMT+08:00 Shaoxuan Wang :

> Congratulations, Jark and Kostas!
> Let's push "flink" moving "forward" rapidly.
>
> -Shaoxuan
>
>
>
> On Wed, Feb 8, 2017 at 4:16 AM, Fabian Hueske  wrote:
>
> > Hi everybody,
> >
> > I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> > invitation of the Flink PMC to become committers of the Apache Flink
> > project.
> >
> > Jark and Kostas are longtime members of the Flink community.
> > Both are actively driving Flink's development and contributing to its
> > community in many ways.
> >
> > Please join me in welcoming Kostas and Jark as committers.
> >
> > Fabian
> >
>


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Shaoxuan Wang
Congratulations, Jark and Kostas!
Let's push "flink" moving "forward" rapidly.

-Shaoxuan



On Wed, Feb 8, 2017 at 4:16 AM, Fabian Hueske  wrote:

> Hi everybody,
>
> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> invitation of the Flink PMC to become committers of the Apache Flink
> project.
>
> Jark and Kostas are longtime members of the Flink community.
> Both are actively driving Flink's development and contributing to its
> community in many ways.
>
> Please join me in welcoming Kostas and Jark as committers.
>
> Fabian
>


[jira] [Created] (FLINK-5738) Destroy created backend when task is canceled

2017-02-07 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-5738:
---

 Summary: Destroy created backend when task is canceled
 Key: FLINK-5738
 URL: https://issues.apache.org/jira/browse/FLINK-5738
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Xiaogang Shi


When a task is canceled, the {{ClosableRegistry}} will be closed in the cancel 
thread. However, the task may still in the creation of {{KeyedStateBackend}}, 
and it will fail to register the backend to the {{ClosableRegistry}}. Because 
the backend is not assigned to the operator yet (due to the exception), the 
backend will not be destroyed when the task thread exits.

A simple solution is to catch exception in the registering and destroy the 
created backend in the case of failures. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Chen Qin
Congrats! 

> On Feb 7, 2017, at 17:52, Zhuoluo Yang  wrote:
> 
> Congrats! Good job guys!
> 
> Thanks,
> 
> Zhuoluo 
> 
> 
> 
> 
> 
>> 在 2017年2月8日,上午4:59,Greg Hogan  写道:
>> 
>> Welcome Jark and Kostas! Thank you for your contributions and many more to
>> come.
>> 
>>> On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske  wrote:
>>> 
>>> Hi everybody,
>>> 
>>> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
>>> invitation of the Flink PMC to become committers of the Apache Flink
>>> project.
>>> 
>>> Jark and Kostas are longtime members of the Flink community.
>>> Both are actively driving Flink's development and contributing to its
>>> community in many ways.
>>> 
>>> Please join me in welcoming Kostas and Jark as committers.
>>> 
>>> Fabian
>>> 
> 


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Longda Feng

@Jark &  @Kostas, Congratulations, good job.More and more guys join the 
Community.

CheersLongda--From:Greg
 Hogan Send Time:2017年2月8日(星期三) 04:59To:dev 
Subject:Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas 
as committers
Welcome Jark and Kostas! Thank you for your contributions and many more to
come.

On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske  wrote:

> Hi everybody,
>
> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> invitation of the Flink PMC to become committers of the Apache Flink
> project.
>
> Jark and Kostas are longtime members of the Flink community.
> Both are actively driving Flink's development and contributing to its
> community in many ways.
>
> Please join me in welcoming Kostas and Jark as committers.
>
> Fabian
>



Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Jinkui Shi
Congratulations, Jark Wu and Kostas Kloudas.

> On Feb 8, 2017, at 09:52, Zhuoluo Yang  wrote:
> 
> Congrats! Good job guys!
> 
> Thanks,
> 
> Zhuoluo 
> 
> 
> 
> 
> 
>> 在 2017年2月8日,上午4:59,Greg Hogan > > 写道:
>> 
>> Welcome Jark and Kostas! Thank you for your contributions and many more to
>> come.
>> 
>> On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske > > wrote:
>> 
>>> Hi everybody,
>>> 
>>> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
>>> invitation of the Flink PMC to become committers of the Apache Flink
>>> project.
>>> 
>>> Jark and Kostas are longtime members of the Flink community.
>>> Both are actively driving Flink's development and contributing to its
>>> community in many ways.
>>> 
>>> Please join me in welcoming Kostas and Jark as committers.
>>> 
>>> Fabian
>>> 
> 
>  邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
> 共有 1 个附件
> smime.p7s(3K)
> 极速下载 
> 


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Zhuoluo Yang
Congrats! Good job guys!

Thanks,

Zhuoluo 





> 在 2017年2月8日,上午4:59,Greg Hogan  写道:
> 
> Welcome Jark and Kostas! Thank you for your contributions and many more to
> come.
> 
> On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske  wrote:
> 
>> Hi everybody,
>> 
>> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
>> invitation of the Flink PMC to become committers of the Apache Flink
>> project.
>> 
>> Jark and Kostas are longtime members of the Flink community.
>> Both are actively driving Flink's development and contributing to its
>> community in many ways.
>> 
>> Please join me in welcoming Kostas and Jark as committers.
>> 
>> Fabian
>> 



smime.p7s
Description: S/MIME cryptographic signature


FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-07 Thread Pattarawat Chormai
Hi all,

We’re working on FLINK-5734 : Implement code generation for 
NormalizedKeySorter. We have just finished a design document that describes how 
we’re going to proceed this implementation.

Because this implement will touch several important parts of Flink, so we would 
appreciate if you could give us some feedback before starting implementation.
Here is the link to the document : 
https://docs.google.com/document/d/1anGQhBn9qI0yqe7twVvrDIiym4U4gxalJkZzM4Ar4QM/edit?usp=sharing
 


Also, we would like to confirm that NormalizedKeySorter is always instantiated 
at TaskManager, but we could not find any solution to verify that yet. If you 
know how to do it, we would be very pleased to hear that.

Thank you in advance for your feedback.

Best regards, 
Serkan, Gábor and Pat

Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 2/7/17 12:55 PM, Haohui Mai wrote:
> Congratulations!
> 
> On Tue, Feb 7, 2017 at 12:17 PM Fabian Hueske 
> wrote:
> 
>> Hi everybody,
>> 
>> I'm very happy to announce that Jark Wu and Kostas Kloudas
>> accepted the invitation of the Flink PMC to become committers of
>> the Apache Flink project.
>> 
>> Jark and Kostas are longtime members of the Flink community. Both
>> are actively driving Flink's development and contributing to its 
>> community in many ways.
>> 
>> Please join me in welcoming Kostas and Jark as committers.
>> 
>> Fabian
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJYmjTvAAoJELz8Z8hxAGOixrMP3iEATjioyiHr4koyIHF/it2y
cFXdrJsD84t1fLpNlH5snxokqIOxSDUC5kbQ437RGpGk7g6bxJ/KZP1fObWGstd/
2ThhM69UeDsVKUm0fiycJ9kvvmuylW4jXJ170JiTltpFdOv2x8nWkHYyv8rDkbrX
WKs+rUqBpbOk49ObnTtN/V1wqeHc7+iZRmwK1yVjTFKS+IvhFzCY8cGdMbHee8hR
oM2rB91nnI130voCiCAuVnzc/LV7G9EifKr1xAdbHZzG+Ld3kuDCvjuG2BNhH4LP
gFFKqIU4/0Tvi5+pqVoz5YIJNVBQMAOmafTQEUKpakhl85qRVx/etMi70DFA0ujb
lGwdG6N4EbFdUCNpj5Y9w4vH0CDlY/cjffTRQJCtuQJVtP8psnVVgaZp5H9HLVxX
k08sE5j44NJS1+owgayq4pPUGcz5Kzl8ECQx4DSOmGs2A2ZGgntB1Pluy/3LTd9h
UbuLI1mJm+1n/3SqmBmch6h5pZzu1QJvt3Wfm+ChE+eD70OUj7a7TYxbaCBoUvXq
fRQ1CV7YpbeR7Pti9eQTS+sh/yKfu1g90cSLOImnGLbCKL46ociq5Ja0njG7H9ta
ndZ+lCRjyygUQG9XI+9LB3pZulcSsWWhbKWmkVENsfxu8RAfk1VRYRBFqq5by+h/
3nJWUuk+BS2cAlo=
=uVv7
-END PGP SIGNATURE-


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Greg Hogan
Welcome Jark and Kostas! Thank you for your contributions and many more to
come.

On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske  wrote:

> Hi everybody,
>
> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> invitation of the Flink PMC to become committers of the Apache Flink
> project.
>
> Jark and Kostas are longtime members of the Flink community.
> Both are actively driving Flink's development and contributing to its
> community in many ways.
>
> Please join me in welcoming Kostas and Jark as committers.
>
> Fabian
>


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Haohui Mai
Congratulations!

On Tue, Feb 7, 2017 at 12:17 PM Fabian Hueske  wrote:

> Hi everybody,
>
> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> invitation of the Flink PMC to become committers of the Apache Flink
> project.
>
> Jark and Kostas are longtime members of the Flink community.
> Both are actively driving Flink's development and contributing to its
> community in many ways.
>
> Please join me in welcoming Kostas and Jark as committers.
>
> Fabian
>


[ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Fabian Hueske
Hi everybody,

I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
invitation of the Flink PMC to become committers of the Apache Flink
project.

Jark and Kostas are longtime members of the Flink community.
Both are actively driving Flink's development and contributing to its
community in many ways.

Please join me in welcoming Kostas and Jark as committers.

Fabian


RE: Stream SQL and Dynamic tables

2017-02-07 Thread Radu Tudoran
Hi,

I made some comments over the Dynamic table document. Not sure how to ask for 
feedback for them...therefore my email.

Please let me know what do you think

https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU/edit#heading=h.3eo2vkvydld6

Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Monday, January 30, 2017 9:07 PM
To: dev@flink.apache.org
Subject: Re: Stream SQL and Dynamic tables

Hi Radu,

yes, the clean-up timeout would need to be defined somewhere.
I would actually prefer to do that within the query, because the clean-up 
timeout affects the result and hence the semantics of the computed result.
This could look for instance as

SELECT a, sum(b)
FROM myTable
WHERE rowtime BETWEEN now() - INTERVAL '1' DAY AND now() GROUP BY a;

In this query now() would always refer to the current time, i.e., the current 
wall-clock time for processing time or the current watermark time for event 
time.
The result of the query would be the grouped aggregate of the data received in 
the last hour.
We can add syntactic sugar with built-in functions as for example:
last(rowtime, INTERVAL '1' DAY).

In addition we can also add a configuration parameter to the TableEnvironment 
to control the clean-up timeout.

Cheers,
Fabian

2017-01-30 18:14 GMT+01:00 Radu Tudoran :

> Hi Fabian,
>
> Thanks for the clarifications. I have a follow up question: you say 
> that operations are expected to be bounded in space and time (e.g., 
> the optimizer will do a cleanup after a certain timeout period). - can 
> I assume that this will imply that we will have at the level of the 
> system a couple of parameters that hold these thresholds and potentially can 
> be setup?
>
> For example having in the environment variable
>
> Env.setCleanupTimeout(100,TimeUnit.Minutes);
>
> ...or alternatively perhaps directly at the level of the table (either 
> table environment or the table itself)
>
> TableEnvironment tbEnv =...
> tbEnv.setCleanupTimeOut(100,TimeUnit.Minutes)
> Table tb=
> tb.setCleanupTimeOut(100,TimeUnit.Minutes)
>
>
>
> -Original Message-
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Friday, January 27, 2017 9:41 PM
> To: dev@flink.apache.org
> Subject: Re: Stream SQL and Dynamic tables
>
> Hi Radu,
>
> the idea is to only support operations that are bounded in space and 
> compute time:
>
> - space: the size of state may not infinitely grow over time or with 
> growing key domains. For these cases, the optimizer will enforce a 
> cleanup timeout and all data which is passed that timeout will be discarded.
> Operations which cannot be bounded in space will be rejected.
>
> - compute time: certain queries can not be efficiently execute because 
> newly arriving data (late data or just newly appended rows) might 
> trigger recomputation of large parts of the current state. Operations 
> that will result in such a computation pattern will be rejected. One 
> example would be event-time OVER ROWS windows as we discussed in the other 
> thread.
>
> So the plan is that the optimizer takes care of limiting the space 
> requirements and computation effort.
> However, you are of course right. Retraction and long running windows 
> can result significant amounts of operator state.
> I don't think this is a special requirement for the Table API (there 
> are users of the DataStream API with jobs that manage TBs of state). 
> Persisting state to disk with RocksDB and scaling out to more nodes 
> should address the scaling problem initially. In the long run, the 
> Flink community will work to improve the handling of large state with 
> features such as incremental checkpoints and new state backends.
>
> Looking forward to your comments.
>
> Best,
> Fabian
>
> 2017-01-27 11:01 GMT+01:00 Radu Tudoran :
>
> > Hi,
> >
> > Thanks for the clarification Fabian - 

[jira] [Created] (FLINK-5736) Fix Javadocs / Scaladocs build (Feb 2017)

2017-02-07 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-5736:
-

 Summary: Fix Javadocs / Scaladocs build (Feb 2017)
 Key: FLINK-5736
 URL: https://issues.apache.org/jira/browse/FLINK-5736
 Project: Flink
  Issue Type: Task
  Components: Build System, Documentation
Reporter: Robert Metzger


It looks like the scaladocs are not building properly.

Expected URL: 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/

Buildbot output: 
https://ci.apache.org/builders/flink-docs-release-1.2/builds/15/steps/Java%20%26%20Scala%20docs/logs/stdio

Command to reproduce issue on master: mvn clean javadoc:aggregate 
-Paggregate-scaladoc -DadditionalJOption="-Xdoclint:none" -Dheader="http://flink.apache.org/; target="_top">Back to Flink 
Website"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5735) Non-overlapping sliding window is not deterministic

2017-02-07 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5735:
---

 Summary: Non-overlapping sliding window is not deterministic
 Key: FLINK-5735
 URL: https://issues.apache.org/jira/browse/FLINK-5735
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Timo Walther


I don't know if this is a problem of the Table API or the underlying API. We 
have to investigate this as part of the issue.

The following code leads to different results from time to time. Sometimes the 
count of "Hello" is 1 sometimes 2.

{code}
  val data = List(
(1L, 1, "Hi"),
(2L, 2, "Hallo"),
(3L, 2, "Hello"),
(6L, 3, "Hello"),
(4L, 5, "Hello"),
(16L, 4, "Hello world"),
(8L, 3, "Hello world"))

  @Test
  def testEventTimeSlidingWindowNonOverlapping(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val tEnv = TableEnvironment.getTableEnvironment(env)
StreamITCase.testResults = mutable.MutableList()

val stream = env
  .fromCollection(data)
  .assignTimestampsAndWatermarks(new TimestampWithEqualWatermark())
val table = stream.toTable(tEnv, 'long, 'int, 'string)

val windowedTable = table
  .window(Slide over 5.milli every 10.milli on 'rowtime as 'w)
  .groupBy('w, 'string)
  .select('string, 'int.count, 'w.start, 'w.end)

val results = windowedTable.toDataStream[Row]
results.addSink(new StreamITCase.StringSink)
env.execute()

val expected = Seq(
  "Hallo,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005",
  "Hello,2,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005",
  "Hi,1,1970-01-01 00:00:00.0,1970-01-01 00:00:00.005")
assertEquals(expected.sorted, StreamITCase.testResults.sorted)
  }

  class TimestampWithEqualWatermark extends 
AssignerWithPunctuatedWatermarks[(Long, Int, String)] {

override def checkAndGetNextWatermark(
lastElement: (Long, Int, String),
extractedTimestamp: Long)
  : Watermark = {
  new Watermark(extractedTimestamp)
}

override def extractTimestamp(
element: (Long, Int, String),
previousElementTimestamp: Long): Long = {
  element._1
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: New Flink team member - Kate Eri.

2017-02-07 Thread Felix Neutatz
Hi Katherin,

we are also working in a similar direction. We implemented a prototype to
integrate with SystemML:
https://github.com/apache/incubator-systemml/pull/119
SystemML provides many different matrix formats, operations, GPU support
and a couple of DL algorithms. Unfortunately, we realized that the lack of
a caching operator and a broadcast issue highly effects the performance
(e.g. compared to Spark). At the moment I am trying to tackle the broadcast
issue. But caching is still a problem for us.

Best regards,
Felix

2017-02-07 16:22 GMT+01:00 Katherin Eri :

> Thank you, Till.
>
> 1)  Regarding ND4J, I didn’t know about such a pity and critical
> restriction of it -> lack of sparsity optimizations, and you are right:
> this issue is still actual for them. I saw that Flink uses Breeze, but I
> thought its usage caused by some historical reasons.
>
> 2)  Regarding integration with DL4J, I have read the source code of
> DL4J/Spark integration, that’s why I have declined my idea of reuse of
> their word2vec implementation for now, for example. I can perform deeper
> investigation of this topic, if it required.
>
>
>
> So I feel that we have the following picture:
>
> 1)  DL integration investigation, could be part of Apache Bahir. I can
> perform futher investigation of this topic, but I thik we need some
> separated ticket for this to track this activity.
>
> 2)  GPU support, required for DL is interesting, but requires ND4J for
> example.
>
> 3)  ND4J couldn’t be incorporated because it doesn’t support sparsity
>  [1].
>
> Regarding ND4J is this the single blocker for incorporation of it or may be
> some others known?
>
>
> [1] https://deeplearning4j.org/roadmap.html
>
> вт, 7 февр. 2017 г. в 16:26, Till Rohrmann :
>
> Thanks for initiating this discussion Katherin. I think you're right that
> in general it does not make sense to reinvent the wheel over and over
> again. Especially if you only have limited resources at hand. So if we
> could integrate Flink with some existing library that would be great.
>
> In the past, however, we couldn't find a good library which provided enough
> freedom to integrate it with Flink. Especially if you want to have
> distributed and somewhat high-performance implementations of ML algorithms
> you would have to take Flink's execution model (capabilities as well as
> limitations) into account. That is mainly the reason why we started
> implementing some of the algorithms "natively" on Flink.
>
> If I remember correctly, then the problem with ND4J was and still is that
> it does not support sparse matrices which was a requirement from our side.
> As far as I know, it is quite common that you have sparse data structures
> when dealing with large scale problems. That's why we built our own
> abstraction which can have different implementations. Currently, the
> default implementation uses Breeze.
>
> I think the support for GPU based operations and the actual resource
> management are two orthogonal things. The implementation would have to work
> with no GPUs available anyway. If the system detects that GPUs are
> available, then ideally it would exploit them. Thus, we could add this
> feature later and maybe integrate it with FLINK-5131 [1].
>
> Concerning the integration with DL4J I think that Theo's proposal to do it
> in a separate repository (maybe as part of Apache Bahir) is a good idea.
> We're currently thinking about outsourcing some of Flink's libraries into
> sub projects. This could also be an option for the DL4J integration then.
> In general I think it should be feasible to run DL4J on Flink given that it
> also runs on Spark. Have you already looked at it closer?
>
> [1] https://issues.apache.org/jira/browse/FLINK-5131
>
> Cheers,
> Till
>
> On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri 
> wrote:
>
> > Thank you Theodore, for your reply.
> >
> > 1)Regarding GPU, your point is clear and I agree with it, ND4J looks
> > appropriate. But, my current understanding is that, we also need to cover
> > some resource management questions -> when we need to provide GPU support
> > we also need to manage it like resource. For example, Mesos has already
> > supported GPU like resource item: Initial support for GPU resources.
> > 
> > Flink
> > uses Mesos as cluster manager, and this means that this feature of Mesos
> > could be reused. Also memory managing questions in Flink regarding GPU
> > should be clarified.
> >
> > 2)Regarding integration with DL4J: what stops us to initialize ticket
> > and start the discussion around this topic? We need some user story or
> the
> > community is not sure that DL is really helpful? Why the discussion with
> > Adam
> > Gibson just finished with no implementation of any idea? What concerns do
> > we have?
> >
> > пн, 6 февр. 

Re: New Flink team member - Kate Eri.

2017-02-07 Thread Katherin Eri
Thank you, Till.

1)  Regarding ND4J, I didn’t know about such a pity and critical
restriction of it -> lack of sparsity optimizations, and you are right:
this issue is still actual for them. I saw that Flink uses Breeze, but I
thought its usage caused by some historical reasons.

2)  Regarding integration with DL4J, I have read the source code of
DL4J/Spark integration, that’s why I have declined my idea of reuse of
their word2vec implementation for now, for example. I can perform deeper
investigation of this topic, if it required.



So I feel that we have the following picture:

1)  DL integration investigation, could be part of Apache Bahir. I can
perform futher investigation of this topic, but I thik we need some
separated ticket for this to track this activity.

2)  GPU support, required for DL is interesting, but requires ND4J for
example.

3)  ND4J couldn’t be incorporated because it doesn’t support sparsity
 [1].

Regarding ND4J is this the single blocker for incorporation of it or may be
some others known?


[1] https://deeplearning4j.org/roadmap.html

вт, 7 февр. 2017 г. в 16:26, Till Rohrmann :

Thanks for initiating this discussion Katherin. I think you're right that
in general it does not make sense to reinvent the wheel over and over
again. Especially if you only have limited resources at hand. So if we
could integrate Flink with some existing library that would be great.

In the past, however, we couldn't find a good library which provided enough
freedom to integrate it with Flink. Especially if you want to have
distributed and somewhat high-performance implementations of ML algorithms
you would have to take Flink's execution model (capabilities as well as
limitations) into account. That is mainly the reason why we started
implementing some of the algorithms "natively" on Flink.

If I remember correctly, then the problem with ND4J was and still is that
it does not support sparse matrices which was a requirement from our side.
As far as I know, it is quite common that you have sparse data structures
when dealing with large scale problems. That's why we built our own
abstraction which can have different implementations. Currently, the
default implementation uses Breeze.

I think the support for GPU based operations and the actual resource
management are two orthogonal things. The implementation would have to work
with no GPUs available anyway. If the system detects that GPUs are
available, then ideally it would exploit them. Thus, we could add this
feature later and maybe integrate it with FLINK-5131 [1].

Concerning the integration with DL4J I think that Theo's proposal to do it
in a separate repository (maybe as part of Apache Bahir) is a good idea.
We're currently thinking about outsourcing some of Flink's libraries into
sub projects. This could also be an option for the DL4J integration then.
In general I think it should be feasible to run DL4J on Flink given that it
also runs on Spark. Have you already looked at it closer?

[1] https://issues.apache.org/jira/browse/FLINK-5131

Cheers,
Till

On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri 
wrote:

> Thank you Theodore, for your reply.
>
> 1)Regarding GPU, your point is clear and I agree with it, ND4J looks
> appropriate. But, my current understanding is that, we also need to cover
> some resource management questions -> when we need to provide GPU support
> we also need to manage it like resource. For example, Mesos has already
> supported GPU like resource item: Initial support for GPU resources.
> 
> Flink
> uses Mesos as cluster manager, and this means that this feature of Mesos
> could be reused. Also memory managing questions in Flink regarding GPU
> should be clarified.
>
> 2)Regarding integration with DL4J: what stops us to initialize ticket
> and start the discussion around this topic? We need some user story or the
> community is not sure that DL is really helpful? Why the discussion with
> Adam
> Gibson just finished with no implementation of any idea? What concerns do
> we have?
>
> пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com>:
>
> > Hell all,
> >
> > This is point that has come up in the past: Given the multitude of ML
> > libraries out there, should we have native implementations in FlinkML or
> > try to integrate other libraries instead?
> >
> > We haven't managed to reach a consensus on this before. My opinion is
> that
> > there is definitely value in having ML algorithms written natively in
> > Flink, both for performance optimization,
> > but more importantly for engineering simplicity, we don't want to force
> > users to use yet another piece of software to run their ML algos (at
> least
> > for a basic set of algorithms).
> >
> > We have in the past  discussed integrations with DL4J (particularly

ecosystem proposal, Temporal graph analytics on Flink with Gelly

2017-02-07 Thread Wouter Ligtenberg
Hello,

For my master thesis i have created Tink, abrevated for temporal Flink,
It's a library that's supposed to work on top of Flink in relation with
Gelly to do temporal graph analytics.
I've worked on this project for the past 6 months as part of my master
thesis at the university of Eindhoven.
I read on [1] that i could propose my project to have it listed on the
flink project website under the eco system page.
Could my project also be listed on the web page?

Please note that it's not completely done yet, at some point I had to start
writing my thesis, which is done now.

[1] https://flink.apache.org/ecosystem.html


Best regards,
Wouter Ligtenberg
Game designer at Onzichtbaar Developers


Re: ecosystem proposal, Temporal graph analytics on Flink with Gelly

2017-02-07 Thread Wouter Ligtenberg
I forgot to attach the github page of the project:

https://github.com/otherwise777/Temporal_Graph_library

Best regards,
Wouter Ligtenberg
Game designer at Onzichtbaar Developers


On Tue, Feb 7, 2017 at 3:41 PM, Wouter Ligtenberg 
wrote:

> Hello,
>
> For my master thesis i have created Tink, abrevated for temporal Flink,
> It's a library that's supposed to work on top of Flink in relation with
> Gelly to do temporal graph analytics.
> I've worked on this project for the past 6 months as part of my master
> thesis at the university of Eindhoven.
> I read on [1] that i could propose my project to have it listed on the
> flink project website under the eco system page.
> Could my project also be listed on the web page?
>
> Please note that it's not completely done yet, at some point I had to
> start writing my thesis, which is done now.
>
> [1] https://flink.apache.org/ecosystem.html
>
>
> Best regards,
> Wouter Ligtenberg
> Game designer at Onzichtbaar Developers
>
>


[jira] [Created] (FLINK-5734) Implement code generation for NormalizedKeySorter

2017-02-07 Thread Pattarawat Chormai (JIRA)
Pattarawat Chormai created FLINK-5734:
-

 Summary: Implement code generation for NormalizedKeySorter
 Key: FLINK-5734
 URL: https://issues.apache.org/jira/browse/FLINK-5734
 Project: Flink
  Issue Type: Sub-task
  Components: Local Runtime
Reporter: Pattarawat Chormai
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Discuss] Organizing Documentation for Configuration Options

2017-02-07 Thread Greg Hogan
+1 and thanks for volunteering for an initial implementation.

Looking forward to auto-scaling of network buffers.

On Tue, Feb 7, 2017 at 3:04 AM, Ufuk Celebi  wrote:

> I fully agree with you Greg.
>
> Since this is doomed to get out of sync again very shortly after clean up,
> I vote to automate this. Stephan introduced the ConfigOption type, which
> makes it easy to define the options. It's already planned to migrate all
> configuration options from ConfigConstants to this approach.
>
> For an example see here: https://github.com/apache/flink/blob/master/
> flink-core/src/main/java/org/apache/flink/configuration/
> HighAvailabilityOptions.java
>
> I think that it is possible to build the configuration docs page from this
> with reasonable effort.
>
> This would translate the task to:
> 1) Automate ConfigOption to HTML/Markdown generation
> 2) Extend ConfigOption with description fields
> 3) Migrate ConfigConstants to ConfigOptions
>
> I would also volunteer to take a first stab at this.
>
> Regarding the network buffers: +1 to your suggestion. Nico (cc'd) is
> starting to work on automating the network buffer configuration in order to
> get rid of any manual tuning for most users (because of the issues you
> described + streaming and batch jobs require different tuning, which
> complicates things even more).
>
> – Ufuk
>
> On 6 February 2017 at 19:21:28, Greg Hogan (c...@greghogan.com) wrote:
> > > Hi devs,
> >
> > Flink's Configuration page [1] has grown intimidatingly long
> > and complex.
> > Options are described across three main sections: common options
> > (single
> > section), advanced options (multiple sections), and full reference.
> > The
> > trailing "background" section further describes the most impactful
> > options
> > in much greater detail.
> >
> > Several recent tickets, and a few outstanding, have added missing
> > options
> > to the configuration documentation. I'd like to propose a goal
> > of
> > organizing all options in the full reference into alphabetized,
> > tabular
> > form (one table per section), much like the system metrics [2].
> > Columns
> > would be option name, description, and default value.
> >
> > The common and advanced sections could also be converted to tabular
> > form
> > with the exception of Kerberos-based Security. Missing options
> > would be
> > added to the full reference.
> >
> > Lastly, the simple heuristic for configuring network buffers
> > has prompted
> > many questions on the mailing list. With the 1.3 release the total
> > and
> > number of available buffers is reported through metrics and
> > in the web
> > dashboard. My experience has been that the number of required
> > buffers is
> > highly dependent on job topology and cluster performance. I
> > propose keeping
> > the simple heuristic and description while directing users
> > to monitor the
> > balance of available buffers.
> >
> > Greg
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-
> master/monitoring/metrics.html#system-metrics
> > [3]
> > https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html#configuring-the-network-buffers
>
>


[jira] [Created] (FLINK-5733) Link to Bahir connectors

2017-02-07 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-5733:
-

 Summary: Link to Bahir connectors
 Key: FLINK-5733
 URL: https://issues.apache.org/jira/browse/FLINK-5733
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Project Website
Affects Versions: 1.3.0, 1.2.1
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


The [ecosystem|https://flink.apache.org/ecosystem.html] page lists and links to 
connectors included in the Flink distribution. Add to this list the connectors 
in [bahir-flink|https://github.com/apache/bahir-flink].

Also add Bahir connectors to the 
[connectors|https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/index.html]
 documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: New Flink team member - Kate Eri.

2017-02-07 Thread Till Rohrmann
Thanks for initiating this discussion Katherin. I think you're right that
in general it does not make sense to reinvent the wheel over and over
again. Especially if you only have limited resources at hand. So if we
could integrate Flink with some existing library that would be great.

In the past, however, we couldn't find a good library which provided enough
freedom to integrate it with Flink. Especially if you want to have
distributed and somewhat high-performance implementations of ML algorithms
you would have to take Flink's execution model (capabilities as well as
limitations) into account. That is mainly the reason why we started
implementing some of the algorithms "natively" on Flink.

If I remember correctly, then the problem with ND4J was and still is that
it does not support sparse matrices which was a requirement from our side.
As far as I know, it is quite common that you have sparse data structures
when dealing with large scale problems. That's why we built our own
abstraction which can have different implementations. Currently, the
default implementation uses Breeze.

I think the support for GPU based operations and the actual resource
management are two orthogonal things. The implementation would have to work
with no GPUs available anyway. If the system detects that GPUs are
available, then ideally it would exploit them. Thus, we could add this
feature later and maybe integrate it with FLINK-5131 [1].

Concerning the integration with DL4J I think that Theo's proposal to do it
in a separate repository (maybe as part of Apache Bahir) is a good idea.
We're currently thinking about outsourcing some of Flink's libraries into
sub projects. This could also be an option for the DL4J integration then.
In general I think it should be feasible to run DL4J on Flink given that it
also runs on Spark. Have you already looked at it closer?

[1] https://issues.apache.org/jira/browse/FLINK-5131

Cheers,
Till

On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri 
wrote:

> Thank you Theodore, for your reply.
>
> 1)Regarding GPU, your point is clear and I agree with it, ND4J looks
> appropriate. But, my current understanding is that, we also need to cover
> some resource management questions -> when we need to provide GPU support
> we also need to manage it like resource. For example, Mesos has already
> supported GPU like resource item: Initial support for GPU resources.
> 
> Flink
> uses Mesos as cluster manager, and this means that this feature of Mesos
> could be reused. Also memory managing questions in Flink regarding GPU
> should be clarified.
>
> 2)Regarding integration with DL4J: what stops us to initialize ticket
> and start the discussion around this topic? We need some user story or the
> community is not sure that DL is really helpful? Why the discussion with
> Adam
> Gibson just finished with no implementation of any idea? What concerns do
> we have?
>
> пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com>:
>
> > Hell all,
> >
> > This is point that has come up in the past: Given the multitude of ML
> > libraries out there, should we have native implementations in FlinkML or
> > try to integrate other libraries instead?
> >
> > We haven't managed to reach a consensus on this before. My opinion is
> that
> > there is definitely value in having ML algorithms written natively in
> > Flink, both for performance optimization,
> > but more importantly for engineering simplicity, we don't want to force
> > users to use yet another piece of software to run their ML algos (at
> least
> > for a basic set of algorithms).
> >
> > We have in the past  discussed integrations with DL4J (particularly ND4J)
> > with Adam Gibson, the core developer of the library, but we never got
> > around to implementing anything.
> >
> > Whether it makes sense to have an integration with DL4J as part of the
> > Flink distribution would be up for discussion. I would suggest to make it
> > an independent repo to allow for
> > faster dev/release cycles, and because it wouldn't be directly related to
> > the core of Flink so it would add extra reviewing burden to an already
> > overloaded group of committers.
> >
> > Natively supporting GPU calculations in Flink would be much better
> achieved
> > through a library like ND4J, the engineering burden would be too much
> > otherwise.
> >
> > Regards,
> > Theodore
> >
> > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri 
> > wrote:
> >
> > > Hello, guys.
> > >
> > > Theodore, last week I started the review of the PR:
> > > https://github.com/apache/flink/pull/2735 related to *word2Vec for
> > Flink*.
> > >
> > >
> > >
> > > During this review I have asked myself: why do we need to implement
> such
> > a
> > > very popular algorithm like *word2vec one more time*, when there is
> > already
> > > available implementation in java provided by 

[jira] [Created] (FLINK-5732) Java quick start mvn command line is incorrect

2017-02-07 Thread Colin Breame (JIRA)
Colin Breame created FLINK-5732:
---

 Summary: Java quick start mvn command line is incorrect
 Key: FLINK-5732
 URL: https://issues.apache.org/jira/browse/FLINK-5732
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.2.0
Reporter: Colin Breame
Priority: Trivial


On this page:


https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/java_api_quickstart.html

Within the "Create Project" section it reads:

{quote}
$ mvn archetype:generate   \
  -DarchetypeGroupId=org.apache.flink  \
  -DarchetypeArtifactId=flink-quickstart-java  \
  -DarchetypeVersion=1.2
{quote}

However this is incorrect, it should read:

{quote}
$ mvn archetype:generate   \
  -DarchetypeGroupId=org.apache.flink  \
  -DarchetypeArtifactId=flink-quickstart-java  \
  -DarchetypeVersion=1.2.0
{quote}

(1.2 *.0* not 1.2)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Discuss] Organizing Documentation for Configuration Options

2017-02-07 Thread Till Rohrmann
+1 for automating. The more automation the fewer mistakes we can make :-)

On Tue, Feb 7, 2017 at 11:13 AM, Robert Metzger  wrote:

> +1 to automate this and describe the config parameters in the code.
>
> That's exactly the approach Apache Kafka is taking as well for their
> config.
>
> On Tue, Feb 7, 2017 at 9:04 AM, Ufuk Celebi  wrote:
>
> > I fully agree with you Greg.
> >
> > Since this is doomed to get out of sync again very shortly after clean
> up,
> > I vote to automate this. Stephan introduced the ConfigOption type, which
> > makes it easy to define the options. It's already planned to migrate all
> > configuration options from ConfigConstants to this approach.
> >
> > For an example see here: https://github.com/apache/flink/blob/master/
> > flink-core/src/main/java/org/apache/flink/configuration/
> > HighAvailabilityOptions.java
> >
> > I think that it is possible to build the configuration docs page from
> this
> > with reasonable effort.
> >
> > This would translate the task to:
> > 1) Automate ConfigOption to HTML/Markdown generation
> > 2) Extend ConfigOption with description fields
> > 3) Migrate ConfigConstants to ConfigOptions
> >
> > I would also volunteer to take a first stab at this.
> >
> > Regarding the network buffers: +1 to your suggestion. Nico (cc'd) is
> > starting to work on automating the network buffer configuration in order
> to
> > get rid of any manual tuning for most users (because of the issues you
> > described + streaming and batch jobs require different tuning, which
> > complicates things even more).
> >
> > – Ufuk
> >
> > On 6 February 2017 at 19:21:28, Greg Hogan (c...@greghogan.com) wrote:
> > > > Hi devs,
> > >
> > > Flink's Configuration page [1] has grown intimidatingly long
> > > and complex.
> > > Options are described across three main sections: common options
> > > (single
> > > section), advanced options (multiple sections), and full reference.
> > > The
> > > trailing "background" section further describes the most impactful
> > > options
> > > in much greater detail.
> > >
> > > Several recent tickets, and a few outstanding, have added missing
> > > options
> > > to the configuration documentation. I'd like to propose a goal
> > > of
> > > organizing all options in the full reference into alphabetized,
> > > tabular
> > > form (one table per section), much like the system metrics [2].
> > > Columns
> > > would be option name, description, and default value.
> > >
> > > The common and advanced sections could also be converted to tabular
> > > form
> > > with the exception of Kerberos-based Security. Missing options
> > > would be
> > > added to the full reference.
> > >
> > > Lastly, the simple heuristic for configuring network buffers
> > > has prompted
> > > many questions on the mailing list. With the 1.3 release the total
> > > and
> > > number of available buffers is reported through metrics and
> > > in the web
> > > dashboard. My experience has been that the number of required
> > > buffers is
> > > highly dependent on job topology and cluster performance. I
> > > propose keeping
> > > the simple heuristic and description while directing users
> > > to monitor the
> > > balance of available buffers.
> > >
> > > Greg
> > >
> > > [1] https://ci.apache.org/projects/flink/flink-docs-
> > master/setup/config.html
> > > [2]
> > > https://ci.apache.org/projects/flink/flink-docs-
> > master/monitoring/metrics.html#system-metrics
> > > [3]
> > > https://ci.apache.org/projects/flink/flink-docs-
> > master/setup/config.html#configuring-the-network-buffers
> >
> >
>


Re: New Flink team member - Kate Eri.

2017-02-07 Thread Katherin Eri
Thank you Theodore, for your reply.

1)Regarding GPU, your point is clear and I agree with it, ND4J looks
appropriate. But, my current understanding is that, we also need to cover
some resource management questions -> when we need to provide GPU support
we also need to manage it like resource. For example, Mesos has already
supported GPU like resource item: Initial support for GPU resources.
 Flink
uses Mesos as cluster manager, and this means that this feature of Mesos
could be reused. Also memory managing questions in Flink regarding GPU
should be clarified.

2)Regarding integration with DL4J: what stops us to initialize ticket
and start the discussion around this topic? We need some user story or the
community is not sure that DL is really helpful? Why the discussion with Adam
Gibson just finished with no implementation of any idea? What concerns do
we have?

пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com>:

> Hell all,
>
> This is point that has come up in the past: Given the multitude of ML
> libraries out there, should we have native implementations in FlinkML or
> try to integrate other libraries instead?
>
> We haven't managed to reach a consensus on this before. My opinion is that
> there is definitely value in having ML algorithms written natively in
> Flink, both for performance optimization,
> but more importantly for engineering simplicity, we don't want to force
> users to use yet another piece of software to run their ML algos (at least
> for a basic set of algorithms).
>
> We have in the past  discussed integrations with DL4J (particularly ND4J)
> with Adam Gibson, the core developer of the library, but we never got
> around to implementing anything.
>
> Whether it makes sense to have an integration with DL4J as part of the
> Flink distribution would be up for discussion. I would suggest to make it
> an independent repo to allow for
> faster dev/release cycles, and because it wouldn't be directly related to
> the core of Flink so it would add extra reviewing burden to an already
> overloaded group of committers.
>
> Natively supporting GPU calculations in Flink would be much better achieved
> through a library like ND4J, the engineering burden would be too much
> otherwise.
>
> Regards,
> Theodore
>
> On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri 
> wrote:
>
> > Hello, guys.
> >
> > Theodore, last week I started the review of the PR:
> > https://github.com/apache/flink/pull/2735 related to *word2Vec for
> Flink*.
> >
> >
> >
> > During this review I have asked myself: why do we need to implement such
> a
> > very popular algorithm like *word2vec one more time*, when there is
> already
> > available implementation in java provided by deeplearning4j.org
> >  library (DL4J -> Apache 2
> licence).
> > This library tries to promote itself, there is a hype around it in ML
> > sphere, and it was integrated with Apache Spark, to provide scalable
> > deeplearning calculations.
> >
> >
> > *That's why I thought: could we integrate with this library or not also
> and
> > Flink? *
> >
> > 1) Personally I think, providing support and deployment of
> > *Deeplearning(DL)
> > algorithms/models in Flink* is promising and attractive feature, because:
> >
> > a) during last two years DL proved its efficiency and these
> algorithms
> > used in many applications. For example *Spotify *uses DL based algorithms
> > for music content extraction: Recommending music on Spotify with deep
> > learning AUGUST 05, 2014
> >  for their music
> > recommendations. Developers need to scale up DL manually, that causes a
> lot
> > of work, so that’s why such platforms like Flink should support these
> > models deployment.
> >
> > b) Here is presented the scope of Deeplearning usage cases
> > , so many of this scenarios
> related
> > to scenarios, that could be supported on Flink.
> >
> >
> > 2) But DL uncover such questions like:
> >
> > a) scale up calculations over machines
> >
> > b) perform these calculations both over CPU and GPU. GPU is required
> to
> > train big DL models, otherwise learning process could have very slow
> > convergence.
> >
> >
> > 3) I have checked this DL4J library, which already have reach support of
> > many attractive DL models like: Recurrent Networks and LSTMs,
> Convolutional
> > Networks (CNN), Restricted Boltzmann Machines (RBM) and others. So we
> won’t
> > need to implement them independently, but only provide the ability of
> > execution of this models over Flink cluster, the quite similar way like
> it
> > was integrated with Apache Spark.
> >
> >
> > Because of all of this I propose:
> >
> > 1)To create new ticket in Flink’s JIRA for integration of Flink with
> > DL4J and decide on which side this 

[jira] [Created] (FLINK-5731) Split up CI builds

2017-02-07 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5731:
--

 Summary: Split up CI builds
 Key: FLINK-5731
 URL: https://issues.apache.org/jira/browse/FLINK-5731
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Tests
Reporter: Ufuk Celebi
Assignee: Robert Metzger
Priority: Critical


Test builds regularly time out because we are hitting the Travis 50 min limit. 
Previously, we worked around this by splitting up the tests into groups. I 
think we have to split them further.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Discuss] Organizing Documentation for Configuration Options

2017-02-07 Thread Robert Metzger
+1 to automate this and describe the config parameters in the code.

That's exactly the approach Apache Kafka is taking as well for their config.

On Tue, Feb 7, 2017 at 9:04 AM, Ufuk Celebi  wrote:

> I fully agree with you Greg.
>
> Since this is doomed to get out of sync again very shortly after clean up,
> I vote to automate this. Stephan introduced the ConfigOption type, which
> makes it easy to define the options. It's already planned to migrate all
> configuration options from ConfigConstants to this approach.
>
> For an example see here: https://github.com/apache/flink/blob/master/
> flink-core/src/main/java/org/apache/flink/configuration/
> HighAvailabilityOptions.java
>
> I think that it is possible to build the configuration docs page from this
> with reasonable effort.
>
> This would translate the task to:
> 1) Automate ConfigOption to HTML/Markdown generation
> 2) Extend ConfigOption with description fields
> 3) Migrate ConfigConstants to ConfigOptions
>
> I would also volunteer to take a first stab at this.
>
> Regarding the network buffers: +1 to your suggestion. Nico (cc'd) is
> starting to work on automating the network buffer configuration in order to
> get rid of any manual tuning for most users (because of the issues you
> described + streaming and batch jobs require different tuning, which
> complicates things even more).
>
> – Ufuk
>
> On 6 February 2017 at 19:21:28, Greg Hogan (c...@greghogan.com) wrote:
> > > Hi devs,
> >
> > Flink's Configuration page [1] has grown intimidatingly long
> > and complex.
> > Options are described across three main sections: common options
> > (single
> > section), advanced options (multiple sections), and full reference.
> > The
> > trailing "background" section further describes the most impactful
> > options
> > in much greater detail.
> >
> > Several recent tickets, and a few outstanding, have added missing
> > options
> > to the configuration documentation. I'd like to propose a goal
> > of
> > organizing all options in the full reference into alphabetized,
> > tabular
> > form (one table per section), much like the system metrics [2].
> > Columns
> > would be option name, description, and default value.
> >
> > The common and advanced sections could also be converted to tabular
> > form
> > with the exception of Kerberos-based Security. Missing options
> > would be
> > added to the full reference.
> >
> > Lastly, the simple heuristic for configuring network buffers
> > has prompted
> > many questions on the mailing list. With the 1.3 release the total
> > and
> > number of available buffers is reported through metrics and
> > in the web
> > dashboard. My experience has been that the number of required
> > buffers is
> > highly dependent on job topology and cluster performance. I
> > propose keeping
> > the simple heuristic and description while directing users
> > to monitor the
> > balance of available buffers.
> >
> > Greg
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-
> master/monitoring/metrics.html#system-metrics
> > [3]
> > https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html#configuring-the-network-buffers
>
>


[jira] [Created] (FLINK-5730) User can concurrently modify state metadata of RocksDB asynchronous snapshots

2017-02-07 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-5730:
-

 Summary: User can concurrently modify state metadata of RocksDB 
asynchronous snapshots
 Key: FLINK-5730
 URL: https://issues.apache.org/jira/browse/FLINK-5730
 Project: Flink
  Issue Type: Bug
Reporter: Stefan Richter


The current implementation of asynchronous snapshots in RocksDB iterates the 
original state metadata structures as part of the asynchronous snapshot. Users 
could potentially modify the state metadata concurrently (e.g. by registering a 
new state while the snapshot is running), thereby corrupting the metadata.

For the way most users are registering their states (at the start of the task), 
this is not a problem. However, if state is conditionally registered at some 
later point in time this can be problematic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


RE: [jira] [Created] (FLINK-5656) Add processing time OVER ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL

2017-02-07 Thread Stefano Bortoli
Thanks for confirming this. After the email I started working on it and things 
became more clear already. I will go ahead this way. 

Dr. Stefano Bortoli
Senior Research Engineer - Big Data and Semantic Technology Expert
IT R Division


-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Monday, February 06, 2017 6:37 PM
To: dev@flink.apache.org
Subject: Re: [jira] [Created] (FLINK-5656) Add processing time OVER ROWS 
BETWEEN UNBOUNDED PRECEDING aggregation to SQL

Hi Stefano,

I don't think we should integrate this with LogicalWindowAggregate which is 
meant for GroupBy windows and not Over windows.
Moreover, LogicalWindowAggregate is on the logical plan level but we need to 
implement a physical operator, i.e., a DataStreamRel.
Calcite parses the SQL query into the logical representation already. The 
windowing semantics is captured in the LogicalProject / LogicalCalc.
Radu pointed [1] out that it makes sense to apply a rule to extract the window 
semantics into a LogicalWindow with a Calcite optimization rule.

From there we should add DataStreamRel that creates the required DataStream 
transformations and functions and the corresponding translation rule that 
converts LogicalWindow / LogicalCalc into the DataStreamRel.

Btw. It would be good to move this discussion to the JIRA issue. Just replying 
to the CREATE mail will not mirror the discussion on JIRA.

Best, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-5654

2017-02-06 15:26 GMT+01:00 Stefano Bortoli :

> Hi Fabian,
>
> After working around the rule, I am moving towards the implementation 
> of the Aggregation function.
>
> I started working extending DataStreamRel (for which I created a Java 
> version). However, I noticed the LogicalWindowAggregate provides the 
> list of aggregatedCalls and other parameters. Perhaps it is a better 
> idea to start extending this one. But I may not be aware of some 
> implications related to this choice. What do you think?
>
> My first idea was to implement a WindowAggregateUtil class including 
> some methods to extract and perhaps interpret window parameters (e.g.
> boundaries, aggregate calls, parameter pointers, etc. )
>
> Dr. Stefano Bortoli
> Senior Research Engineer - Big Data and Semantic Technology Expert IT 
> R Division
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> European Research Center
> Riesstrasse 25, 80992 München
>
> -Original Message-
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Thursday, February 02, 2017 1:48 PM
> To: dev@flink.apache.org
> Subject: Re: [jira] [Created] (FLINK-5656) Add processing time OVER 
> ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL
>
> Sounds good to me Stefano!
>
> Best, Fabian
>
> 2017-02-01 13:43 GMT+01:00 Stefano Bortoli :
>
> > Hi all,
> >
> > I was thinking to open a JIRA for the procTime() function so that it 
> > could be merged before and others could use it as well. What do you
> think?
> >
> > Regards,
> > Stefano
> >
> >
> > -Original Message-
> > From: Fabian Hueske [mailto:fhue...@gmail.com]
> > Sent: Friday, January 27, 2017 10:34 AM
> > To: dev@flink.apache.org
> > Subject: Re: [jira] [Created] (FLINK-5656) Add processing time OVER 
> > ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL
> >
> > Hi Stefano,
> >
> > I can assign the issue to you if you want to.
> > Just drop a comment in JIRA.
> >
> > Best, Fabian
> >
> > 2017-01-27 9:39 GMT+01:00 Stefano Bortoli :
> >
> > > Hi Fabian,
> > >
> > > In the next days I will start working on this issue. As soon as I 
> > > have a proposal I will start sharing it for discussion.
> > >
> > > Regards,
> > > Dr. Stefano Bortoli
> > > Senior Research Engineer - Big Data and Semantic Technology Expert 
> > > IT R Division
> > >
> > > -Original Message-
> > > From: Fabian Hueske (JIRA) [mailto:j...@apache.org]
> > > Sent: Thursday, January 26, 2017 2:49 PM
> > > To: dev@flink.apache.org
> > > Subject: [jira] [Created] (FLINK-5656) Add processing time OVER 
> > > ROWS BETWEEN UNBOUNDED PRECEDING aggregation to SQL
> > >
> > > Fabian Hueske created FLINK-5656:
> > > 
> > >
> > >  Summary: Add processing time OVER ROWS BETWEEN 
> > > UNBOUNDED PRECEDING aggregation to SQL
> > >  Key: FLINK-5656
> > >  URL: https://issues.apache.org/jira/browse/FLINK-5656
> > >  Project: Flink
> > >   Issue Type: Sub-task
> > >   Components: Table API & SQL
> > > Reporter: Fabian Hueske
> > >
> > >
> > > The goal of this issue is to add support for OVER ROW aggregations 
> > > on processing time streams to the SQL interface.
> > >
> > > Queries similar to the following should be supported:
> > > {code}
> > > SELECT
> > >   a,
> > >   SUM(b) OVER (PARTITION BY c ORDER BY procTime() ROW BETWEEN 
> > > UNBOUNDED PRECEDING AND CURRENT ROW) AS sumB,
> > >   

[jira] [Created] (FLINK-5729) add hostname option in SocketWindowWordCount example to be more convenient

2017-02-07 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5729:
---

 Summary: add hostname option in SocketWindowWordCount example to 
be more convenient
 Key: FLINK-5729
 URL: https://issues.apache.org/jira/browse/FLINK-5729
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Reporter: Tao Wang
Priority: Minor


"hostname" option will help users to get data from the right port, otherwise 
the example would fail due to connection refused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Discuss] Organizing Documentation for Configuration Options

2017-02-07 Thread Ufuk Celebi
I fully agree with you Greg.

Since this is doomed to get out of sync again very shortly after clean up, I 
vote to automate this. Stephan introduced the ConfigOption type, which makes it 
easy to define the options. It's already planned to migrate all configuration 
options from ConfigConstants to this approach.

For an example see here: 
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/HighAvailabilityOptions.java

I think that it is possible to build the configuration docs page from this with 
reasonable effort.

This would translate the task to:
1) Automate ConfigOption to HTML/Markdown generation
2) Extend ConfigOption with description fields
3) Migrate ConfigConstants to ConfigOptions

I would also volunteer to take a first stab at this.

Regarding the network buffers: +1 to your suggestion. Nico (cc'd) is starting 
to work on automating the network buffer configuration in order to get rid of 
any manual tuning for most users (because of the issues you described + 
streaming and batch jobs require different tuning, which complicates things 
even more).

– Ufuk

On 6 February 2017 at 19:21:28, Greg Hogan (c...@greghogan.com) wrote:
> > Hi devs,
>  
> Flink's Configuration page [1] has grown intimidatingly long  
> and complex.
> Options are described across three main sections: common options  
> (single
> section), advanced options (multiple sections), and full reference.  
> The
> trailing "background" section further describes the most impactful  
> options
> in much greater detail.
>  
> Several recent tickets, and a few outstanding, have added missing  
> options
> to the configuration documentation. I'd like to propose a goal  
> of
> organizing all options in the full reference into alphabetized,  
> tabular
> form (one table per section), much like the system metrics [2].  
> Columns
> would be option name, description, and default value.
>  
> The common and advanced sections could also be converted to tabular  
> form
> with the exception of Kerberos-based Security. Missing options  
> would be
> added to the full reference.
>  
> Lastly, the simple heuristic for configuring network buffers  
> has prompted
> many questions on the mailing list. With the 1.3 release the total  
> and
> number of available buffers is reported through metrics and  
> in the web
> dashboard. My experience has been that the number of required  
> buffers is
> highly dependent on job topology and cluster performance. I  
> propose keeping
> the simple heuristic and description while directing users  
> to monitor the
> balance of available buffers.
>  
> Greg
>  
> [1] https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html  
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#system-metrics
>   
> [3]
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers
>