Re: Sorting of fields

2015-02-04 Thread Timo Walther

Ok, I found an earlier discussion about it. Sorry for the mail.

However, I think this is a very important feature and I should be added 
soon.


On 04.02.2015 14:38, Timo Walther wrote:

Hey,

is it correct that we currently do not support sorting without any 
grouping? I had this question by 2 users in the last weeks and now I 
also need this functionality.



Is it possible to sort e.g. a word count result Tuple2String, 
Integer by count?



Regards,
Timo





Re: Eclipse JDT, Java 8, lambdas

2015-02-09 Thread Timo Walther

Hey,

it seems that 4.4.2 also includes the fix 
(https://projects.eclipse.org/projects/eclipse/releases/4.4.2/bugs) and 
will be released end of february. I will try Eclipse Luna SR2 RC2 today 
and check if it is working.


Regards,
Timo


On 09.02.2015 10:05, Nam-Luc Tran wrote:

I did try the 4.5 M4 release and it did not go straightforward.



--
View this message in context: 
http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Eclipse-JDT-Java-8-lambdas-tp3664p3688.html
Sent from the Apache Flink (Incubator) Mailing List archive. mailing list 
archive at Nabble.com.




Re: Eclipse JDT, Java 8, lambdas

2015-02-11 Thread Timo Walther
I will investigate that until the compiler is officially released (not 
just an RC).


I think we should change the documentation to the current situation of 
Lambda Expressions where only a specific minor release version of 
Eclipse JDT compiler is officially supported. I will do this tomorrow...


On 09.02.2015 16:28, Stephan Ewen wrote:

Is it possible to use this compiler for the java 8 quickstart archetypes?

On Mon, Feb 9, 2015 at 4:14 PM, Timo Walther twal...@apache.org wrote:


The fix is included in 4.4.2. However, it seems that even if the compiler
option org.eclipse.jdt.core.compiler.codegen.lambda.genericSignature=generate
is set in the project's org.eclipse.jdt.core.prefs file, it has no effect.

The command line approach works:

java -jar ./plugins/org.eclipse.jdt.core_3.10.2.v20150120-1634.jar -8
-cp /usr/lib/jvm/jdk1.8.0/jre/lib/rt.jar -genericsignature
Test.java

I will further investigate the problem and open an issue at Eclipse JDT.


Regards
Timo



On 09.02.2015 10:39, Timo Walther wrote:


Hey,

it seems that 4.4.2 also includes the fix (https://projects.eclipse.org/
projects/eclipse/releases/4.4.2/bugs) and will be released end of
february. I will try Eclipse Luna SR2 RC2 today and check if it is working.

Regards,
Timo


On 09.02.2015 10:05, Nam-Luc Tran wrote:


I did try the 4.5 M4 release and it did not go straightforward.



--
View this message in context: http://apache-flink-incubator-
mailing-list-archive.1008284.n3.nabble.com/Eclipse-JDT-
Java-8-lambdas-tp3664p3688.html
Sent from the Apache Flink (Incubator) Mailing List archive. mailing
list archive at Nabble.com.





Re: kryoException : Buffer underflow

2015-02-11 Thread Timo Walther

Hey Nam-Luc,

I think your problem lies in the following line:

.returns(eu.euranova.flink.Centroid25)

If you do not specify the fields of the class in the String by using 
myfield=String,otherField=int, the underlying parser will create an 
GenericTypeInfo type information which then uses Kryo for serialization.


In general, lambda expressions are a very new feature which currently 
makes a lot of problems due to missing type information by compilers. 
Maybe it is better to use (anonymous) classes instead.


In case of map() functions you don't need to provide type hints 
through the returns() method.


For other operators you need to either specify all fields of the class 
in the String (makes no sense in you case) or you change the method to


.returns(Centroid25.class)

I hope that helps.

Regards,
Timo

On 11.02.2015 17:38, Nam-Luc Tran wrote:

Hello Stephan,

Thank you for your help.

I ensured all the POJO classes used comply to what you previously said
and the same exception occurs. Here is the listing of classes
Centroid25 and Point25:

public class Centroid25 extends Point25 {

public int id;

public Centroid25() {}

public Centroid25(int id, Double value0, Double value1, Double value2,
Double value3, Double value4, Double value5,
Double value6, Double value7, Double value8, Double value9, Double
value10, Double value11, Double value12,
Double value13, Double value14, Double value15, Double value16, Double
value17, Double value18,
Double value19, Double value20, Double value21, Double value22, Double
value23, Double value24) {
super(value0, value1, value2, value3, value4, value5, value6, value7,
value8, value9, value10, value11,
value12, value13, value14, value15, value16, value17, value18,
value19, value20, value21, value22,
value23, value24);
this.id = id;
}

public Centroid25(int id, Point25 p) {
super(p.f0,
p.f1,p.f2,p.f3,p.f4,p.f5,p.f6,p.f7,p.f8,p.f9,p.f10,p.f11,p.f12,p.f13,p.f14,p.f15,p.f16,p.f17,p.f18,p.f19,p.f20,p.f21,p.f22,p.f23,p.f24);
this.id = id;
}

public Centroid25(int id, Tuple25 p) {
super(p.f0,
p.f1,p.f2,p.f3,p.f4,p.f5,p.f6,p.f7,p.f8,p.f9,p.f10,p.f11,p.f12,p.f13,p.f14,p.f15,p.f16,p.f17,p.f18,p.f19,p.f20,p.f21,p.f22,p.f23,p.f24);
this.id = id;
}

@Override
public String toString() {
return id +   + super.toString();
}
}

public class Point25{

public Double
f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24
= 0.0;

public Point25() {
}

public Point25(Double value0, Double value1, Double value2, Double
value3, Double value4, Double value5,
Double value6, Double value7, Double value8, Double value9, Double
value10, Double value11, Double value12,
Double value13, Double value14, Double value15, Double value16, Double
value17, Double value18,
Double value19, Double value20, Double value21, Double value22, Double
value23, Double value24) {
f0=value0;
f1=value1;
f2=value2;
f3=value3;
f4=value4;
f5=value5;
f6=value6;
f7=value7;
f8=value8;
f9=value9;
f10=value10;
f11=value11;
f12=value12;
f13=value13;
f14=value14;
f15=value15;
f16=value16;
f17=value17;
f18=value18;
f19=value19;
f20=value20;
f21=value21;
f22=value22;
f23=value23;
f24=value24;

}

public List getFieldsAsList() {
return Arrays.asList(f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11,
f12, f13, f14, f15, f16, f17, f18, f19,
f20, f21, f22, f23, f24);
}

public Point25 add(Point25 other) {
f0 += other.f0;
f1 += other.f1;
f2 += other.f2;
f3 += other.f3;
f4 += other.f4;
f5 += other.f5;
f6 += other.f6;
f7 += other.f7;
f8 += other.f8;
f9 += other.f9;
f10 += other.f10;
f11 += other.f11;
f12 += other.f12;
f13 += other.f13;
f14 += other.f14;
f15 += other.f15;
f16 += other.f16;
f17 += other.f17;
f18 += other.f18;
f19 += other.f19;
f20 += other.f20;
f21 += other.f21;
f22 += other.f22;
f23 += other.f23;
f24 += other.f24;
return this;
}

public Point25 div(long val) {
f0 /= val;
f1 /= val;
f2 /= val;
f3 /= val;
f4 /= val;
f5 += val;
f6 += val;
f7 += val;
f8 += val;
f9 += val;
f10 += val;
f11 += val;
f12 += val;
f13 += val;
f14 += val;
f15 += val;
f16 += val;
f17 += val;
f18 += val;
f19 += val;
f20 += val;
f21 += val;
f22 += val;
f23 += val;
f24 += val;
return this;
}

public double euclideanDistance(Point25 other) {
List l = this.getFieldsAsList();
List ol = other.getFieldsAsList();
double res = 0;
for(int i=0;i

I came accross an error for which I am unable to retrace the exact

cause.

Starting from flink-java-examples module, I have extended the KMeans
example
to a case where points have 25 coordinates. It follows the exact

same

structure and transformations as the original example, only with

points

having 25 coordinates instead of 2.

When creating the centroids dataset within the code as follows the

job

iterates and executes well:

Centroid25 cent1 = new

Centroid25(ThreadLocalRandom.current().nextInt(0,

1000),



-10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0);

Centroid25 cent2 = new


Re: Fwd: TypeSerializerInputFormat cannot determine its type automatically

2015-01-29 Thread Timo Walther

Hey Alexander,

I have looked into your issue. You can simply use 
env.createInput(InputFormat,TypeInformation) instead of env.readFile() 
then you can pass TypeInformation manually without implementing 
ResultTypeQueryable.


Regards,
Timo



On 29.01.2015 14:54, Alexander Alexandrov wrote:

The problem seems to be that the reflection analysis cannot determine the
type of the TypeSerializerInputFormat.

One possible solution is to add the ResultTypeQueryable interface and force
clients to explicitly set the TypeInformation.

This might break code which relies on automatic type inference, but at the
moment I cannot find any other usages of the TypeSerializerInputFormat
except from the unit test.


-- Forwarded message --
From: Alexander Alexandrov alexander.s.alexand...@gmail.com
Date: 2015-01-29 12:04 GMT+01:00
Subject: TypeSerializerInputFormat cannot determine its type automatically
To: u...@flink.apache.org


I am trying to use the TypeSerializer IO formats to write temp data to
disk. A gist with a minimal example can be found here:

https://gist.github.com/aalexandrov/90bf21f66bf604676f37

However, with the current setting I get the following error with the
TypeSerializerInputFormat:

Exception in thread main
org.apache.flink.api.common.InvalidProgramException: The type returned by
the input format could not be automatically determined. Please specify the
TypeInformation of the produced type explicitly.
 at
org.apache.flink.api.java.ExecutionEnvironment.readFile(ExecutionEnvironment.java:341)
 at SerializedFormatExample$.main(SerializedFormatExample.scala:48)
 at SerializedFormatExample.main(SerializedFormatExample.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

I think that the typeInformation instance at line 43 should be somehow
passed to the TypeSerializerInputFormat, but I cannot find a way to do it.

Any suggestions?

Thanks,
A.





Semantic Properties and Functions with Iterables

2015-03-06 Thread Timo Walther

Hey all,

I'm currently working a lot on the UDF static code analyzer. But I have 
a general question about Semantic Properties which might be also 
interesting for other users.


How is the ForwardedFields annotation interpreted for UDF functions with 
Iterables?


An example can be found in: 
org.apache.flink.examples.java.graph.EnumTrianglesBasic.TriadBuilder


Does this mean that each call of collect must happen in the same order 
than the call of next? But this is not the case in the example above. 
Or does the annotation only refer to the first iterator element?


Other examples:

@ForwardedFields(*) // CORRECT?
public static class GroupReduce1 implements 
GroupReduceFunctionTuple2Long, Long,Tuple2Long, Long {

@Override
public void reduce(IterableTuple2Long, Long values,
CollectorTuple2Long, Long out) throws Exception {
out.collect(values.iterator().next());
}
}

@ForwardedFields(*) // NOT CORRECT?
public static class GroupReduce3 implements 
GroupReduceFunctionTuple2Long, Long,Tuple2Long, Long {

@Override
public void reduce(IterableTuple2Long, Long values,
CollectorTuple2Long, Long out) throws Exception {
IteratorTuple2Long, Long it = values.iterator();
while (it.hasNext()) {
Tuple2Long,Long t = it.next();
if (t.f0 == 42) {
out.collect(t);
}
}
}
}

Thanks in advance.

Regards,
Timo


Re: [VOTE] Name of Expression API Representation

2015-03-26 Thread Timo Walther

+Table API

Same thoughts as Stephan. Table is more common in the economy than Relation.

On 25.03.2015 21:30, Stephan Ewen wrote:

+Table API / Table

I have a feeling that Relation is a name mostly used by people with a
deeper background in (relational) databases, while table is more the
pragmatic developer term. (As a reason for my choice)
Am 25.03.2015 20:37 schrieb Fabian Hueske fhue...@gmail.com:


I think the voting scheme is clear.
The mail that started the thread says:

The name with the most votes is chosen.
If the vote ends with no name having the most votes, a new vote
with an alternative voting scheme will be done.

So let's go with a single vote and handle corner cases as they appear.

2015-03-25 20:24 GMT+01:00 Ufuk Celebi u...@apache.org:


+Table, DataTable

---

How are votes counted? When voting for the name of the project, we didn't
vote for one name, but gave a preference ordering.

In this case, I am for Table or DataTable, but what happens if I vote for
Table and then there is a tie between DataTable and Relation? Will Table
count for DataTable then?

– Ufuk

On 25 Mar 2015, at 18:33, Vasiliki Kalavri vasilikikala...@gmail.com
wrote:


+Relation
On Mar 25, 2015 6:29 PM, Henry Saputra henry.sapu...@gmail.com

wrote:

+Relation

PS
Aljoscha, don't forget to cast your own vote :)


On Wednesday, March 25, 2015, Aljoscha Krettek aljos...@apache.org
wrote:


Please vote on the new name of the equivalent to DataSet and
DataStream in the new expression-based API.

 From the previous discussion thread three names emerged: Relation,
Table and DataTable.

The vote is open for the next 72 hours.
The name with the most votes is chosen.
If the vote ends with no name having the most votes, a new vote
with an alternative voting scheme will be done.

Please vote either of these:

+Relation
+Table
+DataTable







Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-26 Thread Timo Walther

+1 for a beta release. So there is no feature-freeze until the RC right?


On 26.03.2015 15:32, Márton Balassi wrote:

+1 for the early release.

I'd call it 0.9-milestone1.

On Thu, Mar 26, 2015 at 1:37 PM, Maximilian Michels m...@apache.org wrote:


+1 for a beta release: 0.9-beta.

On Thu, Mar 26, 2015 at 12:09 PM, Paris Carbone par...@kth.se wrote:


+1 for an early release. It will help unblock the samoa PR that has 0.9
dependencies.


On 26 Mar 2015, at 11:44, Kostas Tzoumas ktzou...@apache.org wrote:

+1 for an early milestone release. Perhaps we can call it 0.9-milestone

or

so?

On Thu, Mar 26, 2015 at 11:01 AM, Robert Metzger rmetz...@apache.org
wrote:


Two weeks have passed since we've discussed the 0.9 release the last

time.

The ApacheCon is in 18 days from now.
If we want, we can also release a 0.9.0-beta release that contains

known

bugs, but allows our users to try out the new features easily (because

they

are part of a release). The vote for such a release would be mainly

about

the legal aspects of the release rather than the stability. So I

suspect

that the vote will go through much quicker.



On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger rmetz...@apache.org
wrote:


I've reopened https://issues.apache.org/jira/browse/FLINK-1650

because

the issue is still occurring.

On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi u...@apache.org wrote:


On Thursday, March 12, 2015, Till Rohrmann till.rohrm...@gmail.com
wrote:


Have you run the 20 builds with the new shading code? With new

shading

the

TaskManagerFailsITCase should no longer fail. If it still does,

then

we

have to look into it again.


No, rebased on Monday before shading. Let me rebase and rerun

tonight.








Re: Merge Python API

2015-04-20 Thread Timo Walther

+1

On 20.04.2015 14:49, Gyula Fóra wrote:

+1

On Mon, Apr 20, 2015 at 2:41 PM, Fabian Hueske fhue...@gmail.com wrote:


+1

2015-04-20 14:39 GMT+02:00 Maximilian Michels m...@apache.org:


+1 Let's merge it to flink-staging and get some people to use it.

On Mon, Apr 20, 2015 at 2:21 PM, Kostas Tzoumas ktzou...@apache.org
wrote:


I'm +1 for this

On Mon, Apr 20, 2015 at 11:03 AM, Robert Metzger rmetz...@apache.org
wrote:


Hi,

The Python API pull request [1] has been open for quite some time

now.

I was wondering whether we are planning to merge it or not.
I took a closer look at the Python API a few weeks ago and I think we
should merge it to expose it to our users to collect feedback.
I hope by merging it, we'll find additional contributors for it and

we

get

more feedback.

Since it will be located in the flink-staging module and we'll mark

it

as

a beta component, there is not much risk that we break any existing

code.

Please give me some +1's if you want to merge the Python API PR.
I'd like to merge it in the next 24 to 48 hours, depending on the

feedback

I'm getting in this thread here.




[1] https://github.com/apache/flink/pull/202





Re: New project website

2015-05-11 Thread Timo Walther
I completely agree with Robert. Less marketing, more technical 
information targeted to the user group.


I think the diagram could be a little bit larger and the blog post 
smaller, so that the lines meet in the middle ;)


On 11.05.2015 14:39, Robert Metzger wrote:

I think its fine when the front page looks a little bit more like a
documentation.
Flink is no fancy hipster app, our users are either sysops running it on a
cluster or developers programming against APIs.
I think the new website will convince this target audience.

I agree with you that aligning the vertical separators might help visual
structure of the page. I think we only need to center the vertical
separator between the getting started and the blog/community section.
The positioning of the first line is fine, in my opinion


On Mon, May 11, 2015 at 2:27 PM, Felix Neutatz neut...@googlemail.com
wrote:


Hi Ufuk,

I really like the idea of redesigning the start page. But in my opinion
your page design looks more like a documentation webpage than a starting
page.

In my personal opinion I like the current design better, since you get a
really quick overview with many fancy pictures. (So if you wanna dive in
deep, you can check out things in the documentation or/and wiki).

Moreover I have some design issue. Can you align the structure that there
aren't as many overlaps. Here you can see what I mean:
http://postimg.org/image/paogu6f3h/

This is only my personal opinion, maybe I have a wrong idea about starting
pages.

Best regards,
Felix


2015-05-11 14:06 GMT+02:00 Hermann Gábor reckone...@gmail.com:


Great!
It looks way better than the current site :)

On Mon, May 11, 2015 at 1:28 PM Stephan Ewen se...@apache.org wrote:


I think we may have to remove the term Language Integrated Queries, I
think that is trademarked by Microsoft.

Otherwise, +1

On Mon, May 11, 2015 at 1:19 PM, Maximilian Michels m...@apache.org
wrote:


+1 very nice comprehensive overhaul of the website. I'd suggest to

merge

this as soon as possible. We can incrementally fix the remaining

issues

and

add a twitter feed to the front page.

On Mon, May 11, 2015 at 12:09 PM, Ted Dunning ted.dunn...@gmail.com
wrote:


If there is an active twitter entity for Flink, I recommend framing

that

on

the home page.



On Mon, May 11, 2015 at 8:51 AM, Ufuk Celebi u...@apache.org

wrote:

Hey all,

I reworked the project website the last couple of days and would

like

to

share the preview:

http://uce.github.io/flink-web/

I would like to get this in asap. We can push incremental updates

at

any

time, but I think this version is a big improvement over the

current

status

quo. If I get some +1s I'll go ahead and update the website

today.

– Ufuk




Re: About Operator and OperatorBase

2015-04-16 Thread Timo Walther

I share Stephans opinion.

By the way, we could also find a common name for operators with two 
inputs. Sometimes it's TwoInputXXX, DualInputXXX, 
BinaryInputXXX... pretty inconsistent.


On 15.04.2015 17:48, Till Rohrmann wrote:

I would also be in favour of making the distinction between the API and
common API layer more clear by using different names. This will ease the
understanding of the source code.

In the wake of a possible renaming we could also get rid of the legacy code
org.apache.flink.optimizer.dag.MatchNode and
rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
make the naming more consistent.

On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:


On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:


I think we can rename the base operators.

Renaming the subclass of DataSet would be extremely api breaking. I think
that is not worth it.

Oh, that's right. We return MapOperator for DataSet operations. Stephan's
point makes sense.




Re: SQL on Flink

2015-05-27 Thread Timo Walther

It's rather passion for the future of the project than passion for SQL ;-)

I always try to think like someone from the economy. And IMO the guys 
from economy are still thinking in SQL. If you want to persuade someone 
coming from the SQL world, you should offer a SQL interface to run 
legacy code first (similar to Hadoop operators). Rewriting old queries 
in Table API is not very convenient.


I share Stephans opinion. Building both APIs concurrently would act as a 
good source to test and extend the Table API. Currently, the Table API 
is half-done, but I think the goal is to have SQL functionality. I can 
implement an SQL operator and extend the Table API if functionality is 
missing.


On 27.05.2015 16:41, Fabian Hueske wrote:

+1 for committer passion!

Please don't get me wrong, I think SQL on Flink would be a great feature.
I just wanted to make the point that the Table API needs to mirror all SQL
features, if SQL is implemented on top of the Table API.


2015-05-27 16:34 GMT+02:00 Kostas Tzoumas ktzou...@apache.org:


I think Fabian's arguments make a lot of sense.

However, if Timo *really wants* to start SQL on top of Table, that is what
he will do a great job at :-) As usual, we can keep it in beta status in
flink-staging until it is mature... and it will help create issues for the
Table API and give direction to its development. Perhaps we will have a
feature-poor SQL for a bit, then switch to hardening the Table API to
support more features and then back to SQL.

I'm just advocating for committer passion-first here :-) Perhaps Timo
should weight in

On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske fhue...@gmail.com wrote:


IMO, it is better to have one feature that is reasonably well developed
instead of two half-baked features. That's why I proposed to advance the
Table API a bit further before starting the next big thing. I played

around

with the Table API recently and I think it definitely needs a bit more
contributor attention and more features to be actually usable. Also since
all features of the SQL interface need to be included in the Table API
(given we follow the SQL on Table approach) it makes sense IMO to push

the

Table API a bit further before going for the next thing.

2015-05-27 16:06 GMT+02:00 Stephan Ewen se...@apache.org:


I see no reason why a SQL interface cannot be bootstrapped

concurrently.

It would initially not support many operations,
but would act as a good source to test and drive functionality from the
Table API.


@Ted:

I would like to learn a bit more about the stack and internal

abstractions

of Drill. It may make sense to
reuse some of the query execution operators from Drill. I especially

like

the learning schema on the fly part of drill.

Flink DataSets and Streams have a schema, but it may in several cases

be

a

schema lower bound, like the greatest common superclass.
Those cases may benefit big time from Drill's ability to refine schema

on

the fly.

That may be useful also in the Table API, making it again available to
LINQ-like programs, and SQL scripts.

On Wed, May 27, 2015 at 3:49 PM, Robert Metzger rmetz...@apache.org
wrote:


I didn't know that paper...  Thanks for sharing.

I've worked on a SQL layer for Stratosphere some time ago, using

Apache

Calcite (called Optiq back then). I think the project provides a lot

of

very good tooling for creating a SQL layer. So if we decide to go for

SQL

on Flink, I would suggest to use Calcite.
I can also help you a bit with Calcite to get started with it.

I agree with Fabian that it would probably make more sense for now to
enhance the Table API.
I think the biggest limitation right now is that it only supports

POJOs.

We should also support Tuples (I know thats difficult to do), data

from

HCatalog (that includes parquet  orc), JSON, ...
Then, I would add filter and projection pushdown into the table API.



On Tue, May 26, 2015 at 10:03 PM, Ted Dunning ted.dunn...@gmail.com
wrote:


It would also be relatively simple (I think) to retarget drill to

Flink

if

Flink doesn't provide enough typing meta-data to do traditional

SQL.



On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com

wrote:

Hi,

Flink's Table API is pretty close to what SQL provides. IMO, the

best

approach would be to leverage that and build a SQL parser (maybe

together

with a logical optimizer) on top of the Table API. Parser (and

optimizer)

could be built using Apache Calcite which is providing exactly

this.

Since the Table API is still a fairly new component and not very

feature

rich, it might make sense to extend and strengthen it before

putting

something major on top.

Cheers, Fabian

2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:


Hey everyone,

I would be interested in having a complete SQL API in Flink.

How

is

the

status there? Is someone already working on it? If not, I would

like

to

work on it. I found

http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf

but

I

Re: Function input type validation

2015-11-08 Thread Timo Walther
The reason for input validation is to check if the Function is fully 
compatible. Actually only the return types are necessary, but it 
prohibits stupid implementation mistakes and undesired behavior.


E.g. if you implement a "class MyMapper extends MapFunction{}" and use it for "env.fromElements(1,2,3).map(new MyMapper())"


Regards,
Timo

On 08.11.2015 23:27, Chesnay Schepler wrote:

On 08.11.2015 21:28, Gyula Fóra wrote:

Let's say I want to implement my own TupleTypeinfo that handles null
values, and I pass this typeinfo in the returns call of an operation. 
This
will most likely fail when the next operation validates the input 
although

I think it shouldn't.
So i just tried this. Removed the final modifier from TupleTypeInfo, 
created one that extended TupleTypeInfo,
threw in a returns() call into the ConnectedComponents example and it 
passed the input validation.


Unless i misunderstood something you're premise is invalid.




Re: Error in during TypeExtraction

2015-11-12 Thread Timo Walther
This looks like a bug. Can you open an issue for that? I will look into 
it later.


Regards,
Timo


On 12.11.2015 13:16, Gyula Fóra wrote:

Hey,

I get a weird error when I try to execute my job on the cluster. Locally
this works fine but running it from the command line fails during
typeextraction:

input1.union(input2, input3).map(Either::
Left).returns(eventOrLongType);

This fails when trying to extract the output type from the mapper, but I
wouldnt actually care about that because I am providing my custom type
implementation for this Either type.

The error:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error.
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:512)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
at org.apache.flink.client.program.Client.runBlocking(Client.java:250)
at
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:669)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:320)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:971)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1021)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:418)
at java.util.ArrayList.get(ArrayList.java:431)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoFromInputs(TypeExtractor.java:599)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:493)
at
org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1392)
at
org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1273)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:560)
at
org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:389)
at
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:273)
at
org.apache.flink.api.java.typeutils.TypeExtractor.getMapReturnTypes(TypeExtractor.java:110)
at
org.apache.flink.streaming.api.datastream.DataStream.map(DataStream.java:550)

Any ideas?

Gyula





Re: [DISCUSSION] Type hints versus TypeInfoParser

2015-11-11 Thread Timo Walther

+1

It's still hacky but we don't have better alternatives.

I'm not 100% sure if we can get rid of the parser. I think it's still a 
nice way for quickly defining the fields of a POJO if the type extractor 
fails to analyze it. But actually I don't know an example where it fails.


Regards,
Timo

On 11.11.2015 19:56, Aljoscha Krettek wrote:

Big +1

Of course, we had the initial talk about it… :D

On 11 Nov 2015, at 19:33, Kirschnick, Johannes 
<johannes.kirschn...@tu-berlin.de> wrote:

Hi Stephan,

looking at the TypeHint, I got reminded on how Gson (a Json Parser) handles 
this situation of parsing generics.

See here for an overview
https://sites.google.com/site/gson/gson-user-guide#TOC-Serializing-and-Deserializing-Generic-Types

Seems like this method was rediscovered :) And maybe there are some tricks that 
can be learned from the implementation

I'm all in favor for "hard" types over string literals.

Johannes

P.S.
Apparently GWT uses the same "trick" to handle generics ...

-Ursprüngliche Nachricht-
Von: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] Im Auftrag von 
Stephan Ewen
Gesendet: Mittwoch, 11. November 2015 15:53
An: dev@flink.apache.org
Betreff: [DISCUSSION] Type hints versus TypeInfoParser

Hi all!

We discovered a nice way to give TypeHints in the Google Cloud Dataflow SDK, in 
a way that would fit Flink perfectly. I created a JIRA for that:
https://issues.apache.org/jira/browse/FLINK-2788

Since this is more powerful and type safe than the String/Parser way of giving 
hints, I was wondering whether we should add this and deprecate the String 
variant. If we do that, 1.0 is the time to do that.

What do you think about this idea?

@Timo Walther Since you worked a lot on types/parser/etc - what is your take on 
this?

Greetings,
Stephan




Re: Powered by Flink

2015-10-19 Thread Timo Walther

+1 for adding it to the website instead of wiki.
"Who is using Flink?" is always a question difficult to answer to 
interested users.


On 19.10.2015 15:08, Suneel Marthi wrote:

+1 to this.

On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske > wrote:


Sounds good +1

2015-10-19 14:57 GMT+02:00 Márton Balassi
>:

> Thanks for starting and big +1 for making it more prominent.
>
> On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske
> wrote:
>
>> Thanks for starting this Kostas.
>>
>> I think the list is quite hidden in the wiki. Should we link from
>> flink.apache.org  to that page?
>>
>> Cheers, Fabian
>>
>> 2015-10-19 14:50 GMT+02:00 Kostas Tzoumas >:
>>
>>> Hi everyone,
>>>
>>> I started a "Powered by Flink" wiki page, listing some of the
>>> organizations that are using Flink:
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
>>>
>>> If you would like to be added to the list, just send me a
short email
>>> with your organization's name and a description and I will add
you to the
>>> wiki page.
>>>
>>> Best,
>>> Kostas
>>>
>>
>>
>






Re: Powered by Flink

2015-10-19 Thread Timo Walther

Ah ok, sorry. I think linking to the wiki is also ok.

On 19.10.2015 15:18, Fabian Hueske wrote:

@Timo: The proposal was to keep the list in the wiki (can be easily
extended) but link from the main website to the wiki page.

2015-10-19 15:16 GMT+02:00 Timo Walther <twal...@apache.org>:


+1 for adding it to the website instead of wiki.
"Who is using Flink?" is always a question difficult to answer to
interested users.


On 19.10.2015 15:08, Suneel Marthi wrote:

+1 to this.

On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske <fhue...@gmail.com> wrote:


Sounds good +1

2015-10-19 14:57 GMT+02:00 Márton Balassi < <balassi.mar...@gmail.com>
balassi.mar...@gmail.com>:


Thanks for starting and big +1 for making it more prominent.

On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske < <fhue...@gmail.com>

fhue...@gmail.com> wrote:

Thanks for starting this Kostas.

I think the list is quite hidden in the wiki. Should we link from
flink.apache.org to that page?

Cheers, Fabian

2015-10-19 14:50 GMT+02:00 Kostas Tzoumas < <ktzou...@apache.org>

ktzou...@apache.org>:

Hi everyone,

I started a "Powered by Flink" wiki page, listing some of the
organizations that are using Flink:

https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink

If you would like to be added to the list, just send me a short email
with your organization's name and a description and I will add you to

the

wiki page.

Best,
Kostas










Add BigDecimal and BigInteger as types

2015-11-18 Thread Timo Walther

Hey everyone,

I'm not sure if we already had a discussion about it but as we are 
currently adding new types like the Either type, I would like to discuss 
it again. I think especially for business or scientific applications it 
makes sense to support the BigInteger and BigDecimal types natively. In 
my opinion they are as important as Date or Void and should be added as 
BasicTypes. I need them for the SQL prototype (FLINK-2099) but I think 
people working with the Table API or Java/Scala API would also benefit 
from it.


What do you think?

Regards,
Timo


Re: [DISCUSSION] Type hints versus TypeInfoParser

2015-11-18 Thread Timo Walther
If the TypeExtractor is not able to handle the fields of a Pojo 
correctly, the String parser is quite useful to say 

Re: [ANNOUNCE] Welcome Matthias Sax as new committer

2015-09-02 Thread Timo Walther

Congratulations Matthias!

Regards,
Timo

On 02.09.2015 13:32, Chiwan Park wrote:

Welcome Matthias! :)

Regards,
Chiwan Park


On Sep 2, 2015, at 8:30 PM, Kostas Tzoumas  wrote:

The Project Management Committee (PMC) of Apache Flink has asked Matthias
Sax to become a committer, and we are pleased to announce that he has
accepted.

Matthias has been very active with Flink, and he is the original
contributor of the Storm compatibility functionality.

Being a committer enables easier contribution to the project since there is no
need to go via the pull request submission process. This should enable better
productivity. Being a PMC member enables assistance with the management and
to guide the direction of the project.

Please join me in welcoming Matthias as a new committer!




Re: Either not NotSerializableException and InvalidTypesException

2015-11-29 Thread Timo Walther

Hi Vasia,

regarding your TypeExtractor problem. The TypeExtractor works correctly. 
The with() function of the JoinOperator calls the wrong TypeExtractor 
method that does not allow missing type info. This is a bug. Can open an 
issue for that?


Regards,
Timo

On 28.11.2015 20:18, Vasiliki Kalavri wrote:

Hi squirrels,

I have 2 problems with the new Either type and I could use your help to
understand them.

1. I have a piece of code that looks like this:

TypeInformation>> workSetTypeInfo = ...
DataSet>> initialWorkSet =
initialVertices.map(...).returns(workSetTypeInfo);

This gives me the following exception:

Exception in thread "main"
org.apache.flink.api.common.InvalidProgramException: Object
org.apache.flink.graph.spargelnew.MessagePassingIteration$InitializeWorkSet@75ba8574
not serializable

at
org.apache.flink.api.java.ClosureCleaner.ensureSerializable(ClosureCleaner.java:97)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:59)
at org.apache.flink.api.java.DataSet.clean(DataSet.java:184)
at org.apache.flink.api.java.DataSet.map(DataSet.java:214)
at
org.apache.flink.graph.spargelnew.MessagePassingIteration.createResult(MessagePassingIteration.java:160)
at org.apache.flink.api.java.DataSet.runOperation(DataSet.java:1190)
at org.apache.flink.graph.Graph.runMessagePassingIteration(Graph.java:1650)
at org.apache.flink.graph.spargelnew.example.PregelCC.main(PregelCC.java:53)

Caused by: java.io.NotSerializableException:
org.apache.flink.api.java.typeutils.Either$Left
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:307)
at
org.apache.flink.api.java.ClosureCleaner.ensureSerializable(ClosureCleaner.java:95)

​Making​ Either implement
java.io.Serializable
solves this, but I am wondering why this is needed. Since I'm registering
the typeinfo with returns(), shouldn't the EitherTypeSerializer be
registered too? Also, this seem to be the only operation where I get this
error, even though I'm using the Either type in other places as well.


2. The second problem appeared after rebasing to the current master,
containing a fix for FLINK-3046 (Integrate the Either Java type with the
TypeExtractor).

Exception in thread "main"
org.apache.flink.api.common.functions.InvalidTypesException: Type of
TypeVariable 'Message' in 'class
org.apache.flink.graph.spargelnew.MessagePassingIteration$AppendVertexState'
could not be determined. This is most likely a type erasure problem. The
type extraction currently supports types with generic variables only in
cases where all variables in the return type can be deduced from the input
type(s).

at
org.apache.flink.api.java.typeutils.TypeExtractor.createSubTypesInfo(TypeExtractor.java:706)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:458)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createSubTypesInfo(TypeExtractor.java:713)
at
org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:425)
at
org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:380)
at
org.apache.flink.api.java.typeutils.TypeExtractor.getBinaryOperatorReturnType(TypeExtractor.java:320)
at
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatJoinReturnTypes(TypeExtractor.java:176)
at
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatJoinReturnTypes(TypeExtractor.java:170)
at
org.apache.flink.api.java.operators.JoinOperator$DefaultJoin.with(JoinOperator.java:562)
at
org.apache.flink.graph.spargelnew.MessagePassingIteration.createResult(MessagePassingIteration.java:171)
at org.apache.flink.api.java.DataSet.runOperation(DataSet.java:1190)
at org.apache.flink.graph.Graph.runMessagePassingIteration(Graph.java:1650)
at org.apache.flink.graph.spargelnew.example.PregelCC.main(PregelCC.java:53)


​The code giving this exception is the following:
​
DataSet, Either>> verticesWithMsgs
=
​
 ​
iteration.getSolutionSet().join(iteration.getWorkset())
​​
.where(0).equalTo(0)
​​
.with(new AppendVertexState())
​​
.
​​
​​
returns(new TupleTypeInfo, Either>>(vertexType, nullableMsgTypeInfo));

​Do I need to register the Either typeinfo differently ​now that it's
integrated with the TypeExtractor or is this a bug?

If you want to see the complete code, I've pushed it here:

Re: The null in Flink

2015-11-25 Thread Timo Walther

Hi Chengxiang,

I totally agree that the Table API should fully support NULL values. The 
Table API is a logical API and therefore we should be as close to ANSI 
SQL as possible. Rows need to be nullable in the near future.


2. i, ii, iii and iv sound reasonable. But v, vi and vii sound to much 
like SQL magic. I think all other SQL magic (DBMS specific corner cases) 
should be handled by the SQL API on top of the Table API.


Regards,
Timo


On 25.11.2015 11:31, Li, Chengxiang wrote:

Hi
In this mail list, there are some discussions about null value handling in Flink, and I saw several 
related JIRAs as well(like FLINK-2203, FLINK-2210), but unfortunately, got reverted due to immature 
design, and no further action since then. I would like to pick this topic up here, as it's quite an 
important part of data analysis and many features depend on it. Hopefully, through a plenary 
discussion, we can generate an acceptable solution and move forward. Stephan has explained very 
clearly about how and why Flink handle "Null values in the Programming Language APIs", so 
I mainly talk about the second part of "Null values in the high-level (logical) APIs ".

1. Why should Flink support Null values handling in Table API?
i.  Data source may miss column value in many cases, if no Null values 
handling in Table API, user need to write an extra ETL to handle missing values 
manually.
ii. Some Table API operators generate Null values on their own, like 
Outer Join/Cube/Rollup/Grouping Set, and so on. Null values handling in Table 
API is the prerequisite of these features.

2. The semantic of Null value handling in Table API.
Fortunately, there are already mature DBMS  standards we can follow for Null 
value handling, I list several semantic of Null value handling here. To be 
noted that, this may not cover all the cases, and the semantics may vary in 
different DBMSs, so it should totally open to discuss.
I,  NULL compare. In ascending order, NULL is smaller than any other 
value, and NULL == NULL return false.
ii. NULL exists in GroupBy Key, all NULL values are grouped as a single 
group.
iii. NULL exists in Aggregate columns, ignore NULL in aggregation 
function.
 iv. NULL exists in both side Join key, refer to #i, NULL == 
NULL return false, no output for NULL Join key.
 v.  NULL in Scalar expression, expression within NULL(eg. 1 + 
NULL) return NULL.
 vi. NULL in Boolean expression, add an extra result: UNKNOWN, 
more semantic for Boolean expression in reference #1.
 vii. More related function support, like COALESCE, NVL, NANVL, 
and so on.

3. NULL value storage in Table API.
   Just set null to Row field value. To mark NULL value in serialized binary 
record data, normally it use extra flag for each field to mark whether its 
value is NULL, which would change the data layout of Row object. So any logic 
that access serialized Row data directly should updated to sync with new data 
layout, for example, many methods in RowComparator.

Reference:
1. Nulls: Nothing to worry about: 
http://www.oracle.com/technetwork/issue-archive/2005/05-jul/o45sql-097727.html.
2. Null related functions: 
https://oracle-base.com/articles/misc/null-related-functions

-Original Message-
From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan 
Ewen
Sent: Thursday, June 18, 2015 8:43 AM
To: dev@flink.apache.org
Subject: Re: The null in Flink

Hi!

I think we actually have two discussions here, both of them important:

--
1) Null values in the Programming Language APIs
--

Fields in composite types may simply be null pointers.

In object types:
   - primitives members are naturally non-nullable
   - all other members are nullable

=> If you want to avoid the overhead of nullability, go with primitive types.

In Tuples, and derives types (Scala case classes):
   - Fields are non-nullable.

=> The reason here is that we initially decided to keep tuples as a very fast 
data type. Because tuples cannot hold primitives in Java/Scala, we would not have 
a way to make fast non-nullable fields. The performance of nullable fields affects 
the key-operations, especially on normalized keys.
We can work around that with some effort, but have not one it so far.

=> In Scala, the Option types is a natural way of elegantly working around that.


--
2) Null values in the high-level (logial) APIs
--

This is mainly what Ted was referring to, if I understood him correctly.

Here, we need to figure out what form of semantical null values in the Table 
API and later, in SQL.

Besides deciding what semantics to follow here in the logical APIs, we need to 
decide what these values confert 

Re: withParameters() for Streaming API

2015-11-24 Thread Timo Walther

Thanks for the hint Matthias.
So actually the parameter of the open() method is useless? IMHO that 
does not look like a nice API design...

We should try to keep DataSet and DataStream API in sync.
Does it make sense to deprecate withParameters() for 1.0?

Timo

On 24.11.2015 14:31, Matthias J. Sax wrote:

We had this discussion a while ago.

If I recall correctly, "withParameters()" is not encourage to be used in
DataSet either.

This is the thread:
https://mail-archives.apache.org/mod_mbox/flink-dev/201509.mbox/%3C55EC69CD.1070003%40apache.org%3E

-Matthias

On 11/24/2015 02:14 PM, Timo Walther wrote:

Hi all,

I want to set the Configuration of a streaming operator and access it
via the open method of the RichFunction.
There is no possibility to set the Configuration of the open method at
the moment, right? Can I open an issue for a withParameters() equivalent
for the Stremaing API?

Regards,
Timo





withParameters() for Streaming API

2015-11-24 Thread Timo Walther

Hi all,

I want to set the Configuration of a streaming operator and access it 
via the open method of the RichFunction.
There is no possibility to set the Configuration of the open method at 
the moment, right? Can I open an issue for a withParameters() equivalent 
for the Stremaing API?


Regards,
Timo


Re: [DISCUSSION] Type hints versus TypeInfoParser

2015-11-19 Thread Timo Walther
All that you have mentioned is implemented in the TypeExtractor. I just 
mean corner cases e.g. if you have a POJO


public class MyPojo {
public Object field1;
public Object field2;
public Tuple2 field3;
}

Where the TypeExtractor can not analyze anything. Then you may want to 
provide the TypeInfo manually. TypeInfoParser makes it easy to specify 
the types of the fields of POJOs manually (but only as an internal 
feature). But as I said this just a corner case.


Timo


On 18.11.2015 18:43, Stephan Ewen wrote:

I think the TypeHints case can cover this:

public class MyPojo<T, R> {
 public T field1;
 public R field2;
}

If you say '.returns(new TypeHint<MyPojo<String, Double>>() {})' this
creates an anonymous subclass of the TypeHint, which has the types that T
and R bind to, which allows one to construct the POJO type info properly.
(Not sure if all that is implemented in the TypeExtractor, though).

What do you think?

Stephan




On Wed, Nov 18, 2015 at 6:03 PM, Timo Walther <twal...@apache.org> wrote:


If the TypeExtractor is not able to handle the fields of a Pojo correctly,
the String parser is quite useful to say
"org.my.Pojo

Re: Add BigDecimal and BigInteger as types

2015-11-19 Thread Timo Walther
I could image that some applications also want to group or join by a 
BigInteger or sort by BigDecimal. All DBMS support this types by default.

I'm not from the industry but there is a need for that I think.

On 18.11.2015 18:21, Stephan Ewen wrote:

I agree that they are important.

They are currently generic types and handled by Kryo, which has (AFAIK)
proper serializers for them. Are there more benefits of native support
(other than more compact serialization) that you are thinking of?

On Wed, Nov 18, 2015 at 5:55 PM, Timo Walther <twal...@apache.org> wrote:


Hey everyone,

I'm not sure if we already had a discussion about it but as we are
currently adding new types like the Either type, I would like to discuss it
again. I think especially for business or scientific applications it makes
sense to support the BigInteger and BigDecimal types natively. In my
opinion they are as important as Date or Void and should be added as
BasicTypes. I need them for the SQL prototype (FLINK-2099) but I think
people working with the Table API or Java/Scala API would also benefit from
it.

What do you think?

Regards,
Timo





Re: primitiveDefaultValue in CodeGenUtils in Table API

2016-06-29 Thread Timo Walther

Hi Cody,

default values are needed in cases where NULL values are not supported. 
This happens if the null check is disabled in TableConfig for efficiency 
reasons. Using 0 to DataType.MAX_VALUE for numeric types and -1 as a 
NULL equivalent in special cases seems more reasonable to me.


Hope that helps.

Timo

On 29.06.2016 05:24, Cody Innowhere wrote:

Hi guys,
I found that in CodeGenUtils, default values of numeric primitive types are
set to -1, what's the consideration of setting the default values to -1
instead of 0? IMHO 0 would make more sense, although in DB if a field is
null then all operations on this field will return null anyway.



Re: Table API / SQL Queries and Code Generation

2016-04-05 Thread Timo Walther

Hi Gábor,

the code generation in the Table API is in a very early stage and 
contains not much optimization logic so far. Currently we extend the 
functionality to support the most important SQL operations. It will need 
some time until we can further optimize the generated code (e.g. for 
tracking nulls).


We are using the Janino compiler [1] for in-memory compilation and class 
loading. You can have a look at the code generation here [2].


Regards,
Timo

[1] http://unkrig.de/w/Janino
[2] 
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala



On 05.04.2016 16:25, Gábor Horváth wrote:

Hi!

During this summer I plan to introduce runtime code generation in the
serializers [1]
to improve the performance of Flink.

As Stephan Ewen pointed in Table API / SQL effort code generation will also
be used to
generate functions and data types. In order to share as much code as
possible and
align the code generation efforts I would like to cooperate with the
authors of that project.
Who are they, what libraries and approach are they planning to use? Is
there a design
document or a requirement list somewhere?

I know about one document [2], but that did not contain the answers I was
looking for.

Thanks in advance,
Gábor Horváth

[1]
https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk
[2]
https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0





Re: Input type validation is killing me

2016-03-02 Thread Timo Walther
Can you open an issue with an example of your custom TypeInfo? I will 
then open a suitable PR for it.



On 02.03.2016 15:33, Gyula Fóra wrote:

Would that work with generic classes?

Timo Walther <twal...@apache.org> ezt írta (időpont: 2016. márc. 2., Sze,
15:22):


After thinking about it, I think an even better solution is to provide
an interface for the TypeExtractor where the user can register mappings
from class to TypeInformation.
So that the TypeExctractor is more extensible. This would also solve you
problem. What do you think?

On 02.03.2016 15:00, Gyula Fóra wrote:

Hi!

Yes I think, that sounds good :) We just need to make sure that this

works

with things like the TupleTypeInfo which is built-on but I can still mix

in

new Types for the fields.

   Thanks,
Gyula

Timo Walther <twal...@apache.org> ezt írta (időpont: 2016. márc. 2.,

Sze,

14:02):


The TypeExtractor's input type validation was designed for the built-in
TypeInformation classes.

In your case of a new, unknown TypeInformation, the validation should
simply skipped, because we can assume that you user knows what he is

doing.

I can open a PR for that.


On 02.03.2016 11:34, Aljoscha Krettek wrote:

I think you have a point. Another user also just ran into problems with

the TypeExtractor. (The “Java Maps and TypeInformation” email).

So let’s figure out what needs to be changed to make it work for all

people.

Cheers,
Aljoscha

On 02 Mar 2016, at 11:15, Gyula Fóra <gyf...@apache.org> wrote:

Hey,

I have brought up this issue a couple months back but I would like to

do it

again.

I think the current way of validating the input type of udfs against

the

out type of the preceeding operators is too aggressive and breaks a

lot

of

code that should otherwise work.

This issue appears all the time when I want to use my own
TypeInformations<> for operators such as creating my own Tuple

typeinfos

with custom types for the different fields and so.

I have a more complex streaming job which would not run if I have the

input

type validation. Replacing the Exceptions with logging my Job runs
perfectly (making my point) but you can see the errors that would have

been

reported as exceptions in the logs:

2016-03-02 11:06:03,447 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Generic

object type ‘mypackage.TestEvent' expected but was ‘mypackage.Event’.
2016-03-02 11:06:03,450 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Unknown

Error. Type is null.
2016-03-02 11:06:03,466 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Basic

type expected.
2016-03-02 11:06:03,470 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Basic

type expected.

Clearly all these errors where not valid in my case as my job runs
perfectly.

Would it make sense to change the current behaviour or am I just

abusing

the .returns(..) and ResultTypeQueryable interfaces in unintended

ways.

Cheers,
Gyula






Re: Input type validation is killing me

2016-03-02 Thread Timo Walther
After thinking about it, I think an even better solution is to provide 
an interface for the TypeExtractor where the user can register mappings 
from class to TypeInformation.
So that the TypeExctractor is more extensible. This would also solve you 
problem. What do you think?


On 02.03.2016 15:00, Gyula Fóra wrote:

Hi!

Yes I think, that sounds good :) We just need to make sure that this works
with things like the TupleTypeInfo which is built-on but I can still mix in
new Types for the fields.

  Thanks,
Gyula

Timo Walther <twal...@apache.org> ezt írta (időpont: 2016. márc. 2., Sze,
14:02):


The TypeExtractor's input type validation was designed for the built-in
TypeInformation classes.

In your case of a new, unknown TypeInformation, the validation should
simply skipped, because we can assume that you user knows what he is doing.
I can open a PR for that.


On 02.03.2016 11:34, Aljoscha Krettek wrote:

I think you have a point. Another user also just ran into problems with

the TypeExtractor. (The “Java Maps and TypeInformation” email).

So let’s figure out what needs to be changed to make it work for all

people.

Cheers,
Aljoscha

On 02 Mar 2016, at 11:15, Gyula Fóra <gyf...@apache.org> wrote:

Hey,

I have brought up this issue a couple months back but I would like to

do it

again.

I think the current way of validating the input type of udfs against the
out type of the preceeding operators is too aggressive and breaks a lot

of

code that should otherwise work.

This issue appears all the time when I want to use my own
TypeInformations<> for operators such as creating my own Tuple typeinfos
with custom types for the different fields and so.

I have a more complex streaming job which would not run if I have the

input

type validation. Replacing the Exceptions with logging my Job runs
perfectly (making my point) but you can see the errors that would have

been

reported as exceptions in the logs:

2016-03-02 11:06:03,447 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Generic

object type ‘mypackage.TestEvent' expected but was ‘mypackage.Event’.
2016-03-02 11:06:03,450 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Unknown

Error. Type is null.
2016-03-02 11:06:03,466 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Basic

type expected.
2016-03-02 11:06:03,470 ERROR
org.apache.flink.api.java.typeutils.TypeExtractor - Input mismatch:

Basic

type expected.

Clearly all these errors where not valid in my case as my job runs
perfectly.

Would it make sense to change the current behaviour or am I just abusing
the .returns(..) and ResultTypeQueryable interfaces in unintended ways.

Cheers,
Gyula






Re: Use case

2016-07-28 Thread Timo Walther

Hi Kevin,

I don't know what your entire program is doing but wouldn't be a 
FlatMapFunction containing a state with your biggest value sufficient? 
Your stream goes through your FlatMapper and compares with the last 
saved biggest value. You can then emit something if the value has increased.


I hope that helps.

Timo

Am 28/07/16 um 17:00 schrieb Kevin Jacobs:

Hi all,

I am trying to keep track of the biggest value in a stream. I do this 
by using the iterative step mechanism of Apache Flink. However, I get 
an exception that checkpointing is not supported for iterative jobs. 
Why can't this be enabled? My iterative stream is also quite small: 
only one value. Is there a better way to keep track of a largest value 
in a stream?


Regards,
Kevin



--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: Flink Table & SQL doesn't work in very simple example

2016-07-20 Thread Timo Walther

I will answer Radu's private e-mail here:

Sorry to bother you ... I am still running in the same problem and I cannot 
figure out why.
I have download and recompile the last branch of flink 1.1. I also tried using 
the jar snapshot from the website but I get the same error.

What I am doing:
I am creating a new Java project and then I add as external dependencies the 
jar files

avatica-1.7.1.jar
calcite-core-1.7.0.jar
calcite-linq4j-1.7.0.jar
eigenbase-properties-1.1.5.jar
flink-dist_2.10-1.1-SNAPSHOT.jar
flink-table_2.10-1.1-SNAPSHOT.jar
guava-18.0.jar


I run the code and get the error.
How are you proceeding? Do you create a maven project (if so can you share the 
pom for this example) Or do you add the dependencies like this - if so which 
other jars do you have?


You don't have to work much with JARs if you use Maven. Here is how I 
did it:


1. Create a quickstart project with Flink 1.0.3 
(https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/java_api_quickstart.html)

2. Import it into your IDE
3. Change version in pom.xml to 1.1-SNAPSHOT
4. Add

 org.apache.flink flink-table_2.10 
${flink.version} 

to your dependencies.

In general, flink-tableXXX.jar already includes everyting it needs 
(calcite, avative, eigenbase, etc.).


Hope that helps.





Am 20/07/16 um 14:14 schrieb Timo Walther:
You can always find the latest nightly snapshot version here: 
http://flink.apache.org/contribute-code.html (at the end of the page)


Am 20/07/16 um 14:08 schrieb Radu Tudoran:

Hi,

I am also using v1.1...with eclipse.

i will re-download the source and build it again.
Is there also a binary version for version 1.1 (i would like to test 
also againat that) particularly if the issue persists.


otherwise i am downloading and building the version from the main git 
branch...



From:Timo Walther
To:dev@flink.apache.org,
Date:2016-07-20 13:55:32
Subject:Re: Flink Table & SQL doesn't work in very simple example

I also tried it again with the latest 1.1-SNAPSHOT and everything works.
This Maven issue has been solved in FLINK-4111.



Am 20/07/16 um 13:43 schrieb Suneel Marthi:

I am not seeing an issue with this code Radu, this is from present
1.1-Snapshot.

This is what I have and it works (running from within IntelliJ and 
not cli)

:


List<Tuple3<Long, String, Integer>> input = new ArrayList<>();
input.add(new Tuple3<>(3L,"test",1));
input.add(new Tuple3<>(5L,"test2",2));
StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironment(1);
DataStream<Tuple3<Long, String, Integer>> ds = 
env.fromCollection(input);


StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);


tableEnv.registerDataStream("Words", ds, "frequency, word, pos");
// run a SQL query on the Table and retrieve the result as a new Table
Table result = tableEnv.sql("SELECT STREAM word, pos FROM Words WHERE
frequency > 2");




On Wed, Jul 20, 2016 at 6:55 AM, Radu Tudoran <radu.tudo...@huawei.com>
wrote:


Hi,

As far as I managed to isolate the cause of the error so far it has 
to do

with some mismatch in the function call

val traitDefs:ImmutableList[RelTraitDef[_ <: RelTrait]] =
config.getTraitDefs

I am not sure thought why it is not working because when I tried to 
make a
dummy test by creating a program and  calling that function, 
everything

works.
Can it be that there is some overlapping between libraries that 
contain

the ImmutableList type?
google/common/collect/ImmutableList (with flink shaded)?
As per the error
"/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;" 




-Original Message-
From: Maximilian Michels [mailto:m...@apache.org]
Sent: Wednesday, July 20, 2016 11:52 AM
To: dev@flink.apache.org
Cc: Timo Walther
Subject: Re: Flink Table & SQL doesn't work in very simple example

CC Timo who I know is working on Table API and SQL.



On Tue, Jul 19, 2016 at 6:14 PM, Radu Tudoran 
<radu.tudo...@huawei.com>

wrote:

Hi,

I am not sure that this problem was solved. I am using the last 
pom to

compile the table API.

I was trying to run a simple program.


ArrayList<Tuple3<Long, String, Integer>> input = new

ArrayList<Tuple3<Long, String, Integer>>();

  input.add(new Tuple3<Long, String,

Integer>(3L,"test",1));

  input.add(new Tuple3<Long, String,
Integer>(5L,"test2",2));

  DataStream<Tuple3<Long, String, Integer>> ds =
env.fromCollection(input);

StreamTableEnvironment tableEnv =
TableEnvironment.getTableEnvironment(env);

  tableEnv.registerDataStream("Words", ds, 
"frequency,

word, position");
  // run a SQL query on the Table and retrieve the 
result

as a new Table

  Table result = tableEnv.sql(
&

Re: Flink Table & SQL doesn't work in very simple example

2016-07-20 Thread Timo Walther
I also tried it again with the latest 1.1-SNAPSHOT and everything works. 
This Maven issue has been solved in FLINK-4111.




Am 20/07/16 um 13:43 schrieb Suneel Marthi:

I am not seeing an issue with this code Radu, this is from present
1.1-Snapshot.

This is what I have and it works (running from within IntelliJ and not cli)
:


List<Tuple3<Long, String, Integer>> input = new ArrayList<>();
input.add(new Tuple3<>(3L,"test",1));
input.add(new Tuple3<>(5L,"test2",2));
StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironment(1);
DataStream<Tuple3<Long, String, Integer>> ds = env.fromCollection(input);

StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

tableEnv.registerDataStream("Words", ds, "frequency, word, pos");
// run a SQL query on the Table and retrieve the result as a new Table
Table result = tableEnv.sql("SELECT STREAM word, pos FROM Words WHERE
frequency > 2");




On Wed, Jul 20, 2016 at 6:55 AM, Radu Tudoran <radu.tudo...@huawei.com>
wrote:


Hi,

As far as I managed to isolate the cause of the error so far it has to do
with some mismatch in the function call

val traitDefs:ImmutableList[RelTraitDef[_ <: RelTrait]] =
config.getTraitDefs

I am not sure thought why it is not working because when I tried to make a
dummy test by creating a program and  calling that function, everything
works.
Can it be that there is some overlapping between libraries that contain
the ImmutableList type?
google/common/collect/ImmutableList (with flink shaded)?
As per the error
"/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;"


-Original Message-
From: Maximilian Michels [mailto:m...@apache.org]
Sent: Wednesday, July 20, 2016 11:52 AM
To: dev@flink.apache.org
Cc: Timo Walther
Subject: Re: Flink Table & SQL doesn't work in very simple example

CC Timo who I know is working on Table API and SQL.



On Tue, Jul 19, 2016 at 6:14 PM, Radu Tudoran <radu.tudo...@huawei.com>
wrote:

Hi,

I am not sure that this problem was solved. I am using the last pom to

compile the table API.

I was trying to run a simple program.


ArrayList<Tuple3<Long, String, Integer>> input = new

ArrayList<Tuple3<Long, String, Integer>>();

 input.add(new Tuple3<Long, String,

Integer>(3L,"test",1));

 input.add(new Tuple3<Long, String,
Integer>(5L,"test2",2));

 DataStream<Tuple3<Long, String, Integer>> ds =
env.fromCollection(input);

StreamTableEnvironment tableEnv =
TableEnvironment.getTableEnvironment(env);

 tableEnv.registerDataStream("Words", ds, "frequency,

word, position");

 // run a SQL query on the Table and retrieve the result

as a new Table

 Table result = tableEnv.sql(
   "SELECT STREAM product, amount FROM Words WHERE
frequency > 2");



..and I get:

Exception in thread "main" java.lang.NoSuchMethodError:

org.apache.calcite.tools.FrameworkConfig.getTraitDefs()Lorg/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;

 at

org.apache.flink.api.table.FlinkPlannerImpl.(FlinkPlannerImpl.scala:50)

 at

org.apache.flink.api.table.StreamTableEnvironment.sql(StreamTableEnvironment.scala:127)

 at TestStreamSQL.main(TestStreamSQL.java:69)


Any thoughts on how this can be solved?


Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com Registered
Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, Managing
Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der Gesellschaft:
Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This e-mail and
its attachments contain confidential information from HUAWEI, which is

intended only for the person or entity whose address is listed above. Any
use of the information contained herein in any way (including, but not
limited to, total or partial disclosure, reproduction, or dissemination) by
persons other than the intended recipient(s) is prohibited. If you receive
this e-mail in error, please notify the sender by phone or email
immediately and delete it!


-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Thursday, June 23, 2016 11:13 AM
To: dev@flink.apache.org
Subject: Re: Flink Table & SQL doesn't work in very simple example

Hi Jark Wu,

yes, that looks like a dependency issue.
Can you open a JIRA for it set "Fix Version" to 1.1.0. This issue should

Re: Flink Table & SQL doesn't work in very simple example

2016-07-20 Thread Timo Walther
You can always find the latest nightly snapshot version here: 
http://flink.apache.org/contribute-code.html (at the end of the page)


Am 20/07/16 um 14:08 schrieb Radu Tudoran:

Hi,

I am also using v1.1...with eclipse.

i will re-download the source and build it again.
Is there also a binary version for version 1.1 (i would like to test also 
againat that) particularly if the issue persists.

otherwise i am downloading and building the version from the main git branch...


From:Timo Walther
To:dev@flink.apache.org,
Date:2016-07-20 13:55:32
Subject:Re: Flink Table & SQL doesn't work in very simple example

I also tried it again with the latest 1.1-SNAPSHOT and everything works.
This Maven issue has been solved in FLINK-4111.



Am 20/07/16 um 13:43 schrieb Suneel Marthi:

I am not seeing an issue with this code Radu, this is from present
1.1-Snapshot.

This is what I have and it works (running from within IntelliJ and not cli)
:


List<Tuple3<Long, String, Integer>> input = new ArrayList<>();
input.add(new Tuple3<>(3L,"test",1));
input.add(new Tuple3<>(5L,"test2",2));
StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironment(1);
DataStream<Tuple3<Long, String, Integer>> ds = env.fromCollection(input);

StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

tableEnv.registerDataStream("Words", ds, "frequency, word, pos");
// run a SQL query on the Table and retrieve the result as a new Table
Table result = tableEnv.sql("SELECT STREAM word, pos FROM Words WHERE
frequency > 2");




On Wed, Jul 20, 2016 at 6:55 AM, Radu Tudoran <radu.tudo...@huawei.com>
wrote:


Hi,

As far as I managed to isolate the cause of the error so far it has to do
with some mismatch in the function call

val traitDefs:ImmutableList[RelTraitDef[_ <: RelTrait]] =
config.getTraitDefs

I am not sure thought why it is not working because when I tried to make a
dummy test by creating a program and  calling that function, everything
works.
Can it be that there is some overlapping between libraries that contain
the ImmutableList type?
google/common/collect/ImmutableList (with flink shaded)?
As per the error
"/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;"


-Original Message-
From: Maximilian Michels [mailto:m...@apache.org]
Sent: Wednesday, July 20, 2016 11:52 AM
To: dev@flink.apache.org
Cc: Timo Walther
Subject: Re: Flink Table & SQL doesn't work in very simple example

CC Timo who I know is working on Table API and SQL.



On Tue, Jul 19, 2016 at 6:14 PM, Radu Tudoran <radu.tudo...@huawei.com>
wrote:

Hi,

I am not sure that this problem was solved. I am using the last pom to

compile the table API.

I was trying to run a simple program.


ArrayList<Tuple3<Long, String, Integer>> input = new

ArrayList<Tuple3<Long, String, Integer>>();

  input.add(new Tuple3<Long, String,

Integer>(3L,"test",1));

  input.add(new Tuple3<Long, String,
Integer>(5L,"test2",2));

  DataStream<Tuple3<Long, String, Integer>> ds =
env.fromCollection(input);

StreamTableEnvironment tableEnv =
TableEnvironment.getTableEnvironment(env);

  tableEnv.registerDataStream("Words", ds, "frequency,

word, position");

  // run a SQL query on the Table and retrieve the result

as a new Table

  Table result = tableEnv.sql(
"SELECT STREAM product, amount FROM Words WHERE
frequency > 2");



..and I get:

Exception in thread "main" java.lang.NoSuchMethodError:

org.apache.calcite.tools.FrameworkConfig.getTraitDefs()Lorg/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;

  at

org.apache.flink.api.table.FlinkPlannerImpl.(FlinkPlannerImpl.scala:50)

  at

org.apache.flink.api.table.StreamTableEnvironment.sql(StreamTableEnvironment.scala:127)

  at TestStreamSQL.main(TestStreamSQL.java:69)


Any thoughts on how this can be solved?


Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, 
www.huawei.com<http://www.huawei.com> Registered
Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, Managing
Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der Gesellschaft:
Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This e-mail and
its attachments contain confidential information from HUAWEI, which is

intended only for the person or entity whose address is listed above. Any
use o

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-22 Thread Timo Walther
OM​ ​
   table1 ​AS​ t1,
​ ​ table2 ​AS​ ​t2
WHERE ​
   t1.currency = t2.currency AND
   t2.rowtime ​=​ ​(
​ ​​ ​  SELECT​ ​MAX(t22.rowtime)
​ ​​ ​  FROM​ ​table2 ​AS​ t22
​ ​​   ​AND​ ​t22.rowtime ​<=​ t1.rowtime)

The query joins two streaming tables. Table 1 is a streaming table with
amounts in a certain currency. Table 2 is a (slowly changing) streaming
table of currency exchange rates.
We want to join the amounts stream with the exchange rate of the
corresponding currency that is valid (i.e., last received value ->
MAX(rowtime)) at the rowtime of the amounts row.
In order to specify the query, we need to refer to the rowtime of the
different tables. Hence, we need a way to relate the rowtime expression

(or

marker) to a table.
This is not possible with a parameterless scalar function.

I'd like to comment on the concerns regarding the performance:
In fact, the columns could be completely virtual and only exist during
query parsing and validation.
During execution, we can directly access the rowtime metadata of a

Flink

streaming record (which is present anyway) or look up the current
processing time from the machine clock. So the processing overhead

would

actually be the same as with a marker function.

Regarding the question on what should be allowed with a system

attribute:

IMO, it could be used as any other attribute. We need it at least in

GROUP

BY, ORDER BY, and WHERE to define windows and joins. We could also

allow

to

access it in SELECT if we want users to give access to rowtime and
processing time. So @Haohui, your query could be supported.
However, what would not be allowed is to modify the value of the rows,
i.e., by naming another column rowtime, i.e., "SELECT sometimestamp AS
rowtime" would not be allowed, because Flink does not support to modify

the

event time of a row (for good reasons) and processing time should not

be

modifiable anyway.

@Timo:
I think the approach to only use the system columns during parsing and
validation and converting them to expressions afterwards makes a lot of
sense.
The question is how this approach could be nicely integrated with

Calcite.

Best, Fabian



2017-02-15 16:50 GMT+01:00 Radu Tudoran <radu.tudo...@huawei.com>:


Hi,

My initial thought would be that it makes more sense to thave

procTime()

and rowTime() only as functions which in fact are to be used as

markers.

Having the value (even from special system attributes does not make

sense

in some scenario such as the ones for creating windows, e.g.,
If you have SELECT Count(*) OVER (ORDER BY procTime()...)
If you get the value of procTime you cannot do anything as you need

the

marker to know how to construct the window logic.

However, your final idea of having " implement some rule/logic that
translates the attributes to special RexNodes internally " I believe

is

good and gives a solution to both problems. One the one hand for those
scenarios where you need the value you can access the value, while for
others you can see the special type of the RexNode and use it as a

marker.

Regarding keeping this data in a table...i am not sure as you would

say

we  need to augment the data with two fields whether needed or

not...this

is nto necessary very efficient


Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from
HUAWEI, which is intended only for the person or entity whose address

is

listed above. Any use of the information contained herein in any way
(including, but not limited to, total or partial disclosure,

reproduction,

or dissemination) by persons other than the intended recipient(s) is
prohibited. If you receive this e-mail in error, please notify the

sender

by phone or email immediately and delete it!

-Original Message-
From: Timo Walther [mailto:twal...@apache.org]
Sent: Wednesday, February 15, 2017 9:33 AM
To: dev@flink.apache.org
Subject: Re: [DISCUSS] Table API / SQL indicators for event and
processing time

Hi all,

at first I also thought that built-in functions (rowtime() and
proctime()) are the easiest solution. However, I think to be

future-proof

we should make them system attributes; esp. to relate them to a
corresponding table in case of multiple tables. Logically they are
attributes of each row, which is already done in Table API.

I will ask on the Calcite ML if there is a good way for integrating
system attribute

Re: Contribute to Flink

2017-02-20 Thread Timo Walther

Welcome to the Flink community, Jin!

I gave you contributor permissions, so you can assign issues to yourself.

Regards,
Timo


Am 20/02/17 um 14:47 schrieb Jin Mingjian:

Hi, Flink dev community,

I'd like to contribute to Flink. Particularly, I am interested in kinds of
optimization works in Flink.

To be familiar with the process of contribution, I pick up some starter
issue as the entrance contribution(s) such as [FLINK-5692](
https://issues.apache.org/jira/browse/FLINK-5692) and/or some else.

It is appreciated who gives me the permission of that I can assign that
issue to myself. Then, the journey can be started:)

best regards,
Jin





Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-15 Thread Timo Walther

Hi all,

at first I also thought that built-in functions (rowtime() and 
proctime()) are the easiest solution. However, I think to be 
future-proof we should make them system attributes; esp. to relate them 
to a corresponding table in case of multiple tables. Logically they are 
attributes of each row, which is already done in Table API.


I will ask on the Calcite ML if there is a good way for integrating 
system attributes. Right now, I would propose the following implementation:


- we introduce a custom row type (extending RelDataType)
- in a streaming environment every row has two attributes by default 
(rowtime and proctime)
- we do not allow creating a row type with those attributes (this should 
already prevent `SELECT field AS rowtime FROM ...`)
- we need to ensure that these attributes are not part of expansion like 
`SELECT * FROM ...`
- implement some rule/logic that translates the attributes to special 
RexNodes internally, such that the opimizer does not modify these attributes


What do you think?

Regards,
Timo




Am 15/02/17 um 03:36 schrieb Xingcan Cui:

Hi all,

thanks for this thread.

@Fabian If I didn't miss the point, the main difference between the two
approaches is whether or not taking these time attributes as common table
fields that are directly available to users. Whatever, these time
attributes should be attached to records (right?), and the discussion lies
in whether give them public qualifiers like other common fields or private
qualifiers and related get/set methods.

The former (system attributes) approach will be more compatible with
existing SQL read-only operations (e.g., select, join), but we need to add
restrictions on SQL modification operation (like what?). I think there are
no needs to forbid users modifying these attributes via table APIs (like
map function). Just inform them about these special attribute names like
system built in aggregator names in iteration.

As for the built in function approach, I don't know if, for now, there are
functions applied on a single row (maybe the value access functions like
COMPOSITE.get(STRING)?). It seems that most of the built in functions work
for a single field or on columns and thus it will be mountains of work if
we want to add a new kind of function to SQL. Maybe all existing operations
should be modified to support it.

All in all, if there are existing supports for single row function, I
prefer the built in function approach. Otherwise the system attributes
approach should be better. After all there are not so much modification
operations in SQL and maybe we can use alias to support time attributes
setting (just hypothesis, not sure if it's feasible).

@Haohui I think the given query is valid if we add a aggregate
function to (PROCTIME()
- ROWTIME()) / 1000 and it should be executed efficiently.

Best,
Xingcan

On Wed, Feb 15, 2017 at 6:17 AM, Haohui Mai  wrote:


Hi,

Thanks for starting the discussion. I can see there are multiple trade-offs
in these two approaches. One question I have is that to which extent Flink
wants to open its APIs to allow users to access both processing and event
time.

Before we talk about joins, my understanding for the two approaches that
you mentioned are essentially (1) treating the value of event / processing
time as first-class fields for each row, (2) limiting the scope of time
indicators to only specifying windows. Take the following query as an
example:

SELECT (PROCTIME() - ROWTIME()) / 1000 AS latency FROM table GROUP BY
FLOOR(PROCTIME() TO MINUTES)

There are several questions we can ask:

(1) Is it a valid query?
(2) How efficient the query will be?

For this query I can see arguments from both sides. I think at the end of
the day it really comes down to what Flink wants to support. After working
on FLINK-5624 I'm more inclined to support the second approach (i.e.,
built-in functions). The main reason why is that the APIs of Flink are
designed to separate times from the real payloads. It probably makes sense
for the Table / SQL APIs to have the same designs.

For joins I don't have a clear answer on top of my head. Flink requires two
streams to be put in the same window before doing the joins. This is
essentially a subset of what SQL can express. I don't know what would be
the best approach here.

Regards,
Haohui


On Tue, Feb 14, 2017 at 12:26 AM Fabian Hueske  wrote:


Hi,

It would as in the query I gave as an example before:

SELECT
   a,
   SUM(b) OVER (PARTITION BY c ORDER BY proctime ROWS BETWEEN 2
PRECEDING AND CURRENT ROW) AS sumB,
FROM myStream

Here "proctime" would be a system attribute of the table "myStream".
The table would also have another system attribute called "rowtime" which
would be used to indicate event time semantics.
These attributes would always be present in tables which are derived from
streams.
Because we still require that streams have timestamps and watermarks
assigned (either by the StreamTableSource or 

Re: New Flink team member - Kate Eri.

2017-01-17 Thread Timo Walther

Hi Katherin,

great to hear that you would like to contribute! Welcome!

I gave you contributor permissions. You can now assign issues to 
yourself. I assigned FLINK-1750 to you.
Right now there are many open ML pull requests, you are very welcome to 
review the code of others, too.


Timo


Am 17/01/17 um 10:39 schrieb Katherin Sotenko:

Hello, All!



I'm Kate Eri, I'm java developer with 6-year enterprise experience, also I
have some expertise with scala (half of the year).

Last 2 years I have participated in several BigData projects that were
related to Machine Learning (Time series analysis, Recommender systems,
Social networking) and ETL. I have experience with Hadoop, Apache Spark and
Hive.


I’m fond of ML topic, and I see that Flink project requires some work in
this area, so that’s why I would like to join Flink and ask me to grant the
assignment of the ticket https://issues.apache.org/jira/browse/FLINK-1750
to me.





Re: <无主题>

2016-08-16 Thread Timo Walther

Hi Renkai,

thanks for the interest in Flink and the Table API/SQL! I would 
recommend to maybe have a look into previous Pull Requests for new SQL 
features such as INTERSECT [1] to get an overview of layers and classes. 
I also recommend the design document [2] that describes the all planned 
features. It might be good if you look for an easy task to start with [3].



[1] https://github.com/apache/flink/pull/2159
[2] 
https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/edit#heading=h.4vdi2v1tlg8h
[3] 
https://issues.apache.org/jira/browse/FLINK-4393?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20%3D%20%22Table%20API%20%26%20SQL%22


Hope that helps.

Timo

Am 16/08/16 um 05:39 schrieb Renkai:

Hi all:

I’m interested in Table API and SQL.As far as I know, the SQL supports 
is very limited currently. I want to help improving related features, such as 
distinct aggregates for SQL on batch tables, and aggregations for SQL on stream 
tables. I don’t have much experience in developing SQL related things, what 
could I do for these?

  

  







Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-03-01 Thread Timo Walther
 name of the attribute that carries the timestamp.

@Stefano: That's great news. I'd suggest to open a pull request and have

a

look at PR #3397 which handles the (partitioned) unbounded case. Would be
good to share some code between these approaches.

Thanks, Fabian

2017-02-28 18:17 GMT+01:00 Stefano Bortoli <stefano.bort...@huawei.com>:


Hi all,

I have completed a first implementation that works for the SQL query
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY a RANGE BETWEEN 2
PRECEDING) AS sumB FROM MyTable

I have SUM, MAX, MIN, AVG, COUNT implemented but I could test it just

on

simple queries such as the one above. Is there any specific case I

should

be looking at?

Regards,
Stefano

-Original Message-
From: jincheng sun [mailto:sunjincheng...@gmail.com]
Sent: Tuesday, February 28, 2017 12:26 PM
To: dev@flink.apache.org
Subject: Re: [DISCUSS] Table API / SQL indicators for event and

processing

time

Hi everyone, thanks for sharing your thoughts. I really like Timo’s
proposal, and I have a few thoughts want to share.

We want to keep the query same for batch and streaming. IMO. “process

time”

is something special to dataStream while it is not a well defined term

for

batch query. So it is kind of free to create something new for

processTime.

I think it is a good idea to add a proctime as a reserved keyword for

SQL.

  Regarding to “event time”, it is well defined for batch query. So IMO,

we

should keep the way of defining a streaming window exactly same as

batch

window. Therefore, the row for event time is nothing special, but just

a

normal column. The major difference between batch and stream is that in
dataStream the event time column must be associated with a watermark
function. I really like the way Timo proposed, that we can select any
column as rowtime. But I think instead of just clarify a column is a
rowtime (actually I do not think we need this special rowtime keyword),

it

is better to register/associate the waterMark function to this column

when

creating the table. For dataStream, we will validate a rowtime column

only

if it has been associated with the waterMark function. A prototype code

to

explain how it looks like is shown as below:

   TableAPI:
  toTable(tEnv, 'a, 'b, 'c)
   .registeredWatermarks('a, waterMarkFunction1)

  batchOrStreamTable
   .window(Tumble over 5.milli on 'a as 'w)
   .groupBy('w, 'b)
   .select('b, 'a.count as cnt1, 'c.sum as cnt2)

   SQL:
 addTable[(Int, String, Long)]("MyTable", 'a, 'b, 'c)
   .registeredWatermarks('a, waterMarkFunction1)

 SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY a RANGE BETWEEN 2
PRECEDING) AS sumB FROM MyTable

What do you think ?

2017-02-22 23:44 GMT+08:00 Timo Walther <twal...@apache.org>:


Hi everyone,

I have create an issue [1] to track the progress of this topic. I

have

written a little design document [2] how we could implement the
indicators and which parts have to be touched. I would suggest to
implement a prototype, also to see what is possible and can be
integrated both in Flink and Calcite. Feedback is welcome.

Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-5884
[2] https://docs.google.com/document/d/1JRXm09x_wKst6z6UXdCGF9tg
F1ueOAsFiQwahR72vbc/edit?usp=sharing



Am 21/02/17 um 15:06 schrieb Fabian Hueske:

Hi Xingcan,

thanks for your thoughts.
In principle you are right that the monotone attribute property

would

be sufficient, however there are more aspects to consider than that.

Flink is a parallel stream processor engine which means that data is
processed in separate processes and shuffle across them.
Maintaining a strict order when merging parallel streams would be
prohibitively expensive.
Flink's watermark mechanism helps operators to deal with

out-of-order

data (due to out-of-order input or shuffles).
I don't think we can separate the discussion about time attributes
from watermarks if we want to use Flink as a processing engine and
not reimplement large parts from scratch.

When transforming a time attribute, we have to either align it with
existing watermarks or generate new watermarks.
If we want to allow all kinds of monotone transformations, we have

to

adapt the watermarks which is not trivial.
Instead, I think we should initially only allow very few monotone
transformations which are aligned with the existing watermarks. We
might later relax this condition if we see that users request this

feature.

You are right, that we need to track which attribute can be used as

a

time attribute (i.e., is increasing and guarded by watermarks).
For that we need to expose the time attribute when a Table is

created

(either when a DataStream is converted like: stream.toTable(tEnv,

'a,

'b,
't.rowtime) or in a StreamTableSource) and track how it is used in
queries.
I am not sure if the monotone property would be the right choice
here, since data is only quasi-monotone and a monotone annotation
might tri

Re: [DISCUSS] FLIP-11: Table API Stream Aggregations

2016-09-06 Thread Timo Walther

Hi all,

I thought about the API of the FLIP again. If we allow the "systemtime" 
attribute, we cannot implement a nice method chaining where the user can 
define a "allowLateness" only on event time. So even if the user 
expressed that "systemtime" is used we have to offer a "allowLateness" 
method because we have to assume that this attribute can also be the 
batch event time column, which is not very nice.


class TumblingWindow(size: Expression) extends Window {
  def on(timeField: Expression): TumblingEventTimeWindow =
new TumblingEventTimeWindow(alias, timeField, size) // has 
allowLateness() method

}

What do you think?

Timo


Am 05/09/16 um 10:41 schrieb Fabian Hueske:

Hi Jark,

you had asked for non-windowed aggregates in the Table API a few times.
FLIP-11 proposes row-window aggregates which are a generalization of
running aggregates (SlideRow unboundedPreceding).

Can you have a look at the FLIP and give feedback whether this is what you
are looking for?
Improvement suggestions are very welcome as well.

Thank you,
Fabian

2016-09-01 16:12 GMT+02:00 Timo Walther <twal...@apache.org>:


Hi all!

Fabian and I worked on a FLIP for Stream Aggregations in the Table API.
You can find the FLIP-11 here:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%
3A+Table+API+Stream+Aggregations

Motivation for the FLIP:

The Table API is a declarative API to define queries on static and
streaming tables. So far, only projection, selection, and union are
supported operations on streaming tables.

This FLIP proposes to add support for different types of aggregations on
top of streaming tables. In particular, we seek to support:

- Group-window aggregates, i.e., aggregates which are computed for a group
of elements. A (time or row-count) window is required to bound the infinite
input stream into a finite group.

- Row-window aggregates, i.e., aggregates which are computed for each row,
based on a window (range) of preceding and succeeding rows.
Each type of aggregate shall be supported on keyed/grouped or
non-keyed/grouped data streams for streaming tables as well as batch tables.

We are looking forward to your feedback.

Timo




--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: Extending FLIP template

2016-09-01 Thread Timo Walther

+1

Make sense especially since FLIPs are always large changes that need to 
be done in several steps.


Am 01/09/16 um 11:24 schrieb Aljoscha Krettek:

+1 If you think it worthwhile you can add it to the template(s).

On Thu, 1 Sep 2016 at 10:38 Fabian Hueske <fhue...@gmail.com> wrote:


Hi,

I'm currently preparing a FLIP for Table API streaming aggregates and
noticed that there is no section about how the task can be divided into
subtasks.

I think it would make sense to extend the template by a section "Work Plan"
or "Implementation Plan" that explains in which steps or subtask the FLIP
can be implemented.
The described subtasks will then be reflected as individual JIRA issues.

What do you think?

Cheers, Fabian




--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



[DISCUSS] FLIP-11: Table API Stream Aggregations

2016-09-01 Thread Timo Walther

Hi all!

Fabian and I worked on a FLIP for Stream Aggregations in the Table API. 
You can find the FLIP-11 here:


https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations

Motivation for the FLIP:

The Table API is a declarative API to define queries on static and 
streaming tables. So far, only projection, selection, and union are 
supported operations on streaming tables.


This FLIP proposes to add support for different types of aggregations on 
top of streaming tables. In particular, we seek to support:


- Group-window aggregates, i.e., aggregates which are computed for a 
group of elements. A (time or row-count) window is required to bound the 
infinite input stream into a finite group.


- Row-window aggregates, i.e., aggregates which are computed for each 
row, based on a window (range) of preceding and succeeding rows.
Each type of aggregate shall be supported on keyed/grouped or 
non-keyed/grouped data streams for streaming tables as well as batch tables.


We are looking forward to your feedback.

Timo


Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

2016-08-30 Thread Timo Walther
Sorry, I thought that StreamTableSourceScanRule always fires but it is 
only used in StreamTableEnvironment so everything is fine. I think you 
can open PR if you like.


Timo

Am 30/08/16 um 05:25 schrieb Jark Wu:

It seems that we have done that (?).  The 
BatchTableEnvironment.registerTableSource(name, tableSource) only accept a 
BatchTableSource. In contrast, the 
StreamTableEnvironment.registerTableSource(name, tableSource) only accept a 
StreamTableSource. So that, if a TableSource implements both batch and stream, 
we will determine batch or stream by the type of table environment.  I think 
the TableSourceITCase.testCsvTableSource in batch and stream package can 
explain it.  Am I right ?

- Jark Wu


在 2016年8月29日,下午8:59,Timo Walther <twal...@apache.org> 写道:

At first glance, I thought we are losing the possibility to distingish between 
choosing a batch or streaming table if a TableSource implements both. Because 
currently you are using a StreamTableSource as default if a TableSource 
implements both types. I think it would be better to determine batch or stream 
using the type of execution environment. What do you think?

Timo


Am 29/08/16 um 14:31 schrieb Jark Wu:

Hi Timo,

Yes, you are right. The STREAM keyword is invalid now. If there is a STREAM keyword in 
the query, the parser will throw "can’t convert table xxx to stream" exception. 
Because we register the table as a regular table not streamable.

- Jark Wu


在 2016年8月29日,下午8:13,Timo Walther <twal...@apache.org> 写道:

Hi Jark,

your code looks good and it also simplifies many parts. So the STREAM keyword 
is not optional but invalid now, right? What happens if there is keyword in the 
query?

Timo


Am 29/08/16 um 05:40 schrieb Jark Wu:

Hi Fabian, Timo,

I have created a prototype for removing STREAM keyword and using batch sql 
parser for stream jobs.

This is the working brach:  https://github.com/wuchong/flink/tree/remove-stream 
<https://github.com/wuchong/flink/tree/remove-stream>

Looking forward to your feedback.

- Jark Wu


在 2016年8月24日,下午4:56,Fabian Hueske <fhue...@gmail.com> 写道:

Starting with a prototype would be great, Jark.
We had some trouble with Calcite's StreamableTable interface anyways. A few
things can be simplified if we do not declare our tables as streamable.
I would try to implement DataStreamTable (and all related classes and
methods) equivalent to DataSetTables if possible.

Best, Fabian

2016-08-24 6:27 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:


Hi Fabian,

You are right, the main thing we need to change for removing STREAM
keyword is the table registration. If you would like, I can do a prototype.

Hi Timo,

I’m glad to contribute our work back to Flink. I will look into it and
create JIRAs next days.

- Jark Wu


在 2016年8月24日,上午12:13,Fabian Hueske <fhue...@gmail.com> 写道:

Hi Jark,

We can think about removing the STREAM keyword or not. In principle,
Calcite should allow the same windowing syntax on streaming and static
tables (this is one of the main goals of Calcite). The Table API can also
distinguish stream and batch without the STREAM keyword by looking at the
ExecutionEnvironment.
I think we would need to change the way that tables are registered in
Calcite's catalog and also add more validation (check that time windows
refer to a time column, etc).
A prototype should help to see what the consequence of removing the

STREAM

keyword (which is actually, changing the table registration, the parser

is

the same) would be.

Regarding streaming aggregates without window definition: We can

certainly

implement this feature in the Table API. There are a few points that need
to be considered like value expiration after a certain time of update
inactivity (otherwise the state might grow infinitely). But these aspects
should be rather easy to solve. I think for SQL, such running aggregates
are a special case of the Sliding Windows as discussed in Calcite's
StreamSQL document [1].

Thanks also for the document! I'll take that into account when sketching
the FLIP for streaming aggregation support.

Cheers, Fabian

[1] http://calcite.apache.org/docs/stream.html#sliding-windows

2016-08-23 13:09 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:


Hi Fabian, Timo,

Sorry for the late response.

Regarding Calcite’s StreamSQL syntax, what I concern is only the STREAM
keyword and no agg-without-window. Which makes different syntax for
streaming and static tables. I don’t think Flink should have a custom

SQL

syntax, but it’s better to have a consistent syntax for batch and
streaming. Regarding window syntax , I think it’s good and reasonable to
follow Calcite’s syntax. Actually, we implement Blink SQL Window

following

Calcite’s syntax[1].

In addition, I describe the Blink SQL design including UDF, UDTF, UDAF,
Window in google doc[1]. Hope that can help for the upcoming Flink SQL
design.

+1 for creating FLIP

[1] https://docs.google.com/document/d/15iVc

Re: KafkaProducer can not be instantiated

2016-10-06 Thread Timo Walther
Thanks for the information Tzu-Li. I will mock the FlinkKafkaProducer 
class until this issue is fixed.


Timo


Am 05/10/16 um 17:57 schrieb Tzu-Li (Gordon) Tai:

Sorry, correction to my last statements:
On the consumer side I think the instantiation was already removed from the 
constructor in a recent commit.


On October 5, 2016 at 11:37:41 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) 
wrote:

This matters on the consumer side, yes. Moving the instantiation out of the 
constructor will require such
guarantee that the list fetched individually at subtasks are determinate and 
identical.

On the producer side I don’t really think it matters. Unless the user 
implementations of the provided KafkaPartitioner depends on the ordering of the 
passed partition id array to KafkaPartitioner.open(), though. From the 
interface Javadoc I’m not really sure if there was a contract / guarantee on 
that to the user in the first place.

Otherwise, if we want to be really safe to not break any user code on the 
producer side, then we should also keep the ordering guarantee there too.


On October 5, 2016 at 11:26:43 PM, Chesnay Schepler (ches...@apache.org) wrote:

if you were to move the partition list fetching out of the constructor
int open(), is there any guarantee that for each fetching subtask the
partition list is identical?

On 05.10.2016 17:17, Tzu-Li (Gordon) Tai wrote:

Hi Timo,

I haven’t had the chance to look at the producer side too much yet, but after a 
look in the code,
I think it’s reasonable to remove the instantiation from the producer 
constructor.
The instantiation in the constructor is only used for partition list fetching & 
eager properties validation
before running up the job. With an alternative to do the eager properties 
validation in the constructor without relying on KafkaProducer,
it should be safe to remove it from the constructor.

The consumer side actually has the same problem right now too. I was hoping to 
bundle the fix with a bigger task,
but would probably consider moving it up TODO list so it can be resolved sooner 
as a standalone fix.

Cheers,
Gordon


On October 5, 2016 at 10:51:05 PM, Timo Walther (twal...@apache.org) wrote:

Hey everyone,

I'm currently rewriting the KafkaTabeSinkTest and discovered something
that doesn't seem to be intended: Is it intended that
FlinkKafkaProducer08 cannot be instantiated without a running Kafka
instance?

The constructor of FlinkKafkaProducerBase calls getKafkaProducer() which
actually should be called in the open() method first. What happens if
the Client has no access to the Kafka properties (e.g. using an remote
execution environment)? Then it is impossible to create a KafkaProducer?

Thanks.

Timo






--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



KafkaProducer can not be instantiated

2016-10-05 Thread Timo Walther

Hey everyone,

I'm currently rewriting the KafkaTabeSinkTest and discovered something 
that doesn't seem to be intended: Is it intended that 
FlinkKafkaProducer08 cannot be instantiated without a running Kafka 
instance?


The constructor of FlinkKafkaProducerBase calls getKafkaProducer() which 
actually should be called in the open() method first. What happens if 
the Client has no access to the Kafka properties (e.g. using an remote 
execution environment)? Then it is impossible to create a KafkaProducer?


Thanks.

Timo


Re: Some questions about Table API and FlinkSQL

2016-10-04 Thread Timo Walther

Hi Anton,

1) according to org.apache.calcite.sql.fun.SqlAvgAggFunction " the 
result is the same type" so I think this is standard SQL behavior.
2) This seems to be a code generation bug. The sqrt/power function seems 
not accept the data types. Would be great if you could open an issue if 
it does not yet exists in Jira.


I hope that helps.

Regards,
Timo



Am 04/10/16 um 18:04 schrieb Anton Mushin:

Hello all,

I have some questions about work with FlinkSQL.



1)I'm want calculate average for column values:



val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env, config)
val ds = env.fromElements(
   (1.0f, 1),
   (2.0f, 2)).toTable(tEnv)
 tEnv.registerTable("MyTable", ds)
 val sqlQuery="select avg(_1), avg(_2) from MyTable"
 tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(x=>print(s"$x "))



As result I'm getting: "1.5,1 ". But I expected: "1.5,1.5 "

Why is for columns like integer types avg function is return result as integer? 
Where is described this behavior?​​



2) I wanted calculate stddev_pop function like as sequences sql aggregate 
functions, how it is describe in calcite javadocs: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java#L64



val ds = env.fromElements(

   (1.0f, 1),

   (1.0f, 2)).toTable(tEnv)

 tEnv.registerTable("MyTable", ds)



val sqlQuery = "SELECT " +

   "SQRT((SUM(a*a) - SUM(a)*SUM(a)/COUNT(a))/COUNT(a)) "+

   "from (select _1 as a from MyTable)"

tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(print)

​

I got exception:



 org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.

...

 Caused by: java.lang.Exception: The user defined 'open(Configuration)' 
method in class org.apache.flink.api.table.runtime.FlatMapRunner caused an 
exception: Table program cannot be compiled. This is a bug. Please file an 
issue.

  at 
org.apache.flink.runtime.operators.BatchTask.openUserCode(BatchTask.java:1337)

  at 
org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.openTask(ChainedFlatMapDriver.java:47)

.

 Caused by: org.codehaus.commons.compiler.CompileException: Line 59, Column 57: No applicable constructor/method 
found for actual parameters "float, java.math.BigDecimal"; candidates are: "public static double 
org.apache.calcite.runtime.SqlFunctions.power(long, java.math.BigDecimal)", "public static double 
org.apache.calcite.runtime.SqlFunctions.power(long, long)", "public static double 
org.apache.calcite.runtime.SqlFunctions.power(double, double)"

  at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10062)

  at 
org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:7476)



In this time if I am execute for int column ('_2') i getting result is equals 
'0.0'

What am I doing wrong?



Best regards,

Anton Mushin






--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: [DISCUSS] FLIP-11: Table API Stream Aggregations

2016-09-19 Thread Timo Walther
ok like this:

class TumblingWindow(size: Expression) extends Window {
  def on(time: rowtime.type): TumblingEventTimeWindow =
  new TumblingEventTimeWindow(alias, ‘rowtime, size)// has
allowLateness() method

  def on(time: systemtime.type): TumblingProcessingTimeWindow=
 new TumblingProcessingTimeWindow(alias, ‘systemtime, size)
// hasn’t allowLateness() method
}
object rowtime
object systemtime

What do you think about this?

- Jark Wu


在 2016年9月6日,下午11:00,Timo Walther <twal...@apache.org 
<mailto:twal...@apache.org>> 写道:

Hi all,

I thought about the API of the FLIP again. If we allow the "systemtime"

attribute, we cannot implement a nice method chaining where the user can
define a "allowLateness" only on event time. So even if the user expressed
that "systemtime" is used we have to offer a "allowLateness" method because
we have to assume that this attribute can also be the batch event time
column, which is not very nice.

class TumblingWindow(size: Expression) extends Window {
def on(timeField: Expression): TumblingEventTimeWindow =
   new TumblingEventTimeWindow(alias, timeField, size) // has

allowLateness() method

}

What do you think?

Timo


Am 05/09/16 um 10:41 schrieb Fabian Hueske:

Hi Jark,

you had asked for non-windowed aggregates in the Table API a few times.
FLIP-11 proposes row-window aggregates which are a generalization of
running aggregates (SlideRow unboundedPreceding).

Can you have a look at the FLIP and give feedback whether this is what

you

are looking for?
Improvement suggestions are very welcome as well.

Thank you,
Fabian

2016-09-01 16:12 GMT+02:00 Timo Walther <twal...@apache.org 
<mailto:twal...@apache.org>>:


Hi all!

Fabian and I worked on a FLIP for Stream Aggregations in the Table API.
You can find the FLIP-11 here:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-11% 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%>
3A+Table+API+Stream+Aggregations

Motivation for the FLIP:

The Table API is a declarative API to define queries on static and
streaming tables. So far, only projection, selection, and union are
supported operations on streaming tables.

This FLIP proposes to add support for different types of aggregations

on

top of streaming tables. In particular, we seek to support:

- Group-window aggregates, i.e., aggregates which are computed for a

group

of elements. A (time or row-count) window is required to bound the

infinite

input stream into a finite group.

- Row-window aggregates, i.e., aggregates which are computed for each

row,

based on a window (range) of preceding and succeeding rows.
Each type of aggregate shall be supported on keyed/grouped or
non-keyed/grouped data streams for streaming tables as well as batch

tables.

We are looking forward to your feedback.

Timo



--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr <https://www.linkedin.com/in/twalthr>







--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: Performance and Latency Chart for Flink

2016-09-16 Thread Timo Walther

Hi Amir,

it would be great if you could link to the details of your benchmark 
environment if you make such claims. Compared to which IBM system? 
Characteristics of your machines? Configuration of the software? 
Implementation code? etc.


In general the Beam Runner also adds some overhead compared to native 
Flink jobs.  There are many factors that could affect results. I don't 
know the Linear Road Benchmark but 150 times sounds unrealistic.


Timo


Am 16/09/16 um 10:02 schrieb amir bahmanyari:

FYI, we, at a well known IT department, have been actively measuring Beam Flink 
Runner performance using MIT's Linear Road to stress the Flink Cluster servers.The 
results, thus far does not even come close to the previous streaming engines we 
have bench-marked.Our optimistic assumption was, when we started, that Beam runners 
(Flink for instance) will leave Storm & IBM in smoke.Wrong. What IBM managed to 
perform is 150 times better than Flink. Needless to mention Storm, and 
Hortonworks.As an example, IBM  handled 150 expressways in 3.5 hours.In the same 
identical topology, everything fixed, Beam Flink Runner in a Flink Cluster handled 
10 expressways in 17 hours at its best so far.
I have followed every single performance tuning recommendation that is out there 
& none improved it even a bit.Works fine with 1 expressway. Sorry but thats our 
findings so far unless we are doing something wrong.I posted all details to this 
forum but never got any solid response that would make a difference in our 
observations.Therefore, we assume what we are seeing is the reality which we have 
to report to our superiors.Pls prove us wrong. We still have some time.Thanks.Amir-

   From: Fabian Hueske <fhue...@gmail.com>
  To: "dev@flink.apache.org" <dev@flink.apache.org>
  Sent: Friday, September 16, 2016 12:31 AM
  Subject: Re: Performance and Latency Chart for Flink

Hi,


I am not aware of periodic performance runs for the Flink releases.
I know a few benchmarks which have been published at different points in
time like [1], [2], and [3] (you'll probably find more).

In general, fair benchmarks that compare different systems (if there is
such thing) are very difficult and the results often depend on the use case.
IMO the best option is to run your own benchmarks, if you have a concrete
use case.

Best, Fabian

[1] 08/2015:
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
[2] 12/2015:
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
[3] 02/2016:
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/


2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>:


Hi

Is there any performance run that is done for each Flink release? Or you
are aware of any third party evaluation of performance metrics for Flink?
I am interested in seeing how performance has improved over release to
release, and performance vs other competitors.

Regards
Sumit Chawla







--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: DataStream of Future

2016-09-16 Thread Timo Walther

Hi Albert,

you cannot use Futures between operators as objects are seralialized and 
possibly sent through the cluster immediatly. Right now there is no 
straight forward way in Flink to do async calls. However, there is a 
discussion going on which you might wanna join [1]. As far as I know, 
the current solution is to create a FlatMap function manually which 
manages the async calls and emits [2].


[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-for-Asynchronous-I-O-in-FLINK-td13497.html
[2] 
http://stackoverflow.com/questions/38866078/how-to-look-up-and-update-the-state-of-a-record-from-a-database-in-apache-flink


I hope that helps.

Timo



Am 16/09/16 um 13:16 schrieb Albert Gimenez:

Hello,

  


I just started learning Flink (using Scala) recently, and I developed a job
that, in short, does this steps:

  


-  Reads json messages from Kafka

-  Enriches the messages, reading data from Cassandra (using Phantom
DSL)

-  Puts the enriched messages back to another Kafka topic.

  


The job looks like this:

  


env

 .addSource(new FlinkKafkaProducer09[String](...))

 .map(MyService.enrichMessage _) // Returns Option

 .filter(!_.isEmpty)

 .map(_.get)

 .map(enrichedMessageToJsonMapper)

 .addSink(new FlinkKafkaConsumer09[String](...)))

  


The "enrichMessage" method is where I'm using Phantom DSL to query
Cassandra, and I would like to return a Future, but I can't figure out a way
to do it right now, so I'm using "Await" to force the resolution and return
a result. Is there a way to use a Future here?

  


I do have a second job that is updating the data in Cassandra, and since I
don't need to sink, I can have my map to return the Futures, and everything
happens asynchronously. I would like to know if it's possible to have a
similar behavior when I want to use a Sink (so, sink to Kafka as the Futures
are completed).

  


BTW, I'm using Flink 1.1.2 with Scala 2.11.

  


Thanks a lot for your help!

  


Kind regards,

  


Albert



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus




--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

2016-08-29 Thread Timo Walther
ernal handling of streaming windows in Calcite

than

the

SQL parser. IMO, it should be possible to exchange or modify the

parser

if

we want that.

Regarding Calcite's StreamSQL syntax: Except for the STREAM keyword,
Calcite closely follows the SQL standard (e.g.,no special keywords

like

WINDOW. Instead stream specific aspects like tumbling windows are done

as

functions such as TUMBLE [1]). One main motivation of the Calcite

community

is to have the same syntax for streaming and static tables. This

includes

support for tables which are static and streaming at the same time

(the

example of [1] is a table about orders to which new order records are
added). When querying such a table, the STREAM keyword is required to
distinguish the cases of a batch query which returns a result set and

a

standing query which returns a result stream. In the context of Flink

we

can can do the distinction using the type of the TableEnvironment. So

we

could use the batch parser, but would need to change a couple things
internally and add checks for proper grouping on the timestamp column

when

doing windows, etc. So far the discussion about the StreamSQL syntax

rather

focused on the question whether 1) StreamSQL should follow the SQL

standard

(as Calcite proposes) or 2) whether Flink should use a custom syntax

with

stream specific features. For instance a tumbling window is expressed

in

the GROUP BY clause [1] when following standard SQL but it could be

defined

using a special WINDOW keyword in a custom StreamSQL dialect.

You are right that we have a dependency on Calcite. However, I think

this

dependency is rather in the internals than the parser, i.e., how does

the

validator/optimizer support and handle monotone / quasi-monotone

attributes

and windows. I am not sure how much is already supported but the

Calcite

community is working on this [2]. I think we need these features in

Calcite

unless we want to completely remove our dependency on Calcite for
StreamSQL. I would not be in favor of removing Calcite at this point.

We

put a lot of effort into refactoring the Table API internals. Instead

we

should start to talk to the Calcite community and see how far they

are,

what is missing, and how we can help.

I will start a discussion on the Calcite dev mailing list in the next

days

and ask about the status of StreamSQL.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html#tumbling-

windows-improved

[2] https://issues.apache.org/jira/browse/CALCITE-1345










--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer

2016-08-29 Thread Timo Walther
At first glance, I thought we are losing the possibility to distingish 
between choosing a batch or streaming table if a TableSource implements 
both. Because currently you are using a StreamTableSource as default if 
a TableSource implements both types. I think it would be better to 
determine batch or stream using the type of execution environment. What 
do you think?


Timo


Am 29/08/16 um 14:31 schrieb Jark Wu:

Hi Timo,

Yes, you are right. The STREAM keyword is invalid now. If there is a STREAM keyword in 
the query, the parser will throw "can’t convert table xxx to stream" exception. 
Because we register the table as a regular table not streamable.

- Jark Wu


在 2016年8月29日,下午8:13,Timo Walther <twal...@apache.org> 写道:

Hi Jark,

your code looks good and it also simplifies many parts. So the STREAM keyword 
is not optional but invalid now, right? What happens if there is keyword in the 
query?

Timo


Am 29/08/16 um 05:40 schrieb Jark Wu:

Hi Fabian, Timo,

I have created a prototype for removing STREAM keyword and using batch sql 
parser for stream jobs.

This is the working brach:  https://github.com/wuchong/flink/tree/remove-stream 
<https://github.com/wuchong/flink/tree/remove-stream>

Looking forward to your feedback.

- Jark Wu


在 2016年8月24日,下午4:56,Fabian Hueske <fhue...@gmail.com> 写道:

Starting with a prototype would be great, Jark.
We had some trouble with Calcite's StreamableTable interface anyways. A few
things can be simplified if we do not declare our tables as streamable.
I would try to implement DataStreamTable (and all related classes and
methods) equivalent to DataSetTables if possible.

Best, Fabian

2016-08-24 6:27 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:


Hi Fabian,

You are right, the main thing we need to change for removing STREAM
keyword is the table registration. If you would like, I can do a prototype.

Hi Timo,

I’m glad to contribute our work back to Flink. I will look into it and
create JIRAs next days.

- Jark Wu


在 2016年8月24日,上午12:13,Fabian Hueske <fhue...@gmail.com> 写道:

Hi Jark,

We can think about removing the STREAM keyword or not. In principle,
Calcite should allow the same windowing syntax on streaming and static
tables (this is one of the main goals of Calcite). The Table API can also
distinguish stream and batch without the STREAM keyword by looking at the
ExecutionEnvironment.
I think we would need to change the way that tables are registered in
Calcite's catalog and also add more validation (check that time windows
refer to a time column, etc).
A prototype should help to see what the consequence of removing the

STREAM

keyword (which is actually, changing the table registration, the parser

is

the same) would be.

Regarding streaming aggregates without window definition: We can

certainly

implement this feature in the Table API. There are a few points that need
to be considered like value expiration after a certain time of update
inactivity (otherwise the state might grow infinitely). But these aspects
should be rather easy to solve. I think for SQL, such running aggregates
are a special case of the Sliding Windows as discussed in Calcite's
StreamSQL document [1].

Thanks also for the document! I'll take that into account when sketching
the FLIP for streaming aggregation support.

Cheers, Fabian

[1] http://calcite.apache.org/docs/stream.html#sliding-windows

2016-08-23 13:09 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:


Hi Fabian, Timo,

Sorry for the late response.

Regarding Calcite’s StreamSQL syntax, what I concern is only the STREAM
keyword and no agg-without-window. Which makes different syntax for
streaming and static tables. I don’t think Flink should have a custom

SQL

syntax, but it’s better to have a consistent syntax for batch and
streaming. Regarding window syntax , I think it’s good and reasonable to
follow Calcite’s syntax. Actually, we implement Blink SQL Window

following

Calcite’s syntax[1].

In addition, I describe the Blink SQL design including UDF, UDTF, UDAF,
Window in google doc[1]. Hope that can help for the upcoming Flink SQL
design.

+1 for creating FLIP

[1] https://docs.google.com/document/d/15iVc1781dxYWm3loVQlESYvMAxEzb
buVFPZWBYuY1Ek


- Jark Wu


在 2016年8月23日,下午3:47,Fabian Hueske <fhue...@gmail.com> 写道:

Hi,

I did a bit of prototyping yesterday to check to what extend Calcite
supports window operations on streams if we would implement them for

the

Table API.
For the Table API we do not go through Calcite's SQL parser and

validator,

but generate the logical plan (tree of RelNodes) ourselves mostly using
Calcite's Relbuilder.
It turns out that Calcite does not restrict grouped aggregations on

streams

at this abstraction level, i.e., it does not perform any checks.

I think it should be possible to implement windowed aggregates for the
Table API. Once CALCITE-1345 [1] is implemented (and released),

windowed

aggregates are also supported by the SQL 

Re: RE:[DISCUSS] FLIP-11: Table API Stream Aggregations

2016-10-14 Thread Timo Walther

Hi everyone,

I think syntax in general is a question of taste, it will be hard to 
make everyone happy. On the one hand it would be great if Table API and 
SQL could look consistent, but on the other hand there are also some 
negative aspects:


SQL is a language that has not been developed for todays needs and 
Stream SQL will basically be a "hack" e.g. by using UDFs like TUMBLE, 
HOP etc. However, the Table API is a newly designed API and does not 
need the same hacky solutions.


The Table API should be a fluent API for both Scala and Java. If we are 
moving windows into the groupBy() call, the question is how this would 
look like:


.groupBy('col, tumble(12.hours, 'rowtime, 'alias)) OR .groupBy('col, 
Tumble over 12.hours on 'rowtime as 'alias)


In Java the window definitions would then be defined a string instead of 
method calls, so it is easier to for the user to make mistakes and there 
is no Javadoc with explanation.


I think we should decide whether a window is an operator or an 
expression. If it is an expression we can also allow window definition 
in .over() clauses. What do you think?


I support the idea of introducing partitionBy().

Regards,
Timo




Am 13/10/16 um 13:04 schrieb Zhangrucong:

Hi Fabian:
 What is the strategy for new syntax which calcite does not support? 
The calcite will support it? For example, the row window syntax.

Thank you very much!



-邮件原件-
发件人: Fabian Hueske [mailto:fhue...@gmail.com]
发送时间: 2016年10月13日 18:17
收件人: dev@flink.apache.org
抄送: Sean Wang; Timo Walther
主题: Re: 答复: RE:[DISCUSS] FLIP-11: Table API Stream Aggregations

Hi Zhangrucong,

yes, we want to use Calcite's SQL parser including its window syntax, i.e.,

- the standard SQL OVER windows (in streaming with a few restriction such as no 
different partitionings or orders)
- the GroupBy window functions (TUMBLE, HOP, SESSION).

The GroupBy window function are not implemented in Calcite yet. There is
CALCITE-1345 [1] to track the issue.

As Shaoxuan mentioned, we are not using the STREAM keyword to be SQL compliant.

Best, Fabian

[1] https://issues.apache.org/jira/browse/CALCITE-1345

2016-10-13 12:05 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:


Hi everybody,

happy to see a good discussion here :-) I'll reply to Shaoxuan's mail
first and comment on Zhangrucong question in a separate mail.

Shaoxuan, thanks for the suggestions! I think we all agree that for
SQL we should definitely follow the standard (batch) SQL syntax.
In my opinion, the Table API does not necessarily have to be as close
as possible to SQL but should try to make a few things easier and also
safer (easier is of course subjective).

- GroupBy without windows: These are currently intentionally not
supported and also not part of FLIP-11. Our motivation for not
supporting this, is to guard the user from defining a query that fails
when being executed due to a very memory consuming operation. FLIP-11
provides a way to define such a query as a sliding row window with
unbounded preceding rows. With the upcoming SQL proposal, queries that
consume unbounded memory should be identified and rejected. I would be
in favor of allowing groupBy without windows once the guarding mechanism are in 
place.

- GroupBy with window: I think this is a question of taste. Having a
window() call, makes the feature more explicit in my opinion. However,
I'm not opposed to move the windows into the groupBy clause.
Implementation-wise it should be easy to move the window definition
into to groupBy clause for the Scala Table API. For the Java Table API
we would need to extend the parser quite a bit because windows would
need to be defined as Strings and not via objects.

- RowWindows: The rowWindow() call mimics the standard SQL WINDOW
clause (implemented by PostgreSQL and Calcite) which allows to have "reusable"
window definitions. I think this is a desirable feature. In the
FLIP-11 proposal the over() clause in select() refers to the
predefined windows with aliases. In case only one window is defined,
the over() clause is optional and the same (and only) window is
applied to all aggregates. I think we can make the over() call
mandatory to have the windowing more explicit. It should also be
possible to extend the over clause to directly accept RowWindows
instead of window aliases. I would not make this a priority at the
moment, but a feature that could be later added, because
rowWindow() and over() cover all cases. Similar as for GroupBy with
windows, we would need to extend the parser for the Java Table API though.

Finally, I have an own suggestion:
In FLIP-11, groupBy() is  used to define the partitioning of
RowWindows. I think this should be changed to partitionBy() because
groupBy() groups data and applies an aggregation to all rows of a
group which is not happening here. In original SQL, the OVER clause
features a PARTITION BY clause. We are moving this out of the window
definition, i.e., OVER clause, to enf

Re: Move Row, RowInputFormat to core package

2016-11-25 Thread Timo Walther

Hi Anton,

I would also support the idea of moving Row and RowTypeInfo to Flink 
core. I think there are many real-world use cases where a 
variable-length record that supports null values is required. However, I 
think that those classes needs to be reworked before. They should not 
depend on Scala-related things.


RowTypeInfo should not inherit from CaseClassTypeInfo, the current 
solution with the dummy field names is a hacky solution anyway. Row 
should not inherit from Scala classes.


Regards,
Timo

Am 24/11/16 um 16:46 schrieb Anton Solovev:

Hello,



In Scala case classes can store huge count of fields, it's really helpful for 
reading wide csv files, but It uses only in table api.

what about this issue (https://issues.apache.org/jira/browse/FLINK-2186), 
should we use table api in machine learning library?

To solve the issue #readCsvFile can generate RowInputFormat.

For commodity I added another one constructor in RowTypeInfo 
(https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)

What do you think about add some scala and moving Row to Flink core?





Re: Type problem in RichFlatMapFunction when using GenericArray type

2016-10-11 Thread Timo Walther

I will also have a look at this issue.

Am 11/10/16 um 09:10 schrieb Chesnay Schepler:

Yes, i think a JIRA issue would be good for this.

On 11.10.2016 08:42, Martin Junghanns wrote:

Shall I open an issue for that?

The Exception gets thrown when using
RichFlatJoinFunction or RichFlatMapFunction (updated the Gist)
and the first field of the tuple is an array type.

I can look into it once the issue is there.

Cheers,

Martin


On 10.10.2016 13:39, Chesnay Schepler wrote:

Hello Martin,

Could you include the error you are getting?

Regards,
Chesnay

On 10.10.2016 13:31, Martin Junghanns wrote:

Hi,

I ran into a problem when using generic arrays in a tuple. I wrote 
a minimal program to reproduce the error [1].


The problem seems to be related to the order of tuple fields. When
I switch Tuple2<K[], K> to Tuple2<K, K[]> and perform the join on 
field 0, everything works as expected.


Using Flink 1.1.2.

Cheers,
Martin


[1] https://gist.github.com/s1ck/37aefb19198cd01a8b998fab354c2cfd









--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: Type problem in RichFlatMapFunction when using GenericArray type

2016-10-11 Thread Timo Walther
I identified the problem and opened a issue for it: 
https://issues.apache.org/jira/browse/FLINK-4801



Am 11/10/16 um 15:31 schrieb Timo Walther:

I will also have a look at this issue.

Am 11/10/16 um 09:10 schrieb Chesnay Schepler:

Yes, i think a JIRA issue would be good for this.

On 11.10.2016 08:42, Martin Junghanns wrote:

Shall I open an issue for that?

The Exception gets thrown when using
RichFlatJoinFunction or RichFlatMapFunction (updated the Gist)
and the first field of the tuple is an array type.

I can look into it once the issue is there.

Cheers,

Martin


On 10.10.2016 13:39, Chesnay Schepler wrote:

Hello Martin,

Could you include the error you are getting?

Regards,
Chesnay

On 10.10.2016 13:31, Martin Junghanns wrote:

Hi,

I ran into a problem when using generic arrays in a tuple. I wrote 
a minimal program to reproduce the error [1].


The problem seems to be related to the order of tuple fields. When
I switch Tuple2<K[], K> to Tuple2<K, K[]> and perform the join on 
field 0, everything works as expected.


Using Flink 1.1.2.

Cheers,
Martin


[1] https://gist.github.com/s1ck/37aefb19198cd01a8b998fab354c2cfd












--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: 1.2-SNAPSHOT documentation

2016-12-04 Thread Timo Walther

Hi Ken,

the docs on the website need to be built manually at the moment. So they 
might be out of sync.
If you want the most recent documentation you can checkout the git 
repository and built the docs:


git clone https://github.com/apache/flink.git
./flink/docs/build_docs.sh

Regards,
Timo


Am 04/12/16 um 02:42 schrieb Ken Krugler:

Hi all,

Quick question - how quickly are the online Javadocs updated to match snapshot 
builds?

E.g. 
https://ci.apache.org/projects/flink/flink-docs-master/api/java/index.html?org/apache/flink/streaming/api/datastream/KeyedStream.html
 
<https://ci.apache.org/projects/flink/flink-docs-master/api/java/index.html?org/apache/flink/streaming/api/datastream/KeyedStream.html>

says it’s the 1.2-SNAPSHOT documentation, but it doesn’t match the current 
1.2-SNAPSHOT jars that I’m downloading.

In particular, the RichTimelyFlatMapFunction is still in the documentation, but 
not the ProcessFunction.

So should I be using a different link for snapshot Javadocs?

Thanks,

— Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr



Re: [FLINK-4704 ] Move Table API to org.apache.flink.table

2016-12-05 Thread Timo Walther

Hi Anton,

thanks for bringing up this discussion and volunteering. Yes we should 
do that before the 1.2 release. You can assign the issue to you if you 
like. However, we should first get most of the pull requests in (esp. 
the large PRs). The next release is still some weeks away so it make 
sense to do the refactoring early in January.


User-facing classes (API classes) should be moved to 
org.apache.flink.table.java/scala. All other stuff such as logical 
nodes, expressions, code gen should be in org.apache.flink.table.


Regars,
Timo


Am 05/12/16 um 09:07 schrieb Jark Wu:

+1

I would like to move these classes into `org.apache.flink.table.api` , and move 
Java `BatchTableEnvironment.scala` into `org.apache.flink.table.api.java`,
and Scala `BatchTableEnvironment.scala` into `org.apache.flink.table.api.scala`.
  
- Jark Wu



在 2016年12月4日,上午3:18,Anton Mushin  写道:

Hi devs,

I would to do FLINK-4704 [1]


In this time the special classes like as `BatchTableEnvironment.scala`, 
`Row.scala` and all expressions, codogen classes etc is in 
`org.apache.flink.api.table` package of scala table api module.

Where do need move these classes? Is simple in

`org.apache.flink.table` pacage?


What do you still think about move Table API classes?


[1]https://issues.apache.org/jira/browse/FLINK-4704


Best regards,
Anton Mushin






Re: [FLINK-3615] Add support for non-native SQL types

2017-01-09 Thread Timo Walther

You are right. The issue has been solved. I closed it.

Timo

Am 09/01/17 um 11:58 schrieb Till Rohrmann:

Hi Alex,

I guess that Fabian and Timo (cc'ed) will be able to answer this question.

Cheers,
Till

On Wed, Dec 28, 2016 at 9:38 AM, Alexander Chermenin 
> wrote:


Hi all.

Is FLINK-3615 > an actual
issue now? Or it has been solved in FLINK-3916
>?

Regards,
Alex






Re: [VOTE] Release Apache Flink 1.2.1 (RC1)

2017-03-29 Thread Timo Walther
A user reported that all tumbling and slinding window assigners contain 
a pretty obvious bug about offsets.


https://issues.apache.org/jira/browse/FLINK-6214

I think we should also fix this for 1.2.1. What do you think?

Regards,
Timo


Am 29/03/17 um 11:30 schrieb Robert Metzger:

Hi Haohui,
I agree that we should fix the parallelism issue. Otherwise, the 1.2.1
release would introduce a new bug.

On Tue, Mar 28, 2017 at 11:59 PM, Haohui Mai  wrote:


-1 (non-binding)

We recently found out that all jobs submitted via UI will have a
parallelism of 1, potentially due to FLINK-5808.

Filed FLINK-6209 to track it.

~Haohui

On Mon, Mar 27, 2017 at 2:59 AM Chesnay Schepler 
wrote:


If possible I would like to include FLINK-6183 & FLINK-6184 as well.

They fix 2 metric-related issues that could arise when a Task is
cancelled very early. (like, right away)

FLINK-6183 fixes a memory leak where the TaskMetricGroup was never closed
FLINK-6184 fixes a NullPointerExceptions in the buffer metrics

PR here: https://github.com/apache/flink/pull/3611

On 26.03.2017 12:35, Aljoscha Krettek wrote:

I opened a PR for FLINK-6188: https://github.com/apache/

flink/pull/3616



This improves the previously very sparse test coverage for

timestamp/watermark assigners and fixes the bug.

On 25 Mar 2017, at 10:22, Ufuk Celebi  wrote:

I agree with Aljoscha.

-1 because of FLINK-6188


On Sat, Mar 25, 2017 at 9:38 AM, Aljoscha Krettek <

aljos...@apache.org>

wrote:

I filed this issue, which was observed by a user:

https://issues.apache.org/jira/browse/FLINK-6188

I think that’s blocking for 1.2.1.


On 24 Mar 2017, at 18:57, Ufuk Celebi  wrote:

RC1 doesn't contain Stefan's backport for the Asynchronous snapshots
for heap-based keyed state that has been merged. Should we create

RC2

with that fix since the voting period only starts on Monday? I think
it would only mean rerunning the scripts on your side, right?

– Ufuk


On Fri, Mar 24, 2017 at 3:05 PM, Robert Metzger <

rmetz...@apache.org>

wrote:

Dear Flink community,

Please vote on releasing the following candidate as Apache Flink

version 1.2

.1.

The commit to be voted on:
*732e55bd* (*

http://git-wip-us.apache.org/repos/asf/flink/commit/732e55bd

*)

Branch:
release-1.2.1-rc1

The release artifacts to be voted on can be found at:
*http://people.apache.org/~rmetzger/flink-1.2.1-rc1/
*

The release artifacts are signed with the key with fingerprint

D9839159:

http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:


https://repository.apache.org/content/repositories/orgapacheflink-1116

-


The vote ends on Wednesday, March 29, 2017, 3pm CET.


[ ] +1 Release this package as Apache Flink 1.2.1
[ ] -1 Do not release this package, because ...






Re: TumblingEventTimeWindows with negative offset / Wrong documentation

2017-03-29 Thread Timo Walther

Hi Vladislav,

thank you very much for reporting this. You are right this is bug. I 
opened an issue for it: https://issues.apache.org/jira/browse/FLINK-6214


I think it will be solved soon.

Regards,
Timo


Am 29/03/17 um 16:57 schrieb Vladislav Pernin:

Hi,

The documentation mentions the possibility to use a negative offset with
a TumblingEventTimeWindows :

// daily tumbling event-time windows offset by -8 hours.input
 .keyBy()
 .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
 .();


But the code will throw an IllegalArgumentException :

if (offset < 0 || offset >= size) {
throw new IllegalArgumentException("TumblingEventTimeWindows
parameters must satisfy 0 <= offset < size");
}


Regards,
Vladislav





Re: Flink on yarn passing yarn config params.

2017-03-29 Thread Timo Walther

Hi,

you can pass application tags using `yarn.tags` option. See also here 
for more options: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#yarn


I hope that helps.

Regards,
Timo


Am 29/03/17 um 01:18 schrieb praveen kanamarlapudi:

yarn application tag





Re: [DISCUSS] Project build time and possible restructuring

2017-03-20 Thread Timo Walther
I agress with Aljoscha that we might consider moving from Jenkins to 
Travis. Is there any disadvantage in using Jenkins?


I think we should structure the project according to release management 
(e.g. more frequent releases of libraries) or other criteria (e.g. core 
and non-core) instead of build time. What would happen if the built of 
another submodule would become too long, would we split/restructure 
again and again? If Jenkins solves all our problems we should use it.


Regards,
Timo



Am 20/03/17 um 12:21 schrieb Aljoscha Krettek:

I prefer Jenkins to Travis by far. Working on Beam, where we have good Jenkins 
integration, has opened my eyes to what is possible with good CI integration.

For example, look at this recent Beam PR: https://github.com/apache/beam/pull/2263 
. The Jenkins-Github integration will 
tell you exactly which tests failed and if you click on the links you can look at the 
log output/std out of the tests in question.

This is the overview page of one of the Jenkins Jobs that we have in Beam: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/ 
. This is an 
example of a stable build: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/ 
.
 Notice how it gives you fine grained information about the Maven run. This is an unstable run: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/ 
.
 There you can see which tests failed and you can easily drill down.

Best,
Aljoscha


On 20 Mar 2017, at 11:46, Robert Metzger  wrote:

Thank you for looking into the build times.

I didn't know that the build time situation is so bad. Even with yarn, mesos, 
connectors and libraries removed, we are still running into the build timeout :(

Aljoscha told me that the Beam community is using Jenkins for running the 
tests, and they are planning to completely move away from Travis. I wonder 
whether we should do the same, as having our own Jenkins servers would allow us 
to run tests for more than 50 minutes.

I agree with Stephan that we should keep the yarn and mesos tests in the core 
for stability / testing quality purposes.


On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen > wrote:
@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors, but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan > wrote:


I’d like to use this refactoring opportunity to unspilt the Travis tests.
With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are generic (“libraries”) so that no one module is alone and
independent.

Flink has three “libraries”: cep, ml, and gelly.

“connectors” is a hotspot due to the long-running Kafka tests (and
connectors for three Kafka versions).

Both flink-storm and flink-python have a modest number of number of tests
and could live with the miscellaneous modules in “contrib”.

The YARN tests are long-running and problematic (I am unable to
successfully run these locally). A “cluster” module could host flink-mesos,
flink-yarn, and flink-yarn-tests.

That gets us close to running all tests in a single Travis build.
   https://travis-ci.org/greghogan/flink/builds/212122590 
 <
https://travis-ci.org/greghogan/flink/builds/212122590 
>

I also tested (https://github.com/greghogan/flink/commits/core_build 
 <

Re: FW: [DISCUSS] Table API / SQL indicators for event and processing time

2017-03-20 Thread Timo Walther
Yes, you are right. In the current design the user cannot assign 
timestamp and watermarks in a table program. Operators (such as windows) 
might adapt the metatimestamp, if this is the case this adaption might 
need to be expressed in the query itself too.


E.g. for a tumbling windows we could limit the select part to 
table.select('rowtime.ceil(DAY) as 'newRowtime) (so that logical rowtime 
matches the physical metatimestamp)


Do you have a good example use case that needs the assignment of rowtime 
within a query?


Am 20/03/17 um 13:39 schrieb Radu Tudoran:

Hi,

As suggested by Timo - I am forwarding this to the mailing list. Sorry for not 
having the conversation directly here - I initially thought it might not be of 
interest...

@Timo - thanks for the clarification. I get the main point now which is that 
the rowtime is encoded within the  metadata of the record. I think this is key. 
My view on the matter was maybe a bit updated in the sense that I saw the 
processing pipeline as an input source (as you exemplify - a table scan) and 
from there you have a timestamp and water mark assigner before the processing 
actually starts. So by overriding the timestamp extractor you match the field 
that carries the eventtime/rowtime with the mechanism from flink. But as far as 
I understand this would not be the case anymore...am I right? In case the 
assignment of the rowtime to the metadata of the record is done differently - 
what would be the way to do it?


Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R Division


HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!


-Original Message-
From: Timo Walther [mailto:twal...@apache.org]
Sent: Monday, March 20, 2017 12:29 PM
To: Radu Tudoran
Subject: Re: [DISCUSS] Table API / SQL indicators for event and processing time

You are not bothering me, it is very interesting to compare the design
with real world use cases.

In your use case we would create table like: tEnv.toTable('date, 'time1,
'time2, 'data, 'myrowtime.rowtime)

We would not "overwrite" an actual attribute of the record but only add
logical "myrowtime". In general, just to make it clear again, the
rowtime must be in the metatimestamp of the record (by using a timestamp
extractor before). The Table API assumes that records that enter the
Table API are timestamped correctly. So in your use case, you would
create your own TableSource extract the timestamp based on your 3 time
fields and define an attribute that represents the rowtime logically. In
the current design we want that the Table API relies on Flink's time
handling, because time handling can be very tricky.So we only support
one event-time time field.

But would it be possible to post our discussion on the ML? It might be
interesting for others as well. If yes, can you forward our conversion
to the ML?

Timo



Am 20/03/17 um 12:11 schrieb Radu Tudoran:

Thanks for the replies.

Regarding the ""It might be sometimes that this is not explicit to be guessed" 
That is
why I added the RelTimeConverter. After this conversion step it should
be as explicit as possible (by using the special types). And we can add
special handling of functions (i.e. ceil) that preserve the monotonicity."

..maybe I am missing something so sorry if I just bother you for nothing (it is 
just to make sure we think of all cases before hand). I saw examples of 
applications where you have multiple fields of the same type. For example an 
event can have 3 time fields of TIMESTAMP, 1 of DATE and 2 of TIME (this is 
actually from a real application with some sort fo standard communication 
schema). I was referring to such cases that it is unclear to me how the code 
will identify the exact field to use as rowtime for example. This is what I 
meant about how are we passing indicators to spot the row time field as well as 
what would happen with the code in such a situation as it can identify multiple 
time fields.

Dr. Radu Tudoran
Senior Research Engineer - Big Data Ex

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-03-20 Thread Timo Walther

Hi Radu,

we differentiate rowtime and processing time fields by their field 
types. Both indicators extend the timestamp type. In my protoype I added 
the functions FlinkTypeFactory.isRowtime() and 
FlinkTypeFactory.isProctime() for checking this. If a time indicator has 
been materiatized (e.g. long.cast(STRING)), it becomes a regular 
timestamp (or in this case a string after evaluation). So we cannot 
differentiate between rowtime and proctime anymore. However, we can add 
some exceptions for certain functions (e.g. for ceil() in combination 
with windows) that preserve the time attributes.


Count windows have to be defined over a time attribute. If you take a 
look at the tests of 
org.apache.flink.table.api.scala.stream.table.AggregationsITCase, you 
can see that countWindows are still supported as before. As I said, in 
most of the user-facing API does not change. It only tries to make time 
more explicit.


Timo


Am 20/03/17 um 10:34 schrieb Radu Tudoran:

Hi Timo,

I have some questions regarding your implementation:

" The timestamp (not an indicator anymore) becomes part of the physical row. 
E.g.
long.cast(STRING) would require a materialization "
=> If we have this how are we going to make a difference between rowtime and 
processtime? For supporting some queries/operators you only need to use these time 
indications as markers to have something like below. If you do not get access to 
any sort of unique markers to indicate these than we will have hard time to 
support many implementations. What would be the option to support this condition 
in your implementation
   if(rowtime)
...
   else if(proctime)
...some other implemenetation

"- Windows are only valid if they work on time indicators."
=> Does this mean we can no longer work with count windows? There are a lot of 
queries where windows would be defined based on cardinality of elements.



-Original Message-
From: Timo Walther [mailto:twal...@apache.org]
Sent: Monday, March 20, 2017 10:08 AM
To: dev@flink.apache.org
Subject: Re: [DISCUSS] Table API / SQL indicators for event and processing time

Hi everyone,

for the last two weeks I worked on a solution for the time indicator issue. I 
have implemented a prototype[1] which shows how we can express, track, and 
access time in a consistent way for batch and stream tables.

Main changes of my current solution:

- Processing and rowtime time indicators can be named arbitrarily
- They can be defined as follows: stream.toTable(tEnv, 'long, 'int, 'string, 
'proctime.proctime) or stream.toTable(tEnv, 'long.rowtime, 'int, 'string)
- In a streaming environment: if the "long" field is already defined in the record, it 
will not be read by the runtime. "long" always represents the timestamp of the row.
- In batch environment: "long" must be present in the record and will be read 
by the runtime.
- The table definition looks equivalent in both batch and streaming (better 
unification than current state)
- Internally row types are split up in a logical and a physical row type.
- The logical row type contains time indicators, the physical rowtime never contains time 
indicators (the pure "long" will never be in a record)
- After validation and query decorrelation, a special time indicator converter 
traverses the RelNodes and analyzes if the a time indicator is accessed or only 
forwarded.
- An access to a time indicator means that we need to materialize the rowtime 
using a ProcessFunction (not yet implemented). The timestamp (not an indicator 
anymore) becomes part of the physical row. E.g.
long.cast(STRING) would require a materialization
- Forwarding of time indicators does not materialize the rowtime. It remains a 
logical attribute. E.g. .select('long)
- Windows are only valid if they work on time indicators.

There are still a lot of open question that we can discuss and/or fix in future 
PRs. For now it would be great if you could give some feedback about the 
current implementation. With some exceptions my branch can be built 
successfully.

Regards,
Timo


[1] https://github.com/twalthr/flink/tree/FLINK-5884


Am 02/03/17 um 07:22 schrieb jincheng sun:

Hi,
@Timo, thanks for your replay, and congratulations on your job.
@Fibian, No matter what way to achieve, as long as when the table is
generated or created, identity the field attributes, that is what we want.
I think at this point we are on the same page. We can go ahead.
And very glad to hear That: `the 'rowtime keyword would be removed`,
which is a very important step for keeping Stream and Batch consistent.

Best,
SunJincheng


2017-03-01 17:24 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:


Hi,

@Xingcan
Yes that is right. It is not (easily) possible to change the
watermarks of a stream. All attributes which are used as event-time
timestamps must be aligned with these watermarks. This are only
attributes which are derived from the ori

Re: [DISCUSS] Project build time and possible restructuring

2017-03-20 Thread Timo Walther
Another solution would be to make the Travis builds more efficient. For 
example, we could write a script that determines the modified Maven 
module and only run the test for this module (and maybe transitive 
dependencies). PRs for libraries such as Gelly, Table, CEP or connectors 
would not trigger a compilation of the entire stack anymore. Of course 
this would not solve all problems but many of it.


What do you think about this?



Am 20/03/17 um 14:02 schrieb Robert Metzger:

Aljoscha, do you know how to configure jenkins?
Is Apache INFRA doing that, or are the beam people doing that themselves?

One downside of Jenkins is that we probably need some machines that execute
the tests. A Travis container has 2 CPU cores and 4 GB main memory. We
currently have 10 such containers available on travis concurrently. I think
we would need at least the same amount on Jenkins.


On Mon, Mar 20, 2017 at 1:48 PM, Timo Walther <twal...@apache.org> wrote:


I agress with Aljoscha that we might consider moving from Jenkins to
Travis. Is there any disadvantage in using Jenkins?

I think we should structure the project according to release management
(e.g. more frequent releases of libraries) or other criteria (e.g. core and
non-core) instead of build time. What would happen if the built of another
submodule would become too long, would we split/restructure again and
again? If Jenkins solves all our problems we should use it.

Regards,
Timo



Am 20/03/17 um 12:21 schrieb Aljoscha Krettek:


I prefer Jenkins to Travis by far. Working on Beam, where we have good
Jenkins integration, has opened my eyes to what is possible with good CI
integration.

For example, look at this recent Beam PR: https://github.com/apache/beam
/pull/2263 <https://github.com/apache/beam/pull/2263>. The
Jenkins-Github integration will tell you exactly which tests failed and if
you click on the links you can look at the log output/std out of the tests
in question.

This is the overview page of one of the Jenkins Jobs that we have in
Beam: https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
nService_Flink/ <https://builds.apache.org/job
/beam_PostCommit_Java_RunnableOnService_Flink/>. This is an example of a
stable build: https://builds.apache.org/job/
beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/ <
https://builds.apache.org/job/beam_PostCommit_Java_Runnable
OnService_Flink/lastStableBuild/>. Notice how it gives you fine grained
information about the Maven run. This is an unstable run:
https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
nService_Flink/lastUnstableBuild/ <https://builds.apache.org/job
/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/>. There
you can see which tests failed and you can easily drill down.

Best,
Aljoscha

On 20 Mar 2017, at 11:46, Robert Metzger <rmetz...@apache.org> wrote:

Thank you for looking into the build times.

I didn't know that the build time situation is so bad. Even with yarn,
mesos, connectors and libraries removed, we are still running into the
build timeout :(

Aljoscha told me that the Beam community is using Jenkins for running
the tests, and they are planning to completely move away from Travis. I
wonder whether we should do the same, as having our own Jenkins servers
would allow us to run tests for more than 50 minutes.

I agree with Stephan that we should keep the yarn and mesos tests in the
core for stability / testing quality purposes.


On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen <se...@apache.org
<mailto:se...@apache.org>> wrote:
@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors,
but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling
would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see
what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <c...@greghogan.com > wrote:

I’d like to use this refactoring opportunity to unspilt the Travis tests.

With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant
of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are ge

Re: [DISCUSS] Project build time and possible restructuring

2017-03-21 Thread Timo Walther

So what do we want to move to the libraries repository?

I would propose to move these modules first:

flink-cep-scala
flink-cep
flink-gelly-examples
flink-gelly-scala
flink-gelly
flink-ml

All other modules (e.g. in flink-contrib) are rather connectors. I think 
it would be better to move those in a connectors repository later.


If we are not in a rush, we could do the moving after the 
feature-freeze. This is the time where most of the PR will have been merged.


Timo


Am 20/03/17 um 15:00 schrieb Greg Hogan:

We can add cluster tests using the distribution jar, and will need to do so to 
remove Flink’s dependency on Hadoop. The YARN and Mesos tests would still run 
nightly and running cluster tests should be much faster. As troublesome as 
TravisCI has been, a major driver for this change has been local build time.

I agree with splitting off one repo at a time, but we’ll first need to 
reorganize the core repo if using git submodules as flink-python and 
flink-table would need to first be moved. So I think planning this out first is 
a healthy idea, with the understanding that the plan will be reevaluated.

Any changes to the project structure need a scheduled period, perhaps a week, 
for existing pull requests to be reviewed and accepted or closed and later 
migrated.



On Mar 20, 2017, at 6:27 AM, Stephan Ewen  wrote:

@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors, but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan  wrote:


I’d like to use this refactoring opportunity to unspilt the Travis tests.
With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are generic (“libraries”) so that no one module is alone and
independent.

Flink has three “libraries”: cep, ml, and gelly.

“connectors” is a hotspot due to the long-running Kafka tests (and
connectors for three Kafka versions).

Both flink-storm and flink-python have a modest number of number of tests
and could live with the miscellaneous modules in “contrib”.

The YARN tests are long-running and problematic (I am unable to
successfully run these locally). A “cluster” module could host flink-mesos,
flink-yarn, and flink-yarn-tests.

That gets us close to running all tests in a single Travis build.
  https://travis-ci.org/greghogan/flink/builds/212122590 <
https://travis-ci.org/greghogan/flink/builds/212122590>

I also tested (https://github.com/greghogan/flink/commits/core_build <
https://github.com/greghogan/flink/commits/core_build>) with a maven
parallelism of 2 and 4, with the latter a 6.4% drop in build time.
  https://travis-ci.org/greghogan/flink/builds/212137659 <
https://travis-ci.org/greghogan/flink/builds/212137659>
  https://travis-ci.org/greghogan/flink/builds/212154470 <
https://travis-ci.org/greghogan/flink/builds/212154470>

We can run Travis CI builds nightly to guard against breaking changes.

I also wanted to get an idea of how disruptive it would be to developers
to divide the project into multiple git repos. I wrote a simple python
script and configured it with the module partitions listed above. The usage
string from the top of the file lists commits with files from multiple
partitions and well as the modified files.
  https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>

Accounting for the merging of the batch and streaming connector modules,
and assuming that the project structure has not changed much over the past
15 months, for the following date ranges the listed number of commits would
have been split across repositories.

since "2017-01-01"
36 of 571 commits were mixed

since "2016-07-01"
155 of 1607 commits were mixed

since "2016-01-01"
272 of 2561 commits were mixed

Greg



On Mar 15, 2017, at 1:13 PM, Stephan Ewen  wrote:

@Robert - I think once we know that a separate git repo works well, and
that it actually 

Re: Dropping Java 7 support

2017-07-18 Thread Timo Walther

Hurray! Finally IntStreams, LongStreams, etc. in our stream processor ;-)

Timo

Am 18.07.17 um 16:31 schrieb Stephan Ewen:

Hi all!

Over the last days, there was a longer poll running concerning dropping the
support for Java 7.

The feedback from users was unanimous - in favor of dropping Java 7 and
going ahead with Java 8.

So let's do that!

Greetings,
Stephan

-- Forwarded message --
From: Stephan Ewen 
Date: Tue, Jul 18, 2017 at 4:29 PM
Subject: Re: [POLL] Who still uses Java 7 with Flink ?
To: user 


All right, thanks everyone.

I think the consensus here is clear :-)

On Thu, Jul 13, 2017 at 5:17 PM, nragon 

Re: Using native library in Flink

2017-07-18 Thread Timo Walther

Hi Mike,

do you run Flink locally or in a cluster? You have to make sure that VM 
argument -Djava.library.path is set for all Flink JVMs. Job Manager and 
Task Managers might run in separate JVMs. Make also sure that the 
library is accessible from all node. I don't know what happens if the 
file is accessed by multiple processes/threads at the same time. It 
might also important where you put the static { ... } loading. It should 
be in the Function, because these classes get deserialized on the 
TaskManager.


I hope this helps.

Timo


Am 17.07.17 um 21:30 schrieb Mike Accola:

I am new Flink user just trying to learn a little bit.  I am trying to
incorporate an existing C++ library into a new Flink application.  I am
seeing some strange behavior when trying to link in the native (C++)
library using java via JNI.
  
I am running this on Linux (RHEL6)
  
I can run my application once without error.  Sometimes it will run

successfully a 2nd or 3rd time.  However, eventually on a subsequent run,
I get an exception about the the native library not being found:
  
java.lang.UnsatisfiedLinkError: no dummy2native in java.library.path

 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
 at java.lang.Runtime.loadLibrary0(Runtime.java:870)
 at java.lang.System.loadLibrary(System.java:1122)
 at com.att.flink.tdata.spss.TinyLoader.loadNative(Dummy2.java:10)
  
For debugging purposes for now, my native library does not have any

external references.  It really contains 1 method that essentially does
nothing.
  
The behavior seems to indicate that there is some kind of cleanup being

done that "unloads" the native library.  I suspect this is somehow related
to Flink's implementation of its library cache manager, but I have not
been able to prove this yet.
  
A few more details:
  
- I have a c++ library libdummy2native.so that contains a method that can

be invoked via JNI.
- I have a jar containing a class, called Dummy2.  The Dummy2 constructor
will invoke the JNI method.
- The libdummy2native.so library is invoked with System.loadLibrary() like
this:
  static {System.loadLibrary("dummy2native"); }
- In my simple Flink application, I have extended the ProcessFunction
class.  Within this class, I have overriden processElement method that
declares a Dummy2 object.
- The Dummy2 class can be called and invoked without error when used in a
standalone java program.
  
Any thoughts or ideas on what to try next would be appreciated. Initially,

I'd be happy to be able to just explain this behavior.  I will worry about
fixing it afterwards.
  
Thanks.









Fwd: Re: AVRO Union type support in Flink

2017-07-19 Thread Timo Walther
We have similar checks in our KafkaAvroTableSource, but I could not find 
such a check in AvroTypeInfo. The field should have a generic type, so 
you can work with it. If you want to use it as key, you might have to 
use a mapper before and convert it into a valid key type.


Timo



 Weitergeleitete Nachricht 
Betreff:Re: AVRO Union type support in Flink
Datum:  Wed, 19 Jul 2017 10:26:24 -0400
Von:Vishnu Viswanath <vishnu.viswanat...@gmail.com>
An:     Timo Walther <twal...@apache.org>



Hi Timo,

Thanks for checking that. I did not try yet. My current application uses 
Cascading and it has the limitation that Union cannot contain two 
concrete types - link 
<https://github.com/ScaleUnlimited/cascading.avro/blob/master/scheme/src/main/java/cascading/avro/AvroToCascading.java#L137>, 
so was wondering if I can use Flink. Will give it a try.


Hi Martin,
The documentation is here 
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/connectors.html#avro-support-in-flink>

I use it to create AVRO files from source files in S3 and write to Kafka.

Thanks,
Vishnu


On Wed, Jul 19, 2017 at 5:55 AM, Timo Walther <twal...@apache.org 
<mailto:twal...@apache.org>> wrote:


   Hi Vishnu,

   I took a look into the code. Actually, we should support it.
   However, those types might be mapped to Java Objects that will be
   serialized with our generic Kryo serializer. Have you tested it?

   Regards,
   Timo


   Am 19.07.17 um 06:30 schrieb Martin Eden:

Hey Vishnu,

For those of us on the list that are not very familiar with Flink
and Avro can you give a pointed to the docs you are referring to
and how you intend to use it? Just so we gain understanding as well.

Thanks,
Martin

On Tue, Jul 18, 2017 at 9:12 PM, Vishnu Viswanath
<vishnu.viswanat...@gmail.com
<mailto:vishnu.viswanat...@gmail.com>> wrote:

Hi All,

Does Flink support AVRO union types - Documentation says it
supports nullable types: {"name": "type_double_test", "type":
["null", "double"]}

But my schema has something like : {"name": "union_field",
"type": ["string", "double"]}

Thanks
Vishnu








Re: How to run wordcount program on Multi-node Cluster.

2017-08-07 Thread Timo Walther

Hi Ramanji,

you can find the source code of the examples here:
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java

A general introduction how the cluster execution works can be found here:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/programming-model.html#programs-and-dataflows
https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/runtime.html

It might also be helpful to have a look at the web interface which can 
show you a nice graph of the job.


I hope this helps. Feel free to ask further questions.

Regards,
Timo


Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:

Hello Everyone,

I have followed the steps specified below link to Install & Run Apache
Flink on Multi-node Cluster.

http://data-flair.training/blogs/install-run-deploy-flink-multi-node-cluster/
  used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install

using the command
  " bin/flink run
/home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples/streaming/WordCount.jar"
able to run wordcount, but where can i see which input consider and output
generated?

and how can i specify the input and output paths?

I'm trying to understand how the wordcount will work using Multi-node
Cluster.?

any suggestions will help me further understanding?

Thanks & Regards,
Ramanji.





Re: How to run wordcount program on Multi-node Cluster.

2017-08-07 Thread Timo Walther
Flink is a distributed software for clusters. You need something like a 
distributed file system. So that input file and output files can be 
accessed from all nodes.


Each TM has a log directory where the execution logs are stored.

You can set additional properties to your output format by importing the 
code in your IDE.


Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy:

Hi Timo,
Problem is resolved after copy input file to all tasks managers.

and where should generate outputfile? Is it in jobmanager or task manager?

Where can i see the execution logs to understand how word count done each
task manager?


By the way any option to overwride...?

08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to
FAILED
java.io.IOException: File or directory already exists. Existing files and
directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to
overwrite existing files and directories.
at
org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:763)
at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutPathLocalFS(SafetyNetWrapperFileSystem.java:135)
at
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:231)
at
org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78)
at
org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:61)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:111)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:745)


On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <twal...@apache.org> wrote:


Make sure that the file exists and is accessible from all Flink tasks
managers.


Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:


Thank you Timo.


root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust
er/rama/flink$
*./bin/flink
run ./examples/streaming/WordCount.jar --input
file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out*



Execution of worcountjar gives error...

08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED
java.io.FileNotFoundException: The provided file path
file:/home/root1/hamlet.txt does not exist.
at
org.apache.flink.streaming.api.functions.source.ContinuousFi
leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
at
org.apache.flink.streaming.api.operators.StreamSource.run(
StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(
StreamSource.java:55)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
run(SourceStreamTask.java:95)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)


On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org> wrote:

Hi Ramanji,

you can find the source code of the examples here:
https://github.com/apache/flink/blob/master/flink-examples/
flink-examples-streaming/src/main/java/org/apache/flink/
streaming/examples/wordcount/WordCount.java

A general introduction how the cluster execution works can be found here:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/programming-model.html#programs-and-dataflows
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/runtime.html

It might also be helpful to have a look at the web interface which can
show you a nice graph of the job.

I hope this helps. Feel free to ask further questions.

Regards,
Timo


Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:

Hello Everyone,


I have followed the steps specified below link to Install & Run Apache
Flink on Multi-node Cluster.

http://data-flair.training/blogs/install-run-deploy-flink-
multi-node-cluster/
used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install

using the command
" bin/flink run
/home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
/streaming/WordCount.jar"
able to run wordcount, but where can i see which input consider and
output
generated?

and how can i specify the input and output paths?

I'm trying to understand how the wordcount will work using Multi-node
Cluster.?

any suggestions will help me further understanding?

Thanks & Regards,
Ramanji.







Re: How to run wordcount program on Multi-node Cluster.

2017-08-07 Thread Timo Walther
Make sure that the file exists and is accessible from all Flink tasks 
managers.



Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:

Thank you Timo.


root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Cluster/rama/flink$
*./bin/flink
run ./examples/streaming/WordCount.jar --input
file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out*


Execution of worcountjar gives error...

08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED
java.io.FileNotFoundException: The provided file path
file:/home/root1/hamlet.txt does not exist.
at
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)


On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org> wrote:


Hi Ramanji,

you can find the source code of the examples here:
https://github.com/apache/flink/blob/master/flink-examples/
flink-examples-streaming/src/main/java/org/apache/flink/
streaming/examples/wordcount/WordCount.java

A general introduction how the cluster execution works can be found here:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/programming-model.html#programs-and-dataflows
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/runtime.html

It might also be helpful to have a look at the web interface which can
show you a nice graph of the job.

I hope this helps. Feel free to ask further questions.

Regards,
Timo


Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:

Hello Everyone,

I have followed the steps specified below link to Install & Run Apache
Flink on Multi-node Cluster.

http://data-flair.training/blogs/install-run-deploy-flink-
multi-node-cluster/
   used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install

using the command
   " bin/flink run
/home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
/streaming/WordCount.jar"
able to run wordcount, but where can i see which input consider and output
generated?

and how can i specify the input and output paths?

I'm trying to understand how the wordcount will work using Multi-node
Cluster.?

any suggestions will help me further understanding?

Thanks & Regards,
Ramanji.






Re: [DISCUSS] Stop serving docs for Flink version prior to 1.0

2017-08-23 Thread Timo Walther
I had a offline discussion with Ufuk. I looked into the docs build 
scripts recently, I can take care of removing the old docs. There other 
issues that need to be fixed before the next release as well:


- Drop docs < 1.0
- Make Javadocs with Scala build again
- Build all docs >= 1.0 again (esp. to add the outdated wanring)

Regards,
Timo

Am 23.08.17 um 08:46 schrieb Piotr Nowojski:

+1

Is there a way to serve “current” docs under some permanent link, showing docs 
from whatever is the latest release? Maybe such docs could be indexed by the 
google and ranked higher then any particular release mitigating this issue for 
the future?


On Aug 22, 2017, at 6:48 PM, Till Rohrmann  wrote:

+1

On Tue, Aug 22, 2017 at 6:16 PM, Stefan Richter :

Please go ahead +1

Thank you for taking care!

On Tue, Aug 22, 2017 at 6:02 PM, Aljoscha Krettek 
wrote:


+1

On 22. Aug 2017, at 18:01, Stephan Ewen  wrote:

+1

On Tue, Aug 22, 2017 at 5:58 PM, Ufuk Celebi  wrote:


Quick update: If there are no objections, I will start looking into
how to stop serving the docs for Flink < 1.0 after tomorrow.


On Wed, Aug 16, 2017 at 10:45 PM, Ufuk Celebi  wrote:

Thanks for your early feedback Kostas and Ted.

@Kostas: I have a proposal for a more obvious warning here:
https://github.com/apache/flink/pull/4553 Would appreciate comments

in

that PR too :-)


On Wed, Aug 16, 2017 at 6:56 PM, Ted Yu  wrote:

Without any upgrade path, the old versions are not useful.

+1 on redirecting doc to stable release(s).

On Wed, Aug 16, 2017 at 9:46 AM, Kostas Kloudas <

k.klou...@data-artisans.com

wrote:
Hi Ufuk,

+1

I think that this is a nice change!
Thanks Ufuk for opening the discussion.

I think that broken/redirect links are not an issue,
as either way the information provided is far outdated
(== wrong).

As for the warning, we cannot put it in the header instead
of the footer? Sth like, this page may contain outdated info,
please refer to XXX for an updated version.

Kostas


On Aug 16, 2017, at 6:02 PM, Ufuk Celebi  wrote:

Hey devs,

I would like to stop serving the documentation for all Flink

versions

prior to 1.0.

Do you have any concerns about this?

These Flink versions are very old without any upgrade path to

newer

versions but they still get a lot of traffic and most new users

don't

realize that they are looking at a very outdated docs page.

Even with a stronger warning about being outdated, I don't see a

point

in serving them.

If we choose to do this, I would redirect these ancient pages to

the

docs of the latest stable release and update the release process

Wiki

page to update those redirects etc.

This will mean that some links in old StackOverflow questions and

the

mailing list archives will be broken/redirected. If you have

concerns

about this, please raise your voice in this thread.

– Ufuk

PS: I've created an issue to improve the visibility of the

outdated

warning here https://issues.apache.org/jira/browse/FLINK-7462. I

will

only apply this to versions >= 1.0 for now.










Re: [DISCUSS] Flink 1.4 and time based release

2017-08-23 Thread Timo Walther
I also think we shouldn't publish releases regularly, just to have a 
release regularly.


Maybe we can do time-based releases more flexible: Instead of 
feature-freeze after 3 months, 1 month testing. We could do it like 
feature-freeze 3 months after the last release, unlimited testing. This 
would limit us in not adding too many features, but enables for proper 
testing for robust releases. What do you think?


Regards,
Timo

Am 23.08.17 um 10:26 schrieb Till Rohrmann:

Thanks for starting the discussion Stephan. I agree with you that the last
release was probably a bit hasty due to the constraints we put on ourselves
with the strict time based release. Therefore and because of some of the
incomplete features, I would be in favour of loosening the strict deadline
such that we have more time finishing our work and properly testing the
release. Hard to tell, however, how much more time is needed.

Cheers,
Till

On Tue, Aug 22, 2017 at 6:56 PM, Chen Qin  wrote:


I would be great to avoid immediate 1.x1 bug fixing release. It cause
confusion and raise quality concerns.

Also, is there already way to communicate with Amazon EMR for latest
release speedy available? I may try to find someone work there is needed.

Thanks
Chen


On Aug 22, 2017, at 9:32 AM, Stephan Ewen  wrote:

Hi all!

I want to bring up this discussion because we are approaching the date

when

there would be a feature freeze following the time based release

schedule.

To make it short, I would suggest to not follow the time-based schedule

for

that release. There are a bunch of reasons bringing me to that view:

  - 1.3.0, which was very much pushed by the time-based schedule was not
the best release we ever made. In fact, it had quite a few open issues

that

required an immediate 1.3.1 followup and only 1.3.2 fixed some of them.

  - 1.3.2, which is in some sense what 1.3.0 should have been is only 2
weeks back

  - The delta since the last release is still quite small. One could argue
to make a quick release and then soon another release after that, but
releases still tie up quite a good amount of resources, so that would
introduce a delay for much of the ongoing work. I am doubtful if this is

a

good idea at this point.

  - The current master has still quite a bit of "ongoing work" that is not
in perfect shape for a release, but could use some more weeks to provide
real value to users. Examples are the dependency reworking, network stack
enhancements, speedier state restore efforts, flip-6, exactly-once
sinks/side-effects, and others.


Alternatively, we could do what we did for 1.1 and 1.2, which is making

now

a list of features we want in the release, and then projecting based on
that when we fork off the 1.4 release branch.


What do you think?


Cheers,
Stephan





Re: [DISCUSS] Release Apache Flink 1.3.1

2017-06-19 Thread Timo Walther
I'm working on https://issues.apache.org/jira/browse/FLINK-6896 and 
https://issues.apache.org/jira/browse/FLINK-6881. I try to open a PR for 
both today.


Timo


Am 19.06.17 um 14:54 schrieb Robert Metzger:

Fabian and SunJincheng, it looks like we are cancelling the 1.3.1 RC1.
So there is the opportunity to get the two mentioned JIRAs in.

On Wed, Jun 14, 2017 at 4:16 PM, Robert Metzger  wrote:


I've closed my emails, so I didn't see your messages anymore Fabian.
The RC1 for 1.3.1 is out now. I personally think we should not cancel it
because of these two issues.
If we find more stuff we can do it, but I would like to push out 1.3.1
soon to make the ES5 connector and the fixes to the state descriptors
available.

On Wed, Jun 14, 2017 at 11:22 AM, jincheng sun 
wrote:


Hi @Robert,
I agree with @Fabian.
And thanks for review those PRs. @Fabian.

Cheers,
SunJincheng

2017-06-14 16:53 GMT+08:00 Fabian Hueske :


I don't think that

https://issues.apache.org/jira/browse/FLINK-6886
https://issues.apache.org/jira/browse/FLINK-6896

are blockers but it would be good to include them.
I'll try to review the PRs today and merge them.

Cheers, Fabian

2017-06-13 11:48 GMT+02:00 Till Rohrmann :


I've just merged the fix for this blocker (FLINK-6685).

On Tue, Jun 13, 2017 at 11:21 AM, Aljoscha Krettek <

aljos...@apache.org>

wrote:


A quick Jira search reveals one blocker: https://issues.apache.org/
jira/browse/FLINK-6685?filter=12334772=project%20%3D%
20FLINK%20AND%20priority%20%3D%20Blocker%20AND%20resolution%20%3D%
20Unresolved%20AND%20affectedVersion%20%3D%201.3.0 <
https://issues.apache.org/jira/browse/FLINK-6685?filter=
12334772=project%20=%20FLINK%20AND%20priority%20=%
20Blocker%20AND%20resolution%20=%20Unresolved%20AND%
20affectedVersion%20=%201.3.0>


On 13. Jun 2017, at 10:12, Chesnay Schepler 

wrote:

I would like to include FLINK-6898 and FLINK-6900 in 1.3.1.

They are related to the metric system, and limit the size of

individual

metric name components

as the default window operator names are so long they were causing

issues with file-system based

storages because the components exceeded 255 characters.

They both have open PRs and change 1 and 3 lines respectively, so

it's

very fast to review.

On 13.06.2017 09:33, jincheng sun wrote:

Hi Robert,
  From user mail-list I find 2 bugs as follows:

  https://issues.apache.org/jira/browse/FLINK-6886
  https://issues.apache.org/jira/browse/FLINK-6896

I'm not sure if they are as the release blocker. But I think is

better

to

merged those two PR. into 1.3.1 release.
What do you think? @Fabian, @Timo, @Robert

Best,
SunJincheng


2017-06-13 14:03 GMT+08:00 Tzu-Li (Gordon) Tai <

tzuli...@apache.org

:

I’ve just merged the last blockers for 1.3.1. IMO, the release

process

for

1.3.1 is ready for kick off.


On 8 June 2017 at 10:32:47 AM, Aljoscha Krettek (

aljos...@apache.org

)

wrote:

Yes, there is a workaround, as mentioned in the other thread:
https://lists.apache.org/thread.html/

eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E <
https://lists.apache.org/thread.html/

eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E>. It’s

just a

bit

cumbersome but I agree that it’s not a blocker now.

Best,
Aljoscha

On 8. Jun 2017, at 09:47, Till Rohrmann 

wrote:

There should be an easy work-around for this problem. Start a

standalone

cluster and run the queries against this cluster. But I also

see

that

it

might be annoying for users who used to do it differently. The

basic

question here should be whether we want the users to use the
LocalFlinkMiniCluster in a remote setting (running queries

against

it

from

a different process).

Cheers,
Till

On Wed, Jun 7, 2017 at 4:59 PM, Aljoscha Krettek <

aljos...@apache.org

wrote:


I would also like to raise another potential blocker: it’s

currently

not

easily possible for users to start a job in local mode in the

IDE

and to

then interact with that cluster, say for experimenting with

queryable

state. At least one user walked into this problem already with

the

1.3.0

RC: https://lists.apache.org/thread.html/

eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E <
https://lists.apache.org/thread.html/

eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E>

The reasons I have so far analysed are:
* the local flink cluster starts with HAServices that don’t

allow

external querying, by default. (Broadly spoken)
* the queryable state server is not started in the local flink

mini

cluster anymore and it cannot be configured to do so easily

What do you think?

Best,
Aljoscha

On 7. Jun 2017, at 11:54, Robert Metzger <

rmetz...@apache.org>

wrote:

 From the list [1], not many of the JIRAs have been fixed.
I think it 

Re: [DISCUSS] Release Apache Flink 1.3.1

2017-06-20 Thread Timo Walther

FLINK-6881 and FLINK-6896 are merged. The Table API is ready for a new RC.

Timo

Am 19.06.17 um 17:00 schrieb jincheng sun:

Thanks @Timo!

2017-06-19 22:02 GMT+08:00 Timo Walther <twal...@apache.org>:


I'm working on https://issues.apache.org/jira/browse/FLINK-6896 and
https://issues.apache.org/jira/browse/FLINK-6881. I try to open a PR for
both today.

Timo


Am 19.06.17 um 14:54 schrieb Robert Metzger:

Fabian and SunJincheng, it looks like we are cancelling the 1.3.1 RC1.

So there is the opportunity to get the two mentioned JIRAs in.

On Wed, Jun 14, 2017 at 4:16 PM, Robert Metzger <rmetz...@apache.org>
wrote:

I've closed my emails, so I didn't see your messages anymore Fabian.

The RC1 for 1.3.1 is out now. I personally think we should not cancel it
because of these two issues.
If we find more stuff we can do it, but I would like to push out 1.3.1
soon to make the ES5 connector and the fixes to the state descriptors
available.

On Wed, Jun 14, 2017 at 11:22 AM, jincheng sun <sunjincheng...@gmail.com
wrote:

Hi @Robert,

I agree with @Fabian.
And thanks for review those PRs. @Fabian.

Cheers,
SunJincheng

2017-06-14 16:53 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:

I don't think that

https://issues.apache.org/jira/browse/FLINK-6886
https://issues.apache.org/jira/browse/FLINK-6896

are blockers but it would be good to include them.
I'll try to review the PRs today and merge them.

Cheers, Fabian

2017-06-13 11:48 GMT+02:00 Till Rohrmann <trohrm...@apache.org>:

I've just merged the fix for this blocker (FLINK-6685).

On Tue, Jun 13, 2017 at 11:21 AM, Aljoscha Krettek <


aljos...@apache.org>
wrote:

A quick Jira search reveals one blocker: https://issues.apache.org/

jira/browse/FLINK-6685?filter=12334772=project%20%3D%
20FLINK%20AND%20priority%20%3D%20Blocker%20AND%20resolution%20%3D%
20Unresolved%20AND%20affectedVersion%20%3D%201.3.0 <
https://issues.apache.org/jira/browse/FLINK-6685?filter=
12334772=project%20=%20FLINK%20AND%20priority%20=%
20Blocker%20AND%20resolution%20=%20Unresolved%20AND%
20affectedVersion%20=%201.3.0>

On 13. Jun 2017, at 10:12, Chesnay Schepler <ches...@apache.org>
wrote:
I would like to include FLINK-6898 and FLINK-6900 in 1.3.1.

They are related to the metric system, and limit the size of


individual

metric name components

as the default window operator names are so long they were causing


issues with file-system based


storages because the components exceeded 255 characters.

They both have open PRs and change 1 and 3 lines respectively, so


it's

very fast to review.

On 13.06.2017 09:33, jincheng sun wrote:


Hi Robert,
   From user mail-list I find 2 bugs as follows:

   https://issues.apache.org/jira/browse/FLINK-6886
   https://issues.apache.org/jira/browse/FLINK-6896

I'm not sure if they are as the release blocker. But I think is


better

to

merged those two PR. into 1.3.1 release.

What do you think? @Fabian, @Timo, @Robert

Best,
SunJincheng


2017-06-13 14:03 GMT+08:00 Tzu-Li (Gordon) Tai <


tzuli...@apache.org

:

I’ve just merged the last blockers for 1.3.1. IMO, the release

process

for


1.3.1 is ready for kick off.


On 8 June 2017 at 10:32:47 AM, Aljoscha Krettek (


aljos...@apache.org

)


wrote:

Yes, there is a workaround, as mentioned in the other thread:
https://lists.apache.org/thread.html/


eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E <

https://lists.apache.org/thread.html/


eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E>. It’s

just a

bit

cumbersome but I agree that it’s not a blocker now.

Best,
Aljoscha


On 8. Jun 2017, at 09:47, Till Rohrmann <trohrm...@apache.org>


wrote:

There should be an easy work-around for this problem. Start a

standalone

cluster and run the queries against this cluster. But I also

see

that

it


might be annoying for users who used to do it differently. The

basic

question here should be whether we want the users to use the

LocalFlinkMiniCluster in a remote setting (running queries


against

it

from

a different process).

Cheers,
Till

On Wed, Jun 7, 2017 at 4:59 PM, Aljoscha Krettek <


aljos...@apache.org

wrote:

I would also like to raise another potential blocker: it’s
currently

not


easily possible for users to start a job in local mode in the

IDE

and to

then interact with that cluster, say for experimenting with

queryable

state. At least one user walked into this problem already with

the

1.3.0

RC: https://lists.apache.org/thread.html/

eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E <

https://lists.apache.org/thread.html/


eb7e256146fbe069a4210e1690fac5

d3453208fab61515ab1a2f6bf7@%3Cuser.flink.apache.org%3E>

The reasons I have so far analysed are:
* the local flink cluster starts with HAServices that don’t


allow

external querying, by default. (Broadly spoken)

* the queryable s

Re: [DISCUSS] Release 1.3.2 planning

2017-06-26 Thread Timo Walther
I just opened a PR which should be included in the next bug fix release 
for the Table API:

https://issues.apache.org/jira/browse/FLINK-7005

Timo

Am 23.06.17 um 14:09 schrieb Robert Metzger:

Thanks Haohui.

The first main task for the release management is to come up with a
timeline :)
Lets just wait and see which issues get reported. There are currently no
blockers set for 1.3.1 in JIRA.

On Thu, Jun 22, 2017 at 6:47 PM, Haohui Mai  wrote:


Hi,

Release management is though, I'm happy to help. Are there any timelines
you have in mind?

Haohui
On Fri, Jun 23, 2017 at 12:01 AM Robert Metzger 
wrote:


Hi all,

with the 1.3.1 release on the way, we can start thinking about the 1.3.2
release.

We have already one issue that should go in there:
- https://issues.apache.org/jira/browse/FLINK-6964

If there are any other blockers, let us know here :)

I'm wondering if there's somebody from the community who's willing to

take

care of the release management of 1.3.2 :)





Re: [VOTE] Release Apache Flink 1.3.1 (RC2)

2017-06-22 Thread Timo Walther

+1 (binding)

I tested the following:

- built from source
- tested the web interface
- ran some streaming programs

It seems that Flink cannot be built if the path contains spaces. I added 
an issue for this (https://issues.apache.org/jira/browse/FLINK-6987).


It seems that this error was present before Flink 1.3.0. That's why I 
give my +1 anyway.


Regards,
Timo

Am 22.06.17 um 15:53 schrieb Greg Hogan:

+1 (binding)

- verified source and binary signatures
- verified source and binary checksums
- verified LICENSEs
- verified NOTICEs
- built from source

Greg


On Jun 21, 2017, at 3:46 AM, Robert Metzger  wrote:

Dear Flink community,

Please vote on releasing the following candidate as Apache Flink version
1.3.1.

The commit to be voted on:
*http://git-wip-us.apache.org/repos/asf/flink/commit/1ca6e5b6
*

Branch:
release-1.3.1-rc2

The release artifacts to be voted on can be found at:
*http://people.apache.org/~rmetzger/flink-1.3.1-rc2/
*

The release artifacts are signed with the key with fingerprint D9839159:
http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:
*https://repository.apache.org/content/repositories/orgapacheflink-1125
*


-


The vote ends on Thursday (5pm CEST), June 22, 2016.
IMPORTANT: I've reduced the voting time to only one day because the number
of changes between RC1 and RC2 are mostly in the table API (mostly
documentation) and the serializer changes Till and Gordon were working on.
The list of changes is the following
- Reworked Table API documentation (this is a set of commits)
- [FLINK-6817] [table] Add OverWindowWithPreceding class to guide users
- [FLINK-6859] [table] Do not delete timers in StateCleaningCountTrigger
- [FLINK-6930] [table] Forbid selecting window start/end on row-based T…
- [FLINK-6886] [table] Fix conversion of Row Table to POJO
- [FLINK-6602] [table] Prevent TableSources with empty time attribute n…
- [FLINK-6941] [table] Validate that start and end window properties ar…
- [FLINK-6881] [FLINK-6896] [table] Creating a table from a POJO and de…
- [FLINK-6921] [serializer] Allow EnumValueSerializer to deal with appe…
- [FLINK-6948] [serializer] Harden EnumValueSerializer to detect change…
- [FLINK-6922] [serializer] Remove Java serialization from Enum(Value)S…
- [FLINK-6652] [core] Fix handling of delimiters split by buffers in De…



[ ] +1 Release this package as Apache Flink 1.3.1
[ ] -1 Do not release this package, because ...





Re: [DISCUSS] Release 1.3.0 RC1 (Non voting, testing release candidate)

2017-05-24 Thread Timo Walther
I tested the quickstarts, the SBT build, the PlanVisualizer, and the 
HistoryServer. I could not find any serious problems. However, we have 
to update the quickstart scripts, once 1.3 is released.


Timo


Am 24.05.17 um 16:05 schrieb Chesnay Schepler:
I've found a small problem in the yarn user-jar handling. We recently 
added a switch to disable the user-jar inclusion in the class path, 
which isn't working.


PR is open, but i wouldn't consider this a release-blocker. 
https://github.com/apache/flink/pull/3979


On 24.05.2017 15:45, Kostas Kloudas wrote:

Thanks Robert for pushing this.
This is going to be a voting RC?

Kostas

On May 24, 2017, at 3:16 PM, Robert Metzger  
wrote:


Great!
The fixes are in, and I'm now building the RC2 (There are currently no
blockers in our JIRA)

On Tue, May 23, 2017 at 9:37 PM, Stefan Richter 
:

It appears that rescaling is broken, I've filed

https://issues.apache.org/jira/browse/FLINK-6690 for that.

needless to say, this is a release blocker.

On 23.05.2017 20:55, Robert Metzger wrote:
I know I'm talking to myself here :) Anyways, I was running into 
some

issues while creating the release (I was using master instead of the
release-1.3 branch, which lead to some issues with the scala 2.10 
/ 2.11

switch).

The RC2 is basically ready, however, there's at least one new 
blocker:
https://issues.apache.org/jira/browse/FLINK-6685 which needs 
addressing

first.

Let me know if you want me to publish the new RC2. Otherwise, I'll

re-do it

with the fix included.

On Tue, May 23, 2017 at 10:35 AM, Robert Metzger 


wrote:


I've started building the RC.

On Mon, May 22, 2017 at 6:01 PM, Robert Metzger 


wrote:


Gordon's PR has been merged. I forgot one blocking issue. Till

created a

PR for it: https://issues.apache.org/jira/browse/FLINK-6328
Once travis has passed, I'll merge that one and then do the RC.

On Mon, May 22, 2017 at 10:36 AM, Robert Metzger 


wrote:


Hi Robert,

I did the following checks and found no issues:

  - Check if checksums and GPG files match the corresponding 
release

files
  - Verify that the source archives do not contain any binaries
  - Check if the source release is building properly with Maven
(including
license header check and checkstyle). Also the tests should be

executed

(mvn clean verify).
  - Check build for custom Hadoop version (2.3.0, 2.4.1, 2.6.3,

2.7.2)

  - Check build for Scala 2.11
  - Check that the README.md file is meaningful

thanks
Xiaowei

On Fri, May 19, 2017 at 6:29 PM, Chesnay Schepler <

ches...@apache.org>

wrote:


Whoops, this is the PR for enabling the test:
https://github.com/apache/flink/pull/3844


On 19.05.2017 12:14, Robert Metzger wrote:


Thank you for all your input.

@Chesnay, in your email you are pointing to the same PR twice:
This PR fixes the compilation on Windows: (reviewed once, most

recent

changes not reviewed)
https://github.com/apache/flink/pull/3854
This PR enables a test for savepoint compatibility: (nice 
to have,

easy to

review)
https://github.com/apache/flink/pull/3854

Also the "should define more than one task slot" thing is not

important

IMO.

I think the "empty path on windows" thing is not a release

blocker.

--

These are the issues mentioned in the thread that are still 
open

and

blockers:
- Add nested serializers to config snapshots of composite

serializers:

https://github.com/apache/flink/pull/3937 has no review yet
- FLINK-6610 


WebServer
could not be created,when set the 
"jobmanager.web.submit.enable"

to

false
- FLINK-6629 


ClusterClient
cannot submit jobs to HA cluster if address not set in

configuration



On Fri, May 19, 2017 at 12:17 AM, Till Rohrmann <

trohrm...@apache.org>

wrote:

I might have found another blocker:

https://issues.apache.org/jira/browse/FLINK-6629.

The issue is that the ClusterClient only allows to submit 
jobs to

an HA

cluster if you have specified the JobManager's address in the
flink-conf.yaml or via the command line options. If no 
address is

set,

then
it fails completely. If the wrong address is set, which can

easily

happen

in an HA setting, then we are not able to find the proper

connecting

address for the ActorSystem. This basically voids Flink's HA
capabilities.

Cheers,
Till

On Thu, May 18, 2017 at 10:23 PM, Chesnay Schepler <


Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-31 Thread Timo Walther
I don't think that FLINK-6780 is a blocker, because the Table API is 
still a new feature. FLINK-6736 was also a hard bug. However, if there 
will be a RC4, a fix should be included.


Regards,
Timo


Am 31.05.17 um 02:55 schrieb Haohui Mai:

Hi,

We have discovered https://issues.apache.org/jira/browse/FLINK-6780 which
effectively makes external catalogs in the table API very difficult to use.

It may not be a show stopper but in my opinion it is worth a fix before the
release.

Regards,
Haohui

On Tue, May 30, 2017 at 11:22 AM Till Rohrmann  wrote:


Just some thoughts concerning the cons for cancelling RC3:

- Technically, the release is already delayed since the official release
date was the 26th of May
- Not sure whether it's a good argument to defer fixing major bugs because
they have not been introduced with 1.3.0. It's actually alarming that these
things have not been found earlier given that we test our releases
thoroughly.
- The shared serializer surfaced in the form of a cryptic
ArrayIndexOutOfBoundsException. Only if you realize that this is related to
a shared StateDescriptor you can look for a workaround. It took me 2h to
realize that.

Cheers,
Till

On Tue, May 30, 2017 at 7:02 PM, Robert Metzger 
wrote:


The vote time is over, but I'll keep it open for a bit longer until we've
decided regarding Till's issue.

On Tue, May 30, 2017 at 6:10 PM, Robert Metzger 
wrote:


Hi Till,
good catch! That is definitively a severe issue. Probably it didn't
surface yet, because
a) the code example in the documentation is using a new instance for

each

state descriptor
b) people are using stateless serializers?
c) don't have the same state descriptor on the same machine

I see two options how to handle the situation
1) Cancel RC3 and do another vote (potentially with a 24 hrs vote time)
2) Release RC3 as 1.3.0 and start the vote for 1.3.1 right afterwards.


+ Pros and - cons for cancelling RC3
- The release would be delayed (not sure who's expecting the 1.3.0 to

be

available on time)
- The bug has been there since many releases, probably no user is

affected

and it was not introduced during the rel 1.3.0 cycle.
- There is a workaround for the issue
+ We would have a better feeling for the 1.3.0 release because there

are

no known critical issues.

+ pro and - cons for releasing RC3:
+ there are some other "minor" issues that showed up during the 1.3.0
testing that could go into 1.3.1 (FLINK-6763
, FLINK-6764
) without too much
time-pressure (I'm happy to manage the 1.3.1 release and start it

tomorrow)


I'm undecided between both options and more than happy to hear your
opinion.



On Tue, May 30, 2017 at 4:18 PM, Till Rohrmann 
wrote:


I might have found a blocking issue [1]. The problem is that a
StateDescriptor cannot be shared by multiple subtasks because they

don't

duplicate their serializer. As a consequence, things break if you

have a

stateful serializer. The problem exists since 1.0. However, given that
this
issue is really hard to debug for the user and one can easily fall

into

this trap, I would like to fix it for the release.

[1] https://issues.apache.org/jira/browse/FLINK-6775

Cheers,
Till

On Tue, May 30, 2017 at 4:01 PM, Greg Hogan 

wrote:

+1 (binding)

- verified source and binary signatures
- verified source and binary checksums
- verified LICENSEs
- verified NOTICEs
- built from source

Greg



On May 26, 2017, at 12:58 PM, Robert Metzger 

(*http://git-wip-us.apache.org/repos/asf/flink/commit/760eea8a
*)

Branch:
release-1.3.0-rc3

The release artifacts to be voted on can be found at:
http://people.apache.org/~rmetzger/flink-1.3.0-rc3


The release artifacts are signed with the key with fingerprint

D9839159:

http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:
*https://repository.apache.org/content/repositories/orgapach

eflink-1122



Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-31 Thread Timo Walther
We should also include FLINK-6783. It seems that 
WindowedStream::aggregate is broken right now.



Am 31.05.17 um 14:31 schrieb Timo Walther:

I merged all Table API related PRs.

I'm also fine with a 1.3.1 release this or next week.


Am 31.05.17 um 14:08 schrieb Till Rohrmann:

I would be ok to quickly release 1.3.1 once the the respective PRs have
been merged.

Just for your information, I'm not yet through with the testing of 
the type

serializer upgrade feature, though.

Cheers,
Till

On Wed, May 31, 2017 at 12:14 PM, Stefan Richter <
s.rich...@data-artisans.com> wrote:


+1 for releasing now and providing a 1.3.1 release soon.


Am 31.05.2017 um 11:02 schrieb Gyula Fóra <gyula.f...@gmail.com>:

Hi All,

I also lean towards getting the release out as soon as possible given

that

it had been delayed quite a bit and there is no major issue without a
straightforward workaround (agreeing with Nico and Kostas). I am sure

once

people will start using the new features we will see more issues that
should be fixed asap in 1.3.1.

Regarding the critical bug Till had found, we could add a line 
about it

to

the release notes so that people don't get blocked by it as there is a
workaround possible.

Regards,
Gyula


Kostas Kloudas <k.klou...@data-artisans.com> ezt írta (időpont: 2017.

máj.

31., Sze, 10:53):


Hi all,

I also tend to agree with the argument that says a release should 
be out
as soon as possible, given that 1) it improves 
usability/functionality

and
2) at a minimum, it does not include new known bugs. The arguments 
are

more or less aligned with Nico’s response on the matter.

Focusing on the bug that spiked the current discussion, I agree with

Till
that this is alarming, as it passed all previous testing efforts, 
but I

have to
add that if nobody so far encountered it, we could release 1.3 now 
and

fix

it in the upcoming 1.3.1.

Kostas


On May 31, 2017, at 10:20 AM, Nico Kruber <n...@data-artisans.com>

wrote:
IMHO, any release that improves things and does not break 
anything is

worth

releasing and should not be blocked on bugs that it did not cause.
There will always be a next (minor/major) release that may fix 
this at

a

later

time, given that the time between releases is not too high.

Consider someone waiting for a bugfix/feature that made it into 
1.3.0

who--if
delayed--would have to wait even longer for "his" bugfix/feature. 
Any

new

bugfixes (and there will always be more) can wait a few more days or

even a few

weeks and may be fixed in 1.3.1 or so.


Nico

On Tuesday, 30 May 2017 20:21:41 CEST Till Rohrmann wrote:

- Not sure whether it's a good argument to defer fixing major bugs

because
they have not been introduced with 1.3.0. It's actually alarming 
that

these

things have not been found earlier given that we test our releases
thoroughly.








Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-31 Thread Timo Walther
What do you think about waiting with the release announcement for Flink 
1.3.1 until next week.


IMHO the documentation is not in a good shape for a release annoucement 
right now anyway.


Most of the new features of the Table API are not documented. Docs for 
other features are missing as well or exist in open PR [1].


Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-6674


Am 31.05.17 um 15:03 schrieb Aljoscha Krettek:

Yes, FLINK-6783 might even have been a release blocker…. It’s a new feature 
that simply doesn’t work in most cases.


On 31. May 2017, at 14:51, Timo Walther <twal...@apache.org> wrote:

We should also include FLINK-6783. It seems that WindowedStream::aggregate is 
broken right now.


Am 31.05.17 um 14:31 schrieb Timo Walther:

I merged all Table API related PRs.

I'm also fine with a 1.3.1 release this or next week.


Am 31.05.17 um 14:08 schrieb Till Rohrmann:

I would be ok to quickly release 1.3.1 once the the respective PRs have
been merged.

Just for your information, I'm not yet through with the testing of the type
serializer upgrade feature, though.

Cheers,
Till

On Wed, May 31, 2017 at 12:14 PM, Stefan Richter <
s.rich...@data-artisans.com> wrote:


+1 for releasing now and providing a 1.3.1 release soon.


Am 31.05.2017 um 11:02 schrieb Gyula Fóra <gyula.f...@gmail.com>:

Hi All,

I also lean towards getting the release out as soon as possible given

that

it had been delayed quite a bit and there is no major issue without a
straightforward workaround (agreeing with Nico and Kostas). I am sure

once

people will start using the new features we will see more issues that
should be fixed asap in 1.3.1.

Regarding the critical bug Till had found, we could add a line about it

to

the release notes so that people don't get blocked by it as there is a
workaround possible.

Regards,
Gyula


Kostas Kloudas <k.klou...@data-artisans.com> ezt írta (időpont: 2017.

máj.

31., Sze, 10:53):


Hi all,

I also tend to agree with the argument that says a release should be out
as soon as possible, given that 1) it improves usability/functionality

and

2) at a minimum, it does not include new known bugs. The arguments are
more or less aligned with Nico’s response on the matter.

Focusing on the bug that spiked the current discussion, I agree with

Till

that this is alarming, as it passed all previous testing efforts, but I
have to
add that if nobody so far encountered it, we could release 1.3 now and

fix

it in the upcoming 1.3.1.

Kostas


On May 31, 2017, at 10:20 AM, Nico Kruber <n...@data-artisans.com>

wrote:

IMHO, any release that improves things and does not break anything is

worth

releasing and should not be blocked on bugs that it did not cause.
There will always be a next (minor/major) release that may fix this at

a

later

time, given that the time between releases is not too high.

Consider someone waiting for a bugfix/feature that made it into 1.3.0

who--if

delayed--would have to wait even longer for "his" bugfix/feature. Any

new

bugfixes (and there will always be more) can wait a few more days or

even a few

weeks and may be fixed in 1.3.1 or so.


Nico

On Tuesday, 30 May 2017 20:21:41 CEST Till Rohrmann wrote:

- Not sure whether it's a good argument to defer fixing major bugs

because

they have not been introduced with 1.3.0. It's actually alarming that

these

things have not been found earlier given that we test our releases
thoroughly.





Re: [VOTE] Release Apache Flink 1.3.0 (RC3)

2017-05-31 Thread Timo Walther

I merged all Table API related PRs.

I'm also fine with a 1.3.1 release this or next week.


Am 31.05.17 um 14:08 schrieb Till Rohrmann:

I would be ok to quickly release 1.3.1 once the the respective PRs have
been merged.

Just for your information, I'm not yet through with the testing of the type
serializer upgrade feature, though.

Cheers,
Till

On Wed, May 31, 2017 at 12:14 PM, Stefan Richter <
s.rich...@data-artisans.com> wrote:


+1 for releasing now and providing a 1.3.1 release soon.


Am 31.05.2017 um 11:02 schrieb Gyula Fóra :

Hi All,

I also lean towards getting the release out as soon as possible given

that

it had been delayed quite a bit and there is no major issue without a
straightforward workaround (agreeing with Nico and Kostas). I am sure

once

people will start using the new features we will see more issues that
should be fixed asap in 1.3.1.

Regarding the critical bug Till had found, we could add a line about it

to

the release notes so that people don't get blocked by it as there is a
workaround possible.

Regards,
Gyula


Kostas Kloudas  ezt írta (időpont: 2017.

máj.

31., Sze, 10:53):


Hi all,

I also tend to agree with the argument that says a release should be out
as soon as possible, given that 1) it improves usability/functionality

and

2) at a minimum, it does not include new known bugs. The arguments are
more or less aligned with Nico’s response on the matter.

Focusing on the bug that spiked the current discussion, I agree with

Till

that this is alarming, as it passed all previous testing efforts, but I
have to
add that if nobody so far encountered it, we could release 1.3 now and

fix

it in the upcoming 1.3.1.

Kostas


On May 31, 2017, at 10:20 AM, Nico Kruber 

wrote:

IMHO, any release that improves things and does not break anything is

worth

releasing and should not be blocked on bugs that it did not cause.
There will always be a next (minor/major) release that may fix this at

a

later

time, given that the time between releases is not too high.

Consider someone waiting for a bugfix/feature that made it into 1.3.0

who--if

delayed--would have to wait even longer for "his" bugfix/feature. Any

new

bugfixes (and there will always be more) can wait a few more days or

even a few

weeks and may be fixed in 1.3.1 or so.


Nico

On Tuesday, 30 May 2017 20:21:41 CEST Till Rohrmann wrote:

- Not sure whether it's a good argument to defer fixing major bugs

because

they have not been introduced with 1.3.0. It's actually alarming that

these

things have not been found earlier given that we test our releases
thoroughly.








Re: flink-table sql overlaps time.scala

2017-05-31 Thread Timo Walther

Hi Stefano,

I implemented the overlap according to Calcite's implementation. Maybe 
they changed the behavior in the mean time. I agree we should try to 
stay in sync with Calcite. What do other DB vendors do? Feel free to 
open an issue about this.


Regards,
Timo


Am 30.05.17 um 14:24 schrieb Stefano Bortoli:

Hi all,

I am playing around with the table API, and I have a doubt about temporal 
operator overlaps. In particular, a test in the 
scalarFunctionsTest.testOverlaps checks for false the following intervals:
testAllApis(
   temporalOverlaps("2011-03-10 05:02:02".toTimestamp, 0.second,
 "2011-03-10 05:02:02".toTimestamp, "2011-03-10 05:02:01".toTimestamp),
   "temporalOverlaps(toTimestamp('2011-03-10 05:02:02'), 0.second, " +
 "'2011-03-10 05:02:02'.toTimestamp, '2011-03-10 
05:02:01'.toTimestamp)",
   "(TIMESTAMP '2011-03-10 05:02:02', INTERVAL '0' SECOND) OVERLAPS " +
 "(TIMESTAMP '2011-03-10 05:02:02', TIMESTAMP '2011-03-10 05:02:01')",
   "false")

Basically, the compared intervals overlap just by one of the extreme. The 
interpretation of the time.scala implementation is
AND(
 >=(DATETIME_PLUS(CAST('2011-03-10 
05:02:02'):TIMESTAMP(3) NOT NULL, 0), CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT 
NULL),
 >=(CAST('2011-03-10 05:02:01'):TIMESTAMP(3) NOT NULL, 
CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL)
),

Where the result is false as the second clause is not satisfied.

However, latest calcite master compiles the overlaps as follows:
[AND
 (
 >=(  CASE(
 <=(2011-03-10 05:02:02, 
DATETIME_PLUS(2011-03-10 05:02:02, 0)), DATETIME_PLUS(2011-03-10 05:02:02, 0), 
2011-03-10 05:02:02
 ),
 CASE(
 <=(2011-03-10 05:02:02, 
2011-03-10 05:02:01), 2011-03-10 05:02:02, 2011-03-10 05:02:01
 )
 ),
 >=(  CASE(
 <=(2011-03-10 05:02:02, 
2011-03-10 05:02:01), 2011-03-10 05:02:01, 2011-03-10 05:02:02
 ),
 CASE(
 <=(2011-03-10 05:02:02, 
DATETIME_PLUS(2011-03-10 05:02:02, 0)), 2011-03-10 05:02:02, 
DATETIME_PLUS(2011-03-10 05:02:02, 0)
 )
 )
 )
]

Where the result is true.

I believe the issue is about interpreting the extremes as part of the 
overlapping intervals or not. Flink does not consider the intervals as 
overlapping (as the test shows), whereas Calcite implements the test including 
them.

Which one should be preserved?

I think that calcite implementation is correct, and overlapping extremes should 
be considered. What do you think?

Best,
Stefano





Fwd: Re: Flink Watermark and timing

2017-10-04 Thread Timo Walther

Hi Björn,

the behavior of borderlines is defined clearly by the API: "start 
timestamp (inclusive) and an end timestamp (exclusive)". So it is always 
[0-4] [5-9]. You could increase the interval by one millisecond to 
include 5.



Regards,

Timo



 Weitergeleitete Nachricht 
Betreff:Re: Flink Watermark and timing
Datum:  Tue, 3 Oct 2017 06:37:13 +0200
Von:Björn Zachrisson <bjo...@gmail.com>
An: Timo Walther <twal...@apache.org>



Hi Timo,

One more question regarding that to clarify.
Where do i specify in which window a event that arrives on the exact 
window-borderline, window sizes [0-5] [5-10] and the event arrives at 
exactly 5

Where should the event go and can i control this?

Regards
Björn

2017-10-02 19:28 GMT+02:00 Timo Walther <twal...@apache.org 
<mailto:twal...@apache.org>>:


   Hi Björn,


   I don't know if I get your example correctly, but I think your
   explanation "All events up to and equal to watermark should be
   handled in the prevoius window" is not 100% correct. Watermarks just
   indicate the progress ("until here we have seen all events with
   lower timestamp than X") and trigger the evaluation of a window. The
   assignment of events to windows is based on the timestamp not the
   watermark. The documentation will be improved for the upcoming release:

   
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#window-assigners
   
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#window-assigners>

   "Time-based windows have a start timestamp (inclusive) and an end
   timestamp (exclusive) that together describe the size of the window. "

   I hope this helps.

   Regards,
   Timo


   Am 10/2/17 um 1:06 PM schrieb Björn Zachrisson:

Hi,

I have a question regarding timing of events.

According to;

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html#event-time-and-watermarks

<https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html#event-time-and-watermarks>

All events up to and equal to watermark should be handled in "the
prevoius window".

In my case I use event-timestamp.


I'm testing the timing out.

The case is events from 2000-01-01 02:00:00 and up to 2000-01-01
02:20:00 where eavh event is 2 minutes apart. I try to group the
events in 5 minute windows

2000-01-01 02:00:00 => 2000-01-01 02:05:00
2000-01-01 02:05:00 => 2000-01-01 02:10:00
2000-01-01 02:10:00 => 2000-01-01 02:15:00
2000-01-01 02:15:00 => 2000-01-01 02:20:00

How ever, events at the exakt time 02:10:00 (94669260) is put
in the Window "2000-01-01 02:10:00 => 2000-01-01 02:15:00" which
is not according to what i can read on the wiki.

This is the exakt result;
2000-01-01 02:00:00, 94669200
2000-01-01 02:02:00, 94669212
2000-01-01 02:04:00, 94669224

2000-01-01 02:06:00, 94669236
2000-01-01 02:08:00, 94669248

2000-01-01 02:10:00, 94669260
2000-01-01 02:12:00, 94669272
2000-01-01 02:14:00, 94669284

2000-01-01 02:16:00, 94669296
2000-01-01 02:18:00, 94669308

2000-01-01 02:20:00, 94669320

Is this due to that I'm using event time extractor or what might
be the case?

Regards
Björn







Re: Metrics to data connectors

2017-10-02 Thread Timo Walther

@Chesnay do you know more about it?


Am 10/1/17 um 4:45 PM schrieb Michael Fong:

Hi, all,


Is there any existing metrics implemented to any src/sink connectors? I am
thinking to add some metrics to C* connector that might help to give a
early sign of scalability problem (like back pressure) from data src /
sink. I do not yet find any documentation regarding to adding connector
specific metrics though. Thanks.

Regards,

Michael





Re: partial upgrade

2017-10-02 Thread Timo Walther

Hi Chen,

I think in a long-term perspective it makes sense to support things like 
this. The next big step is dynamic scaling without stopping the 
execution. Partial upgrades could be addressed afterwards, but I'm not 
aware of any plans.


Until then, I would recommend a different architecture by using 
connect() and stream in a new logic dynamically. This is especially 
interesting for ML models etc.


Regards,
Timo


Am 10/1/17 um 3:03 AM schrieb Chen Qin:

Hi there,

So far, flink job is interpreted and deployed during bootstrap phase. Once
pipeline runs, it's very hard to do partial upgrade without stop execution.
(like savepoint is heavy) Is there any plan to allow upload annotated jar
package which hints which stream tasks implementation CAN BE partial
upgraded after next checkpoint succeed without worry about backfill.


Thanks,
Chen





[DISCUSS] FLIP-24 - SQL Client

2017-12-19 Thread Timo Walther

Hey everyone,

in the past, the community already asked about having a way to write 
Flink jobs without extensive programming skills. During the last year we 
have put a lot of effort in the core of our Table & SQL API. Now it is 
time to improve the tooling around it as well and make Flink more 
accessible. I already opened an issue for adding a SQL CLI client [0]. 
We developed a small protoype at data Artisans that was shown at the 
last Flink Forward Berlin [1].


For Flink 1.5 it would be great to offer at least a CLI client to play 
around with Flink and use it for debugging/demo purposes. We created a 
FLIP-24 [2] that roughly sketches the functionality and architecture. We 
also show how this architecture can evolve from a CLI client to a SQL 
gateway/REST server. Most of it is still up for dicussion. The targeted 
minimum viable product for Flink 1.5 should act as a start for 
collecting feedback and attracting contributors.


So feedback on this FLIP is very welcome. What do you think?

Regards,
Timo


[0] https://issues.apache.org/jira/browse/FLINK-7594
[1] 
https://berlin-2017.flink-forward.org/kb_sessions/from-streams-to-tables-and-back-again-a-demo-of-flinks-table-sql-api/

[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client



Re: [VOTE] Release 1.4.0, release candidate #3

2017-12-11 Thread Timo Walther

+1 (binding)

- build the source locally
- run various table programs
- checked the resource consumption of table programs with retention 
enabled and disabled

- built a quickstart project
- tested the web ui submission (found 
https://issues.apache.org/jira/browse/FLINK-8187 but this is non-blocking)



Am 12/11/17 um 2:16 PM schrieb Chesnay Schepler:

+1 (binding)

- checked contents of flink-dist for unshaded dependencies
- ran python examples (with/-out arguments) locally
- ran jobs on yarn on a cluster testing optional hadoop dependency
- verified that quickstarts work
- checked JM/TM logs for anything suspicious

On 11.12.2017 11:29, Fabian Hueske wrote:

+1 (binding)

- Checked hashes & signatures
- Checked no binaries in source release
- Checked Flink version in Quickstart pom files

Cheers, Fabian

2017-12-11 11:26 GMT+01:00 Stefan Richter :


+1 (non-binding)

- did extensive cluster tests on Google Cloud with special focus on
checkpointing and recovery and Kafka 0.11 end-to-end exactly-once +
at-least-once.
- build from source.

Am 11.12.2017 um 09:53 schrieb Piotr Nowojski 
:


Hi,

+1 (non-binding)

I have:
- verified Scala and Java sample projects are creating and working

properly and that Quickstart docs are ok
- verified that ChildFirstClassloader allows user to run his 
application

with some custom akka version

- tested Kafka 0.11 end to end exactly once
- did some manual checks whether docs/distribution files are ok

Piotrek


On 8 Dec 2017, at 16:49, Stephan Ewen  wrote:

@Eron Given that this is actually an undocumented "internal" 
feature at
this point, I would not expect that it is used heavily beyond 
Pravega.


Unless you feel strongly that this is a major issue, I would go ahead

with

the release...

On Fri, Dec 8, 2017 at 3:18 PM, Aljoscha Krettek 


wrote:


Thanks for the update! I would also say it's not a blocker but we

should

make sure that we don't break this after 1.4, then.


On 7. Dec 2017, at 22:37, Eron Wright  wrote:

Just discovered:  the removal of Flink's Future (FLINK-7252) 
causes a

breaking change in connectors that use
`org.apache.flink.runtime.checkpoint.MasterTriggerRestoreHook`,

because

`Future` is a type on one of the methods.

To my knowledge, this affects only the Pravega connector.  
Curious to

know

whether any other connectors are affected.  I don't think we (Dell

EMC)

consider it a blocker but it will mean that the connector is Flink

1.4+.

Eron


On Thu, Dec 7, 2017 at 12:25 PM, Aljoscha Krettek <

aljos...@apache.org>

wrote:


I just noticed that I did a copy-and-paste error and the last

paragraph

about voting period should be this:

The vote will be open for at least 72 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

Best,
Aljoscha

On 7. Dec 2017, at 19:24, Bowen Li  
wrote:


I agree that it shouldn't block the release. The doc website 
part is

even

better!

On Thu, Dec 7, 2017 at 1:09 AM, Aljoscha Krettek <

aljos...@apache.org>

wrote:


Good catch, yes. This shouldn't block the release, though, since

the

doc
is always built form the latest state of a release branch, 
i.e. the

1.4

doc

on the website will update as soon as the doc on the release-1.4

branch

is

updated.


On 6. Dec 2017, at 20:47, Bowen Li 

wrote:

Hi Aljoscha,

I found Flink's State doc and javaDoc are very ambiguous on 
what

the

replacement of FoldingState is, which will confuse a lot of

users. We

need

to fix it in 1.4 release.

I have submitted a PR at https://github.com/apache/

flink/pull/5129

Thanks,
Bowen


On Wed, Dec 6, 2017 at 5:56 AM, Aljoscha Krettek <

aljos...@apache.org>

wrote:


Hi everyone,

Please review and vote on release candidate #3 for the version

1.4.0,

as

follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific

comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience

releases

to

be

deployed to dist.apache.org[2], which are signed with the key

with

fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
* all artifacts to be deployed to the Maven Central Repository

[4],

* source code tag "release-1.4.0-rc1" [5],
* website pull request listing the new release [6].

Please have a careful look at the website PR because I changed

some

wording and we're now also releasing a binary without Hadoop

dependencies.

Please use this document for coordinating testing efforts: [7]

The only change between RC1 and this RC2 is that the source

release

package does not include the erroneously included binary Ruby

dependencies

of the documentation anymore. Because of this I would like to

propose

a
shorter voting time and close the vote around the time that 
RC1

would

Re: Using SQL with dynamic tables where rows are updated

2017-12-20 Thread Timo Walther

Hi Ghassan,

in your example the result 3.5 is correct. The query is executed with 
standard SQL semantics. You only group by ProductID and since it is the 
same for all elements, the average is 3.5.


The second "review-3" does not replace anything. In general, the 
replacement would happen in the TableSink. The dynamic table performs 
view maintenance. The TableSink materializes the result to some 
key-value store or database.


It might be worth to look into TableSinks [0] and the JavaDocs of the 
mentioned classes.


Feel free to ask further questions if necessary.

Regards,
Timo

[0] 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sourceSinks.html#define-a-tablesink




Am 12/19/17 um 9:08 PM schrieb Ghassan Yammine:

Hello,

I’m knew to Flink and I need some help.  I’d like to use the SQL API for 
processing an incoming stream that has the following characteristics:

   *   Each stream record has a key
   *   The record can be updated
   *   The record is of the form: reviewId -> (productId, rating)

For the above stream, I want to compute the average rating for each product ID. 
 The key is the reviewId
With the SQL API, I get incorrect results.  However, I’ve been able to make it 
work through the use of RichFlatMapFunction and the Datastream API.

Below is the entire code listing, which does not work.  I know I’m missing the 
definition/use of a primary key so that an update on the same key can occur.
However, I’m not sure how to go about doing this.  Any help/comments are 
welcome.

Thank you,

Ghassan


package com.bazaarvoice.flink_poc

import com.bazaarvoice.flink_poc.flink_poc.{ProductId, ReviewId}
import org.apache.flink.api.common.time.Time
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala.{DataStream, createTypeInformation, 
_}
import org.apache.flink.table.api.{StreamQueryConfig, TableEnvironment}
import org.apache.flink.table.api.scala._

package object flink_poc{
   type ProductId = String
   type ReviewId = String
}

case class SummaryReview(reviewId: ReviewId, productId: ProductId, approved: 
Boolean, rating: Double) extends Serializable {
   override def toString: String = {
 s"$reviewId,  $productId,  ${if (approved) "APPROVED" else "REJECTED"},  
$rating"
   }
}

object AverageRatingWithSQL {

   def main(args: Array[String]) {

 val events = List(
   SummaryReview("review-1", "product-100", approved = true, 1),
   SummaryReview("review-2", "product-100", approved = true, 4),
   SummaryReview("review-3", "product-100", approved = true, 6),
   SummaryReview("review-3", "product-100", approved = true, 3)// <-- 
this should override the previous record
 ).toSeq
 // Average rating should be equal to (1+4+3)/3 = 2.67

 val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)

 val inputStream: DataStream[SummaryReview] = env.fromCollection(events)

 val tableEnv = TableEnvironment.getTableEnvironment(env)

 tableEnv.registerDataStream("Reviews", inputStream, 'ReviewID, 'ProductID, 
'Approved, 'Rating)

 val resultTable = tableEnv.sql(
   "SELECT ProductID, AVG(Rating) FROM Reviews WHERE Approved = true GROUP BY 
ProductID"
 )

 val typeInfo = createTypeInformation[(ProductId, Double)]
 val outStream = resultTable.toRetractStream(typeInfo)

 outStream.print()

 env.execute("Flink SQL Average rating")

   }
}








Re: Support more intelligent function lookup in FunctionCatalog for UDF

2018-05-14 Thread Timo Walther

Hi Rong,

yes I think we can improve the type infererence at this point. Input 
parameter type inference can be more tolerant but return types should be 
as exact as possible.


The change should only touch ScalarSqlFunction and 
UserDefinedFunctionUtils#createEvalOperandTypeInference, right?


Regards,
Timo


Am 14.05.18 um 11:52 schrieb Fabian Hueske:

Hi Rong,

I didn't look into the details of the example that you provided, but I 
think if we can improve the internal type resolution of scalar UDFs we 
should definitely go for it.
There is quite a bit of information available such as the signatures 
of the eval() methods but also the argument types provided by 
Calcite's analyzer.

Not sure if we leverage all that information to the full extend.
The ScalarFunction interface also provides methods to override some of 
the type extraction behavior.


@Timo, what do you think?

Best,
Fabian




2018-05-04 20:09 GMT+02:00 Rong Rong >:


Hi,

We have been looking into more intelligent UDF supports such as
creating a
better type inference module to infer automatically composite data
types[1].

One most comment pain point we have are some use cases where users
would
like to re-use a rather generic UDF, for example:

public List eval(Map myMap) {

  return new ArrayList<>(myMap.keySet());
>
}
>

In this case, since we are only interested in the key sets of the map,
value type cannot be easily resolved or overrided using concrete
types.
Eventually we end up overriding the exact same function with
multiple case
classes, so that each one uses a different ValueTypeInfo.

This is rather inefficient in terms of user development cycle. I was
wondering if there's a better way in FunctionCatalog lookup to
match a UDF
in context.

Best,
Rong

[1] https://issues.apache.org/jira/browse/FLINK-9294







Re: Support more intelligent function lookup in FunctionCatalog for UDF

2018-05-15 Thread Timo Walther
I added some comments to your documents. I think we should work on these 
limitations step by step. A first step could be to support Map<String, 
?> by considering only the raw types. Another step would be to allow 
eval(Object) as a wild card for operands.


Regards,
Timo


Am 14.05.18 um 18:23 schrieb Rong Rong:

Thanks for the reply Timo / Fabian,

Yes that's what I had in mind. ParameterType can be vague but return type
has to be exact.
I can image that: depending on the input parameter type, the output type
can be different. But I cannot think of a concrete use cases as of now.

I actually created a doc [1] regarding the use cases we currently have, and
some very preliminary solution possibilities.

Please kindly take a look when you have time, any comments and suggestions
are highly appreciated.

--
Rong

[1]
https://docs.google.com/document/d/1zKSY1z0lvtQdfOgwcLnCMSRHew3weeJ6QfQjSD0zWas/edit?usp=sharing

On Mon, May 14, 2018 at 4:36 AM, Timo Walther <twal...@apache.org> wrote:


Hi Rong,

yes I think we can improve the type infererence at this point. Input
parameter type inference can be more tolerant but return types should be as
exact as possible.

The change should only touch ScalarSqlFunction and
UserDefinedFunctionUtils#createEvalOperandTypeInference, right?

Regards,
Timo


Am 14.05.18 um 11:52 schrieb Fabian Hueske:


Hi Rong,

I didn't look into the details of the example that you provided, but I
think if we can improve the internal type resolution of scalar UDFs we
should definitely go for it.
There is quite a bit of information available such as the signatures of
the eval() methods but also the argument types provided by Calcite's
analyzer.
Not sure if we leverage all that information to the full extend.
The ScalarFunction interface also provides methods to override some of
the type extraction behavior.

@Timo, what do you think?

Best,
Fabian




2018-05-04 20:09 GMT+02:00 Rong Rong <walter...@gmail.com >:

 Hi,

 We have been looking into more intelligent UDF supports such as
 creating a
 better type inference module to infer automatically composite data
 types[1].

 One most comment pain point we have are some use cases where users
 would
 like to re-use a rather generic UDF, for example:

 public List eval(Map<String, ?> myMap) {

   return new ArrayList<>(myMap.keySet());
 >
 }
 >

 In this case, since we are only interested in the key sets of the map,
 value type cannot be easily resolved or overrided using concrete
 types.
 Eventually we end up overriding the exact same function with
 multiple case
 classes, so that each one uses a different ValueTypeInfo.

 This is rather inefficient in terms of user development cycle. I was
 wondering if there's a better way in FunctionCatalog lookup to
 match a UDF
 in context.

 Best,
 Rong

 [1] https://issues.apache.org/jira/browse/FLINK-9294
 <https://issues.apache.org/jira/browse/FLINK-9294>







Re: [DISCUSS] GitBox

2018-05-17 Thread Timo Walther

+1 for using Gitbox

Timo

Am 16.05.18 um 17:43 schrieb Kenneth Knowles:

Actually, GitHub has a feature so you do not require picture-perfect
commits:
https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/

If the owner of the PR checks the box, it will give committers write access
to their branch (on their fork). A nice bonus is you can make the changes
and then continue the review, too.

Kenn

On Wed, May 16, 2018 at 8:31 AM Stefan Richter 
wrote:


+1


Am 16.05.2018 um 12:40 schrieb Chesnay Schepler :

Hello,

during the discussion about how to better manage pull requests [1] the

topic of GitBox integration came up again.

This seems like a good opportunity to restart this discussion that we

had about a year ago [2].

* What is GitBox

   Essentially, GitBox allow us to use GitHub features.
   We can decide for ourselves which features we want enabled.

   We could merge PRs directly on GitHub at the button of a click.
   That said the merge functionality is fairly limited and would
   require picture-perfect commits in the pull requests.
   Commits can be squashed, but you cannot amend commits in any way, be
   it fixing typos or changing the commit message. Realistically this
   limits how much we can use this feature, and it may lead to a
   decline in the quality of commit messages.

   Labels can be useful for the management of PRs as (ready for review,
   delayed for next release, waiting for changes). This is really what
   I'm personally most interested in.

   We've been using GitBox for flink-shaded for a while now and i
   didn't run into any issue. AFAIK GitBox is also the default for new
   projects.

* What this means for committers:

   The apache git remote URL will change, which will require all
   committers to update their git setup.
   This also implies that we may have to update the website build scripts.
   The new URL would (probably) be
   /https://gitbox.apache.org/repos/asf/flink.git/.

   To make use of GitHub features you have to link your GitHub and
   Apache accounts. [3]
   This also requires setting up two-factor authentication on GitHub.

   Update the scm entry in the parent pom.xml.

* What this means for contributors:

   Nothing should change for contributors. Small changes (like typos)
   may be merged more quickly, if the commit message is appropriate, as
   it could be done directly through GitHub.

[1]

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Closing-automatically-inactive-pull-requests-tt22248.html

[2]

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-GitBox-td18027.html

[3] https://gitbox.apache.org/setup/






Re: 回复: Rewriting a new file instead of writing a ".valid-length" file inBucketSink when restoring

2018-05-15 Thread Timo Walther
As far as I know, the bucketing sink is currenlty also limited by 
relying on Hadoops file system abstraction. It is planned to switch to 
Flink's file system abstraction which might also improve this situation. 
Kostas (in CC) might know more about it.


But I think we can discuss if an other behavior should be configurable 
as well. Would you be willing to contribute?


Regards,
Timo


Am 15.05.18 um 14:01 schrieb Xinyu Zhang:

Thanks for your reply.
Indeed, if a file is very large, it will take a long time. However, 
the the ??.valid-length?? file is not?0?2convenient for others who use the 
data in HDFS.
Maybe we should provide a configuration for users to choose which 
strategy they prefer.

Do you have any ideas?


--?0?2?0?2--
*??:*?0?2"Timo Walther"<twal...@apache.org>;
*:*?0?22018??5??15??(??) 7:30
*??:*?0?2"dev"<dev@flink.apache.org>;
*:*?0?2Re: Rewriting a new file instead of writing a ".valid-length" 
file inBucketSink when restoring


I guess writing a new file would take much longer than just using the
.valid-length file, especially if the files are very large. The
restoring time should be as minimal as possible to ensure little
downtime on restarts.

Regards,
Timo


Am 15.05.18 um 09:31 schrieb Gary Yao:
> Hi,
>
> The BucketingSink truncates the file if the Hadoop FileSystem 
supports this
> operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you 
using?

>
> Best,
> Gary
>
> [1]
> 
https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301

>
> On Mon, May 14, 2018 at 1:37 PM, ?? <342689...@qq.com> wrote:
>
>> Hi
>>
>>
>> I'm trying to copy data from kafka to HDFS . The data in HDFS is 
used to

>> do other computations by others in map/reduce.
>> If some tasks failed, the ".valid-length" file is created for the low
>> version hadoop. The problem is other people must know how to deal 
with the

>> ".valid-length" file, otherwise, the data may be not exactly-once.
>> Hence, why not rewrite a new file when restoring instead of writing a
>> ".valid-length" file. In this way, others who use the data in HDFS 
don't

>> need to know how to deal with the ".valid-length" file.
>>
>>
>> Thanks!
>>
>>
>> Zhang Xinyu





Re: Rewriting a new file instead of writing a ".valid-length" file in BucketSink when restoring

2018-05-15 Thread Timo Walther
I guess writing a new file would take much longer than just using the 
.valid-length file, especially if the files are very large. The 
restoring time should be as minimal as possible to ensure little 
downtime on restarts.


Regards,
Timo


Am 15.05.18 um 09:31 schrieb Gary Yao:

Hi,

The BucketingSink truncates the file if the Hadoop FileSystem supports this
operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you using?

Best,
Gary

[1]
https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301

On Mon, May 14, 2018 at 1:37 PM, 张馨予 <342689...@qq.com> wrote:


Hi


I'm trying to copy data from kafka to HDFS . The data in HDFS is used to
do other computations by others in map/reduce.
If some tasks failed, the ".valid-length" file is created for the low
version hadoop. The problem is other people must know how to deal with the
".valid-length" file, otherwise, the data may be not exactly-once.
Hence, why not rewrite a new file when restoring instead of writing a
".valid-length" file. In this way, others who use the data in HDFS don't
need to know how to deal with the ".valid-length" file.


Thanks!


Zhang Xinyu





Re: [DISCUSS] Drop "canEqual" from TypeInformation, TypeSerializer, etc.

2018-05-16 Thread Timo Walther

+1

TypeInformation has too many methods that need to be implemented but 
provide little benefit for Flink.


Am 16.05.18 um 10:55 schrieb Ted Yu:

+1 from me as well.

I checked a few serializer classes. The `equals` method on serializers
contains the logic of `canEqual` method whose existence seems redundant.

On Wed, May 16, 2018 at 1:49 AM, Tzu-Li (Gordon) Tai 
wrote:


+1.

Looking at the implementations of the `canEqual` method in several
serializers, it seems like all that is done is a check whether the object
is of the same serializer class.
We’ll have to be careful and double check all `equals` method on
serializers that may have relied on the `canEqual` method to perform the
preliminary type check.
Otherwise, this sounds good.

On 16 May 2018 at 4:35:47 PM, Stephan Ewen (se...@apache.org) wrote:

Hi all!

As part of an attempt to simplify some code in the TypeInfo and
TypeSerializer area, I would like to drop the "canEqual" methods for the
following reason:

"canEqual()" is necessary to make proper equality checks across
hierarchies
of types. This is for example useful in a collection API, stating for
example whether a List can be equal to a Collection if they have the same
contents. We don't have that here.

A certain type information (and serializer) is equal to another one if
they
describe the same type, strictly. There is no necessity for cross
hierarchy
checks.

This has also let to the situation that most type infos and serializers
implement just a dummy/default version of "canEqual". Many "equals()"
methods do not even call the other object's "canEqual", etc.

As a first step, we could simply deprecate the method and implement an
empty default, and remove all calls to that method.

Best,
Stephan





Re: [VOTE] Enable GitBox integration (#2)

2018-05-22 Thread Timo Walther

+1

Am 22.05.18 um 10:49 schrieb Ted Yu:

+1
 Original message From: Chesnay Schepler  
Date: 5/22/18  1:12 AM  (GMT-08:00) To: dev@flink.apache.org Subject: [VOTE] Enable 
GitBox integration (#2)
Hello,

since no concerns have been raised in the discussion about enabling
GitBox [1] I'm opening this vote to make things official.

Please vote on enabling GitBox integration for the flink and flink-web
repositories as follows:

[ ] +1, Approve GitBox
[ ] -1, Do not approve GitBox (please provide specific comments)

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

This is the second attempt for this vote. The previous vote was
cancelled due to the flink-web repository settings. Changes to the
previous vote are highlighted in bold.

If accepted I will file a ticket with INFRA to enable GitBox with the
following settings:

flink:

    * no wiki
    * no issues
    * no projects
    * no merge commit button (rebase/squash merge button still enabled)
    * protected branches [2]: master, release-1.[0-5] (the latter we will
  have to update with each release)

flink-web

    * no wiki
    **no issues*
    * no projects
    * no merge commit button (rebase/squash merge button still enabled)
    * protected branch: asf-site
    * default branch: asf-site (while we're at it...)

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-GitBox-td22328.html
[2] https://help.github.com/articles/about-protected-branches/






Re: [VOTE] Release 1.5.0, release candidate #5

2018-05-24 Thread Timo Walther
I tried to build the relase locally but 3 tests failed. 2 are related to 
the evironment in which the tests are executed (see [1]) and the other 
should have been caught by Travis which did not happen (see last 
comments in [2]).


[1] https://issues.apache.org/jira/browse/FLINK-9424
[2] https://issues.apache.org/jira/browse/FLINK-9234

Timo

Am 23.05.18 um 17:33 schrieb Till Rohrmann:

-1

Piotr just found a race condition between the TM registration at the RM and
slot requests coming from the SlotManager/RM [1]. This is a release blocker
since it affects all deployments. Consequently I have to cancel this RC :-(

[1] https://issues.apache.org/jira/browse/FLINK-9427

On Wed, May 23, 2018 at 5:15 PM, Gary Yao  wrote:


+1 (non-binding)

I have run all examples (batch & streaming), and only found a non-blocking
issue
with TPCHQuery3 [1] which has been introduced a year ago.

I have also deployed a cluster with HA enabled on YARN (Hadoop 2.8.3)
without
problems.

[1] https://issues.apache.org/jira/browse/FLINK-9399

On Wed, May 23, 2018 at 4:50 PM, Ted Yu  wrote:


+1

Checked signatures
Ran test suite

Due to FLINK-9340 and FLINK-9091, I had to run tests in multiple rounds.

Cheers

On Wed, May 23, 2018 at 7:39 AM, Fabian Hueske 

wrote:

+1 (binding)

- checked hashes and signatures
- checked source archive and didn't find unexpected binary files
- built from source archive skipping the tests (mvn -DskipTests clean
install), started a local cluster, and ran an example program.

Thanks,
Fabian



2018-05-23 15:39 GMT+02:00 Till Rohrmann :


Fabian pointed me to the updated ASF release policy [1] and the

changes

it

implies for the checksum files. New releases should no longer

provide a

MD5

checksum file and the sha checksum file should have a proper file

name

extension `sha512` instead of `sha`. I've updated the release

artifacts

[2]

accordingly.

[1] https://www.apache.org/dev/release-distribution.html
[2] http://people.apache.org/~trohrmann/flink-1.5.0-rc5/

Cheers,
Till

On Wed, May 23, 2018 at 11:18 AM, Piotr Nowojski <

pi...@data-artisans.com>

wrote:


+1 from me.

Additionally for this RC5 I did some manual tests to double check

backward

compatibility of the bug fix:
https://issues.apache.org/jira/browse/FLINK-9295 <
https://issues.apache.org/jira/browse/FLINK-9295>

The issue with this bug fix was that it was merged after 1.5.0 RC4

but

just before 1.5.0 RC5, so it missed release testing.

Piotrek


On 23 May 2018, at 09:08, Till Rohrmann 

wrote:

Thanks for the pointer Sihua, I've properly closed FLINK-9070.

On Wed, May 23, 2018 at 4:49 AM, sihua zhou 
 wrote:

Hi everyone,

Please review and vote on the release candidate #5 for the

version

1.5.0,

as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific

comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience

releases

to

be

deployed to dist.apache.org [2], which are signed with the key

with

fingerprint 1F302569A96CFFD5 [3],
* all artifacts to be deployed to the Maven Central Repository

[4],

* source code tag "release-1.5.0-rc5" [5],

Please use this document for coordinating testing efforts: [6]

Since the newly included fixes affect only individual components

and

are

covered by tests, I will shorten the voting period until

tomorrow

5:30pm

CET. It is adopted by majority approval, with at least 3 PMC

affirmative

votes.

Thanks,
Your friendly Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12315522=12341764
[2] http://people.apache.org/~trohrmann/flink-1.5.0-rc5/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/
orgapacheflink-1160/
[5]
https://git-wip-us.apache.org/repos/asf?p=flink.git;a=commit;h=
841bfe4cceecc9cd6ad3d568173fdc0149a5dc9b
[6]
https://docs.google.com/document/d/

1C1WwUphQj597jExWAXFUVkLH9Bi7-

ir6drW9BgB8Ezo/edit?usp=sharing

Pro-tip: you can create a settings.xml file with these contents:



flink-1.5.0



flink-1.5.0


flink-1.5.0


https://repository.apache.org/content/repositories/

orgapacheflink-1160/




archetype


https://repository.apache.org/content/repositories/

orgapacheflink-1160/








And reference that in you maven commands via --settings
path/to/settings.xml. This is 

Re: [VOTE] Release 1.5.0, release candidate #6

2018-05-25 Thread Timo Walther

+1

- I build the release locally with the minor issues mentioned in the 
last thread

- I executed a couple of table programs on a local cluster
- Ran some end-to-end tests locally

@Piotr: Given the amount of changes that went into this release it is 
natural to find a lot of bugs. We have also increased our testing 
efforts (e.g. by having more automated tests) this time.


Timo

Am 25.05.18 um 14:12 schrieb Aljoscha Krettek:

+1

  - verified signature and hashes
  - ran an example on a kerberized Hadoop 2.8.3 cluster, both with kinit and by 
using keytab


On 25. May 2018, at 11:56, Piotr Nowojski  wrote:

Mild +1 from me. I’m concerned about the rate of bugs that we are discovering 
for each RC, which suggests that there are still some release blockers out 
there (but I’m not aware of any right now).

Piotrek


On 25 May 2018, at 11:25, Till Rohrmann  wrote:

+1

- Checked checksums and GPG files
- Verified that source archives do not contain any binaries
- Built Flink with Hadoop 2.8.1 and Scala 2.11.7 from source release
- Verified LICENSE and NOTICE file: The LICENSE file contains unnecessary
entries for jline-reader and jline-terminal
- Checked licenses of newly added dependencies
- Checked pom files
- Read README.md
- Run manual tests in flink-tests: CheckForbiddenMethodsUsage fails
because org.apache.flink.queryablestate.messages.KvStateRequest.serialize()
uses getBytes without charset. This is non-blocking because it was
introduced with 1.4 or before
- Checked builds with SBT and the SBT quickstarts
- Executed the run-nightly-tests.sh for 10 hours without failures
- Executed Jepsen tests without failures

There is a known problem when enabling SSL encryption which can lead to
failures [1]. Since there is a workaround for this problem, I would propose
to not block the release on it and fix it with Flink 1.5.1.

[1] https://issues.apache.org/jira/browse/FLINK-9437

Cheers,
Till


On Thu, May 24, 2018 at 6:51 PM, Till Rohrmann  wrote:


Hi everyone,

Please review and vote on the release candidate #6 for the version 1.5.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 1F302569A96CFFD5 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.0-rc6" [5],
* PR to update the community web site [6]

Please use this document for coordinating testing efforts: [7]

The voting periods ends tomorrow at 5pm CET. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Your friendly Release Manager

[1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12315522=12341764
[2] http://people.apache.org/~trohrmann/flink-1.5.0-rc6/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/
orgapacheflink-1161/
[5] https://git-wip-us.apache.org/repos/asf?p=flink.git;a=commit;h=
c61b108b8eaa22eac3dc492b3c95c22b4177003f
[6] https://github.com/apache/flink-web/pull/106
[7] https://docs.google.com/document/d/10KDBLzt25Z44pdZBSm8MTeKldc4UT
UAbAynfuVQWOt0/edit?usp=sharing

Pro-tip: you can create a settings.xml file with these contents:



flink-1.5.0



   flink-1.5.0
   
 
   flink-1.5.0
   
   https://repository.apache.org/content/repositories/
orgapacheflink-1161/
   
 
 
   archetype
   
   https://repository.apache.org/content/repositories/
orgapacheflink-1161/
   
 
   




And reference that in you maven commands via --settings
path/to/settings.xml. This is useful for creating a quickstart based on the
staged release and for building against the staged jars.






  1   2   3   4   5   6   7   8   9   10   >