Apache Flink 0.9 ALS API

2015-06-12 Thread Ronny Bräunlich
Hello everybody, for a university project we use the current implementation of ALS in Flink 0.9 and we were wondering about the API of predict() and fit() requiring a DataSet[(Int, Int)] or DataSet[(Int, Int, Double]) respectively, because the range of Int is quite limited. That is why we wante

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
I'm with Till on this. Robert's position is valid as well. Again, there is no core disagreement here. No one wants to add it to dist. On 12 Jun 2015, at 00:40, Ufuk Celebi > wrote: On 11 Jun 2015, at 20:04, Fabian Hueske > wrote: How about the following issues? 1. The Hbase Hadoop Compat issue

Re: The correct location for zipWithIndex and zipWithUniqueId

2015-06-12 Thread Till Rohrmann
+1 for linking from DataSet's transformations. On Jun 12, 2015 5:27 PM, "Fabian Hueske" wrote: > Linking from the DataSet Transformations page would be good, IMO. > > 2015-06-12 17:11 GMT+02:00 Andra Lungu : > > > Thanks for the replies! > > > > I will add the two methods in a DataSetUtils separa

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I agree mostly with Robert. However, one could also argue that by not including the libraries in the dist package, the user code jar will also be blown up by the dependencies added by the library. This will slow down job submission, because it has to be distributed on the cluster. Furthermore, I wo

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Robert Metzger
Regarding the discussion with including ML, Gelly, streaming connectors into "flink-dist". I'm strongly against adding those into our jar because they blow up the dependencies we are shipping by default. Also, the maven archetype sets up everything so that the dependencies are packaged into the us

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I've finished the legal check of the source and binary distribution. The PR with the LICENSE and NOTICE file updates can be found here [1]. What I haven't done yet is addressing the issue with the shaded dependencies. I think that we have to add to all jars which contain dependencies as binary dat

Listing Apache-2.0 dependencies in LICENSE file

2015-06-12 Thread Till Rohrmann
Hi guys, I just updated our LICENSE of the binary distribution and noticed that we also list dependencies which are licensed under Apache-2.0. As far as I understand the ASF guidelines [1], this is not strictly necessary. Since it is a lot of work to keep the list up to date, I was wondering wheth

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Maximilian Michels
I almost finished creating the new release candidate. Then the maven deploy command failed on me for the hadoop1 profile: [INFO] [INFO] BUILD FAILURE [INFO]

Re: The correct location for zipWithIndex and zipWithUniqueId

2015-06-12 Thread Fabian Hueske
Linking from the DataSet Transformations page would be good, IMO. 2015-06-12 17:11 GMT+02:00 Andra Lungu : > Thanks for the replies! > > I will add the two methods in a DataSetUtils separate class. Where would > you put the documentation for this? I think users should be able to easily > access i

Re: The correct location for zipWithIndex and zipWithUniqueId

2015-06-12 Thread Andra Lungu
Thanks for the replies! I will add the two methods in a DataSetUtils separate class. Where would you put the documentation for this? I think users should be able to easily access it. This means that it, IMO, it shouldn't go in a separate zip page, but rather in the programming guide. Or there coul

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
OK, guys. I merged and pushed the last outstanding commits to the release-0.9 branch. Good to go for a new candidate. 2015-06-12 14:30 GMT+02:00 Maximilian Michels : > +1 Let's constitute the changes in a new release candidate. > > On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske wrote: > > > +1 f

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Maximilian Michels
+1 Let's constitute the changes in a new release candidate. On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske wrote: > +1 for b) > > I'm organizing + merging the commits that need to go the new candidate > right now. Will let you know, when I am done. > > 2015-06-12 14:03 GMT+02:00 Till Rohrmann :

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
+1 for b) I'm organizing + merging the commits that need to go the new candidate right now. Will let you know, when I am done. 2015-06-12 14:03 GMT+02:00 Till Rohrmann : > I'm in favour of option b) as well. > > On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi wrote: > > > Yes, the LICENSE files ar

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi wrote: > Yes, the LICENSE files are definitely a release blocker. > > a) Either we wait with the RC until we have fixed the LICENSES, or > > b) Put out next RC to continue with testing and then update it with the >

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
Yes, the LICENSE files are definitely a release blocker. a) Either we wait with the RC until we have fixed the LICENSES, or b) Put out next RC to continue with testing and then update it with the LICENSE [either we find something before the LICENSE update or we only have to review the LICENSE cha

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi wrote: > @Max: for the new RC. Can you make sure to set the variables correctly > with regard to stable/snapshot versions in the docs?

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
@Max: for the new RC. Can you make sure to set the variables correctly with regard to stable/snapshot versions in the docs?

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
On 12 Jun 2015, at 00:40, Ufuk Celebi wrote: > > On 11 Jun 2015, at 20:04, Fabian Hueske wrote: > >> How about the following issues? >> >> 1. The Hbase Hadoop Compat issue, Ufuk is working on > > I was not able to reproduce this :( I ran HadoopInputFormats against various > sources and con

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
After thinking about it a bit more, I think that's fine. +1 to document and keep it as it is.

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Maximilian Michels
Just to clarify. If you write a Flink program and include the Table API as a dependency, then you have to package your program in the JAR with the Table API and submit it to the cluster. IMHO that's ok but it should be documented to inform users which libraries are included in Flink binaries out of

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
On 12 Jun 2015, at 10:44, Till Rohrmann wrote: > Yes you're right Ufuk. At the moment the user has to place the jars in the > lib folder of Flink. If this folder is not shared then he has to do it for > every node on which Flink runs. OK. I guess there is a nice way to do this with YARN as well

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
Yes you're right Ufuk. At the moment the user has to place the jars in the lib folder of Flink. If this folder is not shared then he has to do it for every node on which Flink runs. On Fri, Jun 12, 2015 at 10:42 AM Till Rohrmann wrote: > I think I found a real release blocker. Currently we don't

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I think I found a real release blocker. Currently we don't add license files to our shaded jars. For example the flink-shaded-include-yarn-0.9.0-milestone-1.jar shades hadoop code. This code also includes the `org.apache.util.bloom.*` classes. These classes are licensed under The European Commissi

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
On 12 Jun 2015, at 10:29, Till Rohrmann wrote: > Well I think the initial idea was to keep the dist jar as small a possible > and therefore we did not include the libraries. I'm not sure whether we can > decide this here ad-hoc. If the community says that we shall include these > libraries then

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
Well I think the initial idea was to keep the dist jar as small a possible and therefore we did not include the libraries. I'm not sure whether we can decide this here ad-hoc. If the community says that we shall include these libraries then I can add them. But bear in mind that all of them have som

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Márton Balassi
As for outstanding issues I think streaming is good to go as far as I know. I am personally against including all libraries - at least speaking for the streaming connectors. Robert, Stephan and myself had a detailed discussion on that some time ago and the disadvantage of having all the libraries i

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
I have another fix, but this is just a documentation update (FLINK-2207) and will be done soon. 2015-06-12 10:02 GMT+02:00 Maximilian Michels : > We should have a nightly cluster test for every library. Let's keep that in > mind for the future. Very nice find, Till! > > Since there were not objec

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Maximilian Michels
We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candida

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
On 12 Jun 2015, at 09:45, Till Rohrmann wrote: > Hi guys, > > I just noticed while testing the TableAPI on the cluster that it is not > part of the dist module. Therefore, programs using the TableAPI will only > run when you put the TableAPI jar directly on the cluster or if you build a > fat j

[jira] [Created] (FLINK-2209) Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster

2015-06-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2209: Summary: Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster Key: FLINK-2209 URL: https://issues.apache.org/jira/browse/FLINK-2209 Proje

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Márton Balassi
@Till: This also apples to the streaming connectors. On Fri, Jun 12, 2015 at 9:45 AM, Till Rohrmann wrote: > Hi guys, > > I just noticed while testing the TableAPI on the cluster that it is not > part of the dist module. Therefore, programs using the TableAPI will only > run when you put the Tab

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Fur

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I'm currently going through the license file and I discovered some skeletons in our closet. This has to be merged as well. But I'm still working on it (we have a lot of dependencies). Cheers, Till On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi wrote: > > On 12 Jun 2015, at 00:49, Fabian Hueske w