Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Ufuk Celebi
On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen se...@apache.org wrote:

 It would be good to have the patch, but it is also a very tricky patch, so
 pushing it hastily may be problematic.


I agree. @Till: would that be OK with you? If yes, I think we are good to
go for the next RC.


Re: Flink @ Heise

2015-01-12 Thread Robert Metzger
Hey Andreas,

thanks for sharing. Due to the press announcement by the ASF today, there
is quite some attention in the news.

For all non-germans: Heise is one of the biggest IT-news sites here.

On Mon, Jan 12, 2015 at 3:10 PM, Andreas Kunft andreas.ku...@tu-berlin.de
wrote:

 FYI

 http://www.heise.de/newsticker/meldung/Big-Data-
 Apache-Flink-wird-Top-Level-Projekt-2516177.html



Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Stephan Ewen
It would be good to have the patch, but it is also a very tricky patch, so
pushing it hastily may be problematic.

We could put out 0.8 and very soon a 0.8.1 with that fix.
Am 12.01.2015 11:12 schrieb Till Rohrmann trohrm...@apache.org:

 I only have to fix a last dead lock and then the fix should be working.

 On Mon, Jan 12, 2015 at 11:06 AM, Ufuk Celebi u...@apache.org wrote:

  Thanks, Marton. Yes, I think
  https://issues.apache.org/jira/browse/FLINK-1376.
 
  Till has a fix coming up for it. Should we wait for this or postpone it
 to
  0.8.1?
 
  – Ufuk
 
  On 12 Jan 2015, at 10:16, Márton Balassi balassi.mar...@gmail.com
 wrote:
 
   Based on the requests listed in this thread Stephan and myself
   cherry-picked the following commits:
  
   7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
   (rmetzger)
   40a2705 [FLINK-1266] Update mongodb link and address pull request
  comments
   (rmetzger)
   cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
   initializing … (rmetzger)
   3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
   e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
   7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
   e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
   available… (rmetzger)
  
   4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type
  parameters
   (aljoscha)
   fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer
  (aljoscha)
   ada35eb [FLINK-1378] [scala] Add support for Try[A]
   (Success/Failure) (aljoscha)
  
   a64abe1 [docs] Update README and internals (scheduling) for graduation
  and
   fi… (SephanEwen)
   4f95ce7 Fix typo in README.md (uce)
  
   Both of us committed some other fixes mainly for the dependencies and
   licences, the distribution and the quickstarts. Pushing the results as
  soon
   as the tests pass.
  
   Does anyone have any other issue that has to be included in the next
   release candidate?
  
  
   On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger rmetz...@apache.org
  wrote:
  
   I've tested it on an empty YARN cluster, allocating more containers
 than
   available.
   Flink will then allocate as many containers as possible.
  
   On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen se...@apache.org
 wrote:
  
   Seems reasonable. Have you tested it on a cluster with concurrent
 YARN
   jobs?
  
   On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger rmetz...@apache.org
 
   wrote:
  
   I would really like to include this commit into the 0.8 release as
   well:
  
  
  
  
 
 https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
   A user is affected by this issue.
   If you agree, I can merge it.
  
   On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen se...@apache.org
   wrote:
  
   I have gone through the code, cleaned up dependencies and made sure
   that
   all licenses are correctly handled.
   The changes are in the public release-0.8 branch.
  
   From that side, the code is now good to go in my opinion.
  
  
   Stephan
  
  
   On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger 
   rmetz...@apache.org
   wrote:
  
   I've updated the docs/_config.yml variables to reflect that
 hadoop2
   is
   now
   the default profile: https://github.com/apache/flink/pull/294
  
   On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen se...@apache.org
   wrote:
  
   Just as a heads up. I am almost through with checking
   dependencies
   and
   licenses.
   Will commit that later tonight or tomorrow.
  
   On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen se...@apache.org
   wrote:
  
   I vote to include it as well. It is sort of vital for advanced
   use
   of
   the
   Scala API.
  
   It is also an isolated change that does not affect other
   components,
   so
   it
   should be testable very well.
  
   On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger 
   rmetz...@apache.org
   wrote:
  
   I'd say yes because its affecting a user.
  
   On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek 
   aljos...@apache.org
  
   wrote:
  
   I have a fix for this issue:
  
  
  
  
  
  
  
  
 
 https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
   and I think this should also make it into the 0.8.0 release.
  
   What do you think?
  
   On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi 
   balassi.mar...@gmail.com
   wrote:
   Sure, thanks Ufuk.
  
   On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi 
   u...@apache.org
  
   wrote:
  
   Marton, could you also cherry-pick 7f659f6 and 7e08fa1
   for
   the
   next
   RC?
   It's a minor update to the README describing the IDE
   setup.
  
   I will closed the respective issue FLINK-1109.
  
   On 08 Jan 2015, at 23:50, Henry Saputra 
   henry.sapu...@gmail.com
  
   wrote:
  
   Marton, could you close this VOTE thread by replying to
   the
   original
   email and append [CANCEL] in the subject line.
  
   - Henry
  
   On Thu, Jan 8, 2015 at 9:35 AM, Márton Balassi 
   

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Ufuk Celebi
Thanks, Marton. Yes, I think https://issues.apache.org/jira/browse/FLINK-1376.

Till has a fix coming up for it. Should we wait for this or postpone it to 
0.8.1?

– Ufuk

On 12 Jan 2015, at 10:16, Márton Balassi balassi.mar...@gmail.com wrote:

 Based on the requests listed in this thread Stephan and myself
 cherry-picked the following commits:
 
 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
 (rmetzger)
 40a2705 [FLINK-1266] Update mongodb link and address pull request comments
 (rmetzger)
 cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
 initializing … (rmetzger)
 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
 e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
 e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
 available… (rmetzger)
 
 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type parameters
 (aljoscha)
 fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer (aljoscha)
 ada35eb [FLINK-1378] [scala] Add support for Try[A]
 (Success/Failure) (aljoscha)
 
 a64abe1 [docs] Update README and internals (scheduling) for graduation and
 fi… (SephanEwen)
 4f95ce7 Fix typo in README.md (uce)
 
 Both of us committed some other fixes mainly for the dependencies and
 licences, the distribution and the quickstarts. Pushing the results as soon
 as the tests pass.
 
 Does anyone have any other issue that has to be included in the next
 release candidate?
 
 
 On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger rmetz...@apache.org wrote:
 
 I've tested it on an empty YARN cluster, allocating more containers than
 available.
 Flink will then allocate as many containers as possible.
 
 On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen se...@apache.org wrote:
 
 Seems reasonable. Have you tested it on a cluster with concurrent YARN
 jobs?
 
 On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger rmetz...@apache.org
 wrote:
 
 I would really like to include this commit into the 0.8 release as
 well:
 
 
 
 https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
 A user is affected by this issue.
 If you agree, I can merge it.
 
 On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen se...@apache.org
 wrote:
 
 I have gone through the code, cleaned up dependencies and made sure
 that
 all licenses are correctly handled.
 The changes are in the public release-0.8 branch.
 
 From that side, the code is now good to go in my opinion.
 
 
 Stephan
 
 
 On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger 
 rmetz...@apache.org
 wrote:
 
 I've updated the docs/_config.yml variables to reflect that hadoop2
 is
 now
 the default profile: https://github.com/apache/flink/pull/294
 
 On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen se...@apache.org
 wrote:
 
 Just as a heads up. I am almost through with checking
 dependencies
 and
 licenses.
 Will commit that later tonight or tomorrow.
 
 On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen se...@apache.org
 wrote:
 
 I vote to include it as well. It is sort of vital for advanced
 use
 of
 the
 Scala API.
 
 It is also an isolated change that does not affect other
 components,
 so
 it
 should be testable very well.
 
 On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger 
 rmetz...@apache.org
 wrote:
 
 I'd say yes because its affecting a user.
 
 On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek 
 aljos...@apache.org
 
 wrote:
 
 I have a fix for this issue:
 
 
 
 
 
 
 
 https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
 and I think this should also make it into the 0.8.0 release.
 
 What do you think?
 
 On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi 
 balassi.mar...@gmail.com
 wrote:
 Sure, thanks Ufuk.
 
 On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi 
 u...@apache.org
 
 wrote:
 
 Marton, could you also cherry-pick 7f659f6 and 7e08fa1
 for
 the
 next
 RC?
 It's a minor update to the README describing the IDE
 setup.
 
 I will closed the respective issue FLINK-1109.
 
 On 08 Jan 2015, at 23:50, Henry Saputra 
 henry.sapu...@gmail.com
 
 wrote:
 
 Marton, could you close this VOTE thread by replying to
 the
 original
 email and append [CANCEL] in the subject line.
 
 - Henry
 
 On Thu, Jan 8, 2015 at 9:35 AM, Márton Balassi 
 balassi.mar...@gmail.com
 wrote:
 Cherry-picked and tested: found no duplicate
 dependencies
 in
 lib,
 yarn
 uberjar build goes without the mentioned warns.
 Travis tests are passing, pushing soon.
 
 On Thu, Jan 8, 2015 at 4:57 PM, Stephan Ewen 
 se...@apache.org
 wrote:
 
 Nice.
 
 @Marton: As soon as as you are done, I make a pass
 over
 the
 licenses...
 
 Stephan
 
 
 On Thu, Jan 8, 2015 at 4:42 PM, Robert Metzger 
 rmetz...@apache.org
 
 wrote:
 
 Allright. The travis tests are green and I tested it
 again
 with
 Tachyon
 on
 a cluster.
 
 My pull request also fixes some of the issues
 mentioned
 earlier
 in
 this
 thread by Stephan (the warnings from shading
 regarding
 duplicate
 classes).
 

Re: Gather a distributed dataset

2015-01-12 Thread Ufuk Celebi
Hey Alexander,

On 12 Jan 2015, at 11:42, Alexander Alexandrov 
alexander.s.alexand...@gmail.com wrote:

 Hi there,
 
 I wished for intermediate datasets, and Santa Ufuk made my wishes come true
 (thank you, Santa)!
 
 Now that FLINK-986 is in the mainline, I want to ask some practical
 questions.
 
 In Spark, there is a way to put a value from the local driver to the
 distributed runtime via
 
 val x = env.parallelize(...) // expose x to the distributed runtime
 val y = dataflow(env, x) // y is produced by a dataflow which reads from x
 
 and also to get a value from the distributed runtime back to the driver
 
 val z = env.collect(y)
 
 As far as I know, in Flink we have an equivalent for parallelize
 
 val x = env.fromCollection(...)
 
 but not for collect. Is this still the case?

Yes, but this will change soon.

 If yes, how hard would it be to add this feature at the moment? Can you
 give me some pointers?

There is a alpha version/hack of this using accumulators. See 
https://github.com/apache/flink/pull/210. The problem is that each collect call 
results in a new program being executed from the sources. I think Stephan is 
working on the scheduling to support this kind of programs. From the runtime 
perspective, it is not a problem to transfer the produced intermediate results 
back to the job manager. The job manager can basically use the same mechanism 
that the task managers use. Even the accumulator version should be fine as a 
initial version.

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Till Rohrmann
Yeah I agree with that.

On Mon, Jan 12, 2015 at 11:30 AM, Ufuk Celebi u...@apache.org wrote:

 On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen se...@apache.org wrote:

  It would be good to have the patch, but it is also a very tricky patch,
 so
  pushing it hastily may be problematic.
 

 I agree. @Till: would that be OK with you? If yes, I think we are good to
 go for the next RC.



[VOTE] Release Apache Flink 0.8.0 (RC2)

2015-01-12 Thread Márton Balassi
Please vote on releasing the following candidate as Apache Flink version
0.8.0

This release will be the first major release for Flink as a top level
project.

-
The commit to be voted on is in the branch release-0.8.0-rc2
(commit 9ee74f3):
https://git-wip-us.apache.org/repos/asf/flink/commit/9ee74f3

The release artifacts to be voted on can be found at:
http://people.apache.org/~mbalassi/flink-0.8.0-rc2/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/mbalassi.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapacheflink-1022
-


Please vote on releasing this package as Apache Flink 0.8.0.

The vote is open for the next 72 hours and passes if a majority of at least
three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Flink 0.8.0
[ ] -1 Do not release this package because ...


[jira] [Created] (FLINK-1391) Kryo fails to properly serialize avro collection types

2015-01-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1391:
-

 Summary: Kryo fails to properly serialize avro collection types
 Key: FLINK-1391
 URL: https://issues.apache.org/jira/browse/FLINK-1391
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.8, 0.9
Reporter: Robert Metzger


Before FLINK-610, Avro was the default generic serializer.
Now, special types coming from Avro are handled by Kryo .. which seems to cause 
errors like:

{code}
Exception in thread main 
org.apache.flink.runtime.client.JobExecutionException: 
java.lang.NullPointerException
at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at 
org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143)
at 
org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244)
at 
org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56)
at 
org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71)
at 
org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189)
at 
org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176)
at 
org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53)
at 
org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
at java.lang.Thread.run(Thread.java:744)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)