date:20150302

[jira] [Created] (FLINK-1632) Use DataSet's count() and collect() to simplify Gelly methods

2015-03-02 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-1632:


 Summary: Use DataSet's count() and collect() to simplify Gelly 
methods
 Key: FLINK-1632
 URL: https://issues.apache.org/jira/browse/FLINK-1632
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri


The recently introduced count() and collect() methods of DataSet can be used to 
simplify several Gelly methods. We can get rid of GraphUtils and also simplify 
methods which return DataSets with single values, such as numberOfVertices() 
and isWeaklyConnected().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1633) Add getTriplets() Gelly method

2015-03-02 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-1633:


 Summary: Add getTriplets() Gelly method
 Key: FLINK-1633
 URL: https://issues.apache.org/jira/browse/FLINK-1633
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Priority: Minor


In some graph algorithms, it is required to access the graph edges together 
with the vertex values of the source and target vertices. For example, several 
graph weighting schemes compute some kind of similarity weights for edges, 
based on the attributes of the source and target vertices. This issue proposes 
adding a convenience Gelly method that generates a DataSet of srcVertex, Edge, 
TrgVertex triplets from the input graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Access flink-conf.yaml data

2015-03-02 Thread Dulaj Viduranga


Hi,
Can someone help me on how to access the flink-conf.yaml configuration values 
inside the flink sources? Are these readily available as a map somewhere?

Thanks.

[jira] [Created] (FLINK-1634) Fix Could not build up connection to JobManager issue on some systems

2015-03-02 Thread Dulaj Viduranga (JIRA)

Dulaj Viduranga created FLINK-1634:
--

 Summary: Fix Could not build up connection to JobManager issue 
on some systems 
 Key: FLINK-1634
 URL: https://issues.apache.org/jira/browse/FLINK-1634
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Dulaj Viduranga
Priority: Critical
 Fix For: 0.9


In some systems, flink 0.9-SNAPSHOT gives an error 
(org.apache.flink.client.program.ProgramInvocationException: Could not build up 
connection to JobManager.) when the taskmanager tries to connect to the 
jobmanager.
This is because the taskmanager cannot resolve the IP where the jobmanager 
runs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Access flink-conf.yaml data

2015-03-02 Thread Chiwan Park

I think that you can use `org.apache.flink.configuration.GlobalConfiguration` 
to obtain configuration object.

Regards.
Chiwan Park (Sent with iPhone)


 On Mar 3, 2015, at 12:17 PM, Dulaj Viduranga vidura...@icloud.com wrote:
 
 Hi,
 Can someone help me on how to access the flink-conf.yaml configuration values 
 inside the flink sources? Are these readily available as a map somewhere?
 
 Thanks.

Re: Could not build up connection to JobManager

2015-03-02 Thread Dulaj Viduranga


Hi,
I found the fix for this issue and I'll create a pull request in the following 
day.

Re: Problem mvn install

2015-03-02 Thread Fabian Hueske

Hi Matthias,

I just checked and could not reproduce the error.
The files that Maven RAT complained about do not exist in Flink's master
branch.
I don't think they are put there as part of the build process.

Best, Fabian





2015-03-02 15:09 GMT+01:00 Matthias J. Sax mj...@informatik.hu-berlin.de:

 Hi,

 if I start mvn-Dmaven.test.skip=true clean install, the goal fails and
 I get the following error:

  Unapproved licenses:
 
flink-clients/bin/src/main/resources/web-docs/js/dagre-d3.js
flink-clients/bin/src/main/resources/web-docs/js/d3.js
flink-staging/flink-avro/bin/src/test/resources/avro/user.avsc

 It seems, that an APL2 compatible license is expected. Can anyone
 comment it this issue.

 - I am on branch flink/master


 -Matthias

[jira] [Created] (FLINK-1624) Build of old sources fails due to git-commit-id plugin

2015-03-02 Thread Max Michels (JIRA)

Max Michels created FLINK-1624:
--

 Summary: Build of old sources fails due to git-commit-id plugin
 Key: FLINK-1624
 URL: https://issues.apache.org/jira/browse/FLINK-1624
 Project: Flink
  Issue Type: Bug
Reporter: Max Michels
Assignee: Max Michels
Priority: Minor
 Fix For: 0.6-incubating


Builds for Flink (Stratosphere) versions  0.6.0 fail because of a bug in the 
maven git-commit-id plugin.

https://github.com/ktoso/maven-git-commit-id-plugin/issues/61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1623) Rename Expression API and Operation Representation

2015-03-02 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-1623:
---

 Summary: Rename Expression API and Operation Representation
 Key: FLINK-1623
 URL: https://issues.apache.org/jira/browse/FLINK-1623
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek


Right now the package is called flunk-expressions and we refer to the API as 
the Expression API. The equivalent to DataSet and DataStream is the 
ExpressionOperation. I'm not very happy with these names, so we should find 
something that is more marketable before making any big announcements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1618) Add parallel time discretisation for time-window transformations

2015-03-02 Thread Gyula Fora (JIRA)

Gyula Fora created FLINK-1618:
-

 Summary: Add parallel time discretisation for time-window 
transformations 
 Key: FLINK-1618
 URL: https://issues.apache.org/jira/browse/FLINK-1618
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora


Currently discretizers for all windowing policies including time are executed 
with parallelism 1 when they define global windows. (for instance: sum of the 
last 10 minutes) 

While this is necessary for arbitrary policies like delta based or user-defined 
policies. Some discretizers such as Time can be implemented in a distributed 
fashion.

Distributed time discretisers (and other types) can be implemented in the 
following way:

-The discretisers should create StreamWindow s with incrementally increasing 
ID-s starting from the same value so that it is possible to merge them after 
the transformation
- The partitioner for each discretizer should send the number of partitions 
created to the merger (the merger should be aware of the number of partitioners 
present to wait for all the information)
- Based on all the partitioning info the merger can merge the windows properly 
afterwards





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1620) Add pre-aggregator for count windows

2015-03-02 Thread Gyula Fora (JIRA)

Gyula Fora created FLINK-1620:
-

 Summary: Add pre-aggregator for count windows
 Key: FLINK-1620
 URL: https://issues.apache.org/jira/browse/FLINK-1620
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora


Currently there is only support for pre-aggregators for tumbling policies.

A pre-aggregator should be added for count policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Could not build up connection to JobManager

2015-03-02 Thread Robert Metzger

Calling:
java -cp ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10
0.08

Will not connect to Flink. Its just running a standalone KMeans data
generator, not KMeans.
I would suspect that the KMeans example is not running as well.

You can run the KMeans example like this:
bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar.



On Sat, Feb 28, 2015 at 5:47 AM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 I’m thinking I’m doing something wrong. After setting jobManager address
 to 127.0.0.1, I can run kmeans example (java -cp
 ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
 org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10
 0.08)
 But I can’t run word count example (bin/flink run
 ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt'
 file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count.txt’)

 I’m not sure whether I’m running it wrong

  On Feb 26, 2015, at 9:03 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
  Hi,
It’s great to help out. :)
 
Setting 127.0.0.1 instead of “localhost” in
 jobmanager.rpc.address, helped to build the connection to the jobmanager.
 Apparently localhost resolving is different in webclient and the
 jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in
 future builds.
But then I get this error when I tried to run examples. I don’t
 know if I should move this issue to another thread. If so please tell me.
 
  bin/flink run
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count
 
 
  20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
  - Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
  02/26/2015 20:46:23   Job execution switched to status RUNNING.
  02/26/2015 20:46:23   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to SCHEDULED
  02/26/2015 20:46:23   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to DEPLOYING
  02/26/2015 20:48:03   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to FAILED
  akka.pattern.AskTimeoutException: Ask timed out on
 [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms]
at
 akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at
 scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
 scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
 akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
 
  02/26/2015 20:48:03   Job execution switched to status FAILING.
  02/26/2015 20:48:03   Reduce (SUM(1), at main(WordCount.java:72)(1/1)
 switched to CANCELED
  02/26/2015 20:48:03   DataSink(CsvOutputFormat (path:
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
 delimiter:  ))(1/1) switched to CANCELED
  02/26/2015 20:48:03   Job execution switched to status FAILED.
  org.apache.flink.client.program.ProgramInvocationException: The program
 execution failed.
at org.apache.flink.client.program.Client.run(Client.java:344)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

[DISCUSS] Make a release to be announced at ApacheCon

2015-03-02 Thread Stephan Ewen

Hi all!

ApacheCon is coming up and it is the 15th anniversary of the Apache
Software Foundation.

In the course of the conference, Apache would like to make a series of
announcements. If we manage to make a release during (or shortly before)
ApacheCon, they will announce it through their channels.

I am very much in favor of doing this, under the strong condition that we
are very confident that the master has grown to be stable enough (there are
major changes in the distributed runtime since version 0.8 that we are
still stabilizing). No use in a widely announced build that does not have
the quality.

Flink has now many new features that warrant a release soon (once we fixed
the last quirks in the new distributed runtime).

Notable new features are:
 - Gelly
 - Streaming windows
 - Flink on Tez
 - Expression API
 - Distributed Runtime on Akka
 - Batch mode
 - Maybe even a first ML library version
 - Some streaming fault tolerance

Robert proposed to have a feature freeze mid Match for that. His
cornerpoints were:

Feature freeze (forking off release-0.9): March 17
RC1 vote: March 24

The RC1 vote is 20 days before the ApacheCon (13. April).
For the last three releases, the average voting time was 20 days:
R 0.8.0 -- 14 days
R 0.7.0 -- 22 days
R 0.6   -- 26 days

Please share your opinion on this!


Greetings,
Stephan

Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-02 Thread Márton Balassi

Hey,

We have a nice list of new features - it definitely makes sense to have
that as a release. On my side I really want to have a first limited version
of streaming fault tolerance in it.

+1 for Robert's proposal for the deadlines.

I'm also volunteering for release manager.

Best,
Marton

On Mon, Mar 2, 2015 at 2:03 PM, Stephan Ewen se...@apache.org wrote:

 Hi all!

 ApacheCon is coming up and it is the 15th anniversary of the Apache
 Software Foundation.

 In the course of the conference, Apache would like to make a series of
 announcements. If we manage to make a release during (or shortly before)
 ApacheCon, they will announce it through their channels.

 I am very much in favor of doing this, under the strong condition that we
 are very confident that the master has grown to be stable enough (there are
 major changes in the distributed runtime since version 0.8 that we are
 still stabilizing). No use in a widely announced build that does not have
 the quality.

 Flink has now many new features that warrant a release soon (once we fixed
 the last quirks in the new distributed runtime).

 Notable new features are:
  - Gelly
  - Streaming windows
  - Flink on Tez
  - Expression API
  - Distributed Runtime on Akka
  - Batch mode
  - Maybe even a first ML library version
  - Some streaming fault tolerance

 Robert proposed to have a feature freeze mid Match for that. His
 cornerpoints were:

 Feature freeze (forking off release-0.9): March 17
 RC1 vote: March 24

 The RC1 vote is 20 days before the ApacheCon (13. April).
 For the last three releases, the average voting time was 20 days:
 R 0.8.0 -- 14 days
 R 0.7.0 -- 22 days
 R 0.6   -- 26 days

 Please share your opinion on this!


 Greetings,
 Stephan

Re: Could not build up connection to JobManager

2015-03-02 Thread Dulaj Viduranga


In some places of the code, localhost is hard coded. When it is resolved by 
the DNS, it is posible to be directed  to a different IP other than 127.0.0.1 (like 
private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm.
But hard coding 127.0.0.1 is not a good option because when the jobmanager ip 
is changed, this becomes an issue again. I'm thinking of setting jobmanager ip 
from the config.yaml to these places.
If you have a better idea on doing this with your experience, please let me 
know.

Best.

[jira] [Created] (FLINK-1622) Add ReducePartial and GroupReducePartial Operators

2015-03-02 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-1622:
---

 Summary: Add ReducePartial and GroupReducePartial Operators
 Key: FLINK-1622
 URL: https://issues.apache.org/jira/browse/FLINK-1622
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Aljoscha Krettek


This does what a Reduce or GroupReduce Operator does, except it is only 
performed on a local partition.

This is also similar to an explicit combine that can output a type that is 
different from the input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Could not build up connection to JobManager

2015-03-02 Thread Stephan Ewen

Wow, great. Can you tell us what the issue was?
Am 02.03.2015 09:31 schrieb Dulaj Viduranga vidura...@icloud.com:

 Hi,
 I found the fix for this issue and I'll create a pull request in the
 following day.

[jira] [Created] (FLINK-1621) Create a generalized combine function

2015-03-02 Thread Max Michels (JIRA)

Max Michels created FLINK-1621:
--

 Summary: Create a generalized combine function
 Key: FLINK-1621
 URL: https://issues.apache.org/jira/browse/FLINK-1621
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Max Michels
 Fix For: 0.9


Flink allows combiners which accept a type {{I}} and combine the values of this 
type into type {{O}}. In Google Dataflow, combiners are more generalized. They 
accept an Input {{I}}, produce an intermediate combine value of {{T}}, and 
finally an output {{O}}. Flink's combiners are like the {{SimpleCombineFn}} in 
Google Dataflow.

Right now, we translate the {{KeyedCombineFn}} into a {{SortPartition}} 
followed by a {{MapPartition}} to emulate the Combiner's behavior. Rudimentary 
performance tests showed that this behavior causes a significant increase in 
run time compared to the proper Combine implementation.

Let's implement a more generalized Combiner to create a better mapping from 
Google Dataflow to Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1619) Add pre-aggregator for time windows

2015-03-02 Thread Gyula Fora (JIRA)

Gyula Fora created FLINK-1619:
-

 Summary: Add pre-aggregator for time windows
 Key: FLINK-1619
 URL: https://issues.apache.org/jira/browse/FLINK-1619
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora


Currently there is only support for pre-aggregators for tumbling policies.

A pre-aggregator should be added for time policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Tweets Custom Input Format

2015-03-02 Thread Robert Metzger

Great. Thank you.

I gave some feedback in the pull request and asked some questions there.

On Fri, Feb 27, 2015 at 5:43 PM, Mustafa Elbehery elbeherymust...@gmail.com
 wrote:

 @robert,

 I have created the PR https://github.com/apache/flink/pull/442,



 On Fri, Feb 27, 2015 at 11:58 AM, Mustafa Elbehery 
 elbeherymust...@gmail.com wrote:

  @Robert,
 
  Thanks I was asking about the procedure. I have opened a Jira ticket for
  Flink-Contrib and I will create a PR with the naming convention on Wiki,
 
  https://issues.apache.org/jira/browse/FLINK-1615,
 
 
 
  On Fri, Feb 27, 2015 at 11:55 AM, Robert Metzger rmetz...@apache.org
  wrote:
 
  I'm glad you've found the how to contribute guide.
 
  I can not describe the process to open a pull request better than
 already
  written in the guide.
  Maybe this link is also helpful for you:
  https://help.github.com/articles/creating-a-pull-request/
 
  Are you facing a particular error message? Maybe that helps me to help
 you
  better.
 
 
  On Fri, Feb 27, 2015 at 10:46 AM, Mustafa Elbehery 
  elbeherymust...@gmail.com wrote:
 
   Actually I am reading How to contribute now to push the code. Its
  working
   and tested locally and on the cluster, and i have used it for an ETL.
  
   The structure as follow :-
  
   Java Pojos for the tweet object, and the nested objects.  Parser class
   using event-driven approach, and the SimpleTweetInputFormat itself.
  
   Would you guide me how to push the code, just to save sometime :)
  
  
   On Fri, Feb 27, 2015 at 10:42 AM, Robert Metzger rmetz...@apache.org
 
   wrote:
  
Hi,
   
cool! Can you generalize the input format to read JSON into an
  arbitrary
POJO?
   
It would be great if you could contribute the InputFormat into the
flink-contrib module. I've seen many users reading JSON data with
   Flink,
so its good to have a standard solution for that.
If you want you can add the Tweet into POJO as an example into
flink-contrib.
   
On Fri, Feb 27, 2015 at 10:37 AM, Mustafa Elbehery 
elbeherymust...@gmail.com wrote:
   
 Hi,

 I am really sorry for being so late, it was a whole month of
  projects
   and
 examination, I was really busy.

 @Robert, it is IF for reading tweet into Pojo. I use an
 event-driven
 parser, I retrieve most of the tweet into Java Pojos, it was
 tested
  on
1TB
 dataset, for a Flink ETL job, and the performance was pretty good.



 On Sun, Jan 25, 2015 at 7:38 PM, Robert Metzger 
  rmetz...@apache.org
 wrote:

  Hey,
 
  is it a input format for reading JSON data or an IF for reading
   tweets
in
  some format into a pojo?
 
  I think a JSON Input Format would be something very useful for
 our
users.
  Maybe you can add that and use the Tweet IF as a concrete
 example
  for
 that?
  Do you have a preview of the code somewhere?
 
  Best,
  Robert
 
  On Sat, Jan 24, 2015 at 11:06 AM, Fabian Hueske 
  fhue...@gmail.com
 wrote:
 
   Hi Mustafa,
  
   that would be a nice contribution!
  
   We are currently discussing how to add non-core API features
  into
 Flink
   [1].
   I will move this discussion onto the mailing list to decide
  where
   to
 add
   cool add-ons like yours.
  
   Cheers, Fabian
  
   [1] https://issues.apache.org/jira/browse/FLINK-1398
  
   2015-01-23 20:42 GMT+01:00 Henry Saputra 
  henry.sapu...@gmail.com
   :
  
Contributions are welcomed!
   
Here is the link on how to contribute to Apache Flink:
http://flink.apache.org/how-to-contribute.html
   
You can start by creating JIRA ticket [1] to help describe
  what
   you
wanted to do and to get feedback from community.
   
   
- Henry
   
[1] https://issues.apache.org/jira/secure/Dashboard.jspa
   
On Fri, Jan 23, 2015 at 10:54 AM, Mustafa Elbehery
elbeherymust...@gmail.com wrote:
 Hi,

 I have created a custom InputFormat for tweets on Flink,
  based
   on
 JSON-Simple event driven parser. I would like to
 contribute
  my
work
   into
 Flink,

 Regards.

 --
 Mustafa Elbehery
 EIT ICT Labs Master School 
   http://www.masterschool.eitictlabs.eu/home/
 +49(0)15218676094
 skype: mustafaelbehery87
   
  
 



 --
 Mustafa Elbehery
 EIT ICT Labs Master School 
   http://www.masterschool.eitictlabs.eu/home/
 +49(0)15218676094
 skype: mustafaelbehery87

   
  
  
  
   --
   Mustafa Elbehery
   EIT ICT Labs Master School 
 http://www.masterschool.eitictlabs.eu/home/
  
   +49(0)15218676094
   skype: mustafaelbehery87
  
 
 
 
 
  --
  Mustafa Elbehery
  EIT ICT Labs Master School http://www.masterschool.eitictlabs.eu/home/

Re: [DISCUSS] Offer Flink with Scala 2.11

2015-03-02 Thread Aljoscha Krettek

+1 I also like it. We just have to figure out how we can publish two
sets of release artifacts.

On Mon, Mar 2, 2015 at 4:48 PM, Stephan Ewen se...@apache.org wrote:
 Big +1 from my side!

 Does it have to be a Maven profile, or does a maven property work? (Profile
 may be needed for quasiquotes dependency?)

 On Mon, Mar 2, 2015 at 4:36 PM, Alexander Alexandrov 
 alexander.s.alexand...@gmail.com wrote:

 Hi there,

 since I'm relying on Scala 2.11.4 on a project I've been working on, I
 created a branch which updates the Scala version used by Flink from 2.10.4
 to 2.11.4:

 https://github.com/stratosphere/flink/commits/scala_2.11

 Everything seems to work fine and the PR contains minor changes compared to
 Spark:

 https://issues.apache.org/jira/browse/SPARK-4466

 If you're interested, I can rewrite this as a Maven Profile and open a PR
 so people can build Flink with 2.11 support.

 I suggest to do this sooner rather than later in order to

 * the number of code changes enforced by migration small and tractable;
 * discourage the use of deprecated or 2.11-incompatible source code in
 future commits;

 Regards,
 A.

Re: Problem mvn install

2015-03-02 Thread Stephan Ewen

Matthias!

The files should not exist. Has some IDE setup copied the files into the
bin directory (as part of compiling it without maven) ? It looks like you
are building it not through maven really...

BTW: Does it make a difference whether you use mvn -Dmaven.test.skip=true
clean install or mvn -DskipTests clean install

Stephan




On Mon, Mar 2, 2015 at 3:38 PM, Fabian Hueske fhue...@gmail.com wrote:

 Hi Matthias,

 I just checked and could not reproduce the error.
 The files that Maven RAT complained about do not exist in Flink's master
 branch.
 I don't think they are put there as part of the build process.

 Best, Fabian





 2015-03-02 15:09 GMT+01:00 Matthias J. Sax mj...@informatik.hu-berlin.de
 :

  Hi,
 
  if I start mvn-Dmaven.test.skip=true clean install, the goal fails and
  I get the following error:
 
   Unapproved licenses:
  
 flink-clients/bin/src/main/resources/web-docs/js/dagre-d3.js
 flink-clients/bin/src/main/resources/web-docs/js/d3.js
 flink-staging/flink-avro/bin/src/test/resources/avro/user.avsc
 
  It seems, that an APL2 compatible license is expected. Can anyone
  comment it this issue.
 
  - I am on branch flink/master
 
 
  -Matthias

Re: [DISCUSS] Offer Flink with Scala 2.11

2015-03-02 Thread Alexander Alexandrov

Spark currently only provides pre-builds for 2.10 and requires custom build
for 2.11.

Not sure whether this is the best idea, but I can see the benefits from a
project management point of view...

Would you prefer to have a {scala_version} × {hadoop_version} integrated on
the website?

2015-03-02 16:57 GMT+01:00 Aljoscha Krettek aljos...@apache.org:

 +1 I also like it. We just have to figure out how we can publish two
 sets of release artifacts.

 On Mon, Mar 2, 2015 at 4:48 PM, Stephan Ewen se...@apache.org wrote:
  Big +1 from my side!
 
  Does it have to be a Maven profile, or does a maven property work?
 (Profile
  may be needed for quasiquotes dependency?)
 
  On Mon, Mar 2, 2015 at 4:36 PM, Alexander Alexandrov 
  alexander.s.alexand...@gmail.com wrote:
 
  Hi there,
 
  since I'm relying on Scala 2.11.4 on a project I've been working on, I
  created a branch which updates the Scala version used by Flink from
 2.10.4
  to 2.11.4:
 
  https://github.com/stratosphere/flink/commits/scala_2.11
 
  Everything seems to work fine and the PR contains minor changes
 compared to
  Spark:
 
  https://issues.apache.org/jira/browse/SPARK-4466
 
  If you're interested, I can rewrite this as a Maven Profile and open a
 PR
  so people can build Flink with 2.11 support.
 
  I suggest to do this sooner rather than later in order to
 
  * the number of code changes enforced by migration small and tractable;
  * discourage the use of deprecated or 2.11-incompatible source code in
  future commits;
 
  Regards,
  A.

Re: Thoughts About Object Reuse and Collection Execution

2015-03-02 Thread Ted Dunning

On Mon, Mar 2, 2015 at 5:17 PM, Stephan Ewen se...@apache.org wrote:

 There are two execution modes for the runtime: reuse and non-reuse.


That makes a fair bit of sense.

Re: Problem mvn install

2015-03-02 Thread Matthias J. Sax

I guess, Eclipse created those files. I delete them manually, what
resolved the problem. I added bin to my local .gitignore and thus git
status did not list the files and I was not aware the they are not part
of the repository.

As far as I know, -Dmaven.test.skip=true is equal to -DskipTests.

-Matthias


On 03/02/2015 04:54 PM, Stephan Ewen wrote:
 Matthias!
 
 The files should not exist. Has some IDE setup copied the files into the
 bin directory (as part of compiling it without maven) ? It looks like you
 are building it not through maven really...
 
 BTW: Does it make a difference whether you use mvn -Dmaven.test.skip=true
 clean install or mvn -DskipTests clean install
 
 Stephan
 
 
 
 
 On Mon, Mar 2, 2015 at 3:38 PM, Fabian Hueske fhue...@gmail.com wrote:
 
 Hi Matthias,

 I just checked and could not reproduce the error.
 The files that Maven RAT complained about do not exist in Flink's master
 branch.
 I don't think they are put there as part of the build process.

 Best, Fabian





 2015-03-02 15:09 GMT+01:00 Matthias J. Sax mj...@informatik.hu-berlin.de
 :

 Hi,

 if I start mvn-Dmaven.test.skip=true clean install, the goal fails and
 I get the following error:

 Unapproved licenses:

   flink-clients/bin/src/main/resources/web-docs/js/dagre-d3.js
   flink-clients/bin/src/main/resources/web-docs/js/d3.js
   flink-staging/flink-avro/bin/src/test/resources/avro/user.avsc

 It seems, that an APL2 compatible license is expected. Can anyone
 comment it this issue.

 - I am on branch flink/master


 -Matthias



 



signature.asc
Description: OpenPGP digital signature

Re: Queries regarding RDFs with Flink

2015-03-02 Thread Stephan Ewen

Hey Santosh!

RDF processing often involves either joins, or graph-query like operations
(transitive). Flink is fairly good at both types of operations.

I would look into the graph examples and the graph API for a start:

 - Graph examples:
https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
 - Graph API:
https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph

If you have a more specific question, I can give you better pointers ;-)

Stephan


On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com wrote:

 Hello,

 how can flink be useful for processing the data to RDFs and build the
 ontology?

 Regards,
 Santosh







 --
 View this message in context:
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
 Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
 archive at Nabble.com.

[jira] [Created] (FLINK-1625) Add cancel method to user defined sources and sinks and call them on task cancellation

2015-03-02 Thread Gyula Fora (JIRA)

Gyula Fora created FLINK-1625:
-

 Summary: Add cancel method to user defined sources and sinks and 
call them on task cancellation
 Key: FLINK-1625
 URL: https://issues.apache.org/jira/browse/FLINK-1625
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora


Currently on task cancellation the user defined functions get interrupted 
without notice. This can cause serious problems for functions that have 
established connection with the outside world, for instance message queue 
connectors, file sources etc.

An explicit cancel() method should be added to the SourceFunction and 
SinkFunction interfaces so that the user would be forced to implement the 
cancel functionality which is necessary for the specific udf.

The cancel() method in the StreamVertex should also be implemented in a way 
that it calls the cancel methods of the Sink and Source functions on 
cancellation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Thoughts About Object Reuse and Collection Execution

2015-03-02 Thread Stephan Ewen

@Ted  Here is a bit of background about how things are currently done in
the Flink runtime:

There are two execution modes for the runtime: reuse and non-reuse.
  - The non-reuse mode will create new objects for every record received
from the network, or taken out of a sort-buffer or hash-table.
 It has the advantage that is some user code group operation
(groupReduce) materializes the group, it works simply (dedicated objects
for each element)

  - The reuse mode will have one or two objects that will be reused each
time a record is received from the network, or taken out of a sort-buffer
or hash-table.
It behaves similar as the object reuse in Hadoop and saves in garbage
collection, but requires the user code to sometimes be aware of object
reuse

Flink has multiple runtime backends that can execute a program. Some do not
support both modes:

  - Flink's own runtime. Works in both reuse and non-reuse mode,
depending on what users select.

  - Java Collections (non-parallel, for testing and lightweight embedding).
Tolerates objects reuse in user code but does not attempt to reuse by
itself.

  - Tez (coming up) will in the long run support also both modes (reuse and
non-reuse

Where the internal algorithms (sorting/ hashing) do not affect any user
code behavior, they always work in reusing mode.

Greetings,
Stephan




On Sat, Feb 28, 2015 at 10:33 AM, Aljoscha Krettek aljos...@apache.org
wrote:

 @Stephan, they are not copied when object reuse is enabled. This might
 be a problem, though, so maybe we should just change it in that place.

 On Sat, Feb 28, 2015 at 7:57 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:
  This is going to have profound performance implications if this is the
 only
  path for iteration.
 
 
 
  On Fri, Feb 27, 2015 at 10:58 PM, Stephan Ewen se...@apache.org wrote:
 
  I vote to have the key extractor return a new value each time. That
 means
  that objects are not reused everywhere where it is possible, but still
 in
  most places, which still helps.
 
  What still puzzles me: I thought that the collection execution stores
  copies of the returned records by default (reuse safe mode).
  Am 27.02.2015 15:36 schrieb Aljoscha Krettek aljos...@apache.org:
 
   Hello Nation of Flink,
   while figuring out this bug:
   https://issues.apache.org/jira/browse/FLINK-1569
   I came upon some difficulties. The problem is that the
   KeyExtractorMappers always
   return the same tuple. This is problematic, since Collection Execution
   does simply store the returned values in a list. These elements are
   not copied before they are stored when object reuse is enabled.
   Therefore, the whole list will contain only that one reused element.
  
   I see two options for solving this:
   1. Change KeyExtractorMappers to always return a new tuple, thereby
   making object-reuse mode in cluster execution useless for key
   extractors.
  
   2. Change collection execution mapper to always make copies of the
   returned elements. This would make object-reuse in collection
   execution pretty much obsolete, IMHO.
  
   How should we proceed with this?
  
   Cheers,
   Aljoscha

Re: [DISCUSS] Offer Flink with Scala 2.11

2015-03-02 Thread Till Rohrmann

+1 for Scala 2.11

On Mon, Mar 2, 2015 at 5:02 PM, Alexander Alexandrov 
alexander.s.alexand...@gmail.com wrote:

 Spark currently only provides pre-builds for 2.10 and requires custom build
 for 2.11.

 Not sure whether this is the best idea, but I can see the benefits from a
 project management point of view...

 Would you prefer to have a {scala_version} × {hadoop_version} integrated on
 the website?

 2015-03-02 16:57 GMT+01:00 Aljoscha Krettek aljos...@apache.org:

  +1 I also like it. We just have to figure out how we can publish two
  sets of release artifacts.
 
  On Mon, Mar 2, 2015 at 4:48 PM, Stephan Ewen se...@apache.org wrote:
   Big +1 from my side!
  
   Does it have to be a Maven profile, or does a maven property work?
  (Profile
   may be needed for quasiquotes dependency?)
  
   On Mon, Mar 2, 2015 at 4:36 PM, Alexander Alexandrov 
   alexander.s.alexand...@gmail.com wrote:
  
   Hi there,
  
   since I'm relying on Scala 2.11.4 on a project I've been working on, I
   created a branch which updates the Scala version used by Flink from
  2.10.4
   to 2.11.4:
  
   https://github.com/stratosphere/flink/commits/scala_2.11
  
   Everything seems to work fine and the PR contains minor changes
  compared to
   Spark:
  
   https://issues.apache.org/jira/browse/SPARK-4466
  
   If you're interested, I can rewrite this as a Maven Profile and open a
  PR
   so people can build Flink with 2.11 support.
  
   I suggest to do this sooner rather than later in order to
  
   * the number of code changes enforced by migration small and
 tractable;
   * discourage the use of deprecated or 2.11-incompatible source code in
   future commits;
  
   Regards,
   A.

[jira] [Created] (FLINK-1626) Spurious failure in MatchTask cancelling test

2015-03-02 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-1626:
---

 Summary: Spurious failure in MatchTask cancelling test
 Key: FLINK-1626
 URL: https://issues.apache.org/jira/browse/FLINK-1626
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


This is a problem in the test. Cancelling a task may actually throw an 
exception (especially interrupted exceptions). The test can be modified to 
either only call cancel and not call interrupt, or to tolerate exceptions that 
are followup exceptions of interrupted exceptions. I would prepare a patch for 
the first approach.

The stack trace of the symptom is blow:
{code}
java.lang.RuntimeException: Hashtable closing was interrupted
at 
org.apache.flink.runtime.operators.hash.MutableHashTable.close(MutableHashTable.java:652)
at 
org.apache.flink.runtime.operators.hash.ReusingBuildFirstHashMatchIterator.close(ReusingBuildFirstHashMatchIterator.java:100)
at 
org.apache.flink.runtime.operators.MatchDriver.cleanup(MatchDriver.java:179)
at 
org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriverInternal(DriverTestBase.java:245)
at 
org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriver(DriverTestBase.java:175)
at 
org.apache.flink.runtime.operators.MatchTaskTest$4.run(MatchTaskTest.java:783)
Tests run: 47, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 15.589 sec  
FAILURE! - in org.apache.flink.runtime.operators.MatchTaskTest
testCancelHashMatchTaskWhileBuildFirst[1](org.apache.flink.runtime.operators.MatchTaskTest)
  Time elapsed: 1.029 sec   FAILURE!
java.lang.AssertionError: Test threw an exception even though it was properly 
canceled.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.runtime.operators.MatchTaskTest.testCancelHashMatchTaskWhileBuildFirst(MatchTaskTest.java:802)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1627) Netty channel connect deadlock

2015-03-02 Thread Ufuk Celebi (JIRA)

Ufuk Celebi created FLINK-1627:
--

 Summary: Netty channel connect deadlock 
 Key: FLINK-1627
 URL: https://issues.apache.org/jira/browse/FLINK-1627
 Project: Flink
  Issue Type: Bug
Reporter: Ufuk Celebi


[~StephanEwen] reports the following deadlock 
(https://travis-ci.org/StephanEwen/incubator-flink/jobs/52755844, logs: 
https://s3.amazonaws.com/flink.a.o.uce.east/travis-artifacts/StephanEwen/incubator-flink/477/477.2.tar.gz).

{code}
CHAIN Partition - Map (Map at 
testRestartMultipleTimes(SimpleRecoveryITCase.java:200)) (2/4) daemon prio=10 
tid=0x7f5fdc008800 nid=0xe230 in Object.wait() [0x7f5fca8f2000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xf2a13530 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:179)
- locked 0xf2a13530 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:125)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:64)
at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:53)
at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestIntermediateResultPartition(RemoteInputChannel.java:92)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:287)
- locked 0xf29dbcd8 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:306)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:75)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:91)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205)
at java.lang.Thread.run(Thread.java:745)
{code}

{code}
CHAIN Partition - Map (Map at 
testRestartMultipleTimes(SimpleRecoveryITCase.java:200)) (3/4) daemon prio=10 
tid=0x7f5fdc005000 nid=0xe22f in Object.wait() [0x7f5fca9f3000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xf2a13530 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:179)
- locked 0xf2a13530 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:125)
at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:79)
at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:53)
at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestIntermediateResultPartition(RemoteInputChannel.java:92)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:287)
- locked 0xf2896f88 (a java.lang.Object)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:306)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:75)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:91)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205)
at

[jira] [Created] (FLINK-1628) Strange behavior of where function during a join

2015-03-02 Thread Daniel Bali (JIRA)

Daniel Bali created FLINK-1628:
--

 Summary: Strange behavior of where function during a join
 Key: FLINK-1628
 URL: https://issues.apache.org/jira/browse/FLINK-1628
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Daniel Bali


Hello!

If I use the `where` function with a field list during a join, it exhibits 
strange behavior.

Here is the sample code that triggers the error: 
https://gist.github.com/balidani/d9789b713e559d867d5c

This example joins a DataSet with itself, then counts the number of rows. If I 
use `.where(0, 1)` the result is (22), which is not correct. If I use 
`EdgeKeySelector`, I get the correct result (101).

When I pass a field list to the `equalTo` function (but not `where`), 
everything works again.

If I don't include the `groupBy` and `reduceGroup` parts, everything works.

Also, when working with large DataSets, passing a field list to `where` makes 
it incredibly slow, even though I don't see any exceptions in the log (in DEBUG 
mode).

Does anybody know what might cause this problem?

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1629) Add option to start Flink on YARN in a detached mode

2015-03-02 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-1629:
-

 Summary: Add option to start Flink on YARN in a detached mode
 Key: FLINK-1629
 URL: https://issues.apache.org/jira/browse/FLINK-1629
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger


Right now, we expect the YARN command line interface to be connected with the 
Application Master all the time to control the yarn session or the job.

For very long running sessions or jobs users want to just fire and forget a 
job/session to YARN.
Stopping the session will still be possible using YARN's tools.

Also, prior to detaching itself, the CLI frontend could print the required 
command to kill the session as a convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1630) Add option to YARN client to re-allocate failed containers

2015-03-02 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-1630:
-

 Summary: Add option to YARN client to re-allocate failed containers
 Key: FLINK-1630
 URL: https://issues.apache.org/jira/browse/FLINK-1630
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger


The current Flink YARN client tries to allocate only the initial number of 
containers.
If a containers fail (in particular for long-running sessions) there is no way 
of re-allocating them.

We should add a option to the ApplicationMaster to re-allocate missing/failed 
containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Offer Flink with Scala 2.11

2015-03-02 Thread Robert Metzger

I'm +1 if this doesn't affect existing Scala 2.10 users.

I would also suggest to add a scala 2.11 build to travis as well to ensure
everything is working with the different Hadoop/JVM versions.
It shouldn't be a big deal to offer scala_version x hadoop_version builds
for newer releases.
You only need to add more builds here:
https://github.com/apache/flink/blob/master/tools/create_release_files.sh#L131



On Mon, Mar 2, 2015 at 6:17 PM, Till Rohrmann trohrm...@apache.org wrote:

 +1 for Scala 2.11

 On Mon, Mar 2, 2015 at 5:02 PM, Alexander Alexandrov 
 alexander.s.alexand...@gmail.com wrote:

  Spark currently only provides pre-builds for 2.10 and requires custom
 build
  for 2.11.
 
  Not sure whether this is the best idea, but I can see the benefits from a
  project management point of view...
 
  Would you prefer to have a {scala_version} × {hadoop_version} integrated
 on
  the website?
 
  2015-03-02 16:57 GMT+01:00 Aljoscha Krettek aljos...@apache.org:
 
   +1 I also like it. We just have to figure out how we can publish two
   sets of release artifacts.
  
   On Mon, Mar 2, 2015 at 4:48 PM, Stephan Ewen se...@apache.org wrote:
Big +1 from my side!
   
Does it have to be a Maven profile, or does a maven property work?
   (Profile
may be needed for quasiquotes dependency?)
   
On Mon, Mar 2, 2015 at 4:36 PM, Alexander Alexandrov 
alexander.s.alexand...@gmail.com wrote:
   
Hi there,
   
since I'm relying on Scala 2.11.4 on a project I've been working
 on, I
created a branch which updates the Scala version used by Flink from
   2.10.4
to 2.11.4:
   
https://github.com/stratosphere/flink/commits/scala_2.11
   
Everything seems to work fine and the PR contains minor changes
   compared to
Spark:
   
https://issues.apache.org/jira/browse/SPARK-4466
   
If you're interested, I can rewrite this as a Maven Profile and
 open a
   PR
so people can build Flink with 2.11 support.
   
I suggest to do this sooner rather than later in order to
   
* the number of code changes enforced by migration small and
  tractable;
* discourage the use of deprecated or 2.11-incompatible source code
 in
future commits;
   
Regards,
A.

February 2015 in the Flink community

2015-03-02 Thread Kostas Tzoumas

Hi everyone

February might be the shortest month of the year, but the community has
been pretty busy:

- Flink 0.8.1, a bugfix release has been made available

- The project added a new committer

- Flink contributors developed a Flink adapter for Apache SAMOA

- Flink committers contributed to Google's bdutil. Starting from release
1.2, users of bdutil can deploy Flink clusters on Google Cloud Platform

- Flink was mentioned on several articles online, and many large features
have been merged to the master repository.

You can read the full blog post here:
http://flink.apache.org/news/2015/03/02/february-2015-in-flink.html

[jira] [Created] (FLINK-1631) Port collisions in ProcessReaping tests

2015-03-02 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-1631:
---

 Summary: Port collisions in ProcessReaping tests
 Key: FLINK-1631
 URL: https://issues.apache.org/jira/browse/FLINK-1631
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The process reaping tests for the JobManager spawn a process that starts a 
webserver on the default port. It may happen that this port is not available, 
due to another concurrently running task.

I suggest to add an option to not start the webserver to prevent this, by 
setting the webserver port to {{-1}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

37 matches

Mail list logo