[jira] [Resolved] (FLINK-1385) Add option to YARN client to disable resource availability checks

2015-01-23 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1385.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/8f8efe2f

 Add option to YARN client to disable resource availability checks
 -

 Key: FLINK-1385
 URL: https://issues.apache.org/jira/browse/FLINK-1385
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.8, 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 The YARN client checks which resources are available in the cluster at the 
 time of the initial deployment.
 If somebody wants to deploy a Flink YARN session to a full YARN cluster, the 
 Flink Client will reject the session.
 I'm going to add an option which disables the check. The YARN session will 
 then allocate new containers as they become available in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming

2015-01-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289486#comment-14289486
 ] 

Gianmarco De Francisci Morales commented on FLINK-1421:
---

Indeed, having some form of feedback between Processors is quite common is many 
algorithms, unless they are embarrassingly parallel (e.g., ensemble methods 
such as Bagging).
We are also trying to simplify the API and remove a lot of legacy code. 
Hopefully, you'll find it easier to test new datasets from real streams very 
soon.

 Implement a SAMOA Adapter for Flink Streaming
 -

 Key: FLINK-1421
 URL: https://issues.apache.org/jira/browse/FLINK-1421
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Paris Carbone
Assignee: Paris Carbone
   Original Estimate: 336h
  Remaining Estimate: 336h

 Yahoo's Samoa is an experimental incremental machine learning library that 
 builds on an abstract compositional data streaming model to write streaming 
 algorithms. The task is to provide an adapter from SAMOA topologies to 
 Flink-streaming job graphs in order to support Flink as a backend engine for 
 SAMOA tasks.
 A statup guide can be viewed here :
 https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub
 The main working branch of the adapter :
 https://github.com/senorcarbone/samoa/tree/flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1297) Add support for tracking statistics of intermediate results

2015-01-23 Thread Fridtjof Sander (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289511#comment-14289511
 ] 

Fridtjof Sander commented on FLINK-1297:


I started to prototype the design for this and pushed my first early result 
into the FLINK-1297 branch in our fork:
https://github.com/stratosphere/flink/tree/FLINK-1297

 Add support for tracking statistics of intermediate results
 ---

 Key: FLINK-1297
 URL: https://issues.apache.org/jira/browse/FLINK-1297
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
 Fix For: 0.9

   Original Estimate: 1,008h
  Remaining Estimate: 1,008h

 One of the major problems related to the optimizer at the moment is the lack 
 of proper statistics.
 With the introduction of staged execution, it is possible to instrument the 
 runtime code with a statistics facility that collects the required 
 information for optimizing the next execution stage.
 I would therefore like to contribute code that can be used to gather basic 
 statistics for the (intermediate) result of dataflows (e.g. min, max, count, 
 count distinct) and make them available to the job manager.
 Before I start, I would like to hear some feedback form the other users.
 In particular, to handle skew (e.g. on grouping) it might be good to have 
 some sort of detailed sketch about the key distribution of an intermediate 
 result. I am not sure whether a simple histogram is the most effective way to 
 go. Maybe somebody would propose another lightweight sketch that provides 
 better accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1295) Add option to Flink client to start a YARN session per job

2015-01-23 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1295.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/8f8efe2f

 Add option to Flink client to start a YARN session per job
 --

 Key: FLINK-1295
 URL: https://issues.apache.org/jira/browse/FLINK-1295
 Project: Flink
  Issue Type: New Feature
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


 Currently, Flink users can only launch Flink on YARN as a YARN session 
 (meaning a long-running YARN application that can run multiple Flink jobs)
 Users have requested to extend the Flink Client to allocate YARN containers 
 only for executing a single job.
 As part of this pull request, I would suggest to refactor the YARN Client to 
 make it more modular and object oriented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289564#comment-14289564
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/326#issuecomment-71230743
  
Ah, sorry, I forgot that, let me do that in a few hours. I am sort of 
having fun with git right now :-)

FYI: What I basically used is the git filter-branch command, using a tree 
filter and a `mv src/* flink-addons/flink-gelly/src` filter command. Some 
commits before and after to make sure the directories exist and then, after the 
relocation, removing them from the history.



 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289834#comment-14289834
 ] 

ASF GitHub Bot commented on FLINK-1381:
---

GitHub user qmlmoon opened a pull request:

https://github.com/apache/flink/pull/332

[FLINK-1381] Allow multiple output splitters for single stream operator



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/qmlmoon/incubator-flink stream-split

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #332


commit cff3eb0747f17185c4d71cf5117696d28f879964
Author: mingliang qmlm...@gmail.com
Date:   2015-01-23T09:59:02Z

[FLINK-1381] Allow multiple output splitters for single stream operator




 Only one output splitter supported per operator
 ---

 Key: FLINK-1381
 URL: https://issues.apache.org/jira/browse/FLINK-1381
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Gyula Fora
Priority: Minor

 Currently the streaming api only supports output splitting once per operator. 
 The splitting logic should be reworked to allow any number of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-629) Add support for null values to the java api

2015-01-23 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289753#comment-14289753
 ] 

mustafa elbehery commented on FLINK-629:


[~rmetzger] I need this fix into release-0.8 maven repo, I have cherry picked 
the fix into 0.8 branch, would I push it and you merge it ?

 Add support for null values to the java api
 ---

 Key: FLINK-629
 URL: https://issues.apache.org/jira/browse/FLINK-629
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Stephan Ewen
Assignee: Gyula Fora
Priority: Critical
  Labels: github-import
 Fix For: pre-apache

 Attachments: Selection_006.png, SimpleTweetInputFormat.java, 
 Tweet.java, model.tar.gz


 Currently, many runtime operations fail when encountering a null value. Tuple 
 serialization should allow null fields.
 I suggest to add a method to the tuples called `getFieldNotNull()` which 
 throws a meaningful exception when the accessed field is null. That way, we 
 simplify the logic of operators that should not dead with null fields, like 
 key grouping or aggregations.
 Even though SQL allows grouping and aggregating of null values, I suggest to 
 exclude this from the java api, because the SQL semantics of aggregating null 
 fields are messy.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/629
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: enhancement, java api, 
 Milestone: Release 0.5.1
 Created at: Wed Mar 26 00:27:49 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1342) Quickstart's assembly can possibly filter out user's code

2015-01-23 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289913#comment-14289913
 ] 

Fabian Hueske commented on FLINK-1342:
--

This is also true for all dependencies which are from the o.a.f namespace.
For example flink-hadoop-compatibility is not added to ./lib and not in Flink's 
class path by default.
Programs which depend on the hadoop-compat module and which are built with the 
quick start Maven config won't run unless the hadoop-compat jar is manually 
added to ./lib.

I'd suggest to only filter those dependencies which are present in ./lib even 
though this might end up in a rather bulky blacklist. 

 Quickstart's assembly can possibly filter out user's code
 -

 Key: FLINK-1342
 URL: https://issues.apache.org/jira/browse/FLINK-1342
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Márton Balassi
 Fix For: 0.9, 0.8.1


 I've added a quick solution for [1] for the time being. The assembly still 
 filters out everything from the org.apache.flink namespace, so any user code 
 placed there will be missing from the fat jar.
 If we do not use filtering at all the size of the jar goes up to almost 100 
 MB.
 [1] https://issues.apache.org/jira/browse/FLINK-1225 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

2015-01-23 Thread cebe
Github user cebe commented on the pull request:

https://github.com/apache/flink/pull/326#issuecomment-71268972
  
that really sounds like great fun :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

2015-01-23 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/328#discussion_r23485615
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -175,62 +175,79 @@ import scala.collection.JavaConverters._
   }
 
   private def tryJobManagerRegistration(): Unit = {
-registrationAttempts = 0
-import context.dispatcher
-registrationScheduler = Some(context.system.scheduler.schedule(
-  TaskManager.REGISTRATION_DELAY, TaskManager.REGISTRATION_INTERVAL,
-  self, RegisterAtJobManager))
+registrationDuration = 0 seconds
+
+registered = false
+
+context.system.scheduler.scheduleOnce(registrationDelay, self, 
RegisterAtJobManager)
   }
 
   override def receiveWithLogMessages: Receive = {
 case RegisterAtJobManager = {
-  registrationAttempts += 1
+  if(!registered) {
+registrationDuration += registrationDelay
+// double delay for exponential backoff
+registrationDelay *= 2
 
-  if (registered) {
-registrationScheduler.foreach(_.cancel())
-  }
-  else if (registrationAttempts = 
TaskManager.MAX_REGISTRATION_ATTEMPTS) {
+if (registrationDuration  maxRegistrationDuration) {
+  log.warning(TaskManager could not register at JobManager {} 
after {}., jobManagerAkkaURL,
 
-log.info(Try to register at master {}. Attempt #{}, 
jobManagerAkkaURL,
-  registrationAttempts)
-val jobManager = context.actorSelection(jobManagerAkkaURL)
+maxRegistrationDuration)
 
-jobManager ! RegisterTaskManager(connectionInfo, 
hardwareDescription, numberOfSlots)
-  }
-  else {
-log.error(TaskManager could not register at JobManager.);
-self ! PoisonPill
+  self ! PoisonPill
+} else if (!registered) {
+  log.info(sTry to register at master ${jobManagerAkkaURL}. 
${registrationAttempts}.  +
+sAttempt)
+  val jobManager = context.actorSelection(jobManagerAkkaURL)
+
+  jobManager ! RegisterTaskManager(connectionInfo, 
hardwareDescription, numberOfSlots)
+
+  context.system.scheduler.scheduleOnce(registrationDelay, self, 
RegisterAtJobManager)
+}
   }
 }
 
 case AcknowledgeRegistration(id, blobPort) = {
-  if (!registered) {
+  if(!registered) {
+finishRegistration(id, blobPort)
 registered = true
-currentJobManager = sender
-instanceID = id
-
-context.watch(currentJobManager)
-
-log.info(TaskManager successfully registered at JobManager {}.,
-  currentJobManager.path.toString)
-
-setupNetworkEnvironment()
-setupLibraryCacheManager(blobPort)
+  } else {
+if (log.isDebugEnabled) {
+  log.debug(The TaskManager {} is already registered at the 
JobManager {}, but received  +
+another AcknowledgeRegistration message., self.path, 
currentJobManager.path)
+}
+  }
+}
 
-heartbeatScheduler = Some(context.system.scheduler.schedule(
-  TaskManager.HEARTBEAT_INTERVAL, TaskManager.HEARTBEAT_INTERVAL, 
self, SendHeartbeat))
+case AlreadyRegistered(id, blobPort) =
+  if(!registered) {
+log.warning(The TaskManager {} seems to be already registered at 
the JobManager {} even +
+  though it has not yet finished the registration process., 
self.path, sender.path)
 
-profiler foreach {
-  _.tell(RegisterProfilingListener, 
JobManager.getProfiler(currentJobManager))
+finishRegistration(id, blobPort)
+registered = true
+  } else {
+// ignore AlreadyRegistered messages which arrived after 
AcknowledgeRegistration
+if(log.isDebugEnabled){
+  log.debug(The TaskManager {} has already been registered at the 
JobManager {}.,
+self.path, sender.path)
 }
+  }
 
-for (listener - waitForRegistration) {
-  listener ! RegisteredAtJobManager
-}
+case RefuseRegistration(reason) =
+  if(!registered) {
+log.error(The registration of task manager {} was refused by the 
job manager {}  +
+  because {}., self.path, jobManagerAkkaURL, reason)
 
-waitForRegistration.clear()
+// Shut task manager down
+self ! PoisonPill
+  } else {
+// ignore RefuseRegistration messages which arrived after 
AcknowledgeRegistration
+if(log.isDebugEnabled) {
--- End diff --

This is 

[jira] [Resolved] (FLINK-1406) Windows compatibility

2015-01-23 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1406.
--
   Resolution: Fixed
Fix Version/s: 0.9

Fixed in 96585f149c8d100f3d8fce2bb737743695f20c12

 Windows compatibility
 -

 Key: FLINK-1406
 URL: https://issues.apache.org/jira/browse/FLINK-1406
 Project: Flink
  Issue Type: Improvement
Reporter: Max Michels
Priority: Minor
 Fix For: 0.9

 Attachments: flink_1406.patch


 The documentation [1] states: Flink runs on all UNIX-like environments: 
 Linux, Mac OS X, Cygwin. The only requirement is to have a working Java 6.x 
 (or higher) installation.
 I just found out Flink runs also natively on Windows. Do we want to support 
 Windows? If so, we should update the documentation. Clearly, we don't support 
 it at the moment for development. At multiple places, the tests contain 
 references to /tmp or /dev/random/ which are not Windows compatible.
 Probably it's enough to update the documentation stating that Flink runs on 
 Windows but cannot be build or developed on Windows without Cygwin.
 [1] http://flink.apache.org/docs/0.7-incubating/setup_quickstart.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1344) Add implicit conversion from scala streams to DataStreams

2015-01-23 Thread Paris Carbone (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paris Carbone updated FLINK-1344:
-
Description: 
Source definitions in the scala-api pass a collector to the UDF, thus enforcing 
an imperative style for defining custom streams. To encourage a purely 
functional coding style in the streaming scala-api while also adding some 
interoperability with scala constructs (ie. Seq and Streams) it would be nice 
to add an implicit conversion from Seq[T] to DataStream[T]. 

(An upcoming idea would be for sinks to also support wrapping up flink streams 
to scala streams for full interoperability with scala streaming code.)

  was:
Source definitions in the scala-api pass a collector to the UDF, thus enforcing 
an imperative style for defining custom streams. In order maintain a purely 
functional coding style in the streaming scala-api while also adding some 
interoperability with scala constructs (ie. Seq and Streams) 

[UPDATE] Since there is already a fromCollection(Seq) function we can simply 
add an implicit conversion from Seq[T] to DataStream[T]. 

(An additional idea would be for sinks could also support wrapping up flink 
streams to scala streams for full interoperability with scala streaming code.)


 Add implicit conversion from scala streams to DataStreams
 -

 Key: FLINK-1344
 URL: https://issues.apache.org/jira/browse/FLINK-1344
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Paris Carbone
Assignee: Paris Carbone
Priority: Trivial

 Source definitions in the scala-api pass a collector to the UDF, thus 
 enforcing an imperative style for defining custom streams. To encourage a 
 purely functional coding style in the streaming scala-api while also adding 
 some interoperability with scala constructs (ie. Seq and Streams) it would be 
 nice to add an implicit conversion from Seq[T] to DataStream[T]. 
 (An upcoming idea would be for sinks to also support wrapping up flink 
 streams to scala streams for full interoperability with scala streaming code.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

2015-01-23 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/328#discussion_r23485570
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -175,62 +175,79 @@ import scala.collection.JavaConverters._
   }
 
   private def tryJobManagerRegistration(): Unit = {
-registrationAttempts = 0
-import context.dispatcher
-registrationScheduler = Some(context.system.scheduler.schedule(
-  TaskManager.REGISTRATION_DELAY, TaskManager.REGISTRATION_INTERVAL,
-  self, RegisterAtJobManager))
+registrationDuration = 0 seconds
+
+registered = false
+
+context.system.scheduler.scheduleOnce(registrationDelay, self, 
RegisterAtJobManager)
   }
 
   override def receiveWithLogMessages: Receive = {
 case RegisterAtJobManager = {
-  registrationAttempts += 1
+  if(!registered) {
+registrationDuration += registrationDelay
+// double delay for exponential backoff
+registrationDelay *= 2
 
-  if (registered) {
-registrationScheduler.foreach(_.cancel())
-  }
-  else if (registrationAttempts = 
TaskManager.MAX_REGISTRATION_ATTEMPTS) {
+if (registrationDuration  maxRegistrationDuration) {
+  log.warning(TaskManager could not register at JobManager {} 
after {}., jobManagerAkkaURL,
 
-log.info(Try to register at master {}. Attempt #{}, 
jobManagerAkkaURL,
-  registrationAttempts)
-val jobManager = context.actorSelection(jobManagerAkkaURL)
+maxRegistrationDuration)
 
-jobManager ! RegisterTaskManager(connectionInfo, 
hardwareDescription, numberOfSlots)
-  }
-  else {
-log.error(TaskManager could not register at JobManager.);
-self ! PoisonPill
+  self ! PoisonPill
+} else if (!registered) {
+  log.info(sTry to register at master ${jobManagerAkkaURL}. 
${registrationAttempts}.  +
+sAttempt)
+  val jobManager = context.actorSelection(jobManagerAkkaURL)
+
+  jobManager ! RegisterTaskManager(connectionInfo, 
hardwareDescription, numberOfSlots)
+
+  context.system.scheduler.scheduleOnce(registrationDelay, self, 
RegisterAtJobManager)
+}
   }
 }
 
 case AcknowledgeRegistration(id, blobPort) = {
-  if (!registered) {
+  if(!registered) {
+finishRegistration(id, blobPort)
 registered = true
-currentJobManager = sender
-instanceID = id
-
-context.watch(currentJobManager)
-
-log.info(TaskManager successfully registered at JobManager {}.,
-  currentJobManager.path.toString)
-
-setupNetworkEnvironment()
-setupLibraryCacheManager(blobPort)
+  } else {
+if (log.isDebugEnabled) {
--- End diff --

Small nit, with slf4j formatting we do not need to check isDebugEnabled 
anymore because it uses parameterized messages feature that check for it before 
materialize the string. 
It will the keep the code cleaner =)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1344) Add implicit conversion from scala streams to DataStreams

2015-01-23 Thread Paris Carbone (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paris Carbone updated FLINK-1344:
-
Summary: Add implicit conversion from scala streams to DataStreams  (was: 
Add implicit conversions from scala streams to DataStreams)

 Add implicit conversion from scala streams to DataStreams
 -

 Key: FLINK-1344
 URL: https://issues.apache.org/jira/browse/FLINK-1344
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Paris Carbone
Assignee: Paris Carbone
Priority: Trivial

 Source definitions in the scala-api pass a collector to the UDF, thus 
 enforcing an imperative style for defining custom streams. In order maintain 
 a purely functional coding style in the streaming scala-api while also adding 
 some interoperability with scala constructs (ie. Seq and Streams) 
 [UPDATE] Since there is already a fromCollection(Seq) function we can simply 
 add an implicit conversion from Seq[T] to DataStream[T]. 
 (An additional idea would be for sinks could also support wrapping up flink 
 streams to scala streams for full interoperability with scala streaming code.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1438) ClassCastException for Custom InputSplit in local mode

2015-01-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1438:


 Summary: ClassCastException for Custom InputSplit in local mode
 Key: FLINK-1438
 URL: https://issues.apache.org/jira/browse/FLINK-1438
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.8
Reporter: Fabian Hueske
Priority: Minor


Jobs with custom InputSplits fail with a ClassCastException such as 
{{org.apache.flink.examples.java.misc.CustomSplitTestJob$TestFileInputSplit 
cannot be cast to 
org.apache.flink.examples.java.misc.CustomSplitTestJob$TestFileInputSplit}} if 
executed on a local setup. 

This issue is probably related to different ClassLoaders used by the JobManager 
when InputSplits are generated and when they are handed to the InputFormat by 
the TaskManager. Moving the class of the custom InputSplit into the {{./lib}} 
folder and removing it from the job's makes the job work.

To reproduce the bug, run the following job on a local setup. 

{code}
public class CustomSplitTestJob {

public static void main(String[] args) throws Exception {

ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();

DataSetString x = env.createInput(new TestFileInputFormat());
x.print();

env.execute();
}

public static class TestFileInputFormat implements 
InputFormatString,TestFileInputSplit {

@Override
public void configure(Configuration parameters) {

}

@Override
public BaseStatistics getStatistics(BaseStatistics 
cachedStatistics) throws IOException {
return null;
}

@Override
public TestFileInputSplit[] createInputSplits(int minNumSplits) 
throws IOException {
return new TestFileInputSplit[]{new 
TestFileInputSplit()};
}

@Override
public InputSplitAssigner 
getInputSplitAssigner(TestFileInputSplit[] inputSplits) {
return new LocatableInputSplitAssigner(inputSplits);
}

@Override
public void open(TestFileInputSplit split) throws IOException {

}

@Override
public boolean reachedEnd() throws IOException {
return false;
}

@Override
public String nextRecord(String reuse) throws IOException {
return null;
}

@Override
public void close() throws IOException {

}
}

public static class TestFileInputSplit extends FileInputSplit {

}

}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...

2015-01-23 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/264#issuecomment-71285644
  
That's a good point. I copied the code from the DelimitedInputFormat which 
allows to specify the charset for the record delimiter. 
So if we go with
1. we should also fix the DelimitedIF to UTF-8.
2. we need to extend the StringValue which does not support charsets.

1) would be the easy fix. Charset support could be added later. Any 
objections with going for 1)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290218#comment-14290218
 ] 

ASF GitHub Bot commented on FLINK-1168:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/264#issuecomment-71285644
  
That's a good point. I copied the code from the DelimitedInputFormat which 
allows to specify the charset for the record delimiter. 
So if we go with
1. we should also fix the DelimitedIF to UTF-8.
2. we need to extend the StringValue which does not support charsets.

1) would be the easy fix. Charset support could be added later. Any 
objections with going for 1)?



 Support multi-character field delimiters in CSVInputFormats
 ---

 Key: FLINK-1168
 URL: https://issues.apache.org/jira/browse/FLINK-1168
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Assignee: Manu Kaul
Priority: Minor
  Labels: starter

 The CSVInputFormat supports multi-char (String) line delimiters, but only 
 single-char (char) field delimiters.
 This issue proposes to add support for multi-char field delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1440) Missing plan visualizer image for http://flink.apache.org/docs/0.8/programming_guide.html page

2015-01-23 Thread Henry Saputra (JIRA)
Henry Saputra created FLINK-1440:


 Summary: Missing plan visualizer image for 
http://flink.apache.org/docs/0.8/programming_guide.html page
 Key: FLINK-1440
 URL: https://issues.apache.org/jira/browse/FLINK-1440
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Henry Saputra


In the http://flink.apache.org/docs/0.8/programming_guide.html page looks like 
we are missing http://flink.apache.org/docs/0.8/img/plan_visualizer2.png file.

I could not find it anywhere in the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...

2015-01-23 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/325#issuecomment-71170205
  
I'll merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289029#comment-14289029
 ] 

ASF GitHub Bot commented on FLINK-1353:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/325#issuecomment-71170205
  
I'll merge it.


 Execution always uses DefaultAkkaAskTimeout, rather than the configured value
 -

 Key: FLINK-1353
 URL: https://issues.apache.org/jira/browse/FLINK-1353
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1436) Command-line interface verbose option (-v)

2015-01-23 Thread Max Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Michels reassigned FLINK-1436:
--

Assignee: Max Michels

 Command-line interface verbose option (-v)
 --

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Assignee: Max Michels
Priority: Trivial
  Labels: starter, usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289043#comment-14289043
 ] 

ASF GitHub Bot commented on FLINK-1353:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/325


 Execution always uses DefaultAkkaAskTimeout, rather than the configured value
 -

 Key: FLINK-1353
 URL: https://issues.apache.org/jira/browse/FLINK-1353
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value

2015-01-23 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-1353.

Resolution: Fixed

Fixed with a5702b69ed7794abdd44035581077b09fc551450.

 Execution always uses DefaultAkkaAskTimeout, rather than the configured value
 -

 Key: FLINK-1353
 URL: https://issues.apache.org/jira/browse/FLINK-1353
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...

2015-01-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/325


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289085#comment-14289085
 ] 

ASF GitHub Bot commented on FLINK-1352:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71177177
  
Yeah you're right that the exponential backoff was the default behaviour 
before. I think Stephan's proposal is the best solution. I'll implement it and 
also change the Akka messages so that in the future TaskManager can be refused 
by the JobManager, if needed. 


 Buggy registration from TaskManager to JobManager
 -

 Key: FLINK-1352
 URL: https://issues.apache.org/jira/browse/FLINK-1352
 Project: Flink
  Issue Type: Bug
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9


 The JobManager's InstanceManager may refuse the registration attempt from a 
 TaskManager, because it has this taskmanager already connected, or,in the 
 future, because the TaskManager has been blacklisted as unreliable.
 Unpon refused registration, the instance ID is null, to signal that refused 
 registration. TaskManager reacts incorrectly to such methods, assuming 
 successful registration
 Possible solution: JobManager sends back a dedicated RegistrationRefused 
 message, if the instance manager returns null as the registration result. If 
 the TastManager receives that before being registered, it knows that the 
 registration response was lost (which should not happen on TCP and it would 
 indicate a corrupt connection)
 Followup question: Does it make sense to have the TaskManager trying 
 indefinitely to connect to the JobManager. With increasing interval (from 
 seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...

2015-01-23 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71178244
  
Hadoops `FileOutputFormat` has the following method
```java
 public static String getUniqueName(JobConf conf, String name) {
int partition = conf.getInt(JobContext.TASK_PARTITION, -1);
if (partition == -1) {
  throw new IllegalArgumentException(
This method can only be called from within a Job);
}

String taskType = (conf.getBoolean(JobContext.TASK_ISMAP, true)) ? m 
: r;

NumberFormat numberFormat = NumberFormat.getInstance();
numberFormat.setMinimumIntegerDigits(5);
numberFormat.setGroupingUsed(false);

return name + - + taskType + - + numberFormat.format(partition);
  }
```
So users have to overwrite the method if they want to use a custom filename.

I'll remove the configuration-based approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1425) Turn lazy operator execution off for streaming programs

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289273#comment-14289273
 ] 

ASF GitHub Bot commented on FLINK-1425:
---

GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/330

[FLINK-1425] [streaming] Add scheduling of all tasks at once

See corresponding 
[discussion](http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201501.mbox/%3CCA%2Bfaj9yHzsNDW-6hUVdoL_OY0EoVMRrEO-6fWjyD9A_oJsxvnA%40mail.gmail.com%3E)
 on the mailing list.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/incubator-flink schedule-all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #330


commit 1f82e0fe8eed1947f95aa2e5d276a1432171a187
Author: Ufuk Celebi u...@apache.org
Date:   2015-01-23T10:30:30Z

[FLINK-1425] [streaming] Add scheduling of all tasks at once




 Turn lazy operator execution off for streaming programs
 ---

 Key: FLINK-1425
 URL: https://issues.apache.org/jira/browse/FLINK-1425
 Project: Flink
  Issue Type: Task
  Components: Streaming
Affects Versions: 0.9
Reporter: Gyula Fora
Assignee: Ufuk Celebi

 Streaming programs currently use the same lazy operator execution model as 
 batch programs. This makes the functionality of some operators like time 
 based windowing very awkward, since they start computing windows based on the 
 start of the operator.
 Also, one should expect for streaming programs to run continuously so there 
 is not much to gain from lazy execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1436) Command-line interface verbose option error reporting

2015-01-23 Thread Max Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Michels updated FLINK-1436:
---
Summary: Command-line interface verbose option  error reporting  (was: 
Command-line interface verbose option (-v))

 Command-line interface verbose option  error reporting
 ---

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Assignee: Max Michels
Priority: Trivial
  Labels: starter, usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)

2015-01-23 Thread Max Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289416#comment-14289416
 ] 

Max Michels commented on FLINK-1436:


https://github.com/apache/flink/pull/331

 Command-line interface verbose option (-v)
 --

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Assignee: Max Michels
Priority: Trivial
  Labels: starter, usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming

2015-01-23 Thread Paris Carbone (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289468#comment-14289468
 ] 

Paris Carbone edited comment on FLINK-1421 at 1/23/15 4:29 PM:
---

 :)
Samoa is a really interesting project, we can currently create flink 
datastreams from samoa topologies, just in cases where there are no cyclic 
dependencies. Apparently, cycles are quite common in Samoa tasks so we will 
support this very soon through Flink iterations. I bet the performance will 
also be comparatively good as well. I am really curious to run some 
cross-platform benchmarks as well.


was (Author: senorcarbone):
 :)
Samoa is a really interesting project, we can currently create flink 
datastreams from samoa topologies, just in cases where there are no cyclic 
dependencies. Apparently, cycles are quite common in Samoa tasks so we will 
support this very soon through Flink iterations.

 Implement a SAMOA Adapter for Flink Streaming
 -

 Key: FLINK-1421
 URL: https://issues.apache.org/jira/browse/FLINK-1421
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Paris Carbone
Assignee: Paris Carbone
   Original Estimate: 336h
  Remaining Estimate: 336h

 Yahoo's Samoa is an experimental incremental machine learning library that 
 builds on an abstract compositional data streaming model to write streaming 
 algorithms. The task is to provide an adapter from SAMOA topologies to 
 Flink-streaming job graphs in order to support Flink as a backend engine for 
 SAMOA tasks.
 A statup guide can be viewed here :
 https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub
 The main working branch of the adapter :
 https://github.com/senorcarbone/samoa/tree/flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

2015-01-23 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/326#issuecomment-71204417
  
Yes. The package seems to be still wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts

2015-01-23 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1433:
--
Fix Version/s: 0.8.1

 Add HADOOP_CLASSPATH to start scripts
 -

 Key: FLINK-1433
 URL: https://issues.apache.org/jira/browse/FLINK-1433
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger
 Fix For: 0.8.1


 With the Hadoop file system wrapper, its important to have access to the 
 hadoop filesystem classes.
 The HADOOP_CLASSPATH seems to be a standard environment variable used by 
 Hadoop for such libraries.
 Deployments like Google Compute Cloud set this variable containing the 
 Google Cloud Storage Hadoop Wrapper. So if users want to use the Cloud 
 Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289366#comment-14289366
 ] 

ASF GitHub Bot commented on FLINK-1389:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-71208053
  
I've updated the pull request. Please review again, I would like to merge 
this soon.


 Allow setting custom file extensions for files created by the FileOutputFormat
 --

 Key: FLINK-1389
 URL: https://issues.apache.org/jira/browse/FLINK-1389
 Project: Flink
  Issue Type: New Feature
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor

 A user requested the ability to name avro files with the avro extension.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1427) Configuration through environment variables

2015-01-23 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289427#comment-14289427
 ] 

Ufuk Celebi commented on FLINK-1427:


You could easily check whether the variable is already set and print a warning. 
It's just a matter of what we want to give precedence to.

I agree that it is error prone to configure the same thing in different places 
(although I think that Alexander was referring to something else).

Since no user really complained so far, we should stick to your non intrusive 
proposal and essentially keep everything as is.

 Configuration through environment variables
 ---

 Key: FLINK-1427
 URL: https://issues.apache.org/jira/browse/FLINK-1427
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
 Environment: Deployment
Reporter: Max Michels
Priority: Minor
  Labels: configuration, deployment

 Like Hadoop or Spark, etc. Flink should support configuration via shell 
 environment variables. In cluster setups, this makes things a lot easier 
 because writing config files can be omitted. Many automation tools (e.g. 
 Google's bdutil) use (or abuse) this feature.
 For example, to set up the task manager heap size, we would run `export 
 FLINK_TASKMANAGER_HEAP=4096` before starting the task manager on a node to 
 set the heap memory size to 4096MB.
 Environment variables should overwrite the regular config entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming

2015-01-23 Thread Paris Carbone (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289468#comment-14289468
 ] 

Paris Carbone commented on FLINK-1421:
--

 :)
Samoa is a really interesting project, we can currently create flink 
datastreams from samoa topologies, just in cases where there are no cyclic 
dependencies. Apparently, cycles are quite common in Samoa tasks so we will 
support this very soon through Flink iterations.

 Implement a SAMOA Adapter for Flink Streaming
 -

 Key: FLINK-1421
 URL: https://issues.apache.org/jira/browse/FLINK-1421
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Paris Carbone
Assignee: Paris Carbone
   Original Estimate: 336h
  Remaining Estimate: 336h

 Yahoo's Samoa is an experimental incremental machine learning library that 
 builds on an abstract compositional data streaming model to write streaming 
 algorithms. The task is to provide an adapter from SAMOA topologies to 
 Flink-streaming job graphs in order to support Flink as a backend engine for 
 SAMOA tasks.
 A statup guide can be viewed here :
 https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub
 The main working branch of the adapter :
 https://github.com/senorcarbone/samoa/tree/flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-23 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/331#discussion_r23455785
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -960,14 +960,15 @@ public int parseParameters(String[] args) {

// check for action
if (args.length  1) {
-   System.out.println(Please specify an action.);
printHelp();
+   System.out.println();
+   System.out.println(Please specify an action.);
--- End diff --

instead of adding an additional call each time for the new-line, why aren't 
you pre-pending a '\n' to the string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-23 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/331#discussion_r23455746
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -945,13 +948,10 @@ private int handleError(Throwable t) {
private void evaluateGeneralOptions(CommandLine line) {
// check help flag
this.printHelp = line.hasOption(HELP_OPTION.getOpt());
-   
-   // check verbosity flag
-   this.verbose = line.hasOption(VERBOSE_OPTION.getOpt());
}

/**
-* Parses the command line arguments and starts the requested action.
+* Parses the command lin e arguments and starts the requested action.
--- End diff --

typo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-23 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/331

[Flink-1436]  refactor CLiFrontend to provide more identifiable and 
meaningful error messages



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink flink-1436

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #331


commit 1bb326c347ce26b0458a4ed437e418a227820ed4
Author: Max m...@posteo.de
Date:   2015-01-23T13:05:31Z

[FLINK-1436] more meaningful error messages

commit 5daf545384b1a2dfbed5273c691804ed45e211c1
Author: Max m...@posteo.de
Date:   2015-01-23T13:05:55Z

[FLINK-1436] remove verbose flag and default to verbose output

commit 4cc1f691df57a0f411419283522952ba13a4d713
Author: Max m...@posteo.de
Date:   2015-01-23T14:31:07Z

[FLINK-1436] rework error message logic using checked exceptions

in particular, error messages are printed after usage information which
provides information about the error immediately (no more scrolling)

commit ab0bfa0fbd854facfbee6f74a601be975ea268b3
Author: Max m...@posteo.de
Date:   2015-01-23T14:39:40Z

[FLINK-1436] adapt tests for the new error handling of CliFrontend




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289426#comment-14289426
 ] 

ASF GitHub Bot commented on FLINK-1436:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/331#issuecomment-71212582
  
Testing JIRA integration FLINK-1436

Lets see


 Command-line interface verbose option (-v)
 --

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Assignee: Max Michels
Priority: Trivial
  Labels: starter, usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This