[jira] [Created] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-03-22 Thread Suneel Marthi (JIRA)
Suneel Marthi created FLINK-3657:


 Summary: Change access of DataSetUtils.countElements() to 'public' 
 Key: FLINK-3657
 URL: https://issues.apache.org/jira/browse/FLINK-3657
 Project: Flink
  Issue Type: Improvement
  Components: DataSet API
Affects Versions: 1.0.0
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 1.0.1


The access of DatasetUtils.countElements() is presently 'private', change that 
to be 'public'. We happened to be replicating the functionality in our project 
and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Behavior of lib directory shipping on YARN

2016-03-22 Thread Ufuk Celebi
On Tue, Mar 22, 2016 at 8:42 PM, Stefano Baghino
 wrote:
> My feeling is that running a job on YARN should
> end up in having more or less the same effect, regardless of the way the
> job is run.

+1

I think that the current behaviour is buggy. The resource management
is currently undergoing a massive refactoring
(https://github.com/apache/flink/pull/1741). Maybe it's already fixed
there (if the issue is independent of the scripts).

Would be great to have a fix for this. If #1751 does not fix it, feel
free to open an issue and PR. :-)

– Ufuk


[jira] [Created] (FLINK-3656) Rework TableAPI tests

2016-03-22 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3656:


 Summary: Rework TableAPI tests
 Key: FLINK-3656
 URL: https://issues.apache.org/jira/browse/FLINK-3656
 Project: Flink
  Issue Type: Improvement
  Components: Table API
Reporter: Vasia Kalavri


We should look into whether we could rework the Table API tests to extract 
check of Table API parts that are common for DataSet and DataStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Behavior of lib directory shipping on YARN

2016-03-22 Thread Stefano Baghino
Hello everybody,

in the past few days me and my colleagues ran some tests with Flink on YARN
and detected a possible inconsistent behavior in the way the contents of
the flink/lib directory is shipped to the cluster when run on YARN,
depending on the fact that the jobs are deployed individually or onto a
long-running session.

After some discussion on the user mailing list

we were under the impression that the contents of that folder are always
supposed to be copied so that all the nodes have access to them.
Furthermore, we've found a comment in the code

that states:

// remove uberjar from ship list (by default everything in the lib/ folder
is added to
// the list of files to ship, but we handle the uberjar separately.

However, after having a look at some portions of the code, I'm not really
sure if this is actually the case or not. The Flink long-running YARN
session actually ships the contents because it's specified in the
yarn-session.sh script
,
however running a single job on YARN does not automatically ship the
contents of the lib folder.

The behavior is not documented an I'd like to write some lines in the docs
to make clear of what is shipped in which case. Also, if there is an
agreement on the behavior that the single jobs on YARN should have, I can
also provide a fix for it. My feeling is that running a job on YARN should
end up in having more or less the same effect, regardless of the way the
job is run.

Let me know what you think, thank you for your attention.

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: RollingSink

2016-03-22 Thread Vijay Srinivasaraghavan
I have tried both log4j logger as well as System.out.println option but none of 
these worked. 
>From what I have seen so far is the Filesystem streaming connector classes are 
>not packaged in the grand jar (flink-dist_2.10-1.1-SNAPSHOT.jar) that is 
>copied under /build-target/lib location as part of Flink maven 
>build step.
So, I manually copied (overwrite) the compiled class files from 
org.apache.flink.streaming.connectors.fs package to the my "Flink job" 
distribution jar (otherwise it was using standard jars that are defined as mvn 
dependency in Articatory) and then uploaded the jar to Job Manager.
Am I missing something? How do I enable logging for the RollingSink class?

   org.apache.flink
   flink-connector-filesystem_2.11
   ${flink.version}
   provided
 

On Tuesday, March 22, 2016 3:04 AM, Aljoscha Krettek  
wrote:
 

 Hi,
how are you printing the debug statements?

But yeah all the logic of renaming in progress files and cleaning up after a 
failed job happens in restoreState(BucketState state). The steps are roughly 
these:

1. Move current in-progress file to final location
2. truncate the file if necessary (if truncate is not available write a 
.valid-length file)
3. Move pending files to final location that where part of the checkpoint
4. cleanup any leftover pending/in-progress files

Cheers,
Aljoscha
> On 22 Mar 2016, at 10:08, Vijay Srinivasaraghavan 
>  wrote:
> 
> Hello,
> I have enabled checkpoint and I am using RollingSink to sink the data to HDFS 
> (2.7.x) from KafkaConsumer. To simulate failover/recovery, I stopped 
> TaskManager and the job gets rescheduled to other Taskmanager instance. 
> During this momemnt, the current "in-progress" gets closed and renamed to 
> part-0-1 from _part-0-1_in-progress. 
> I was hoping to see the debug statement that I have added to "restoreState" 
> method but none of my debug statement gets printed. I am not sure if the 
> restoreState() method gets invoked during this scenario. Could you please 
> help me understand the flow during "failover" scenario?
> P.S: Functionally the code appears to be working fine but I am trying to 
> understand the underlying implementation details. public void 
> restoreState(BucketState state)
> Regards
> Vijay


  

[jira] [Created] (FLINK-3655) Allow comma-separated multiple directories to be specified for FileInputFormat

2016-03-22 Thread Gna Phetsarath (JIRA)
Gna Phetsarath created FLINK-3655:
-

 Summary: Allow comma-separated multiple directories to be 
specified for FileInputFormat
 Key: FLINK-3655
 URL: https://issues.apache.org/jira/browse/FLINK-3655
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Gna Phetsarath
Priority: Minor


Allow comma-separated multiple directories to be specified for FileInputFormat. 

env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")}}

Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

2016-03-22 Thread Deepak Jha
Hi Maximilian,
Thanks for the email and looking into the issue. I'm using Scala 2.11 so it
sounds perfect to me...
I will be more than happy to test it out.

On Tue, Mar 22, 2016 at 2:48 AM, Maximilian Michels  wrote:

> Hi Deepak,
>
> We have looked further into this and have a pretty easy fix. However,
> it will only work with Flink's Scala 2.11 version because newer
> versions of the Akka library are incompatible with Scala 2.10 (Flink's
> default Scala version). Would that be a viable option for you?
>
> We're currently discussing this here:
> https://issues.apache.org/jira/browse/FLINK-2821
>
> Best,
> Max
>
> On Mon, Mar 14, 2016 at 4:49 PM, Deepak Jha  wrote:
> > Hi Maximilian,
> > Thanks for your response. I will wait for the update.
> >
> > On Monday, March 14, 2016, Maximilian Michels  wrote:
> >
> >> Hi Deepak,
> >>
> >> We'll look more into this problem this week. Until now we considered it
> a
> >> configuration issue if the bind address was not externally reachable.
> >> However, one might not always have the possibility to change this
> network
> >> configuration.
> >>
> >> Looking further, it is actually possible to let the bind address be
> >> different from the advertised address. From the Akka FAQ at
> >> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
> >>
> >> If you are running an ActorSystem under a NAT or inside a docker
> container,
> >> > make sure to set akka.remote.netty.tcp.hostname and
> >> > akka.remote.netty.tcp.port to the address it is reachable at from
> other
> >> > ActorSystems. If you need to bind your network interface to a
> different
> >> > address - use akka.remote.netty.tcp.bind-hostname and
> >> > akka.remote.netty.tcp.bind-port settings. Also make sure your network
> is
> >> > configured to translate from the address your ActorSystem is
> reachable at
> >> > to the address your ActorSystem network interface is bound to.
> >> >
> >>
> >> It looks like we have to expose this configuration to users who have a
> >> special network setup.
> >>
> >> Best,
> >> Max
> >>
> >> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha  >> > wrote:
> >>
> >> > Hi Stephan & Ufuk,
> >> > Thanks for your response.
> >> >
> >> > Yes there is a way in which you can run docker (net = host mode) in
> which
> >> > guest machine's network stack gets shared by docker container.
> >> > Unfortunately its not supported by AWS ECS.
> >> >
> >> > I do have one more question for you. Can you guys please explain me
> what
> >> > happens when taskmanager's register themselves to jobmanager in HA
> mode?
> >> > Does each taskmanager gets connected to jobmanager on separate port ?
> The
> >> > reason I'm asking is because if I run 2 taskmanager's (on separate
> docker
> >> > container), they are able to attach themselves to the Jobmanager
> (another
> >> > docker container) ( Flink HA setup using remote zk cluster) but soon
> >> after
> >> > that they get disconnected. Logs are not very helpful either... I
> suspect
> >> > that each taskmanager gets connected on new port and since by default
> >> > docker does not expose all ports, this may happen I do not see
> this
> >> > happen when I do not use docker container
> >> >
> >> > Here is the log file that I saw in jobmanager
> >> >
> >> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 5673db03e679 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
> >> > is 1. *Current
> >> > number of alive task slots is 1.*
> >> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
> >> > is 2. *Current
> >> > number of alive task slots is 2.*
> >> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager terminated.
> >> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager
> >> > -*
> >> > Unregistered task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager
> >> > . Number of registered
> >> task
> >> > managers 1. Number of available slots 1.*
> >> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > a.remote.ReliableDeliverySupervisor - Association with 

[jira] [Created] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-03-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3654:
---

 Summary: Disable Write-Ahead-Log in RocksDB State
 Key: FLINK-3654
 URL: https://issues.apache.org/jira/browse/FLINK-3654
 Project: Flink
  Issue Type: Improvement
  Components: Sure
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


We do our own checkpointing of the RocksDB database so the WAL is useless to 
us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3653) recovery.zookeeper.storageDir is not documented on the configuration page

2016-03-22 Thread Stefano Baghino (JIRA)
Stefano Baghino created FLINK-3653:
--

 Summary: recovery.zookeeper.storageDir is not documented on the 
configuration page
 Key: FLINK-3653
 URL: https://issues.apache.org/jira/browse/FLINK-3653
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Stefano Baghino
Assignee: Stefano Baghino
Priority: Minor
 Fix For: 1.1.0


The {{recovery.zookeeper.storageDir}} option is documented in the HA page but 
is missing from the configuration page. Since it's required for HA I think it 
would be a good idea to have it on both pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3652) Enable UnusedImports check for Scala checkstyle

2016-03-22 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3652:
-

 Summary: Enable UnusedImports check for Scala checkstyle
 Key: FLINK-3652
 URL: https://issues.apache.org/jira/browse/FLINK-3652
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor


For some reason, we don't have the UnusedImports check enabled in Scala 
checkstyle. This is not consistent with Java where we strictly check for unused 
imports.

I propose to enable it and fix eventual unused imports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RichMapPartitionFunction - problems with collect

2016-03-22 Thread Sergio Ramírez

Hi all,

I've been having some problems with RichMapPartitionFunction. Firstly, I 
tried to convert the iterable into an array unsuccessfully. Then, I have 
used some buffers to store the values per column. I am trying to  
transpose the local matrix of LabeledVectors that I have in each partition.


None of these solutions have worked. For example, for partition 7 and 
feature 10, the vector is empty, whereas for the same partition and 
feature 11, the vectors contains 200 elements. And this change on each 
execution, different partitions and features.


I think there is a problem with using the collect method out of the 
iterable loop.


new RichMapPartitionFunction[LabeledVector, ((Int, Int), Array[Byte])]() {
def mapPartition(it: java.lang.Iterable[LabeledVector], out: 
Collector[((Int, Int), Array[Byte])]): Unit = {

  val index = getRuntimeContext().getIndexOfThisSubtask()
  val mat = for (i <- 0 until nFeatures) yield new 
scala.collection.mutable.ListBuffer[Byte]

  for(reg <- it.asScala) {
for (i <- 0 until (nFeatures - 1)) mat(i) += 
reg.vector(i).toByte

mat(nFeatures - 1) += classMap(reg.label)
  }
  for(i <- 0 until nFeatures) out.collect((i, index) -> 
mat(i).toArray) // numPartitions

}
 }

Regards


[jira] [Created] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3651:
---

 Summary: Fix faulty RollingSink Restore
 Key: FLINK-3651
 URL: https://issues.apache.org/jira/browse/FLINK-3651
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


The RollingSink restore logic has a bug where the sink for subtask index 1 also 
removes files for subtask index 11 because the regex that checks for the file 
name also matches that one. Adding the suffix to the regex should solve the 
problem because then the regex for 1 will only match files for subtask index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-03-22 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3650:


 Summary: Add maxBy/minBy to Scala DataSet API
 Key: FLINK-3650
 URL: https://issues.apache.org/jira/browse/FLINK-3650
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Scala API
Affects Versions: 1.1.0
Reporter: Till Rohrmann


The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
These methods are not supported by the Scala DataSet API. These methods should 
be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3649) Document stable API methods maxBy/minBy

2016-03-22 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3649:


 Summary: Document stable API methods maxBy/minBy
 Key: FLINK-3649
 URL: https://issues.apache.org/jira/browse/FLINK-3649
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Java API
Affects Versions: 1.1.0
Reporter: Till Rohrmann


The Java DataSet API contains the stable API methods {{maxBy}} and {{minBy}} 
which are nowhere mentioned in our documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3648) Introduce Trigger Test Harness

2016-03-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3648:
---

 Summary: Introduce Trigger Test Harness
 Key: FLINK-3648
 URL: https://issues.apache.org/jira/browse/FLINK-3648
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek


As mentioned in 
https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit#
 we should build on the processing-time clock introduced in FLINK-3646 and add 
a testing harness for triggers. The harness should allow inputting elements, 
updating watermark, updating processing-time and firing timers. The output of 
the trigger as well as state managed by the partitioned state abstraction 
should be observable to verify correct trigger behavior.

Then, we should add tests for all triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3647) Change StreamSource to use Processing-Time Clock Service

2016-03-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3647:
---

 Summary: Change StreamSource to use Processing-Time Clock Service
 Key: FLINK-3647
 URL: https://issues.apache.org/jira/browse/FLINK-3647
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek


Currently, the {{StreamSource.AutomaticWatermarkContext}} has it's own timer 
service. This should be changed to use the Clock service introduced in 
FLINK- to make watermark emission testable by providing a custom Clock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3646) Use Processing-Time Clock in Window Assigners/Triggers

2016-03-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3646:
---

 Summary: Use Processing-Time Clock in Window Assigners/Triggers
 Key: FLINK-3646
 URL: https://issues.apache.org/jira/browse/FLINK-3646
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 1.0.0
Reporter: Aljoscha Krettek


This is part of the effort to improve Window Triggers in Flink. We should 
replace all usage of {{System.currentTimeMillis()}} by a provided clock 
implementation. This way we can use a deterministic testing clock to verify the 
behavior of processing-time windowing components in unit tests.

This requires the following changes:

- Change {{StreamTask}} to have a {{Clock}} (name is WIP). By default this 
clock will use {{System.currentTimeMillis()}}. The clock must also provide an 
interface to register processing-time triggers, this is currently handled 
directly by {{StreamTask}} but must now also be handled by an external Clock
- Change API of {{WindowAssigner}} to take a context object that allows it to 
query the current processing time. This can be an abstract class 
{{AssignerContext}} with a single method {{long currentProcessingTime()}}
- Add a method {{long currentProcessingTime()}} in {{TriggerContext}}, change 
{{TriggerContext}} to use the clock methods provided by {{StreamTask}} and 
forwarded by {{WindowOperator}}
- Change processing-time triggers to use the new methods
- Change {{WindowOperator}} to support these changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: RollingSink

2016-03-22 Thread Aljoscha Krettek
Hi,
how are you printing the debug statements?

But yeah all the logic of renaming in progress files and cleaning up after a 
failed job happens in restoreState(BucketState state). The steps are roughly 
these:

1. Move current in-progress file to final location
2. truncate the file if necessary (if truncate is not available write a 
.valid-length file)
3. Move pending files to final location that where part of the checkpoint
4. cleanup any leftover pending/in-progress files

Cheers,
Aljoscha
> On 22 Mar 2016, at 10:08, Vijay Srinivasaraghavan 
>  wrote:
> 
> Hello,
> I have enabled checkpoint and I am using RollingSink to sink the data to HDFS 
> (2.7.x) from KafkaConsumer. To simulate failover/recovery, I stopped 
> TaskManager and the job gets rescheduled to other Taskmanager instance. 
> During this momemnt, the current "in-progress" gets closed and renamed to 
> part-0-1 from _part-0-1_in-progress. 
> I was hoping to see the debug statement that I have added to "restoreState" 
> method but none of my debug statement gets printed. I am not sure if the 
> restoreState() method gets invoked during this scenario. Could you please 
> help me understand the flow during "failover" scenario?
> P.S: Functionally the code appears to be working fine but I am trying to 
> understand the underlying implementation details. public void 
> restoreState(BucketState state)
> Regards
> Vijay



Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

2016-03-22 Thread Maximilian Michels
Hi Deepak,

We have looked further into this and have a pretty easy fix. However,
it will only work with Flink's Scala 2.11 version because newer
versions of the Akka library are incompatible with Scala 2.10 (Flink's
default Scala version). Would that be a viable option for you?

We're currently discussing this here:
https://issues.apache.org/jira/browse/FLINK-2821

Best,
Max

On Mon, Mar 14, 2016 at 4:49 PM, Deepak Jha  wrote:
> Hi Maximilian,
> Thanks for your response. I will wait for the update.
>
> On Monday, March 14, 2016, Maximilian Michels  wrote:
>
>> Hi Deepak,
>>
>> We'll look more into this problem this week. Until now we considered it a
>> configuration issue if the bind address was not externally reachable.
>> However, one might not always have the possibility to change this network
>> configuration.
>>
>> Looking further, it is actually possible to let the bind address be
>> different from the advertised address. From the Akka FAQ at
>> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
>>
>> If you are running an ActorSystem under a NAT or inside a docker container,
>> > make sure to set akka.remote.netty.tcp.hostname and
>> > akka.remote.netty.tcp.port to the address it is reachable at from other
>> > ActorSystems. If you need to bind your network interface to a different
>> > address - use akka.remote.netty.tcp.bind-hostname and
>> > akka.remote.netty.tcp.bind-port settings. Also make sure your network is
>> > configured to translate from the address your ActorSystem is reachable at
>> > to the address your ActorSystem network interface is bound to.
>> >
>>
>> It looks like we have to expose this configuration to users who have a
>> special network setup.
>>
>> Best,
>> Max
>>
>> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha > > wrote:
>>
>> > Hi Stephan & Ufuk,
>> > Thanks for your response.
>> >
>> > Yes there is a way in which you can run docker (net = host mode) in which
>> > guest machine's network stack gets shared by docker container.
>> > Unfortunately its not supported by AWS ECS.
>> >
>> > I do have one more question for you. Can you guys please explain me what
>> > happens when taskmanager's register themselves to jobmanager in HA mode?
>> > Does each taskmanager gets connected to jobmanager on separate port ? The
>> > reason I'm asking is because if I run 2 taskmanager's (on separate docker
>> > container), they are able to attach themselves to the Jobmanager (another
>> > docker container) ( Flink HA setup using remote zk cluster) but soon
>> after
>> > that they get disconnected. Logs are not very helpful either... I suspect
>> > that each taskmanager gets connected on new port and since by default
>> > docker does not expose all ports, this may happen I do not see this
>> > happen when I do not use docker container
>> >
>> > Here is the log file that I saw in jobmanager
>> >
>> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 5673db03e679 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
>> > is 1. *Current
>> > number of alive task slots is 1.*
>> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
>> > is 2. *Current
>> > number of alive task slots is 2.*
>> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager terminated.
>> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
>> > -*
>> > Unregistered task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager
>> > . Number of registered
>> task
>> > managers 1. Number of available slots 1.*
>> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > a.remote.ReliableDeliverySupervisor - Association with remote system
>> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
>> > [5000] ms. Reason is: [Disassociated].
>> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
>> > TaskManager akka://flink/user/taskmanager is disassociating.
>> > 

RollingSink

2016-03-22 Thread Vijay Srinivasaraghavan
Hello,
I have enabled checkpoint and I am using RollingSink to sink the data to HDFS 
(2.7.x) from KafkaConsumer. To simulate failover/recovery, I stopped 
TaskManager and the job gets rescheduled to other Taskmanager instance. During 
this momemnt, the current "in-progress" gets closed and renamed to part-0-1 
from _part-0-1_in-progress. 
I was hoping to see the debug statement that I have added to "restoreState" 
method but none of my debug statement gets printed. I am not sure if the 
restoreState() method gets invoked during this scenario. Could you please help 
me understand the flow during "failover" scenario?
P.S: Functionally the code appears to be working fine but I am trying to 
understand the underlying implementation details. public void 
restoreState(BucketState state)
Regards
Vijay

Re: [DISCUSS] Improving Trigger/Window API and Semantics

2016-03-22 Thread Aljoscha Krettek
Hi,
I have some thoughts about Evictors as well yes, but I didn’t yet write them 
down. The basic idea about them is this:

class Evictor {
   Predicate getPredicate(Iterable elements, int size, W 
window);
}

class Predicate {
  boolean evict(StreamRecord element);
}

The evictor will return a predicate that is evaluated on every element in the 
buffer to decide whether we should keep it or not. The predicate can keep 
internal state. So with the size it gets in getPredicate() it can do count 
based eviction (just evict elements until you reach your desired quota). We can 
also do eviction based on event-time which was not possible before because you 
could only evict from the start of the buffer. What do you think?

Cheers,
Aljoscha
> On 22 Mar 2016, at 09:24, Fabian Hueske  wrote:
> 
> Thanks for the write-up Aljoscha.
> I think it is a really good idea to separate the different aspects (fire, 
> purging, lateness) a bit. At the moment, all of these need to be handled in 
> the Trigger and a custom trigger is necessary whenever, you want some of 
> these aspects slightly differently handled. This makes the Trigger interface 
> and implementations of it really hard to understand.
> 
> +1 for the suggested changes. 
> Are there plans to touch the Evictor interface as well? IMO, this needs a 
> redesign as well.
> 
> Fabian
> 
> 2016-03-21 19:21 GMT+01:00 Aljoscha Krettek :
> Hi,
> my previous message might be a bit hard to parse for people that are not very 
> deep into the Trigger implementation. So I’ll try to give a bit more 
> explanation right in the mail.
> 
> The basic idea is that we observed some basic problems that keep coming up 
> for people on the mailing lists and I want to try and address them.
> 
> The first problem is with the Trigger semantics and the confusion between 
> triggers that purge the window contents and those that don’t. (For example, 
> using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad 
> idea because state will be kept indefinitely.) While working on this we 
> should also tacke the issue of providing composite triggers such as 
> Repeatedly (fires a child-trigger repeatedly), Any (fires when any child 
> trigger fires) and All (fires when all child triggers fire).
> 
> Lateness. Right now, it is possible to write custom triggers that can deal 
> with late elements and can even behave differently based on the amount of 
> lateness. There is, however, no API for dealing with lateness. We should 
> address this.
> 
> The third issue is Trigger testability. We should introduce a testing harness 
> for triggers and move the processing time triggers to use a clock provider 
> instead of directly using System.currentTimeMillis(). This will allow testing 
> them deterministically.
> 
> All of these are expanded upon in the document I linked to before: 
> https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing
>  I think all of this is very important for people working on event-time based 
> pipelines.
> 
> Feedback is very welcome and I hope that we can expand the document together 
> and come up with good solutions.
> 
> Cheers,
> Aljoscha
> > On 21 Mar 2016, at 17:46, Aljoscha Krettek  wrote:
> >
> > Hi,
> > I’m also sending this to @user because the Trigger API concerns users 
> > directly.
> >
> > There are some things in the Trigger API that I think require some 
> > improvements. The issues are trigger testability, fire semantics and 
> > composite triggers and lateness. I started a document to keep track of 
> > things 
> > (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing).
> >  Please read it if you are interested and want to get involved in this. 
> > We’ll evolve the document together and come up with Jira issues for the 
> > subtasks.
> >
> > Cheers,
> > Aljoscha
> 
> 



[jira] [Created] (FLINK-3645) HDFSCopyUtilitiesTest fails in a Hadoop cluster

2016-03-22 Thread Chiwan Park (JIRA)
Chiwan Park created FLINK-3645:
--

 Summary: HDFSCopyUtilitiesTest fails in a Hadoop cluster
 Key: FLINK-3645
 URL: https://issues.apache.org/jira/browse/FLINK-3645
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1.0
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Minor


{{HDFSCopyUtilitiesTest}} class tests {{HDFSCopyFromLocal.copyFromLocal}} and 
{{HDFSCopyToLocal.copyToLocal}} methods. This test fails when runs on a machine 
where Hadoop is installed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Improving Trigger/Window API and Semantics

2016-03-22 Thread Fabian Hueske
Thanks for the write-up Aljoscha.
I think it is a really good idea to separate the different aspects (fire,
purging, lateness) a bit. At the moment, all of these need to be handled in
the Trigger and a custom trigger is necessary whenever, you want some of
these aspects slightly differently handled. This makes the Trigger
interface and implementations of it really hard to understand.

+1 for the suggested changes.
Are there plans to touch the Evictor interface as well? IMO, this needs a
redesign as well.

Fabian

2016-03-21 19:21 GMT+01:00 Aljoscha Krettek :

> Hi,
> my previous message might be a bit hard to parse for people that are not
> very deep into the Trigger implementation. So I’ll try to give a bit more
> explanation right in the mail.
>
> The basic idea is that we observed some basic problems that keep coming up
> for people on the mailing lists and I want to try and address them.
>
> The first problem is with the Trigger semantics and the confusion between
> triggers that purge the window contents and those that don’t. (For example,
> using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad
> idea because state will be kept indefinitely.) While working on this we
> should also tacke the issue of providing composite triggers such as
> Repeatedly (fires a child-trigger repeatedly), Any (fires when any child
> trigger fires) and All (fires when all child triggers fire).
>
> Lateness. Right now, it is possible to write custom triggers that can deal
> with late elements and can even behave differently based on the amount of
> lateness. There is, however, no API for dealing with lateness. We should
> address this.
>
> The third issue is Trigger testability. We should introduce a testing
> harness for triggers and move the processing time triggers to use a clock
> provider instead of directly using System.currentTimeMillis(). This will
> allow testing them deterministically.
>
> All of these are expanded upon in the document I linked to before:
> https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing
> I think all of this is very important for people working on event-time
> based pipelines.
>
> Feedback is very welcome and I hope that we can expand the document
> together and come up with good solutions.
>
> Cheers,
> Aljoscha
> > On 21 Mar 2016, at 17:46, Aljoscha Krettek  wrote:
> >
> > Hi,
> > I’m also sending this to @user because the Trigger API concerns users
> directly.
> >
> > There are some things in the Trigger API that I think require some
> improvements. The issues are trigger testability, fire semantics and
> composite triggers and lateness. I started a document to keep track of
> things (
> https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing).
> Please read it if you are interested and want to get involved in this.
> We’ll evolve the document together and come up with Jira issues for the
> subtasks.
> >
> > Cheers,
> > Aljoscha
>
>