Connected Streams - Controlling Order of arrival on the two streams

2016-08-09 Thread Sameer W
Hi,

I am using connected streams to send rules coded as JavaScript functions on
one stream and event data on another stream. They are both keyed by the
device id. The rules are cached in the co-map operation until another rule
arrives to override existing rule.

Is there a way to ensure that the rules stream arrives before the event
data stream. I am assuming there is no guarantee for this and I cache the
event data is the rules have not yet arrived and process and clear the
cache when the rules arrive. The rules are expected to arrive before the
event data. I am only using this method as a precautionary measure in case
the rules arrive late for reasons unrelated to when they were sent.

Is there a way to handle this situation without caching the streams?


Thanks,
Sameer


Window function - iterator data

2016-08-09 Thread Paul Joireman
When you are using a window function the docs:


https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html#windowfunction---the-generic-case


state that


A WindowFunction gets an Iterable containing all the elements of the window 
being processed


If the input data stream is timestamped using event times where the events can 
come in out of chronological order, are the events returned by the interator 
the same order as they were added?  Or does the window sort them internally on 
timestamp and return them through the iterator in timestamp chronological order?


Paul


Re: Flink : CEP processing

2016-08-09 Thread Sameer W
In one of the earlier thread Till explained this to me (
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CEP-and-Within-Clause-td8159.html
)

1. Within does not use time windows. It sort of uses session windows where
the session begins when the first event of the pattern is identified. The
timer starts when the "first" event in the pattern fires. If the pattern
completes "within" the designated times (meaning the "next" and "followed
by" fire as will "within" the time specified) you have a match or else the
window is removed. I don't know how it is implemented but I doubt it stores
all the events in memory for the "within" window (there is not need to). It
will only store the relevant events (first, next, followed by, etc). So
memory would not be an issue here. If two "first" type events are
identified I think two "within" sessions are created.

2. Snapshotting (I don't know much in this area so I cannot answer). Why
should it be different though? You are using operators and state. It should
work the same way. But I am not too familiar with that.

3. The "Within" window is not an issue. Even the window preceding that
should not be unless you are using WindowFunction (more memory friendly
alternative is https://ci.apache.org/projects/flink/flink-docs-
master/apis/streaming/windows.html#window-functions ) by themselves and
using a really large window

4. The way I am using it, it is working fine. Some of the limitations I
have seen are related to this paper not being fully implemented (
https://people.cs.umass.edu/~yanlei/publications/sase-sigmod08.pdf). I
don't know how to support negation in an event stream but I don't need it
for now.

Thanks,
Sameer


On Tue, Aug 9, 2016 at 3:45 PM, M Singh  wrote:

> Hi Sameer:
>
> If we use a within window for event series -
>
> 1. Does it interfere with the default time windows ?
> 2. How does it affect snapshotting ?
> 3. If the window is too large are the events stored in a "processor" for
> the window to expire ?
> 4. Are there any other know limitations and best practices of using CEP
> with Flink ?
>
> Thanks again for your help.
>
>
>
> On Tuesday, August 9, 2016 11:29 AM, Sameer Wadkar 
> wrote:
>
>
> In that case you need to get them into one stream somehow (keyBy a dummy
> value for example). There is always some logical key to keyBy on when data
> is arriving from multiple sources (ex some portion of the time stamp).
>
> You are looking for patterns within something (events happening around the
> same time but arriving from multiple devices). That something should be the
> key. That's how I am using it.
>
> Sameer
>
> Sent from my iPhone
>
> On Aug 9, 2016, at 1:40 PM, M Singh  wrote:
>
> Thanks Sameer.
>
> So does that mean that if the events keys are not same we cannot use the
> CEP pattern match ?  What if events are coming from different sources and
> need to be correlated ?
>
> Mans
>
>
> On Tuesday, August 9, 2016 9:40 AM, Sameer W  wrote:
>
>
> Hi,
>
> You will need to use keyBy operation first to get all the events you need
> monitored in a pattern on the same node. Only then can you apply Pattern
> because it depends on the order of the events (first, next, followed by). I
> even had to make sure that the events were correctly sorted by timestamps
> to ensure that the first,next and followed by works correctly.
>
> Sameer
>
> On Tue, Aug 9, 2016 at 12:17 PM, M Singh  wrote:
>
> Hey Folks:
>
> I have a question about CEP processing in Flink - How does flink
> processing work when we have multiple partitions in which the events used
> in the pattern sequence might be scattered across multiple partitions on
> multiple nodes ?
>
> Thanks for your insight.
>
> Mans
>
>
>
>
>
>
>


Re: Flink : CEP processing

2016-08-09 Thread M Singh
Hi Sameer:
If we use a within window for event series - 
1. Does it interfere with the default time windows ?2. How does it affect 
snapshotting ?  3. If the window is too large are the events stored in a 
"processor" for the window to expire ?4. Are there any other know limitations 
and best practices of using CEP with Flink ?
Thanks again for your help.
 

On Tuesday, August 9, 2016 11:29 AM, Sameer Wadkar  
wrote:
 

 In that case you need to get them into one stream somehow (keyBy a dummy value 
for example). There is always some logical key to keyBy on when data is 
arriving from multiple sources (ex some portion of the time stamp). 
You are looking for patterns within something (events happening around the same 
time but arriving from multiple devices). That something should be the key. 
That's how I am using it. 

Sameer
Sent from my iPhone
On Aug 9, 2016, at 1:40 PM, M Singh  wrote:


Thanks Sameer.
So does that mean that if the events keys are not same we cannot use the CEP 
pattern match ?  What if events are coming from different sources and need to 
be correlated ?
Mans 

On Tuesday, August 9, 2016 9:40 AM, Sameer W  wrote:
 

 Hi,
You will need to use keyBy operation first to get all the events you need 
monitored in a pattern on the same node. Only then can you apply Pattern 
because it depends on the order of the events (first, next, followed by). I 
even had to make sure that the events were correctly sorted by timestamps to 
ensure that the first,next and followed by works correctly.
Sameer
On Tue, Aug 9, 2016 at 12:17 PM, M Singh  wrote:

Hey Folks:
I have a question about CEP processing in Flink - How does flink processing 
work when we have multiple partitions in which the events used in the pattern 
sequence might be scattered across multiple partitions on multiple nodes ?
Thanks for your insight.
Mans



   


  

Re: Flink : CEP processing

2016-08-09 Thread Sameer Wadkar
In that case you need to get them into one stream somehow (keyBy a dummy value 
for example). There is always some logical key to keyBy on when data is 
arriving from multiple sources (ex some portion of the time stamp). 

You are looking for patterns within something (events happening around the same 
time but arriving from multiple devices). That something should be the key. 
That's how I am using it. 

Sameer

Sent from my iPhone

> On Aug 9, 2016, at 1:40 PM, M Singh  wrote:
> 
> Thanks Sameer.
> 
> So does that mean that if the events keys are not same we cannot use the CEP 
> pattern match ?  What if events are coming from different sources and need to 
> be correlated ?
> 
> Mans
> 
> 
> On Tuesday, August 9, 2016 9:40 AM, Sameer W  wrote:
> 
> 
> Hi,
> 
> You will need to use keyBy operation first to get all the events you need 
> monitored in a pattern on the same node. Only then can you apply Pattern 
> because it depends on the order of the events (first, next, followed by). I 
> even had to make sure that the events were correctly sorted by timestamps to 
> ensure that the first,next and followed by works correctly.
> 
> Sameer
> 
> On Tue, Aug 9, 2016 at 12:17 PM, M Singh  wrote:
> Hey Folks:
> 
> I have a question about CEP processing in Flink - How does flink processing 
> work when we have multiple partitions in which the events used in the pattern 
> sequence might be scattered across multiple partitions on multiple nodes ?
> 
> Thanks for your insight.
> 
> Mans
> 
> 
> 


Re: Flink : CEP processing

2016-08-09 Thread M Singh
Thanks Sameer.
So does that mean that if the events keys are not same we cannot use the CEP 
pattern match ?  What if events are coming from different sources and need to 
be correlated ?
Mans 

On Tuesday, August 9, 2016 9:40 AM, Sameer W  wrote:
 

 Hi,
You will need to use keyBy operation first to get all the events you need 
monitored in a pattern on the same node. Only then can you apply Pattern 
because it depends on the order of the events (first, next, followed by). I 
even had to make sure that the events were correctly sorted by timestamps to 
ensure that the first,next and followed by works correctly.
Sameer
On Tue, Aug 9, 2016 at 12:17 PM, M Singh  wrote:

Hey Folks:
I have a question about CEP processing in Flink - How does flink processing 
work when we have multiple partitions in which the events used in the pattern 
sequence might be scattered across multiple partitions on multiple nodes ?
Thanks for your insight.
Mans



  

Re: Flink : CEP processing

2016-08-09 Thread Sameer W
Hi,

You will need to use keyBy operation first to get all the events you need
monitored in a pattern on the same node. Only then can you apply Pattern
because it depends on the order of the events (first, next, followed by). I
even had to make sure that the events were correctly sorted by timestamps
to ensure that the first,next and followed by works correctly.

Sameer

On Tue, Aug 9, 2016 at 12:17 PM, M Singh  wrote:

> Hey Folks:
>
> I have a question about CEP processing in Flink - How does flink
> processing work when we have multiple partitions in which the events used
> in the pattern sequence might be scattered across multiple partitions on
> multiple nodes ?
>
> Thanks for your insight.
>
> Mans
>


Re: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-09 Thread Ufuk Celebi
I've started a vote for 1.1.1 containing hopefully fixed artifacts. If
you have any spare time, would you mind checking whether it fixes your
problem?

The artifacts are here: http://home.apache.org/~uce/flink-1.1.1-rc1/

You would have to add the following repository to your Maven project
and update the Flink version to 1.1.1:



flink-rc
flink-rc
https://repository.apache.org/content/repositories/orgapacheflink-1101

true


false




Would really appreciate it!

On Tue, Aug 9, 2016 at 2:11 PM, Ufuk Celebi  wrote:
> This is a problem with the Maven artifacts of 1.1.0 :-( I've added a
> warning to the release note and will start a emergency vote for 1.1.1
> which only updates the Maven artifacts.
>
> On Tue, Aug 9, 2016 at 1:20 PM, LINZ, Arnaud  wrote:
>> Okay,
>>
>> That would also solve my issue.
>>
>> Greetings,
>>
>> Arnaud
>>
>>
>>
>> De : Stephan Ewen [mailto:se...@apache.org]
>> Envoyé : mardi 9 août 2016 12:41
>> À : user@flink.apache.org
>> Objet : Re: Classloader issue using AvroParquetInputFormat via
>> HadoopInputFormat
>>
>>
>>
>> Hi Shannon!
>>
>>
>>
>> It seams that the something in the maven deployment went wrong with this
>> release.
>>
>>
>>
>> There should be:
>>
>>   - flink-java (the default, with a transitive dependency to hadoop 2.x for
>> hadoop compatibility features)
>>
>>   - flink-java-hadoop1 (with a transitive dependency for hadoop 1.x fir
>> older hadoop compatibility features)
>>
>>
>>
>> Apparently the "flink-java" artifact git overwritten with the
>> "flink-java-hadoop1" artifact. Damn.
>>
>>
>>
>> I think we need to release new artifacts that fix these dependency
>> descriptors.
>>
>>
>>
>> That needs to be a 1.1.1 release, because maven artifacts cannot be changed
>> after they were deployed.
>>
>>
>>
>> Greetings,
>> Stephan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 8, 2016 at 11:08 PM, Shannon Carey  wrote:
>>
>> Correction: I cannot work around the problem. If I exclude hadoop1, I get
>> the following exception which appears to be due to flink-java-1.1.0's
>> dependency on Hadoop1.
>>
>>
>>
>> Failed to submit job 4b6366d101877d38ef33454acc6ca500
>> (com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
>>
>> org.apache.flink.runtime.client.JobExecutionException: Failed to submit job
>> 4b6366d101877d38ef33454acc6ca500
>> (com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
>>
>> at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>
>> at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>>
>> at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>
>> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>
>> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
>>
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>
>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>
>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>
>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> Caused by: org.apache.flink.runtime.JobException: Creating the input splits
>> caused an error: Found interface org.apache.hadoop.mapreduce.JobContext, but
>> class was expected
>>
>> at
>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
>>
>> at
>> org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:695)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1178)
>>
>> ... 19 more
>>
>> Caused by: java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.JobContext, but class was expected
>>
>> at
>> org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:158)
>>
>> at
>> 

Re: Flink 1.1.0 : Hadoop 1/2 compatibility mismatch

2016-08-09 Thread Ufuk Celebi
I've started a vote for 1.1.1 containing hopefully fixed artifacts. If
you have any spare time, would you mind checking whether it fixes your
problem?

The artifacts are here: http://home.apache.org/~uce/flink-1.1.1-rc1/

You would have to add the following repository to your Maven project
and update the Flink version to 1.1.1:



flink-rc
flink-rc
https://repository.apache.org/content/repositories/orgapacheflink-1101

true


false




Would really appreciate it!


On Tue, Aug 9, 2016 at 2:11 PM, Ufuk Celebi  wrote:
> As noted in the other thread, this is a problem with the Maven
> artifacts of 1.1.0 :-( I've added a warning to the release note and
> will start a emergency vote for 1.1.1 which only updates the Maven
> artifacts.
>
> On Tue, Aug 9, 2016 at 9:45 AM, LINZ, Arnaud  wrote:
>> Hello,
>>
>>
>>
>> I’ve switched to 1.1.0, but part of my code doesn’t work any longer.
>>
>>
>>
>> Despite the fact that I have no Hadoop 1 jar in my dependencies (2.7.1
>> clients & flink-hadoop-compatibility_2.10 1.1.0), I have a weird JobContext
>> version mismatch error, that I was unable to understand.
>>
>>
>>
>> Code is a hive table read in a local batch flink cluster using a M/R job
>> (from good package mapreduce, not mapred).
>>
>>
>>
>> import org.apache.hadoop.mapreduce.InputFormat;
>>
>> import org.apache.hadoop.mapreduce.Job;
>>
>> import org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat;
>>
>> (…)
>>
>> final Job job = Job.getInstance();
>>
>> final InputFormat hCatInputFormat =
>> (InputFormat) HCatInputFormat.setInput(job, table.getDbName(),
>> table.getTableName(), filter);
>>
>>
>>
>> final HadoopInputFormat inputFormat
>> = new HadoopInputFormat>
>> DefaultHCatRecord>(hCatInputFormat, NullWritable.class,
>> DefaultHCatRecord.class,  job);
>>
>>
>>
>>
>>
>> final HCatSchema inputSchema =
>> HCatInputFormat.getTableSchema(job.getConfiguration());
>>
>> return cluster
>>
>> .createInput(inputFormat)
>>
>> .flatMap(new RichFlatMapFunction> DefaultHCatRecord>, T>() {
>>
>> @Override
>>
>> public void flatMap(Tuple2> DefaultHCatRecord> value,
>>
>> Collector out) throws Exception { // NOPMD
>>
>> (...)
>>
>> }
>>
>> }).returns(beanClass);
>>
>>
>>
>>
>>
>> Exception is :
>>
>> org.apache.flink.runtime.client.JobExecutionException: Failed to submit job
>> 69dba7e4d79c05d2967dca4d4a27cf38 (Flink Java Job at Tue Aug 09 09:19:41 CEST
>> 2016)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>
>> at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>
>> at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
>>
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>
>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>
>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>
>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>
>> at
>> 

Flink : CEP processing

2016-08-09 Thread M Singh
Hey Folks:
I have a question about CEP processing in Flink - How does flink processing 
work when we have multiple partitions in which the events used in the pattern 
sequence might be scattered across multiple partitions on multiple nodes ?
Thanks for your insight.
Mans

Re: Release notes 1.1.0?

2016-08-09 Thread Andrew Ge Wu
Oh sorry missed that part, no, Im not explicitly set that.


> On 09 Aug 2016, at 15:29, Aljoscha Krettek  wrote:
> 
> Hi,
> are you setting a StreamTimeCharacteristic, i.e. 
> env.setStreamTimeCharacteristic?
> 
> Cheers,
> Aljoscha
> 
> On Tue, 9 Aug 2016 at 14:52 Andrew Ge Wu  > wrote:
> Hi Aljoscha
> 
> 
> Plan attached, there are split streams and union operations around, but here 
> is how windows are created
> 
> Confidentiality Notice: This e-mail transmission may contain confidential or 
> legally privileged information that is intended only for the individual or 
> entity named in the e-mail address. If you are not the intended recipient, 
> you are hereby notified that any disclosure, copying, distribution, or 
> reliance upon the contents of this e-mail is strictly prohibited and may be 
> unlawful. If you have received this e-mail in error, please notify the sender 
> immediately by return e-mail and delete all copies of this message.
> 
> Let me know if I’m doing something out of ordinary here.
> 
> 
> 
> Thanks!
> 
> 
> Andrew
>> On 09 Aug 2016, at 14:18, Aljoscha Krettek > > wrote:
>> 
> 
>> Hi,
>> could you maybe post how exactly you specify the window? Also, did you set a 
>> "stream time characteristic", for example EventTime?
>> 
>> That could help us pinpoint the problem.
>> 
>> Cheers,
>> Aljoscha
>> 
> 
>> On Tue, 9 Aug 2016 at 12:42 Andrew Ge Wu > > wrote:
> 
>> I rolled back to 1.0.3
> 
>> If I understand this correctly, the peak when topology starts is because it 
>> is trying to fill all the buffers, but I can not see that in 1.1.0.
>> 
>> 
>> 
> 
>>> On 09 Aug 2016, at 12:10, Robert Metzger >> > wrote:
>>> 
>> 
>>> Which source are you using?
>>> 
>>> On Tue, Aug 9, 2016 at 11:50 AM, Andrew Ge Wu >> > wrote:
>>> Hi Robert
>>> 
>>> 
>>> Thanks for the quick reply, I guess I’m one of the early birds.
>>> Yes, it is much slower, I’m not sure why, I copied slaves, masters, 
>>> log4j.properties and flink-conf.yaml directly from 1.0.3
>>> I have parallelization 1 on my sources, I can increase that to achieve the 
>>> same speed, but I’m interested to know why is that.
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> Andrew
 On 09 Aug 2016, at 11:47, Robert Metzger > wrote:
 
 Hi Andrew,
 
 here is the release announcement, with a list of all changes: 
 http://flink.apache.org/news/2016/08/08/release-1.1.0.html 
 , 
 http://flink.apache.org/blog/release_1.1.0-changelog.html 
 
 
 What does the chart say? Are the results different? is Flink faster or 
 slower now?
 
 
 Regards,
 Robert
 
 On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu > wrote:
 Hi,
 
 We found out there is a new stable version released: 1.1.0 but we can not 
 find any release note.
 Do anyone know where to find it?
 
 
 We are experience some change of behavior, I’m not sure if it is related.
 
 
 
 Thanks
 
 
 Andrew
 
 Confidentiality Notice: This e-mail transmission may contain confidential 
 or legally privileged information that is intended only for the individual 
 or entity named in the e-mail address. If you are not the intended 
 recipient, you are hereby notified that any disclosure, copying, 
 distribution, or reliance upon the contents of this e-mail is strictly 
 prohibited and may be unlawful. If you have received this e-mail in error, 
 please notify the sender immediately by return e-mail and delete all 
 copies of this message.
 
>>> 
>>> 
>>> Confidentiality Notice: This e-mail transmission may contain confidential 
>>> or legally privileged information that is intended only for the individual 
>>> or entity named in the e-mail address. If you are not the intended 
>>> recipient, you are hereby notified that any disclosure, copying, 
>>> distribution, or reliance upon the contents of this e-mail is strictly 
>>> prohibited and may be unlawful. If you have received this e-mail in error, 
>>> please notify the sender immediately by return e-mail and delete all copies 
>>> of this message.
>>> 
>> 
>> Confidentiality Notice: This e-mail transmission may contain confidential or 
>> legally privileged information that is intended only for the individual or 
>> entity named in the e-mail address. If you are not the intended recipient, 
>> you are hereby notified that any disclosure, copying, distribution, or 
>> reliance upon the 

Re: Release notes 1.1.0?

2016-08-09 Thread Aljoscha Krettek
Hi,
are you setting a StreamTimeCharacteristic, i.e.
env.setStreamTimeCharacteristic?

Cheers,
Aljoscha

On Tue, 9 Aug 2016 at 14:52 Andrew Ge Wu  wrote:

> Hi Aljoscha
>
>
> Plan attached, there are split streams and union operations around, but
> here is how windows are created
>
> Confidentiality Notice: This e-mail transmission may contain confidential
> or legally privileged information that is intended only for the individual
> or entity named in the e-mail address. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or reliance upon the contents of this e-mail is strictly
> prohibited and may be unlawful. If you have received this e-mail in error,
> please notify the sender immediately by return e-mail and delete all copies
> of this message.
>
> Let me know if I’m doing something out of ordinary here.
>
>
>
> Thanks!
>
>
> Andrew
>
> On 09 Aug 2016, at 14:18, Aljoscha Krettek  wrote:
>
> Hi,
> could you maybe post how exactly you specify the window? Also, did you set
> a "stream time characteristic", for example EventTime?
>
> That could help us pinpoint the problem.
>
> Cheers,
> Aljoscha
>
> On Tue, 9 Aug 2016 at 12:42 Andrew Ge Wu  wrote:
>
> I rolled back to 1.0.3
>>
> If I understand this correctly, the peak when topology starts is because
>> it is trying to fill all the buffers, but I can not see that in 1.1.0.
>>
>>
>>
>> On 09 Aug 2016, at 12:10, Robert Metzger  wrote:
>>
>> Which source are you using?
>>
>> On Tue, Aug 9, 2016 at 11:50 AM, Andrew Ge Wu 
>> wrote:
>>
>>> Hi Robert
>>>
>>>
>>> Thanks for the quick reply, I guess I’m one of the early birds.
>>> Yes, it is much slower, I’m not sure why, I copied slaves, masters,
>>> log4j.properties and flink-conf.yaml directly from 1.0.3
>>> I have parallelization 1 on my sources, I can increase that to achieve
>>> the same speed, but I’m interested to know why is that.
>>>
>>>
>>> Thanks!
>>>
>>>
>>> Andrew
>>>
>>> On 09 Aug 2016, at 11:47, Robert Metzger  wrote:
>>>
>>> Hi Andrew,
>>>
>>> here is the release announcement, with a list of all changes:
>>> http://flink.apache.org/news/2016/08/08/release-1.1.0.html,
>>> http://flink.apache.org/blog/release_1.1.0-changelog.html
>>>
>>> What does the chart say? Are the results different? is Flink faster or
>>> slower now?
>>>
>>>
>>> Regards,
>>> Robert
>>>
>>> On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu 
>>> wrote:
>>>
 Hi,

 We found out there is a new stable version released: 1.1.0 but we can
 not find any release note.
 Do anyone know where to find it?


 We are experience some change of behavior, I’m not sure if it is
 related.

 

 Thanks


 Andrew

 Confidentiality Notice: This e-mail transmission may contain
 confidential or legally privileged information that is intended only for
 the individual or entity named in the e-mail address. If you are not the
 intended recipient, you are hereby notified that any disclosure, copying,
 distribution, or reliance upon the contents of this e-mail is strictly
 prohibited and may be unlawful. If you have received this e-mail in error,
 please notify the sender immediately by return e-mail and delete all copies
 of this message.
>>>
>>>
>>>
>>>
>>> Confidentiality Notice: This e-mail transmission may contain
>>> confidential or legally privileged information that is intended only for
>>> the individual or entity named in the e-mail address. If you are not the
>>> intended recipient, you are hereby notified that any disclosure, copying,
>>> distribution, or reliance upon the contents of this e-mail is strictly
>>> prohibited and may be unlawful. If you have received this e-mail in error,
>>> please notify the sender immediately by return e-mail and delete all copies
>>> of this message.
>>>
>>
>>
>> Confidentiality Notice: This e-mail transmission may contain confidential
>> or legally privileged information that is intended only for the individual
>> or entity named in the e-mail address. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution, or reliance upon the contents of this e-mail is strictly
>> prohibited and may be unlawful. If you have received this e-mail in error,
>> please notify the sender immediately by return e-mail and delete all copies
>> of this message.
>
>


Re: Release notes 1.1.0?

2016-08-09 Thread Aljoscha Krettek
Hi,
could you maybe post how exactly you specify the window? Also, did you set
a "stream time characteristic", for example EventTime?

That could help us pinpoint the problem.

Cheers,
Aljoscha

On Tue, 9 Aug 2016 at 12:42 Andrew Ge Wu  wrote:

> I rolled back to 1.0.3
> If I understand this correctly, the peak when topology starts is because
> it is trying to fill all the buffers, but I can not see that in 1.1.0.
>
>
>
> On 09 Aug 2016, at 12:10, Robert Metzger  wrote:
>
> Which source are you using?
>
> On Tue, Aug 9, 2016 at 11:50 AM, Andrew Ge Wu 
> wrote:
>
>> Hi Robert
>>
>>
>> Thanks for the quick reply, I guess I’m one of the early birds.
>> Yes, it is much slower, I’m not sure why, I copied slaves, masters,
>> log4j.properties and flink-conf.yaml directly from 1.0.3
>> I have parallelization 1 on my sources, I can increase that to achieve
>> the same speed, but I’m interested to know why is that.
>>
>>
>> Thanks!
>>
>>
>> Andrew
>>
>> On 09 Aug 2016, at 11:47, Robert Metzger  wrote:
>>
>> Hi Andrew,
>>
>> here is the release announcement, with a list of all changes:
>> http://flink.apache.org/news/2016/08/08/release-1.1.0.html,
>> http://flink.apache.org/blog/release_1.1.0-changelog.html
>>
>> What does the chart say? Are the results different? is Flink faster or
>> slower now?
>>
>>
>> Regards,
>> Robert
>>
>> On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu 
>> wrote:
>>
>>> Hi,
>>>
>>> We found out there is a new stable version released: 1.1.0 but we can
>>> not find any release note.
>>> Do anyone know where to find it?
>>>
>>>
>>> We are experience some change of behavior, I’m not sure if it is related.
>>>
>>> 
>>>
>>> Thanks
>>>
>>>
>>> Andrew
>>>
>>> Confidentiality Notice: This e-mail transmission may contain
>>> confidential or legally privileged information that is intended only for
>>> the individual or entity named in the e-mail address. If you are not the
>>> intended recipient, you are hereby notified that any disclosure, copying,
>>> distribution, or reliance upon the contents of this e-mail is strictly
>>> prohibited and may be unlawful. If you have received this e-mail in error,
>>> please notify the sender immediately by return e-mail and delete all copies
>>> of this message.
>>
>>
>>
>>
>> Confidentiality Notice: This e-mail transmission may contain confidential
>> or legally privileged information that is intended only for the individual
>> or entity named in the e-mail address. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution, or reliance upon the contents of this e-mail is strictly
>> prohibited and may be unlawful. If you have received this e-mail in error,
>> please notify the sender immediately by return e-mail and delete all copies
>> of this message.
>>
>
>
> Confidentiality Notice: This e-mail transmission may contain confidential
> or legally privileged information that is intended only for the individual
> or entity named in the e-mail address. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or reliance upon the contents of this e-mail is strictly
> prohibited and may be unlawful. If you have received this e-mail in error,
> please notify the sender immediately by return e-mail and delete all copies
> of this message.


Re: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-09 Thread Ufuk Celebi
This is a problem with the Maven artifacts of 1.1.0 :-( I've added a
warning to the release note and will start a emergency vote for 1.1.1
which only updates the Maven artifacts.

On Tue, Aug 9, 2016 at 1:20 PM, LINZ, Arnaud  wrote:
> Okay,
>
> That would also solve my issue.
>
> Greetings,
>
> Arnaud
>
>
>
> De : Stephan Ewen [mailto:se...@apache.org]
> Envoyé : mardi 9 août 2016 12:41
> À : user@flink.apache.org
> Objet : Re: Classloader issue using AvroParquetInputFormat via
> HadoopInputFormat
>
>
>
> Hi Shannon!
>
>
>
> It seams that the something in the maven deployment went wrong with this
> release.
>
>
>
> There should be:
>
>   - flink-java (the default, with a transitive dependency to hadoop 2.x for
> hadoop compatibility features)
>
>   - flink-java-hadoop1 (with a transitive dependency for hadoop 1.x fir
> older hadoop compatibility features)
>
>
>
> Apparently the "flink-java" artifact git overwritten with the
> "flink-java-hadoop1" artifact. Damn.
>
>
>
> I think we need to release new artifacts that fix these dependency
> descriptors.
>
>
>
> That needs to be a 1.1.1 release, because maven artifacts cannot be changed
> after they were deployed.
>
>
>
> Greetings,
> Stephan
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Aug 8, 2016 at 11:08 PM, Shannon Carey  wrote:
>
> Correction: I cannot work around the problem. If I exclude hadoop1, I get
> the following exception which appears to be due to flink-java-1.1.0's
> dependency on Hadoop1.
>
>
>
> Failed to submit job 4b6366d101877d38ef33454acc6ca500
> (com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
>
> org.apache.flink.runtime.client.JobExecutionException: Failed to submit job
> 4b6366d101877d38ef33454acc6ca500
> (com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
>
> at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>
> at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>
> at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>
> at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Caused by: org.apache.flink.runtime.JobException: Creating the input splits
> caused an error: Found interface org.apache.hadoop.mapreduce.JobContext, but
> class was expected
>
> at
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
>
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:695)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1178)
>
> ... 19 more
>
> Caused by: java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
>
> at
> org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:158)
>
> at
> org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:56)
>
> at
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:156)
>
> ... 21 more
>
>
>
> And if I exclude hadoop2, I get the exception from my previous email with
> AvroParquetInputFormat.
>
>
>
>
>
>
>
> From: Shannon Carey 
> Date: Monday, August 8, 2016 at 2:46 PM
> To: "user@flink.apache.org" 
> Subject: Classloader issue using AvroParquetInputFormat via
> HadoopInputFormat
>
>
>
> Hi folks, congrats on 1.1.0!
>
>
>
> FYI, after updating to Flink 1.1.0 I get the exception at bottom when
> attempting to run a job that uses 

Re: Flink 1.1.0 : Hadoop 1/2 compatibility mismatch

2016-08-09 Thread Ufuk Celebi
As noted in the other thread, this is a problem with the Maven
artifacts of 1.1.0 :-( I've added a warning to the release note and
will start a emergency vote for 1.1.1 which only updates the Maven
artifacts.

On Tue, Aug 9, 2016 at 9:45 AM, LINZ, Arnaud  wrote:
> Hello,
>
>
>
> I’ve switched to 1.1.0, but part of my code doesn’t work any longer.
>
>
>
> Despite the fact that I have no Hadoop 1 jar in my dependencies (2.7.1
> clients & flink-hadoop-compatibility_2.10 1.1.0), I have a weird JobContext
> version mismatch error, that I was unable to understand.
>
>
>
> Code is a hive table read in a local batch flink cluster using a M/R job
> (from good package mapreduce, not mapred).
>
>
>
> import org.apache.hadoop.mapreduce.InputFormat;
>
> import org.apache.hadoop.mapreduce.Job;
>
> import org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat;
>
> (…)
>
> final Job job = Job.getInstance();
>
> final InputFormat hCatInputFormat =
> (InputFormat) HCatInputFormat.setInput(job, table.getDbName(),
> table.getTableName(), filter);
>
>
>
> final HadoopInputFormat inputFormat
> = new HadoopInputFormat
> DefaultHCatRecord>(hCatInputFormat, NullWritable.class,
> DefaultHCatRecord.class,  job);
>
>
>
>
>
> final HCatSchema inputSchema =
> HCatInputFormat.getTableSchema(job.getConfiguration());
>
> return cluster
>
> .createInput(inputFormat)
>
> .flatMap(new RichFlatMapFunction DefaultHCatRecord>, T>() {
>
> @Override
>
> public void flatMap(Tuple2 DefaultHCatRecord> value,
>
> Collector out) throws Exception { // NOPMD
>
> (...)
>
> }
>
> }).returns(beanClass);
>
>
>
>
>
> Exception is :
>
> org.apache.flink.runtime.client.JobExecutionException: Failed to submit job
> 69dba7e4d79c05d2967dca4d4a27cf38 (Flink Java Job at Tue Aug 09 09:19:41 CEST
> 2016)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>
> at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>
> at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>
> at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>
> at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Caused by: org.apache.flink.runtime.JobException: Creating the input splits
> caused an error: Found interface org.apache.hadoop.mapreduce.JobContext, but
> class was expected
>
> at
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
>
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:695)

RE: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-09 Thread LINZ, Arnaud
Okay,
That would also solve my issue.
Greetings,
Arnaud

De : Stephan Ewen [mailto:se...@apache.org]
Envoyé : mardi 9 août 2016 12:41
À : user@flink.apache.org
Objet : Re: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

Hi Shannon!

It seams that the something in the maven deployment went wrong with this 
release.

There should be:
  - flink-java (the default, with a transitive dependency to hadoop 2.x for 
hadoop compatibility features)
  - flink-java-hadoop1 (with a transitive dependency for hadoop 1.x fir older 
hadoop compatibility features)

Apparently the "flink-java" artifact git overwritten with the 
"flink-java-hadoop1" artifact. Damn.

I think we need to release new artifacts that fix these dependency descriptors.

That needs to be a 1.1.1 release, because maven artifacts cannot be changed 
after they were deployed.

Greetings,
Stephan






On Mon, Aug 8, 2016 at 11:08 PM, Shannon Carey 
> wrote:
Correction: I cannot work around the problem. If I exclude hadoop1, I get the 
following exception which appears to be due to flink-java-1.1.0's dependency on 
Hadoop1.

Failed to submit job 4b6366d101877d38ef33454acc6ca500 
(com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
org.apache.flink.runtime.client.JobExecutionException: Failed to submit job 
4b6366d101877d38ef33454acc6ca500 
(com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Creating the input splits 
caused an error: Found interface org.apache.hadoop.mapreduce.JobContext, but 
class was expected
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:695)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1178)
... 19 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected
at 
org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:158)
at 
org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:56)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:156)
... 21 more

And if I exclude hadoop2, I get the exception from my previous email with 
AvroParquetInputFormat.



From: Shannon Carey >
Date: Monday, August 8, 2016 at 2:46 PM
To: "user@flink.apache.org" 
>
Subject: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

Hi folks, congrats on 1.1.0!

FYI, after updating to Flink 1.1.0 I get the exception at bottom when 
attempting to run a job that uses AvroParquetInputFormat wrapped in a Flink 
HadoopInputFormat. The ContextUtil.java:71 is trying to execute:

Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl");

I am using Scala 2.11.7. JobContextImpl is coming from 
flink-shaded-hadoop2:1.1.0. However, its parent 

Re: Release notes 1.1.0?

2016-08-09 Thread Andrew Ge Wu
I rolled back to 1.0.3

If I understand this correctly, the peak when topology starts is because it is 
trying to fill all the buffers, but I can not see that in 1.1.0.



> On 09 Aug 2016, at 12:10, Robert Metzger  wrote:
> 
> Which source are you using?
> 
> On Tue, Aug 9, 2016 at 11:50 AM, Andrew Ge Wu  > wrote:
> Hi Robert
> 
> 
> Thanks for the quick reply, I guess I’m one of the early birds.
> Yes, it is much slower, I’m not sure why, I copied slaves, masters, 
> log4j.properties and flink-conf.yaml directly from 1.0.3
> I have parallelization 1 on my sources, I can increase that to achieve the 
> same speed, but I’m interested to know why is that.
> 
> 
> Thanks!
> 
> 
> Andrew
>> On 09 Aug 2016, at 11:47, Robert Metzger > > wrote:
>> 
>> Hi Andrew,
>> 
>> here is the release announcement, with a list of all changes: 
>> http://flink.apache.org/news/2016/08/08/release-1.1.0.html 
>> , 
>> http://flink.apache.org/blog/release_1.1.0-changelog.html 
>> 
>> 
>> What does the chart say? Are the results different? is Flink faster or 
>> slower now?
>> 
>> 
>> Regards,
>> Robert
>> 
>> On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu > > wrote:
>> Hi,
>> 
>> We found out there is a new stable version released: 1.1.0 but we can not 
>> find any release note.
>> Do anyone know where to find it?
>> 
>> 
>> We are experience some change of behavior, I’m not sure if it is related.
>> 
>> 
>> 
>> Thanks
>> 
>> 
>> Andrew
>> 
>> Confidentiality Notice: This e-mail transmission may contain confidential or 
>> legally privileged information that is intended only for the individual or 
>> entity named in the e-mail address. If you are not the intended recipient, 
>> you are hereby notified that any disclosure, copying, distribution, or 
>> reliance upon the contents of this e-mail is strictly prohibited and may be 
>> unlawful. If you have received this e-mail in error, please notify the 
>> sender immediately by return e-mail and delete all copies of this message.
>> 
> 
> 
> Confidentiality Notice: This e-mail transmission may contain confidential or 
> legally privileged information that is intended only for the individual or 
> entity named in the e-mail address. If you are not the intended recipient, 
> you are hereby notified that any disclosure, copying, distribution, or 
> reliance upon the contents of this e-mail is strictly prohibited and may be 
> unlawful. If you have received this e-mail in error, please notify the sender 
> immediately by return e-mail and delete all copies of this message.
> 


-- 
Confidentiality Notice: This e-mail transmission may contain confidential 
or legally privileged information that is intended only for the individual 
or entity named in the e-mail address. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, 
distribution, or reliance upon the contents of this e-mail is strictly 
prohibited and may be unlawful. If you have received this e-mail in error, 
please notify the sender immediately by return e-mail and delete all copies 
of this message.


Re: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-09 Thread Stephan Ewen
Hi Shannon!

It seams that the something in the maven deployment went wrong with this
release.

There should be:
  - *flink-java* (the default, with a transitive dependency to hadoop 2.x
for hadoop compatibility features)
  - *flink-java-hadoop1* (with a transitive dependency for hadoop 1.x fir
older hadoop compatibility features)

Apparently the "flink-java" artifact git overwritten with the
"flink-java-hadoop1" artifact. Damn.

I think we need to release new artifacts that fix these dependency
descriptors.

That needs to be a 1.1.1 release, because maven artifacts cannot be changed
after they were deployed.

Greetings,
Stephan






On Mon, Aug 8, 2016 at 11:08 PM, Shannon Carey  wrote:

> Correction: I cannot work around the problem. If I exclude hadoop1, I get
> the following exception which appears to be due to flink-java-1.1.0's
> dependency on Hadoop1.
>
> Failed to submit job 4b6366d101877d38ef33454acc6ca500 (
> com.expedia.www.flink.jobs.DestinationCountsHistoryJob$)
> org.apache.flink.runtime.client.JobExecutionException: Failed to submit
> job 4b6366d101877d38ef33454acc6ca500 (com.expedia.www.flink.jobs.
> DestinationCountsHistoryJob$)
> at org.apache.flink.runtime.jobmanager.JobManager.org$
> apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:
> 1281)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
> at scala.runtime.AbstractPartialFunction.apply(
> AbstractPartialFunction.scala:36)
> at org.apache.flink.runtime.LeaderSessionMessageFilter$$
> anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
> at scala.runtime.AbstractPartialFunction.apply(
> AbstractPartialFunction.scala:36)
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(
> LogMessages.scala:33)
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(
> LogMessages.scala:28)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> at org.apache.flink.runtime.LogMessages$$anon$1.
> applyOrElse(LogMessages.scala:28)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at org.apache.flink.runtime.jobmanager.JobManager.
> aroundReceive(JobManager.scala:121)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.runtime.JobException: Creating the input
> splits caused an error: Found interface 
> org.apache.hadoop.mapreduce.JobContext,
> but class was expected
> at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(
> ExecutionJobVertex.java:172)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> attachJobGraph(ExecutionGraph.java:695)
> at org.apache.flink.runtime.jobmanager.JobManager.org$
> apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:
> 1178)
> ... 19 more
> Caused by: java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.
> createInputSplits(HadoopInputFormatBase.java:158)
> at org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.
> createInputSplits(HadoopInputFormatBase.java:56)
> at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(
> ExecutionJobVertex.java:156)
> ... 21 more
>
> And if I exclude hadoop2, I get the exception from my previous email with
> AvroParquetInputFormat.
>
>
>
> From: Shannon Carey 
> Date: Monday, August 8, 2016 at 2:46 PM
> To: "user@flink.apache.org" 
> Subject: Classloader issue using AvroParquetInputFormat via
> HadoopInputFormat
>
> Hi folks, congrats on 1.1.0!
>
> FYI, after updating to Flink 1.1.0 I get the exception at bottom when
> attempting to run a job that uses AvroParquetInputFormat wrapped in a Flink
> HadoopInputFormat. The ContextUtil.java:71 is trying to execute:
>
> Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl");
>
> I am using Scala * 2.11*.7. JobContextImpl is coming from flink-shaded-
> *hadoop2*:1.1.0. However, its parent class (JobContext) is actually being
> loaded (according to output with JVM param "-verbose:class") from the
> flink-shaded-*hadoop1_2.10* jar.
>
> After adding an exclusion on flink-shaded-hadoop1_2.10, the problem
> appears to be resolved. Is that the right way to fix the problem?
>
> From what I can tell, the problem is that the JARs that are deployed to
> Maven Central were 

Re: Release notes 1.1.0?

2016-08-09 Thread Robert Metzger
Which source are you using?

On Tue, Aug 9, 2016 at 11:50 AM, Andrew Ge Wu 
wrote:

> Hi Robert
>
>
> Thanks for the quick reply, I guess I’m one of the early birds.
> Yes, it is much slower, I’m not sure why, I copied slaves, masters,
> log4j.properties and flink-conf.yaml directly from 1.0.3
> I have parallelization 1 on my sources, I can increase that to achieve the
> same speed, but I’m interested to know why is that.
>
>
> Thanks!
>
>
> Andrew
>
> On 09 Aug 2016, at 11:47, Robert Metzger  wrote:
>
> Hi Andrew,
>
> here is the release announcement, with a list of all changes:
> http://flink.apache.org/news/2016/08/08/release-1.1.0.html,
> http://flink.apache.org/blog/release_1.1.0-changelog.html
>
> What does the chart say? Are the results different? is Flink faster or
> slower now?
>
>
> Regards,
> Robert
>
> On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu 
> wrote:
>
>> Hi,
>>
>> We found out there is a new stable version released: 1.1.0 but we can not
>> find any release note.
>> Do anyone know where to find it?
>>
>>
>> We are experience some change of behavior, I’m not sure if it is related.
>>
>> 
>>
>> Thanks
>>
>>
>> Andrew
>>
>> Confidentiality Notice: This e-mail transmission may contain confidential
>> or legally privileged information that is intended only for the individual
>> or entity named in the e-mail address. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution, or reliance upon the contents of this e-mail is strictly
>> prohibited and may be unlawful. If you have received this e-mail in error,
>> please notify the sender immediately by return e-mail and delete all copies
>> of this message.
>
>
>
>
> Confidentiality Notice: This e-mail transmission may contain confidential
> or legally privileged information that is intended only for the individual
> or entity named in the e-mail address. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or reliance upon the contents of this e-mail is strictly
> prohibited and may be unlawful. If you have received this e-mail in error,
> please notify the sender immediately by return e-mail and delete all copies
> of this message.
>


Re: Release notes 1.1.0?

2016-08-09 Thread Andrew Ge Wu
Hi Robert


Thanks for the quick reply, I guess I’m one of the early birds.
Yes, it is much slower, I’m not sure why, I copied slaves, masters, 
log4j.properties and flink-conf.yaml directly from 1.0.3
I have parallelization 1 on my sources, I can increase that to achieve the same 
speed, but I’m interested to know why is that.


Thanks!


Andrew
> On 09 Aug 2016, at 11:47, Robert Metzger  wrote:
> 
> Hi Andrew,
> 
> here is the release announcement, with a list of all changes: 
> http://flink.apache.org/news/2016/08/08/release-1.1.0.html 
> , 
> http://flink.apache.org/blog/release_1.1.0-changelog.html 
> 
> 
> What does the chart say? Are the results different? is Flink faster or slower 
> now?
> 
> 
> Regards,
> Robert
> 
> On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu  > wrote:
> Hi,
> 
> We found out there is a new stable version released: 1.1.0 but we can not 
> find any release note.
> Do anyone know where to find it?
> 
> 
> We are experience some change of behavior, I’m not sure if it is related.
> 
> 
> 
> Thanks
> 
> 
> Andrew
> 
> Confidentiality Notice: This e-mail transmission may contain confidential or 
> legally privileged information that is intended only for the individual or 
> entity named in the e-mail address. If you are not the intended recipient, 
> you are hereby notified that any disclosure, copying, distribution, or 
> reliance upon the contents of this e-mail is strictly prohibited and may be 
> unlawful. If you have received this e-mail in error, please notify the sender 
> immediately by return e-mail and delete all copies of this message.
> 


-- 
Confidentiality Notice: This e-mail transmission may contain confidential 
or legally privileged information that is intended only for the individual 
or entity named in the e-mail address. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, 
distribution, or reliance upon the contents of this e-mail is strictly 
prohibited and may be unlawful. If you have received this e-mail in error, 
please notify the sender immediately by return e-mail and delete all copies 
of this message.


Re: Release notes 1.1.0?

2016-08-09 Thread Robert Metzger
Hi Andrew,

here is the release announcement, with a list of all changes:
http://flink.apache.org/news/2016/08/08/release-1.1.0.html,
http://flink.apache.org/blog/release_1.1.0-changelog.html

What does the chart say? Are the results different? is Flink faster or
slower now?


Regards,
Robert

On Tue, Aug 9, 2016 at 11:32 AM, Andrew Ge Wu 
wrote:

> Hi,
>
> We found out there is a new stable version released: 1.1.0 but we can not
> find any release note.
> Do anyone know where to find it?
>
>
> We are experience some change of behavior, I’m not sure if it is related.
>
>
> Thanks
>
>
> Andrew
>
> Confidentiality Notice: This e-mail transmission may contain confidential
> or legally privileged information that is intended only for the individual
> or entity named in the e-mail address. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or reliance upon the contents of this e-mail is strictly
> prohibited and may be unlawful. If you have received this e-mail in error,
> please notify the sender immediately by return e-mail and delete all copies
> of this message.


Release notes 1.1.0?

2016-08-09 Thread Andrew Ge Wu
Hi,

We found out there is a new stable version released: 1.1.0 but we can not find 
any release note.
Do anyone know where to find it?


We are experience some change of behavior, I’m not sure if it is related.



Thanks


Andrew
-- 
Confidentiality Notice: This e-mail transmission may contain confidential 
or legally privileged information that is intended only for the individual 
or entity named in the e-mail address. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, 
distribution, or reliance upon the contents of this e-mail is strictly 
prohibited and may be unlawful. If you have received this e-mail in error, 
please notify the sender immediately by return e-mail and delete all copies 
of this message.


specify user name when connecting to hdfs

2016-08-09 Thread Dong-iL, Kim
Hi.
I’m trying to set external hdfs as state backend.
my os user name is ec2-user. hdfs user is hadoop.
there is a permission denied exception.
I wanna specify hdfs user name.
I set hadoop.job.ugi in core-site.xml and HADOOP_USER_NAME on command line.
but not works.
what shall I do?
thanks.

Flink 1.1.0 : Hadoop 1/2 compatibility mismatch

2016-08-09 Thread LINZ, Arnaud
Hello,

I’ve switched to 1.1.0, but part of my code doesn’t work any longer.

Despite the fact that I have no Hadoop 1 jar in my dependencies (2.7.1 clients 
& flink-hadoop-compatibility_2.10 1.1.0), I have a weird JobContext version 
mismatch error, that I was unable to understand.

Code is a hive table read in a local batch flink cluster using a M/R job (from 
good package mapreduce, not mapred).

import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat;
(…)
final Job job = Job.getInstance();
final InputFormat hCatInputFormat = 
(InputFormat) HCatInputFormat.setInput(job, table.getDbName(), 
table.getTableName(), filter);

final HadoopInputFormat inputFormat = 
new HadoopInputFormat(hCatInputFormat, NullWritable.class, 
DefaultHCatRecord.class,  job);


final HCatSchema inputSchema = 
HCatInputFormat.getTableSchema(job.getConfiguration());
return cluster
.createInput(inputFormat)
.flatMap(new RichFlatMapFunction, T>() {
@Override
public void flatMap(Tuple2 
value,
Collector out) throws Exception { // NOPMD
(...)
}
}).returns(beanClass);


Exception is :
org.apache.flink.runtime.client.JobExecutionException: Failed to submit job 
69dba7e4d79c05d2967dca4d4a27cf38 (Flink Java Job at Tue Aug 09 09:19:41 CEST 
2016)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1281)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:478)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at 
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:121)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Creating the input splits 
caused an error: Found interface org.apache.hadoop.mapreduce.JobContext, but 
class was expected
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:695)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1178)
... 23 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected
at 
org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:158)
at 

Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-09 Thread Fabian Hueske
Thanks Ufuk and everybody who contributed to the release!

Cheers, Fabian

2016-08-08 20:41 GMT+02:00 Henry Saputra :

> Great work all. Great Thanks to Ufuk as RE :)
>
> On Monday, August 8, 2016, Stephan Ewen  wrote:
>
> > Great work indeed, and big thanks, Ufuk!
> >
> > On Mon, Aug 8, 2016 at 6:55 PM, Vasiliki Kalavri <
> > vasilikikala...@gmail.com >
> > wrote:
> >
> > > yoo-hoo finally announced 
> > > Thanks for managing the release Ufuk!
> > >
> > > On 8 August 2016 at 18:36, Ufuk Celebi >
> > wrote:
> > >
> > > > The Flink PMC is pleased to announce the availability of Flink 1.1.0.
> > > >
> > > > On behalf of the PMC, I would like to thank everybody who contributed
> > > > to the release.
> > > >
> > > > The release announcement:
> > > > http://flink.apache.org/news/2016/08/08/release-1.1.0.html
> > > >
> > > > Release binaries:
> > > > http://apache.openmirror.de/flink/flink-1.1.0/
> > > >
> > > > Please update your Maven dependencies to the new 1.1.0 version and
> > > > update your binaries.
> > > >
> > > > – Ufuk
> > > >
> > >
> >
>