Re: Flink 0.10.2 and Kafka 0.8.1

2016-04-17 Thread Balaji Rajagopalan
I had fought with 0.8.0.2 kafka and flink 0.10.2 scala version 2.11, was
never able to get it working confounded with noclassdeffounderror, moved to
flink 1.0.0 with kafka 0.8.0.2  scala version 2.11 things worked for me, if
moving to flink 1.0.0 is an option for you do so.

balaji

On Mon, Apr 18, 2016 at 3:19 AM, Robert Schmidtke 
wrote:

> Hi everyone,
>
> I have a Kafka cluster running on version 0.8.1, hence I'm using the
> FlinkKafkaConsumer081. When running my program, I saw a
> NoClassDefFoundError for org.apache.kafka.common.Node. So I packaged my
> binaries according to
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution,
> however I'm still seeing the error.
>
> I played around a bit and it turns out I have to package kafka-clients v.
> 0.8.2.0 instead of kafka_2.10 v. 0.8.1 with my program. Is there an error
> in the documentation or have I not figured out something properly?
>
> Thanks!
> Robert
>
> --
> My GPG Key ID: 336E2680
>


Flink 0.10.2 and Kafka 0.8.1

2016-04-17 Thread Robert Schmidtke
Hi everyone,

I have a Kafka cluster running on version 0.8.1, hence I'm using the
FlinkKafkaConsumer081. When running my program, I saw a
NoClassDefFoundError for org.apache.kafka.common.Node. So I packaged my
binaries according to
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution,
however I'm still seeing the error.

I played around a bit and it turns out I have to package kafka-clients v.
0.8.2.0 instead of kafka_2.10 v. 0.8.1 with my program. Is there an error
in the documentation or have I not figured out something properly?

Thanks!
Robert

-- 
My GPG Key ID: 336E2680


Re: withBroadcastSet for a DataStream missing?

2016-04-17 Thread Stavros Kontopoulos
Im trying what you suggested. Is this what you are suggesting (this is just
a skeleton of logic not the actual implementation)?

val dataStream =  ... //window based stream

val modelStream = ...

val connected = dataStream.connect(modelStream)

val output = connected.map(
(x:String) => { true},
(y: MyModel) => {false}
  ).iterate {
iteration =>

  val feedback = iteration.filter(!_)
  feedback.broadcast
  (feedback, iteration.filter(x => x))
  }

  output.split(
(b: Boolean) => b match {
  case true => List("true")
  case false => List("false")
}
  ).select("true")


I could save the model In coFlatMap but ideally i need the same model
everywhere. Broadcast does that? From the documentation i read it sends the
output to all parallel operators.
Iteration is executed anytime there is data according to the input window
stream or is it done independently so i can feed back my improved model
(like in datasets case)?
If the latter holds does that mean all partial updates from all operators
will have to be processed from each operator before the the next window
processing begins?

Thnx!


On Fri, Apr 1, 2016 at 10:51 PM, Stavros Kontopoulos <
st.kontopou...@gmail.com> wrote:

> Ok thnx Till i will give it a shot!
>
> On Thu, Mar 31, 2016 at 11:25 AM, Till Rohrmann 
> wrote:
>
>> Hi Stavros,
>>
>> you might be able to solve your problem using a CoFlatMap operation with
>> iterations. You would use one of the inputs for the iteration on which you
>> broadcast the model updates to every operator. On the other input you would
>> receive the data points which you want to cluster. As output you would emit
>> the clustered points and model updates. Here you have to use the split
>> and select function to split the output stream into model updates and
>> output elements. It’s important to broadcast the model updates, otherwise
>> not all operators have the same clustering model.
>>
>> Cheers,
>> Till
>> ​
>>
>> On Tue, Mar 29, 2016 at 7:23 PM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> H i am new here...
>>>
>>> I am trying to implement online k-means as here
>>> https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
>>> with flink.
>>> I dont see anywhere a withBroadcastSet call to save intermediate results
>>> is this currently supported?
>>>
>>> Is intermediate results state saved somewhere like in this example a
>>> viable alternative:
>>>
>>> https://github.com/StephanEwen/flink-demos/blob/master/streaming-state-machine/src/main/scala/com/dataartisans/flink/example/eventpattern/StreamingDemo.scala
>>>
>>> Thnx,
>>> Stavros
>>>
>>
>>
>


fan out parallel-able operator sub-task beyond total slots number

2016-04-17 Thread Chen Qin
Hi there,


I try run large number of subtasks within a task slot using slot sharing
group. The usage scenario tried to adress operator that makes a network
call with high latency yet less memory or cpu footprint. (sample code below)

>From doc provided, slotsharinggroup seems the place to look at. Yet it
seems it were not designed to address the scenario above.
https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#workers-slots-resources

My question is, which is best way to fan out large number of sub tasking
parallel within a task?

public void testFanOut() throws Exception{
env = StreamExecutionEnvironment.getExecutionEnvironment();
...
env.addSource(...).setParallelism(1).disableChaining().shuffle().flatMap(new
FlatMapFunction() {
@Override
public void flatMap(DummyFlinkRecord dummyFlinkRecord,
Collector collector) throws Exception {
Thread.sleep(1000); //latency is high, needs to fan out
collector.collect(1l);
}
}).slotSharingGroup("flatmap").setParallelism(100).rebalance().filter(new
FilterFunction() {
@Override
public boolean filter(Long aLong) throws Exception {
return true;
}
}).setParallelism(10).addSink(new SinkFunction() {
@Override
public void invoke(Long aLong) throws Exception {
System.out.println(aLong);
}
});
env.execute("fan out 100 subtasks for 1s delay mapper");
}

Thanks,
Chen Qin


Re:

2016-04-17 Thread Matthias J. Sax
Can you be a little bit more precise. It fails when you try to do

  bin/start-local.sh

?? Or what do you mean by "try to start the web interface"? The web
interface is started automatically within the JobManager process.

What is the exact error message. Is there any stack trace? Anny error in
the log files (in directory log/)

-Matthias

On 04/17/2016 03:50 PM, Ahmed Nader wrote:
> Sorry the error is can't find the path specified*
> 
> On 17 April 2016 at 15:49, Ahmed Nader  > wrote:
> 
> Thanks, I followed the instructions and when i try to start the web
> interface i get an error can't find file specified. I tried to
> change the env.java.home variable to the path of Java JDK or Java
> JRE on my machine however still i get the same error.
> Any idea how to solve this? 
> 
> On 17 April 2016 at 12:48, Matthias J. Sax  > wrote:
> 
> You need to download Flink and install it. Follow this instructions:
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html
> 
> -Matthias
> 
> On 04/16/2016 04:00 PM, Ahmed Nader wrote:
> > Hello,
> > I'm new to flink so this might seem a basic question. I added
> flink to
> > an existing project using maven and can run the program
> locally with
> > StreamExecutionEnvironment with no problems, however i want to
> know how
> > can I submit jobs for that project and be able to view these
> jobs from
> > flink's web interface and run these jobs, while i don't have the
> > flink/bin folder in my project structure as i only added the
> dependencies.
> > Thanks.
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Missing metrics on Flink UI

2016-04-17 Thread Aljoscha Krettek
Thanks for the heads up! I'm glad you figured it out.

On Sun, Apr 17, 2016, 13:35 neo21 zerro  wrote:

> Nevermind, I've figured it out.
> I was skipping the tuples that were coming from kafka based on some custom
> login.
> That custom logic made sure that the kafka operator did not emit any
> tuples.
> Hence, the missing metrics in the flink ui.
>
>
>
> On Thursday, April 14, 2016 1:12 AM, neo21 zerro 
> wrote:
> Hello everybody,
>
> I have deployed the latest Flink Version 1.0.1 on Yarn 2.5.0-cdh5.3.0.
> When I push the WordCount example shipped with the Flink distribution, I
> can see metrics (bytes received) in the Flink Ui on the corresponding
> operator.
> However, I used the flink kafka connector and when I run my topology I
> cannot see any metrics reported on the operators. Has anybody experienced
> something like this?
>
> Thanks!
>


Re: providing java system arguments(-D) to specific job

2016-04-17 Thread Igor Berman
for the sake of history(at task manager level):
in conf/flink-conf.yaml
env.java.opts: -Dmy-prop=bla -Dmy-prop2=bla2


On 17 April 2016 at 16:25, Igor Berman  wrote:

> How do I provide java arguments while submitting job? Suppose I have some
> legacy component that is dependent on java argument configuration.
>
> I suppose Flink reuses same jvm for all jobs, so in general I can start
> task manager with desired arguments, but then all my jobs can't have
> different system arguments.
>
> any suggestions?
>
>
>
>
>
>


Re:

2016-04-17 Thread Ahmed Nader
Sorry the error is can't find the path specified*

On 17 April 2016 at 15:49, Ahmed Nader  wrote:

> Thanks, I followed the instructions and when i try to start the web
> interface i get an error can't find file specified. I tried to change the
> env.java.home variable to the path of Java JDK or Java JRE on my machine
> however still i get the same error.
> Any idea how to solve this?
>
> On 17 April 2016 at 12:48, Matthias J. Sax  wrote:
>
>> You need to download Flink and install it. Follow this instructions:
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html
>>
>> -Matthias
>>
>> On 04/16/2016 04:00 PM, Ahmed Nader wrote:
>> > Hello,
>> > I'm new to flink so this might seem a basic question. I added flink to
>> > an existing project using maven and can run the program locally with
>> > StreamExecutionEnvironment with no problems, however i want to know how
>> > can I submit jobs for that project and be able to view these jobs from
>> > flink's web interface and run these jobs, while i don't have the
>> > flink/bin folder in my project structure as i only added the
>> dependencies.
>> > Thanks.
>>
>>
>


Re:

2016-04-17 Thread Ahmed Nader
Thanks, I followed the instructions and when i try to start the web
interface i get an error can't find file specified. I tried to change the
env.java.home variable to the path of Java JDK or Java JRE on my machine
however still i get the same error.
Any idea how to solve this?

On 17 April 2016 at 12:48, Matthias J. Sax  wrote:

> You need to download Flink and install it. Follow this instructions:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html
>
> -Matthias
>
> On 04/16/2016 04:00 PM, Ahmed Nader wrote:
> > Hello,
> > I'm new to flink so this might seem a basic question. I added flink to
> > an existing project using maven and can run the program locally with
> > StreamExecutionEnvironment with no problems, however i want to know how
> > can I submit jobs for that project and be able to view these jobs from
> > flink's web interface and run these jobs, while i don't have the
> > flink/bin folder in my project structure as i only added the
> dependencies.
> > Thanks.
>
>


providing java system arguments(-D) to specific job

2016-04-17 Thread Igor Berman
How do I provide java arguments while submitting job? Suppose I have some
legacy component that is dependent on java argument configuration.

I suppose Flink reuses same jvm for all jobs, so in general I can start
task manager with desired arguments, but then all my jobs can't have
different system arguments.

any suggestions?


Re: Missing metrics on Flink UI

2016-04-17 Thread neo21 zerro
Nevermind, I've figured it out.
I was skipping the tuples that were coming from kafka based on some custom 
login. 
That custom logic made sure that the kafka operator did not emit any tuples. 
Hence, the missing metrics in the flink ui. 



On Thursday, April 14, 2016 1:12 AM, neo21 zerro  wrote:
Hello everybody, 

I have deployed the latest Flink Version 1.0.1 on Yarn 2.5.0-cdh5.3.0.
When I push the WordCount example shipped with the Flink distribution, I can 
see metrics (bytes received) in the Flink Ui on the corresponding operator. 
However, I used the flink kafka connector and when I run my topology I cannot 
see any metrics reported on the operators. Has anybody experienced something 
like this? 

Thanks!


Re: jar dependency in the cluster

2016-04-17 Thread Matthias J. Sax
Did you double check that your jar does contain the Kafka connector
classes? I would assume that the jar is not assembled correctly.

See her for some help on how to package jars correctly:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution


One way would be this (even if I think, this would not be required to
solve your problem):
https://stackoverflow.com/questions/31784051/how-to-reference-the-external-jar-in-flink

-Matthias


On 04/16/2016 08:31 PM, Radu Tudoran wrote:
> Hi,
> 
>  
> 
> Could anyone help me with the following problem:
> 
>  
> 
> I have a flink cluster of a couple of nodes (i am using the old version
> 0.10).
> 
> I am packaging a jar that needs to use kafka connector. When I create
> the jar in eclipse I am adding the flink connector dependency and set to
> be packed with the jar. Nevertheless, when I submitted it to be executed
> on the cluster I get an error that the jar connector is not visible for
> the class loader. Is there a way in which I can set flink to use a
> certain library path where to look for dependencies or maybe when I
> deploy either the flink cluster or submit the job to add extra
> dependencies.
> 
>  
> 
> Many thanks
> 
>  
> 
>  
> 
> Dr. Radu Tudoran
> 
> Research Engineer - Big Data Expert
> 
> IT R Division
> 
>  
> 
> cid:image007.jpg@01CD52EB.AD060EE0
> 
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> 
> European Research Center
> 
> Riesstrasse 25, 80992 München
> 
>  
> 
> E-mail: _radu.tudoran@huawei.com_
> 
> Mobile: +49 15209084330
> 
> Telephone: +49 891588344173
> 
>  
> 
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
> 
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
> 
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by phone or email immediately and delete it!
> 
>  
> 



signature.asc
Description: OpenPGP digital signature


Re:

2016-04-17 Thread Matthias J. Sax
You need to download Flink and install it. Follow this instructions:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html

-Matthias

On 04/16/2016 04:00 PM, Ahmed Nader wrote:
> Hello,
> I'm new to flink so this might seem a basic question. I added flink to
> an existing project using maven and can run the program locally with
> StreamExecutionEnvironment with no problems, however i want to know how
> can I submit jobs for that project and be able to view these jobs from
> flink's web interface and run these jobs, while i don't have the
> flink/bin folder in my project structure as i only added the dependencies.
> Thanks.



signature.asc
Description: OpenPGP digital signature


Testing Kafka interface using Flink interactive shell

2016-04-17 Thread Mich Talebzadeh
Hi,

IN Spark shell I can load Kafka jar file through spark-shell option --jar

spark-shell --master spark://50.140.197.217:7077 --jars
,/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar

This works fine.

In Flink I have added the jar file
/home/hduser/jars/flink-connector-kafka-0.10.1.jar to the CLASSPATH.

However I don't get any support for it within flink shell

Scala-Flink> import org.apache.flink.streaming.connectors.kafka
:54: error: object connectors is not a member of package
org.apache.flink.streaming
import org.apache.flink.streaming.connectors.kafka


Any ideas will be appreciated
  ^

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Re: throttled stream

2016-04-17 Thread Márton Balassi
There is a utility in flink-streaming-examples that might be useful, but is
generally the same idea that Niels suggests. [1]

[1]
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/utils/ThrottledIterator.java

On Sun, Apr 17, 2016 at 8:42 AM, Niels Basjes  wrote:

> Simple idea: create a map function that only does "sleep 1/5 second" and
> put that in your pipeline somewhere.
>
> Niels
> On 16 Apr 2016 22:38, "Chen Bekor"  wrote:
>
>> is there a way to consume a kafka stream using flink with  a predefined
>> rate limit (eg 5 events per second)
>>
>> we need this because we need to control some 3rd party api rate
>> limitations so,  even if we have a much larger throughput potential, we
>> must control the consumption rate in order not to overflow the API channel.
>>
>


Re: throttled stream

2016-04-17 Thread Niels Basjes
Simple idea: create a map function that only does "sleep 1/5 second" and
put that in your pipeline somewhere.

Niels
On 16 Apr 2016 22:38, "Chen Bekor"  wrote:

> is there a way to consume a kafka stream using flink with  a predefined
> rate limit (eg 5 events per second)
>
> we need this because we need to control some 3rd party api rate
> limitations so,  even if we have a much larger throughput potential, we
> must control the consumption rate in order not to overflow the API channel.
>