Re: Machine Learning on Apache Fink

2016-01-09 Thread Ashutosh Kumar
I see lot of study materials and even book available for ml on spark. Spark
seems to be more mature for analytics related work as of now. Please
correct me if I am wrong. As I have already built my collection and data
pre processing layers on flink , I want to use Flink for analytics as well.
Thanks in advance.


Ashutosh

On Sat, Jan 9, 2016 at 3:32 PM, Ashutosh Kumar 
wrote:

> I am looking for some study material and examples on machine learning .
> Are classification , recommendation and clustering libraries available ?
> What is the timeline for Flink as backend for Mahout? I am building a meta
> data driven framework over Flink . While building data collection and
> transformation modules was cool , I am struggling since I started analytics
> module. Thanks in advance.
> Ashutosh
>


Re: Working with storm compatibility layer

2016-01-09 Thread Shinhyung Yang
Dear Matthias,

Thank you for replying!

that sounds weird and should not happen -- Spout.open() should get
> called exactly once.


That's what I thought too. I'm new to both Storm and Flink so it's quite
complicated for me to handle both yet; would it be helpful for me if I know
storm's lifecyle and flink 's lifecycle? When submitTopology() invoked,
what should be called other than spout.open()?

I am not sure about multiple calls to
>
declareOuputFields though -- if might be called multiple times -- would
> need to double check the code.


I'll check my code too.


> However, the call to declareOuputFields should be idempotent, so it
> should actually not be a problem if it is called multiple times. Even if
> Storm might call this method only once, there is no guarantee that it is
> not called multiple time. If this is a problem for you, please let me
> know. I think we could fix this and make sure the method is only called
> once.


Actually it doesn't seem to be a problem for now. It just does the same job
multiple times.


> It would be helpful if you could share you code. What do you mean with
> "exits silently"? No submission happens? Did you check the logs? As you
> mentioned FlinkLocalCluster, I assume that you run within an IDE?


The topology doesn't seem to continue. There's a set of initialization code
in the open method of the program's spout and it looks hopeless if it's not
invoked. Is there any way to check the logs other than using println()
calls? I'm running it on the commandline with having
`bin/start_local.sh' running in the background and `bin/flink run'.


> Btw: lately we fixed a couple of bugs. I would suggest that you use the
> latest version from Flink master branch. I should work with 0.10.1
> without problems.


It was vey tedious for me to deal with a pom.xml file and .m2 repository.
So I preferred to use maven central. But I should try with the master
branch if I have to.

I will quickly check if I could share some of the code.

Thank you again for the help!
With best regards,
Shinhyung Yang


>
> -Matthias
>
>
>
> On 01/09/2016 01:27 AM, Shinhyung Yang wrote:
> > Howdies to everyone,
> >
> > I'm trying to use the storm compatibility layer on Flink 0.10.1. The
> > original storm topology works fine on Storm 0.9.5 and I have
> > incorporated FlinkLocalCluster, FlinkTopologyBuilder, and
> > FlinkTopology classes according to the programming guide
> > (
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html
> ).
> > I'm running it on Oracle Java 8 (1.8.0_66-b17) on Centos 7 (7.2.1511).
> > What happens is, it seems to be going all the way to submitTopology
> > method without any problem, however it doesn't invoke open method of
> > Spout class but declareOutputFields method is called for multiple
> > times and the program exits silently. Do you guys have any idea what's
> > going on here or have any suggestions? If needed, then please ask me
> > for more information.
> >
> > Thank you for reading.
> > With best regards,
> > Shinhyung Yang
> >
>
>


Re: Working with storm compatibility layer

2016-01-09 Thread Matthias J. Sax
Hello Shinhyung,

that sounds weird and should not happen -- Spout.open() should get
called exactly once. I am not sure about multiple calls to
declareOuputFields though -- if might be called multiple times -- would
need to double check the code.

However, the call to declareOuputFields should be idempotent, so it
should actually not be a problem if it is called multiple times. Even if
Storm might call this method only once, there is no guarantee that it is
not called multiple time. If this is a problem for you, please let me
know. I think we could fix this and make sure the method is only called
once.

It would be helpful if you could share you code. What do you mean with
"exits silently"? No submission happens? Did you check the logs? As you
mentioned FlinkLocalCluster, I assume that you run within an IDE?

Btw: lately we fixed a couple of bugs. I would suggest that you use the
latest version from Flink master branch. I should work with 0.10.1
without problems.

-Matthias



On 01/09/2016 01:27 AM, Shinhyung Yang wrote:
> Howdies to everyone,
> 
> I'm trying to use the storm compatibility layer on Flink 0.10.1. The
> original storm topology works fine on Storm 0.9.5 and I have
> incorporated FlinkLocalCluster, FlinkTopologyBuilder, and
> FlinkTopology classes according to the programming guide
> (https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html).
> I'm running it on Oracle Java 8 (1.8.0_66-b17) on Centos 7 (7.2.1511).
> What happens is, it seems to be going all the way to submitTopology
> method without any problem, however it doesn't invoke open method of
> Spout class but declareOutputFields method is called for multiple
> times and the program exits silently. Do you guys have any idea what's
> going on here or have any suggestions? If needed, then please ask me
> for more information.
> 
> Thank you for reading.
> With best regards,
> Shinhyung Yang
> 



signature.asc
Description: OpenPGP digital signature


Re: Working with storm compatibility layer

2016-01-09 Thread Matthias J. Sax
Hi,

I just double checked the Flink code and during translation from Storm
to Flink declareOuputFields() is called twice. You are right that is
does the same job twice, but that is actually not a problem. The Flink
code is cleaner this way to I guess we will not change it.

About lifecyle:
If you submit your code, during deployment, Spout.open() and
Bolt.prepare() should be called for each parallel instance on each
Spout/Bolt of your topology.

About your submission (I guess this should solve your current problem):
If you use bin/start-local.sh, you should *not* use FlinkLocalCluster,
but FlinkSubmitter. You have to distinguish three cases:

  - local/debug/IDE mode: use FlinkLocalCluster
=> you do not need to start any Flink cluster before --
FlinkLocalCluster is started up in you current JVM
* the purpose is local debugging in an IDE (this allows to easily
set break points and debug code)

  - pseudo-distributed mode: use FlinkSubmitter
=> you start up a local Flink cluster via bin/start-local.sh
* this local Flink cluster run in an own JVM and looks like a real
cluster to the Flink client, ie, "bin/flink run"
* thus, you just use FlinkSubmitter as for a real cluster (with
JobManager/Nimbus hostname "localhost")
* in contrast to FlinkLocalCluster, no "internal Flink Cluster" is
started in your current JVM, but your code is shipped to the local
cluster you started up beforehand via bin/start-local.sh and executed in
this JVM

  - distributed mode: use FlinkSubmitter
=> you start up Flink in a real cluster using bin/start-cluster.sh
* you use "bin/flink run" to submit your code to the real cluster


About further debugging: you can increase the log level to get more
information:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/logging.html

Hope this helps!

-Matthias

On 01/09/2016 04:38 PM, Shinhyung Yang wrote:
> Dear Matthias,
> 
> Thank you for replying!
> 
> that sounds weird and should not happen -- Spout.open() should get
> called exactly once.
> 
> 
> That's what I thought too. I'm new to both Storm and Flink so it's quite
> complicated for me to handle both yet; would it be helpful for me if I
> know storm's lifecyle and flink 's lifecycle? When submitTopology()
> invoked, what should be called other than spout.open()?
> 
> I am not sure about multiple calls to
> 
> declareOuputFields though -- if might be called multiple times -- would
> need to double check the code.
> 
> 
> I'll check my code too.
>  
> 
> However, the call to declareOuputFields should be idempotent, so it
> should actually not be a problem if it is called multiple times. Even if
> Storm might call this method only once, there is no guarantee that it is
> not called multiple time. If this is a problem for you, please let me
> know. I think we could fix this and make sure the method is only called
> once.
> 
> 
> Actually it doesn't seem to be a problem for now. It just does the same
> job multiple times.
>  
> 
> It would be helpful if you could share you code. What do you mean with
> "exits silently"? No submission happens? Did you check the logs? As you
> mentioned FlinkLocalCluster, I assume that you run within an IDE?
> 
> 
> The topology doesn't seem to continue. There's a set of initialization
> code in the open method of the program's spout and it looks hopeless if
> it's not invoked. Is there any way to check the logs other than using
> println() calls? I'm running it on the commandline with having
> `bin/start_local.sh' running in the background and `bin/flink run'.
>  
> 
> Btw: lately we fixed a couple of bugs. I would suggest that you use the
> latest version from Flink master branch. I should work with 0.10.1
> without problems.
> 
> 
> It was vey tedious for me to deal with a pom.xml file and .m2
> repository. So I preferred to use maven central. But I should try with
> the master branch if I have to.
> 
> I will quickly check if I could share some of the code.
> 
> Thank you again for the help!
> With best regards,
> Shinhyung Yang
>  
> 
> 
> -Matthias
> 
> 
> 
> On 01/09/2016 01:27 AM, Shinhyung Yang wrote:
> > Howdies to everyone,
> >
> > I'm trying to use the storm compatibility layer on Flink 0.10.1. The
> > original storm topology works fine on Storm 0.9.5 and I have
> > incorporated FlinkLocalCluster, FlinkTopologyBuilder, and
> > FlinkTopology classes according to the programming guide
> >
> 
> (https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html).
> > I'm running it on Oracle Java 8 (1.8.0_66-b17) on Centos 7 (7.2.1511).
> > What happens is, it seems to be going all the way to submitTopology
> > method without any problem, however it doesn't invoke open method of
> > Spout class but declareOutputFields method is called for multiple
> > times and the program exits silently. Do you guys