Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-11 Thread Trevor Grant
+1 non-binding on contribution.

Separate repo, or feature branch to start maybe? I just feel like in the
beginning this thing is going to have lots of breaking changes that maybe
aren't going to fit well with tests / other "v1+" release code. Just my
.02.



On Fri, Oct 11, 2019 at 4:38 AM Stephan Ewen  wrote:

> Dear Flink Community!
>
> Some of you probably heard it already: On Tuesday, at Flink Forward
> Berlin, we announced **Stateful Functions**.
>
> Stateful Functions is a library on Flink to implement general purpose
> applications. It is built around stateful functions (who would have thunk)
> that can communicate arbitrarily through messages, have consistent state,
> and a small resource footprint. They are a bit like keyed ProcessFunctions
> that can send each other messages.
> As simple as this sounds, this means you can now communicate in non-DAG
> patterns, so it allows users to build programs they cannot build with Flink.
> It also has other neat properties, like multiplexing of functions, modular
> composition, tooling both container-based deployments and as-a-Flink-job
> deployments.
>
> You can find out more about it here
>   - Website: https://statefun.io/
>   - Code: https://github.com/ververica/stateful-functions
>   - Talk with motivation:
> https://speakerdeck.com/stephanewen/stateful-functions-building-general-purpose-applications-and-services-on-apache-flink?slide=12
>
>
> Now for the main issue: **We would like to contribute this project to
> Apache Flink**
>
> I believe that this is a great fit for both sides.
> For the Flink community, it would be a way to extend the capabilities and
> use cases of Flink into a completely different type of applications and
> thus grow the community into this new field.
> Many discussions recently about evolving the Flink runtime (both on the
> mailing list and at conferences) show the interest in Flink users in the
> space that Stateful Functions covers.
> It seems natural that Stateful Functions should closely co-develop with
> Apache Flink, ideally as part of the project.
>
> There are many details to be discusses, for example whether this should be
> added to the Flink core repository, or whether we and to create a separate
> repository
> for this. But I think we should start discussing this after we have
> consensus on whether the community wants this contribution.
>
> Really looking forward to hear what you think!
>
> Best Regards,
> Stephan
>
>


For wierdos trying to use OpenCV in Flink Stream

2017-08-14 Thread Trevor Grant
I'm new with JNIs / Classloaders /etc, and had a hard time making this work
right.

Lots of posts referenced the Tomcat solution, but it took me a while to
"get it".

And so, for newbs like me, here is a quick bloggette on getting OpenCV (and
theoretically any JNIs) working in your Flink stream.

https://rawkintrevo.org/2017/08/14/using-jnis-like-opencv-in-flink/


Apache IoT

2017-02-03 Thread Trevor Grant
Hey all,

This year at ApacheCon Miami, there is going to be a specific track
dedicated to IoT and how the Apache Family of Projects can be used in
concert to create full pipelines.  As Apache Flink is a relevant part of
that pipeline for the processing of data coming in from edge devices, I
want to invite everyone to consider submitting an talk:

http://us.apacheiot.org/

Ideally something around how Apache projects work together in an IoT
context, or any use cases you might have.  The deadline is same as
ApacheCon (Feb 11), sorry for short notice.

Please feel free to reach out with any further questions and see you in
Miami!

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Chicago Hands on Apache Flink Workshop

2017-02-01 Thread Trevor Grant
Any one who is going to be in or around Chicago 2/21:

Joe Olson is putting on a workshop for our local Flink meeup- drop by if
you can!

https://www.meetup.com/Chicago-Apache-Flink-Meetup-CHAF/events/237385428/

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: Apache siddhi into Flink

2016-08-28 Thread Trevor Grant
Thank you for confirming Hao,

Aparup, please don't refer to it as "Apache Siddhi", that is misleading.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, Aug 28, 2016 at 10:50 AM, Hao Chen <h...@apache.org> wrote:

> Siddhi is not apache project, but licensed under apache license v2, being
> open sourced and maintained by wso2.
>
> - Hao
>
> On Sun, Aug 28, 2016 at 11:11 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
>> Aparup,
>>
>> Was Siddhi recently added as an incubator project?  I can't find it in
>> the project directory or or on github.com/apache.  The closest thing I
>> can find is this: https://github.com/wso2/siddhi
>>
>> tg
>>
>>
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>> On Sat, Aug 27, 2016 at 5:36 PM, Chen Qin <qinnc...@gmail.com> wrote:
>>
>>> ​+1​
>>>
>>>
>>> On Aug 26, 2016, at 11:23 PM, Aparup Banerjee (apbanerj) <
>>> apban...@cisco.com> wrote:
>>>
>>> Hi-
>>>
>>> Has anyone looked into embedding apache siddhi into Flink.
>>>
>>> Thanks,
>>> Aparup
>>>
>>>
>>>
>>
>


Re: Apache siddhi into Flink

2016-08-28 Thread Trevor Grant
Aparup,

Was Siddhi recently added as an incubator project?  I can't find it in the
project directory or or on github.com/apache.  The closest thing I can find
is this: https://github.com/wso2/siddhi

tg




Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sat, Aug 27, 2016 at 5:36 PM, Chen Qin <qinnc...@gmail.com> wrote:

> ​+1​
>
>
> On Aug 26, 2016, at 11:23 PM, Aparup Banerjee (apbanerj) <
> apban...@cisco.com> wrote:
>
> Hi-
>
> Has anyone looked into embedding apache siddhi into Flink.
>
> Thanks,
> Aparup
>
>
>


Re: Flink WebUI on YARN behind firewall

2016-08-26 Thread Trevor Grant
Ahh, sorry I misunderstood.

Your comment provided insight for me though.

To anyone else who is having issues, maybe the following will help them.  I
was trying to deploy Flink on an IBM BigInsights Cloud cluster (disclaimer,
I work for IBM, not trying to promote a company, but they do give me free
cluster time so I tend to play there).

At anyrate- flink_example.py is a script that will do it nice and clean for
anyone on BigInsights Cloud, and may be useful for other similar cloud YARN
deployments so I'll just leave it here...
https://github.com/rawkintrevo/bluemix-extra-services

The solution in short-
1) install flink-
2) run yarn-session.sh
3) parse the stdout from above command (see python script, easy to follow
or email me if not)
4) open the ApplicationUI via the Yarn UI, go to the #/submit tab
5) in the submit tab, IF you open with a web browser (bc it relies on
javascript) it may give you a hostname:port <- save that port
6) combine the port you just got , with an IP address you got when parsing
in step 3
7) open ssh tunneling to that IP:port

Ugly, but it does work.

best

tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Aug 26, 2016 at 5:54 PM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com> wrote:

> Hi Trevor,
>
> Sorry, I think I did not explain it clearly. I am not working on the issue
> that you have mentioned but I have noticed similar problem for another JIRA
> that I am currenty working (Flink Web UI basic authentication). The
> redirection from Yarn UI to Flink does not propogate the authentication
> headers. I was planning to investigate from Yarn UI as to what exactly is
> happening. Please go-ahead and create a JIRA anyway and someone will pick
> it up.
>
> Regards
> Vijay
>
>
> On Friday, August 26, 2016 1:34 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
>
> That's awesome Vijay, thanks for the work- will definitely help (esp as
> Flink gets picked up for more cloud cluster deployments)!
>
> Do you have the JIRA so I can watch it?
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Aug 26, 2016 at 1:14 PM, Vijay Srinivasaraghavan <
> vijikar...@yahoo.com> wrote:
>
> Hi Trevor,
>
> I am seeing similar issue for a JIRA that I am working now. I am yet to
> trace the Yarn Web UI code to find out how the "tracking URL" is being
> handled. To ublock, you could use the tracking URL (Flink UI URL) directly
> to access Flink Web UI to by-pass Yarn UI redirection. You can find the
> tracking URI in the Job Manager log file from Yarn container.
>
> Regards
> Vijay
>
>
> On Friday, August 26, 2016 10:52 AM, Trevor Grant <
> trevor.d.gr...@gmail.com> wrote:
>
>
> I decided it made the most sense to open up a new thread.
>
> I am running Flink on a cluster behind a firewall.  Things seem to be
> working fine, but when I access the YARN web-ui and click on the flink
> application-ui, i get the jobmanager ui, but it is broken.
>
> It is a broken link to a flink image and
> [image: Apache Flink Dashboard]Apache Flink Dashboard
>
>-  Overview
>-  Running Jobs
>-  Completed Jobs
>-  Task Managers
>-  Job Manager
>-  Submit new Job
>
> but none of the links work, no images no pretty formatting.  does anyone
> have a quick idea why this would be?  I found http://mail-archives.
> apache.org/mod_mbox/flink- user/201511.mbox/%3CCAGr9p8B_
> aKhbjLqVQQUZeO_eSz6P=eWKP+ Kg1sQ65SB0npsX2Q@mail.gmail. com%3E
> <http://mail-archives.apache.org/mod_mbox/flink-user/201511.mbox/%3CCAGr9p8B_aKhbjLqVQQUZeO_eSz6P=ewkp+kg1sq65sb0nps...@mail.gmail.com%3E>
>
> but that seems to be more related to running jobs than accessing the UI
>
> any help would be appreciated
>
> tg
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/ users/3002022/rawkintrevo
> <http://stackexchange.com/users/3002022/rawkintrevo>
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
>
>
>
>
>


Flink WebUI on YARN behind firewall

2016-08-26 Thread Trevor Grant
I decided it made the most sense to open up a new thread.

I am running Flink on a cluster behind a firewall.  Things seem to be
working fine, but when I access the YARN web-ui and click on the flink
application-ui, i get the jobmanager ui, but it is broken.

It is a broken link to a flink image and
[image: Apache Flink Dashboard]Apache Flink Dashboard

   -  Overview
   -  Running Jobs
   -  Completed Jobs
   -  Task Managers
   -  Job Manager
   -  Submit new Job

but none of the links work, no images no pretty formatting.  does anyone
have a quick idea why this would be?  I found
http://mail-archives.apache.org/mod_mbox/flink-user/201511.mbox/%3CCAGr9p8B_aKhbjLqVQQUZeO_eSz6P=ewkp+kg1sq65sb0nps...@mail.gmail.com%3E

but that seems to be more related to running jobs than accessing the UI

any help would be appreciated

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: Flink long-running YARN configuration

2016-08-26 Thread Trevor Grant
Stephan,

Will the jobmanager-UI exist?  E.g. if I am running Flink on YARN will I be
able to submit apps/see logs and DAGs through the web interface?

thanks,
tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Aug 25, 2016 at 12:59 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Craig!
>
> For YARN sessions, Flink will
>   - (a) register the app master hostname/port/etc at Yarn, so you can get
> them from example from the yarn UI and tools
>   - (b) it will create a .yarn-properties file that contain the
> hostname/ports info. Future calls to the command line pick up the info from
> there.
>
> /cc Robert
>
> Greetings,
> Stephan
>
>
> On Thu, Aug 25, 2016 at 5:02 PM, Foster, Craig <foscr...@amazon.com>
> wrote:
>
>> I'm trying to understand Flink YARN configuration. The flink-conf.yaml
>> file is supposedly the way to configure Flink, except when you launch Flink
>> using YARN since that's determined for the AM. The following is
>> contradictory or not completely clear:
>>
>>
>>
>> "The system will use the configuration in conf/flink-config.yaml. Please
>> follow our configuration guide
>> <https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html>
>>  if you want to change something.
>>
>> Flink on YARN will overwrite the following configuration parameters
>> jobmanager.rpc.address (because the JobManager is always allocated at
>> different machines), taskmanager.tmp.dirs (we are using the tmp
>> directories given by YARN) and parallelism.default if the number of
>> slots has been specified."
>>
>>
>>
>> OK, so it will use conf/flink-config.yaml, except for
>> jobmanager.rpc.address/port which will be decided by YARN and not
>> necessarily reported to the user since those are dynamically allocated by
>> YARN. That's fine with me, but if I want to make a "long-running" Flink
>> cluster available for more than one user, where do I check in Flink for the
>> Application Master hostname--or do I just have to scrape output of logs
>> (which would definitely be undesirable)? First, I thought this would be
>> written by Flink to conf/flink-config.yaml. It is not. Then I thought it
>> must surely be written to the HDFS configuration directory (under something
>> like hdfs://$USER/.flink/) for that application but that is merely copied
>> from the original conf/flink-config.yaml and doesn't have an accurate
>> configuration for the specified application. So is there an accurate config
>> somewhere in HDFS or on the ResourceManager--i.e. where could I
>> programmatically find that (outside of manipulating YARN app names or
>> scraping)?
>>
>>
>>
>> Thanks,
>>
>> Craig
>>
>>
>>
>>
>>
>>
>>
>
>


Re: Setting up zeppelin with flink

2016-08-26 Thread Trevor Grant
That is a regression of upgrading Zeppelin to spark 2.0/Scala 2.11. as it
broke existing functionality, hopefully whoever did the upgrade will fix...

Please report to Zeppelin, thanks and good find!
On Aug 26, 2016 8:39 AM, "Frank Dekervel" <ker...@gmail.com> wrote:

> Hello,
>
> i added this to my Dockerfile to end up with a working setup:
>
> RUN cp /opt/zeppelin/interpreter/ignite/scala*jar
> /opt/zeppelin/interpreter/flink/
>
> which would copy:
>
> scala-compiler-2.11.7.jar
> scala-library-2.11.7.jar
> scala-parser-combinators_2.11-1.0.4.jar
> scala-reflect-2.11.7.jar
> scala-xml_2.11-1.0.4.jar
>
> actually "working" means "able to run the word count example" (i'm not
> sure if that really qualifies as working ...).
>
> i'll follow up on this on the zeppelin user list.
>
> Frank
>
>
>
> On Thu, Aug 25, 2016 at 6:01 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
>> I'm glad you were able to work it out!
>>
>> Your setup is somewhat unique, and as Zeppelin is the result of multiple
>> drive-by commits, interesting and unexpected things happen in the tail
>> cases.
>>
>> Could you please report your problem and solution on the Zeppelin user
>> list?  What you've discovered may in fact be a bug or a regression caused
>> by some of the recent Spark 2.0/scala 2.11 mess (I see you installed
>> Zeppelin 0.6.1).  Suffice to say, I don't think this is a Flink issue.
>>
>>
>> Finally, out of curiosity- what jars did you copy to the
>> interpreter/flink directory to get this to work?  I'd like to check the
>> Zeppelin/flink/pom.xml
>>
>> Happy to be a sounding board if nothing else ;)
>>
>> tg
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>> On Thu, Aug 25, 2016 at 8:57 AM, Frank Dekervel <ker...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Sorry for the spam, but i got it working after copying all scala
>>> libraries from another interpreter to the interpreter/flink directory. so i
>>> think the error is the scala libraries are missing from the binary release
>>> in the zeppelin/interpreters/flink/ directory. For now i'm adding the copy
>>> commands to the dockerfile, but I'm sure this is not the proper way to fix
>>> it, but i don't know maven enough to understand why the scala libs are
>>> missing for the flink interpreter but not for the ignite interpreter.
>>>
>>> I'm also unable to figure out why a local interpreter worked fine given
>>> the missing libraries ...
>>>
>>> greetings,
>>> Frank
>>>
>>>
>>> On Thu, Aug 25, 2016 at 3:08 PM, Frank Dekervel <ker...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> For reference, below is the dockerfile i used to build the zeppelin
>>>> image (basically just openjdk 8 with the latest binary release of zeppelin)
>>>> the "docker-entrypoint.sh" script is just starting zeppelin.sh
>>>> (oneliner)
>>>>
>>>> FROM openjdk:alpine
>>>>
>>>> RUN apk add --no-cache bash snappy
>>>>
>>>> ARG ZEPPELIN_VERSION=0.6.1
>>>>
>>>> ARG INSTALL_PATH=/opt
>>>> ENV APP_HOME $INSTALL_PATH/zeppelin
>>>> ENV PATH $PATH:$APP_HOME/bin
>>>>
>>>> RUN set -x && \
>>>>   mkdir -p $INSTALL_PATH && \
>>>>   apk --update add --virtual build-dependencies curl && \
>>>>   curl -s $(curl -s https://www.apache.org/dyn/clo
>>>> ser.cgi\?preferred\=true)/zeppelin/zeppelin-0.6.1/zeppelin-$
>>>> ZEPPELIN_VERSION-bin-all.tgz | \
>>>>   tar xvz -C $INSTALL_PATH && \
>>>>   ln -s $INSTALL_PATH/zeppelin-$ZEPPELIN_VERSION-bin-all $APP_HOME && \
>>>>   addgroup -S zeppelin && adduser -D -S -H -G zeppelin -h $APP_HOME
>>>> zeppelin && \
>>>>   chown -R zeppelin:zeppelin $INSTALL_PATH && \
>>>>   chown -h zeppelin:zeppelin $APP_HOME && \
>>>>   apk del build-dependencies && \
>>>>   rm -rf /var/cache/apk/*
>>>>
>>>> # Configure container
>>>> USER zeppelin
>>>> ADD docker-entrypoint.sh $APP_HOME/bin/
>>>>

Re: Setting up zeppelin with flink

2016-08-25 Thread Trevor Grant
I'm glad you were able to work it out!

Your setup is somewhat unique, and as Zeppelin is the result of multiple
drive-by commits, interesting and unexpected things happen in the tail
cases.

Could you please report your problem and solution on the Zeppelin user
list?  What you've discovered may in fact be a bug or a regression caused
by some of the recent Spark 2.0/scala 2.11 mess (I see you installed
Zeppelin 0.6.1).  Suffice to say, I don't think this is a Flink issue.


Finally, out of curiosity- what jars did you copy to the interpreter/flink
directory to get this to work?  I'd like to check the Zeppelin/flink/pom.xml

Happy to be a sounding board if nothing else ;)

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Aug 25, 2016 at 8:57 AM, Frank Dekervel <ker...@gmail.com> wrote:

> Hello,
>
> Sorry for the spam, but i got it working after copying all scala libraries
> from another interpreter to the interpreter/flink directory. so i think the
> error is the scala libraries are missing from the binary release in the
> zeppelin/interpreters/flink/ directory. For now i'm adding the copy
> commands to the dockerfile, but I'm sure this is not the proper way to fix
> it, but i don't know maven enough to understand why the scala libs are
> missing for the flink interpreter but not for the ignite interpreter.
>
> I'm also unable to figure out why a local interpreter worked fine given
> the missing libraries ...
>
> greetings,
> Frank
>
>
> On Thu, Aug 25, 2016 at 3:08 PM, Frank Dekervel <ker...@gmail.com> wrote:
>
>> Hello,
>>
>> For reference, below is the dockerfile i used to build the zeppelin image
>> (basically just openjdk 8 with the latest binary release of zeppelin)
>> the "docker-entrypoint.sh" script is just starting zeppelin.sh (oneliner)
>>
>> FROM openjdk:alpine
>>
>> RUN apk add --no-cache bash snappy
>>
>> ARG ZEPPELIN_VERSION=0.6.1
>>
>> ARG INSTALL_PATH=/opt
>> ENV APP_HOME $INSTALL_PATH/zeppelin
>> ENV PATH $PATH:$APP_HOME/bin
>>
>> RUN set -x && \
>>   mkdir -p $INSTALL_PATH && \
>>   apk --update add --virtual build-dependencies curl && \
>>   curl -s $(curl -s https://www.apache.org/dyn/clo
>> ser.cgi\?preferred\=true)/zeppelin/zeppelin-0.6.1/zeppelin-$
>> ZEPPELIN_VERSION-bin-all.tgz | \
>>   tar xvz -C $INSTALL_PATH && \
>>   ln -s $INSTALL_PATH/zeppelin-$ZEPPELIN_VERSION-bin-all $APP_HOME && \
>>   addgroup -S zeppelin && adduser -D -S -H -G zeppelin -h $APP_HOME
>> zeppelin && \
>>   chown -R zeppelin:zeppelin $INSTALL_PATH && \
>>   chown -h zeppelin:zeppelin $APP_HOME && \
>>   apk del build-dependencies && \
>>   rm -rf /var/cache/apk/*
>>
>> # Configure container
>> USER zeppelin
>> ADD docker-entrypoint.sh $APP_HOME/bin/
>> ENTRYPOINT ["docker-entrypoint.sh"]
>> CMD ["sh", "-c"]
>>
>> greetings,
>> Frank
>>
>>
>> On Thu, Aug 25, 2016 at 2:58 PM, Frank Dekervel <ker...@gmail.com> wrote:
>>
>>> Hello Trevor,
>>>
>>> Thanks for your suggestion. The log does not explain a lot: on the flink
>>> side i don't see anything at all, on the zeppelin side i see this:
>>> Your suggestion sounds plausible, as i always start zeppelin, and then
>>> change the configuration from local to remote.. however, port 6123 locally
>>> doesn't seem to be open
>>>
>>> ==> zeppelin--94490c51d71e.log <==
>>>  INFO [2016-08-25 12:53:24,168] ({qtp846063400-48}
>>> InterpreterFactory.java[createInterpretersForNote]:576) - Create
>>> interpreter instance flink for note 2BW8NMCKW
>>>  INFO [2016-08-25 12:53:24,168] ({qtp846063400-48}
>>> InterpreterFactory.java[createInterpretersForNote]:606) - Interpreter
>>> org.apache.zeppelin.flink.FlinkInterpreter 795344042 created
>>>  INFO [2016-08-25 12:53:24,169] ({pool-1-thread-3}
>>> SchedulerFactory.java[jobStarted]:131) - Job
>>> paragraph_1471964818018_1833520437 started by scheduler
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar
>>> ed_session513606587
>>>  INFO [2016-08-25 12:53:24,170] ({pool-1-thread-3}
>>> Paragraph.java[jobRun]:252) - run paragraph 20160823-150658_99117457 using
>>> null org.apache.zeppelin.interpreter.LazyOpenInterpreter@2f67fcaa
>>>  INFO [2016-08-25 12:53:24,170] ({pool-1-t

Re: Setting up zeppelin with flink

2016-08-24 Thread Trevor Grant
Frank,

can you post the zeppelin flink log please?

You can probably find it in zeppelin_dir/logs/*flink*.log

You've got a few moving pieces here.  I've never run zeppelin against Flink
in a docker container.   But I think the Zeppelin-Flink log is the first
place to look.

You say you can't get Zeppelin to work in local mode either right? Just
curious, is Zeppelin running in a docker too?

Thanks,
tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Aug 24, 2016 at 6:50 AM, Maximilian Michels <m...@apache.org> wrote:

> Hi!
>
> There are some people familiar with the Zeppelin integration. CCing
> Till and Trevor. Otherwise, you could also send this to the Zeppelin
> community.
>
> Cheers,
> Max
>
> On Wed, Aug 24, 2016 at 12:58 PM, Frank Dekervel <ker...@gmail.com> wrote:
> > Hello,
> >
> > for reference:
> >
> > i already found out that "connect to existing process" was my error
> here: it
> > means connecting to an existing zeppelin interpreter, not an existing
> flink
> > cluster. After fixing my error, i'm now in the same situation as
> described
> > here:
> >
> > https://stackoverflow.com/questions/38688277/flink-
> zeppelin-not-responding
> >
> > i guess it's more a zeppelin problem than a flink problem tho, as i see
> both
> > interpreter JVM and main zeppelin JVM waiting on thrift input (so it
> seems
> > they are waiting for each other)
> >
> > greetings,
> > Frank
> >
> >
> >
> >
> > On Tue, Aug 23, 2016 at 2:09 PM, Frank Dekervel <ker...@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> I try to set up apache zeppelin with a flink cluster (one jobmanager,
> one
> >> task manager).
> >>
> >> What i did was using the dockerfiles in flink-contrib/docker-flink + the
> >> latest binary release of apache zeppelin with all interpreters:
> >>
> >>
> >> https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/
> Dockerfile
> >> (i changed the flink version to 1.0.3 to match zeppelin's flink version)
> >>
> >> I built another docker image around the latest binary release of
> zeppelin
> >> (with all interpreters), and i reconfigure the flink interpreter:
> >>
> >> connect to existing process
> >> host: jobmanager, port: 6123
> >> i removed all other properties
> >>
> >> when i try to submit a flink job, i get an error state and the following
> >> exception appears in the log (nothing appears in the jobmanager log)
> >>
> >> ERROR [2016-08-23 11:44:57,932] ({Thread-16}
> >> JobProgressPoller.java[run]:54) - Can not get or update progress
> >> org.apache.zeppelin.interpreter.InterpreterException:
> >> org.apache.thrift.transport.TTransportException
> >> at
> >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(
> RemoteInterpreter.java:373)
> >> at
> >> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(
> LazyOpenInterpreter.java:111)
> >> at
> >> org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
> >> at
> >> org.apache.zeppelin.scheduler.JobProgressPoller.run(
> JobProgressPoller.java:51)
> >> Caused by: org.apache.thrift.transport.TTransportException
> >> at
> >> org.apache.thrift.transport.TIOStreamTransport.read(
> TIOStreamTransport.java:132)
> >> at
> >> org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readAll(
> TBinaryProtocol.java:429)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readI32(
> TBinaryProtocol.java:318)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(
> TBinaryProtocol.java:219)
> >> at
> >> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> >> at
> >> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$
> Client.recv_getProgress(RemoteInterpreterService.java:296)
> >> at
> >> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$
> Client.getProgress(RemoteInterpreterService.java:281)
> >> at
> >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(
> RemoteInterpreter.java:370)
> >> ... 3 more
> >>
> >> Flink in local mode works fine on zeppelin.
> >> Could somebody point me to what i'm doing wrong ?
> >>
> >> Thanks a lot!
> >> Frank
> >>
> >>
> >>
> >
>


Re: Setting up zeppelin with flink

2016-08-24 Thread Trevor Grant
Hey Frank,

Saw your post on the Zeppelin list yesterday.  I can look at it later this
morning, but my gut feeling is a ghost Zeppelin daemon is running in the
background and it's local Flink is holding the port 6123. This is fairly
common and would explain the issue.

Idk if you're on linux or windows or whatever, but have you tried rebooting
the machine? (sorry if you said you did higher in the email). Also I very
vaguely remember there is a boot order that matters with Flink and
Zeppelin, like you need to start flink first then zeppelin, or vice verse.
I feel like it is Flink first, then Zeppelin.

Hope that helps, will dig in later if not.

tg





Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Aug 24, 2016 at 6:50 AM, Maximilian Michels <m...@apache.org> wrote:

> Hi!
>
> There are some people familiar with the Zeppelin integration. CCing
> Till and Trevor. Otherwise, you could also send this to the Zeppelin
> community.
>
> Cheers,
> Max
>
> On Wed, Aug 24, 2016 at 12:58 PM, Frank Dekervel <ker...@gmail.com> wrote:
> > Hello,
> >
> > for reference:
> >
> > i already found out that "connect to existing process" was my error
> here: it
> > means connecting to an existing zeppelin interpreter, not an existing
> flink
> > cluster. After fixing my error, i'm now in the same situation as
> described
> > here:
> >
> > https://stackoverflow.com/questions/38688277/flink-
> zeppelin-not-responding
> >
> > i guess it's more a zeppelin problem than a flink problem tho, as i see
> both
> > interpreter JVM and main zeppelin JVM waiting on thrift input (so it
> seems
> > they are waiting for each other)
> >
> > greetings,
> > Frank
> >
> >
> >
> >
> > On Tue, Aug 23, 2016 at 2:09 PM, Frank Dekervel <ker...@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> I try to set up apache zeppelin with a flink cluster (one jobmanager,
> one
> >> task manager).
> >>
> >> What i did was using the dockerfiles in flink-contrib/docker-flink + the
> >> latest binary release of apache zeppelin with all interpreters:
> >>
> >>
> >> https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/
> Dockerfile
> >> (i changed the flink version to 1.0.3 to match zeppelin's flink version)
> >>
> >> I built another docker image around the latest binary release of
> zeppelin
> >> (with all interpreters), and i reconfigure the flink interpreter:
> >>
> >> connect to existing process
> >> host: jobmanager, port: 6123
> >> i removed all other properties
> >>
> >> when i try to submit a flink job, i get an error state and the following
> >> exception appears in the log (nothing appears in the jobmanager log)
> >>
> >> ERROR [2016-08-23 11:44:57,932] ({Thread-16}
> >> JobProgressPoller.java[run]:54) - Can not get or update progress
> >> org.apache.zeppelin.interpreter.InterpreterException:
> >> org.apache.thrift.transport.TTransportException
> >> at
> >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(
> RemoteInterpreter.java:373)
> >> at
> >> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(
> LazyOpenInterpreter.java:111)
> >> at
> >> org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
> >> at
> >> org.apache.zeppelin.scheduler.JobProgressPoller.run(
> JobProgressPoller.java:51)
> >> Caused by: org.apache.thrift.transport.TTransportException
> >> at
> >> org.apache.thrift.transport.TIOStreamTransport.read(
> TIOStreamTransport.java:132)
> >> at
> >> org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readAll(
> TBinaryProtocol.java:429)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readI32(
> TBinaryProtocol.java:318)
> >> at
> >> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(
> TBinaryProtocol.java:219)
> >> at
> >> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> >> at
> >> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$
> Client.recv_getProgress(RemoteInterpreterService.java:296)
> >> at
> >> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$
> Client.getProgress(RemoteInterpreterService.java:281)
> >> at
> >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(
> RemoteInterpreter.java:370)
> >> ... 3 more
> >>
> >> Flink in local mode works fine on zeppelin.
> >> Could somebody point me to what i'm doing wrong ?
> >>
> >> Thanks a lot!
> >> Frank
> >>
> >>
> >>
> >
>


Re: Anyone going to ApacheCon Big Data in Vancouver?

2016-04-29 Thread Trevor Grant
Hey Ken,

I'll be there doing a talk on Monday afternoon on using Zeppelin for a Data
Science environment with a couple of Flink and Spark Examples.

I'm doing a tutorial Wednesday morning (I think for ApacheCon) that is
about setting up Zeppelin with Flink and Spark in cluster mode.

Would love to meet up with any and all from the community, and my Zeppelin
notebooks will be available (already are available) on my github in the
'apachecon' repository.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 28, 2016 at 5:59 PM, Ken Krugler <kkrugler_li...@transpac.com>
wrote:

>
> On Apr 28, 2016, at 3:37pm, Suneel Marthi <smar...@apache.org> wrote:
>
> I'll be there Ken, still waiting for Part 3 of ur blog series from 2013 on
> text processing. :)
>
>
> Oh geez. I completely forgot about that.
>
>  OK, maybe on the flight back from the conference :)
>
> Or we can discuss in person at the event...
>
> — Ken
>
>
> On Thu, Apr 28, 2016 at 6:34 PM, Ken Krugler <kkrugler_li...@transpac.com>
> wrote:
>
>> Hi all,
>>
>> Is anyone else from the community going?
>>
>> It would be fun to meet up with other Flink users during the event.
>>
>> I’ll be there from Sunday (May 8th) to early Wednesday afternoon (May
>> 11th).
>>
>> — Ken
>>
>> PS - On Monday I’ll be giving a talk
>> <http://apachebigdata2016.sched.org/event/6Lzi/a-faster-way-for-faster-workflows-ken-krugler-scale-unlimited>
>>  on
>> my experience with using the Flink planner for Cascading.
>>
>>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>


Gelly CommunityDetection in scala example

2016-04-27 Thread Trevor Grant
The following example in the scala shell worked as expected:

import org.apache.flink.graph.library.LabelPropagation

val verticesWithCommunity = graph.run(new LabelPropagation(30))

// print the result
verticesWithCommunity.print


I tried to extend the example to use CommunityDetection:

import org.apache.flink.graph.library.CommunityDetection

val verticesWithCommunity = graph.run(new CommunityDetection(30, 0.5))

// print the result
verticesWithCommunity.print


And meant the following error:
error: polymorphic expression cannot be instantiated to expected type;
found : [K]org.apache.flink.graph.library.CommunityDetection[K]
required: org.apache.flink.graph.GraphAlgorithm[Long,String,Double,?]
val verticesWithCommunity = graph.run(new CommunityDetection(30, 0.5))
^

I haven't been able to come up with a hack to make this work. Any
advice/bug?

I invtestigated the code base a little, seems to be an issue with what
Graph.run expects to see vs. what LabelPropagation returns vs. what
CommunityDetection returns.



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: Simple GraphX example.

2016-04-27 Thread Trevor Grant
Ahh pro-tip.

Thanks Aljoscha!

Final solution in case any should stumble across this in the future:

You have to load the flink-gelly-scala jar AND the flink-gelly jar (to get
access to the Edge/Vertex).

import org.apache.flink.api.scala._

import org.apache.flink.graph.scala._
import org.apache.flink.graph.Edge
import org.apache.flink.graph.Vertex


val vertices = Seq(new Vertex[Long, String](1L, "foo"), new Vertex[Long,
String](2L, "bar"))
val edges = Seq(new Edge[Long, String](1L, 2L, "foobar"))

val graph = Graph.fromCollection( vertices, edges, env)


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Apr 27, 2016 at 9:13 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> I think you need to import the stuff from org.apache.flink.graph.scala.*
> instead of org.apache.flink.graph.*.
>
> Cheers,
> Aljoscha
>
> On Wed, 27 Apr 2016 at 16:07 Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
>> Hi, I'm running Flink 1.0.2 from the Zeppelin/shell- trying to experiment
>> with some graph stuff.
>>
>> Zeppelin has been known to add degrees of crazy to trouble shooting- but
>> I intuitively feel like this is something I'm doing wrong on the Flink
>> Side.
>>
>> The simplest example is not working for me.
>>
>> import org.apache.flink.api.scala._
>>
>> import org.apache.flink.graph.Edge
>> import org.apache.flink.graph.Vertex
>> import org.apache.flink.graph.Graph
>>
>> val vertices = Seq(new Vertex[Long, String](1L, "foo"), new Vertex[Long,
>> String](2L, "bar"))
>> val edges = Seq(new Edge[Long, String](1L, 2L, "foobar"))
>>
>> val graph = Graph.fromCollection(edges, vertices)
>>
>> Yields:
>>
>> :46: error: type mismatch;
>> found : Seq[org.apache.flink.graph.Edge[Long,String]]
>> required: java.util.Collection[org.apache.flink.graph.Edge[?,?]]
>> val graph = Graph.fromCollection(edges, vertices)
>>
>>
>> I get a similar error when I try to make DataSets,
>>
>> found :
>> org.apache.flink.api.scala.DataSet[org.apache.flink.graph.Edge[Long,Double]]
>> required:
>> org.apache.flink.api.java.DataSet[org.apache.flink.graph.Edge[?,?]]
>> val graph = Graph.fromDataSet( edges, vertices)
>>
>> Thoughts?
>>
>> thanks,
>> tg
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>


Simple GraphX example.

2016-04-27 Thread Trevor Grant
Hi, I'm running Flink 1.0.2 from the Zeppelin/shell- trying to experiment
with some graph stuff.

Zeppelin has been known to add degrees of crazy to trouble shooting- but I
intuitively feel like this is something I'm doing wrong on the Flink Side.

The simplest example is not working for me.

import org.apache.flink.api.scala._

import org.apache.flink.graph.Edge
import org.apache.flink.graph.Vertex
import org.apache.flink.graph.Graph

val vertices = Seq(new Vertex[Long, String](1L, "foo"), new Vertex[Long,
String](2L, "bar"))
val edges = Seq(new Edge[Long, String](1L, 2L, "foobar"))

val graph = Graph.fromCollection(edges, vertices)

Yields:

:46: error: type mismatch;
found : Seq[org.apache.flink.graph.Edge[Long,String]]
required: java.util.Collection[org.apache.flink.graph.Edge[?,?]]
val graph = Graph.fromCollection(edges, vertices)


I get a similar error when I try to make DataSets,

found :
org.apache.flink.api.scala.DataSet[org.apache.flink.graph.Edge[Long,Double]]
required:
org.apache.flink.api.java.DataSet[org.apache.flink.graph.Edge[?,?]]
val graph = Graph.fromDataSet( edges, vertices)

Thoughts?

thanks,
tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: Programatic way to get version

2016-04-21 Thread Trevor Grant
dug through the codebase, in case any others want to know:

import org.apache.flink.runtime.util.EnvironmentInformation;

EnvironmentInformation.getVersion()



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 21, 2016 at 5:05 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> Is there a programatic way to get the Flink version from the scala shell?
>
> I.e. something akin to
> sc.version
>
> I thought I used env.version or something like that once but I couldn't
> find anything in the scala docs.
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>


Programatic way to get version

2016-04-21 Thread Trevor Grant
Is there a programatic way to get the Flink version from the scala shell?

I.e. something akin to
sc.version

I thought I used env.version or something like that once but I couldn't
find anything in the scala docs.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: DataSet.randomSplit()

2016-04-12 Thread Trevor Grant
Hey all,

Sorry I missed this thread.

The related issue is: https://issues.apache.org/jira/browse/FLINK-2259

I checked it out then forgot about it.  I'm cranking on it now.

tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Mar 29, 2016 at 4:33 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi,
>
> I think Ufuk is completely right. As far as I know, we don't support this
> function and nobody's currently working on it. If you like, then you could
> take the lead there.
>
> Cheers,
> Till
>
> On Mon, Mar 28, 2016 at 10:50 PM, Ufuk Celebi <u...@apache.org> wrote:
>
>> Hey Gna! I think that it's not on the road map at the moment. Feel free
>> to ping in the linked PR though. Probably Till can chime in there.
>>
>> – Ufuk
>>
>> On Mon, Mar 28, 2016 at 5:16 PM, Sourigna Phetsarath <
>> gna.phetsar...@teamaol.com> wrote:
>>
>>> Ufuk,
>>>
>>> Thank you.  Yes, I saw the sampling methods in DataSetUtils and they are
>>> helpful.
>>>
>>> Just wanted to see if that particular method is on the road map for a
>>> future release.
>>>
>>> -Gna
>>>
>>> On Mon, Mar 28, 2016 at 6:22 AM, Ufuk Celebi <u...@apache.org> wrote:
>>>
>>>> Hey Sourigna,
>>>>
>>>> that particular method is not part of Flink yet.
>>>>
>>>> Did you have a look at the sampling methods in DataSetUtils? Maybe they
>>>> can be helpful for what you are trying to achieve.
>>>>
>>>> – Ufuk
>>>>
>>>> On Wed, Mar 23, 2016 at 5:19 PM, Sourigna Phetsarath <
>>>> gna.phetsar...@teamaol.com> wrote:
>>>>
>>>>> All:
>>>>>
>>>>> Does Flink DataSet have a randomSplit(weights:Array[Double], seed:
>>>>> Long): Array[DataSet[T]] function?
>>>>>
>>>>> There is this pull request: https://github.com/apache/flink/pull/921
>>>>>
>>>>> Does anyone have an update of the progress of this?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>>>> Applied Research Chapter
>>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>>> o: 212.402.4871 // m: 917.373.7363
>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>>
>>>>> * <http://www.aolplatforms.com>*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>> Applied Research Chapter
>>> 770 Broadway, 5th Floor, New York, NY 10003
>>> o: 212.402.4871 // m: 917.373.7363
>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>
>>> * <http://www.aolplatforms.com>*
>>>
>>
>>
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-08 Thread Trevor Grant
I'm just about to open an issue / PR solution for 'warm-starts'

Once this is in, we could just add a setter for the weight vector (and what
ever iteration you're on if you're going to do more partial fits).

Then all you need to save if your weight vector (and iter number).



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Apr 8, 2016 at 9:04 AM, Behrouz Derakhshan <
behrouz.derakhs...@gmail.com> wrote:

> Is there a reasons the Predictor or Estimator class don't have read and
> write methods for saving and retrieving the model? I couldn't find Jira
> issues for it. Does it make sense to create one ?
>
> BR,
> Behrouz
>
> On Wed, Mar 30, 2016 at 4:40 PM, Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Yes Suneel is completely wright. If the data does not implement
>> IOReadableWritable it is probably easier to use the
>> TypeSerializerOutputFormat. What you need here to seralize the data is a
>> TypeSerializer. You can obtain it the following way:
>>
>> val model = mlr.weightsOption.get
>>
>> val weightVectorTypeInfo = TypeInformation.of(classOf[WeightVector])
>> val weightVectorSerializer = weightVectorTypeInfo.createSerializer(new 
>> ExecutionConfig())
>> val outputFormat = new TypeSerializerOutputFormat[WeightVector]
>> outputFormat.setSerializer(weightVectorSerializer)
>>
>> model.write(outputFormat, "path")
>>
>> Cheers,
>> Till
>> ​
>>
>> On Tue, Mar 29, 2016 at 8:22 PM, Suneel Marthi <smar...@apache.org>
>> wrote:
>>
>>> U may want to use FlinkMLTools.persist() methods which use
>>> TypeSerializerFormat and don't enforce IOReadableWritable.
>>>
>>>
>>>
>>> On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
>>> gna.phetsar...@teamaol.com> wrote:
>>>
>>>> Till,
>>>>
>>>> Thank you for your reply.
>>>>
>>>> Having this issue though, WeightVector does not extend IOReadWriteable:
>>>>
>>>> *public* *class* SerializedOutputFormat<*T* *extends*
>>>> IOReadableWritable>
>>>>
>>>> *case* *class* WeightVector(weights: Vector, intercept: Double)
>>>> *extends* Serializable {}
>>>>
>>>>
>>>> However, I will use the approach to write out the weights as text.
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Gna,
>>>>>
>>>>> there are no utilities yet to do that but you can do it manually. In
>>>>> the end, a model is simply a Flink DataSet which you can serialize to
>>>>> some file. Upon reading this DataSet you simply have to give it to
>>>>> your algorithm to be used as the model. The following code snippet
>>>>> illustrates this approach:
>>>>>
>>>>> mlr.fit(inputDS, parameters)
>>>>>
>>>>> // write model to disk using the SerializedOutputFormat
>>>>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], 
>>>>> "path")
>>>>>
>>>>> // read the serialized model from disk
>>>>> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>>>>>
>>>>> // set the read model for the MLR algorithm
>>>>> mlr.weightsOption = model
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>> ​
>>>>>
>>>>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
>>>>> simone.robu...@radicalbit.io> wrote:
>>>>>
>>>>>> To my knowledge there is nothing like that. PMML is not supported in
>>>>>> any form and there's no custom saving format yet. If you really need a
>>>>>> quick and dirty solution, it's not that hard to serialize the model into 
>>>>>> a
>>>>>> file.
>>>>>>
>>>>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>>>>>> gna.phetsar...@teamaol.com>:
>>>>>>
>>>>>>> Flinksters,
>>>>>>>
>>>>>>> Is there an example of saving a Trained Model, loading a Trained
>>>>>>> Model and then scoring one or more feature vectors using Flink ML?
>>>>>>>
>>>>>>> All of the examples I've seen have shown only sequential fit and
>>>>>>> predict.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> -Gna
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services
>>>>>>> // Applied Research Chapter
>>>>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>>>>> o: 212.402.4871 // m: 917.373.7363
>>>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>>>>
>>>>>>> * <http://www.aolplatforms.com>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>>> Applied Research Chapter
>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>> o: 212.402.4871 // m: 917.373.7363
>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>
>>>> * <http://www.aolplatforms.com>*
>>>>
>>>
>>>
>>
>


Re: Zeppelin Integration

2015-10-21 Thread Trevor Grant
Hey Till,

I cloned your branch of Zeplin and while it will compile, it fails tests on
timeout, which consequently was the same issue I was having when trying to
use Zeppelin.

Ideas?

---
Test set: org.apache.zeppelin.flink.FlinkInterpreterTest
---
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 100.347 sec
<<< FAILURE! - in org.apache.zeppelin.flink.FlinkInterpreterTest
org.apache.zeppelin.flink.FlinkInterpreterTest  Time elapsed: 100.347 sec
 <<< ERROR!
java.util.concurrent.TimeoutException: Futures timed out after [10
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at
org.apache.flink.runtime.minicluster.FlinkMiniCluster.getLeaderIndex(FlinkMiniCluster.scala:171)
at
org.apache.flink.runtime.minicluster.LocalFlinkMiniCluster.getLeaderRPCPort(LocalFlinkMiniCluster.scala:132)
at
org.apache.zeppelin.flink.FlinkInterpreter.getPort(FlinkInterpreter.java:136)
at org.apache.zeppelin.flink.FlinkInterpreter.open(FlinkInterpreter.java:98)
at
org.apache.zeppelin.flink.FlinkInterpreterTest.setUp(FlinkInterpreterTest.java:42)

org.apache.zeppelin.flink.FlinkInterpreterTest  Time elapsed: 100.347 sec
 <<< ERROR!
java.lang.NullPointerException: null
at
org.apache.zeppelin.flink.FlinkInterpreter.close(FlinkInterpreter.java:221)
at
org.apache.zeppelin.flink.FlinkInterpreterTest.tearDown(FlinkInterpreterTest.java:48)



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Oct 21, 2015 at 11:57 AM, Till Rohrmann <trohrm...@apache.org>
wrote:

> Hi Trevor,
>
> in order to use Zeppelin with a different Flink version in local mode,
> meaning that Zeppelin starts a LocalFlinkMiniCluster when executing your
> jobs, you have to build Zeppelin and change the flink.version property in
> the zeppelin/flink/pom.xml file to the version you want to use.
>
> If you want to let Zeppelin submit jobs to a remote cluster, you should
> build Zeppelin with the version of your cluster. That’s because internally
> Zeppelin will use this version to construct a JobGraph which is then
> submitted to the cluster. In order to configure the remote cluster, you
> have to go the *Interpreter* page and scroll down to the *flink* section.
> There you have to specify the address of your cluster under *host* and
> the port under *port*. This should then be used to submit jobs to the
> Flink cluster.
>
> I hope this answers your question.
>
> Btw: If you want to use Zeppelin with the latest Flink 0.10-SNAPSHOT
> version, you should checkout my branch
> https://github.com/tillrohrmann/incubator-zeppelin/tree/flink-0.10-SNAPSHOT
> where I’ve made the necessary changes.
>
> Cheers,
> Till
> ​
>
> On Wed, Oct 21, 2015 at 5:00 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
>> I'm setting up some Flink/Spark/Zeppelin at work.  Spark+Zeppelin seems
>> to be relatively well supported and configurable but the Flink is not so
>> much.
>>
>> I want Zeppelin to run against my 0.10 build instead of the 0.6 build
>> that ships with Zeppelin.  My best guess at the moment on how to accomplish
>> this is to create a symbolic link from the /opt/zepplin/flink folder to
>> /opt/flink-0.10, but this feels dirty and wrong.
>>
>> Does anyone out there have any experience connecting Zeppelin to a
>> non-prepackaged Flink build?
>>
>> I feel like there is a great opporutnity for a HOWTO write up if non
>> currently exists.
>>
>> I'm asking on the Zeppelin user mailing list too as soon as I am added.
>>
>> Thanks for any help
>>
>> tg
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>


Extracting weights from linear regression model

2015-10-07 Thread Trevor Grant
Sorry if this is a novice question, but I can't figure out how to extract
the weights vector from a multiple linear regression model.  I can
fit/predict, but I can't get the weight vector.

Any advice would be appreciated (even snide go read the docs comments, so
long as they point me to applicable docs, because I've been struggling with
this all day).

Thanks!
tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo

*"Fortunate is he, who is able to know the causes of things."  -Virgil*