Re: Job Manager HA manual setup

2016-02-26 Thread Welly Tambunan
typos


We have tried this one the job manager can failover, but the task manager
CAN'T be relocated to the new task manager. Is there some settings for this
one ? Or is the task manager also can be relocate to the new job manager ?

Cheers

On Sat, Feb 27, 2016 at 7:27 AM, Welly Tambunan <if05...@gmail.com> wrote:

> Hi All,
>
> We have already try to setup the Job Manager HA based on the documentation
> and using script and provided zookeeper. It works.
>
> However currently everything is done using start-cluster script that I
> believe will require passwordlress ssh between node. We are restricted with
> our environment so this one is not possible.
>
> Is it possible to setup the Job Manager HA manually ? By starting each job
> manager with in each node and task manager. We have our zookeeper and hdfs
> cluster already.
>
> We have tried this one the job manager can failover, but the task manager
> can be relocated to the new task manager. Is there some settings for this
> one ? Or is the task manager also can be relocate to the new job manager ?
>
> Any more details on the mechanism used on Job Manager HA and interaction
> with Zookeeper ?
>
> Is task manager also registered on Zookeeper ? How they find the right job
> manager master ?
>
>
> Thanks for your help.
>
> Cheers
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Job Manager HA manual setup

2016-02-26 Thread Welly Tambunan
Hi All,

We have already try to setup the Job Manager HA based on the documentation
and using script and provided zookeeper. It works.

However currently everything is done using start-cluster script that I
believe will require passwordlress ssh between node. We are restricted with
our environment so this one is not possible.

Is it possible to setup the Job Manager HA manually ? By starting each job
manager with in each node and task manager. We have our zookeeper and hdfs
cluster already.

We have tried this one the job manager can failover, but the task manager
can be relocated to the new task manager. Is there some settings for this
one ? Or is the task manager also can be relocate to the new job manager ?

Any more details on the mechanism used on Job Manager HA and interaction
with Zookeeper ?

Is task manager also registered on Zookeeper ? How they find the right job
manager master ?


Thanks for your help.

Cheers
-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Optimal Configuration for Cluster

2016-02-23 Thread Welly Tambunan
Hi Ufuk and Fabian,

Is that better to start 48 task manager ( one slot each ) in one machine
than having single task manager with 48 slot ? Any trade-off that we should
know etc ?

Cheers

On Tue, Feb 23, 2016 at 3:03 PM, Welly Tambunan <if05...@gmail.com> wrote:

> Hi Ufuk,
>
> Thanks for the explanation.
>
> Yes. Our jobs is all streaming job.
>
> Cheers
>
> On Tue, Feb 23, 2016 at 2:48 PM, Ufuk Celebi <u...@apache.org> wrote:
>
>> The new default is equivalent to the previous "streaming mode". The
>> community decided to get rid of this distinction, because it was
>> confusing to users.
>>
>> The difference between "streaming mode" and "batch mode" was how
>> Flink's managed memory was allocated, either lazily when required
>> ('streaming mode") or eagerly on task manager start up ("batch mode").
>> Now it's lazy by default.
>>
>> This is not something you need to worry about, but if you are mostly
>> using the DataSet API where pre allocation has benefits, you can get
>> the "batch mode" behaviour by using the following configuration key:
>>
>> taskmanager.memory.preallocate: true
>>
>> But you are using the DataStream API anyways, right?
>>
>> – Ufuk
>>
>>
>> On Tue, Feb 23, 2016 at 6:36 AM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>> > Hi Fabian,
>> >
>> > Previously when using flink 0.9-0.10 we start the cluster with streaming
>> > mode or batch mode. I see that this one is gone on Flink 1.00 snapshot
>> ? So
>> > this one has already taken care of the flink and optimize by runtime >
>> >
>> > On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com>
>> wrote:
>> >>
>> >> Hi Welly,
>> >>
>> >> sorry for the late response.
>> >>
>> >> The number of network buffers primarily depends on the maximum
>> parallelism
>> >> of your job.
>> >> The given formula assumes a specific cluster configuration (1 task
>> manager
>> >> per machine, one parallel task per CPU).
>> >> The formula can be translated to:
>> >>
>> >> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>> >>
>> >> where p is the maximum parallelism of the job and t is the number of
>> task
>> >> manager.
>> >> You can process more than one parallel task per TM if you configure
>> more
>> >> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
>> The
>> >> TM will divide its memory among all its slots. So it would be possible
>> to
>> >> start one TM for each machine with 100GB+ memory and 48 slots each.
>> >>
>> >> We can compute the number of network buffers if you give a few more
>> >> details about your setup:
>> >> - How many task managers do you start? I assume more than one TM per
>> >> machine given that you assign only 4GB of memory out of 128GB to each
>> TM.
>> >> - What is the maximum parallelism of you program?
>> >> - How many processing slots do you configure for each TM?
>> >>
>> >> In general, pipelined shuffles with a high parallelism require a lot of
>> >> memory.
>> >> If you configure batch instead of pipelined transfer, the memory
>> >> requirement goes down
>> >> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>> >>
>> >> Eventually, we want to merge the network buffer and the managed memory
>> >> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
>> >> hopefully disappear at some point in the future.
>> >>
>> >> Best, Fabian
>> >>
>> >> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> We are trying to running our job in cluster that has this information
>> >>>
>> >>> 1. # of machine: 16
>> >>> 2. memory : 128 gb
>> >>> 3. # of core : 48
>> >>>
>> >>> However when we try to run we have an exception.
>> >>>
>> >>> "insufficient number of network buffers. 48 required but only 10
>> >>> available. the total number of network buffers is currently set to
>> 2048"
>> >>>
>> >>> After looking at the documentation we set configuration bas

Re: Optimal Configuration for Cluster

2016-02-22 Thread Welly Tambunan
Hi Fabian,

Thanks a lot for your response.

- How many task managers do you start? I assume more than one TM per
machine given that you assign only 4GB of memory out of 128GB to each TM.

Currently what we have done is start a 1 TM per machine with number of task
slot 48.

- What is the maximum parallelism of you program?

Paralleism is around 30 and 40.

- How many processing slots do you configure for each TM?
We configure 48 (#core) for each TM. One TM for each machine.

But i would like to ask another question. Is that better to start 48 task
manager in one machine with number of task slot 1 ? Any trade-off that we
should know etc ?




On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Welly,
>
> sorry for the late response.
>
> The number of network buffers primarily depends on the maximum parallelism
> of your job.
> The given formula assumes a specific cluster configuration (1 task manager
> per machine, one parallel task per CPU).
> The formula can be translated to:
>
> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>
> where p is the maximum parallelism of the job and t is the number of task
> manager.
> You can process more than one parallel task per TM if you configure more
> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
> The TM will divide its memory among all its slots. So it would be possible
> to start one TM for each machine with 100GB+ memory and 48 slots each.
>
> We can compute the number of network buffers if you give a few more
> details about your setup:
> - How many task managers do you start? I assume more than one TM per
> machine given that you assign only 4GB of memory out of 128GB to each TM.
> - What is the maximum parallelism of you program?
> - How many processing slots do you configure for each TM?
>
> In general, pipelined shuffles with a high parallelism require a lot of
> memory.
> If you configure batch instead of pipelined transfer, the memory
> requirement goes down
> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>
> Eventually, we want to merge the network buffer and the managed memory
> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
> hopefully disappear at some point in the future.
>
> Best, Fabian
>
> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>:
>
>> Hi All,
>>
>> We are trying to running our job in cluster that has this information
>>
>> 1. # of machine: 16
>> 2. memory : 128 gb
>> 3. # of core : 48
>>
>> However when we try to run we have an exception.
>>
>> "insufficient number of network buffers. 48 required but only 10
>> available. the total number of network buffers is currently set to 2048"
>>
>> After looking at the documentation we set configuration based on docs
>>
>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>>
>> However we face another error from JVM
>>
>> java.io.IOException: Cannot allocate network buffer pool: Could not
>> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>>
>> We fiddle the taskmanager.heap.mb: 4096
>>
>> Finally the cluster is running.
>>
>> However i'm still not sure about the configuration and fiddling in task
>> manager heap really fine tune. So my question is
>>
>>
>>1. Am i doing it right for numberOfBuffers ?
>>2. How much should we allocate on taskmanager.heap.mb given the
>>information
>>3. Any suggestion which configuration we need to set to make it
>>optimal for the cluster ?
>>4. Is there any chance that this will get automatically resolve by
>>memory/network buffer manager ?
>>
>> Thanks a lot for the help
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Optimal Configuration for Cluster

2016-02-19 Thread Welly Tambunan
Hi All,

We are trying to running our job in cluster that has this information

1. # of machine: 16
2. memory : 128 gb
3. # of core : 48

However when we try to run we have an exception.

"insufficient number of network buffers. 48 required but only 10 available.
the total number of network buffers is currently set to 2048"

After looking at the documentation we set configuration based on docs

taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4

However we face another error from JVM

java.io.IOException: Cannot allocate network buffer pool: Could not
allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space

We fiddle the taskmanager.heap.mb: 4096

Finally the cluster is running.

However i'm still not sure about the configuration and fiddling in task
manager heap really fine tune. So my question is


   1. Am i doing it right for numberOfBuffers ?
   2. How much should we allocate on taskmanager.heap.mb given the
   information
   3. Any suggestion which configuration we need to set to make it optimal
   for the cluster ?
   4. Is there any chance that this will get automatically resolve by
   memory/network buffer manager ?

Thanks a lot for the help

Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Flink v0.10.2

2016-01-13 Thread Welly Tambunan
Hi Robert,

We are on deadline for demo stage right now before production for
management so it would be great to have 0.10.2 for stable version within
this week if possible ?

Cheers

On Wed, Jan 13, 2016 at 4:13 PM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi,
>
> there are currently no planned releases. I would actually like to start
> preparing for the 1.0 release soon, but the community needs to discuss that
> first.
>
> How urgently do you need a 0.10.2 release? If this is the last blocker for
> using Flink in production at your company, I can push for the bugfix
> release.
>
>
> On Wed, Jan 13, 2016 at 8:39 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi All,
>>
>> We currently using snapshot version for development as we face Data
>> Stream union error. For deployment we may need to built the flink from
>> the master.
>>
>>
>> I want to ask when this version will be released ? Any roadmap and plan i
>> can look for this release ?
>>
>>
>> Thanks a lot
>>
>> Cheers
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Security in Flink

2016-01-12 Thread Welly Tambunan
Hi Stephan,

Thanks a lot for the explanation.

Is there any timeline on when this will be released ? I guess this one will
be the important for our case if we want Flink to be deployed in
production.

Cheers

On Tue, Jan 12, 2016 at 6:19 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Sourav!
>
> If you want to use Flink in a cluster where neither Hadoop/YARN (not soon
> Mesos) is available, then I assume you have installed Flink in a standalone
> mode on the cluster already.
>
> There is no support in Flink currently to manage user authentication. Few
> thoughts on how that may evolve
>
> 1) It should be not too hard to add authentication to the web dashboard.
> That way, if the cluster is otherwise blocked off (the master's RPC ports
> are firewalled), one would have restricted job starts.
>
> 2) We plan to add authenticated / encrypted connections soon. With that,
> the client that submits the program would need to have access to the
> keystore or key and the corresponding password to connect.
>
> Greetings,
> Stephan
>
>
>
> On Mon, Jan 11, 2016 at 3:46 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Thanks Steven for your details response. Things are more clear to me now.
>>
>> A follow up Qs -
>> Looks like most of the security support depends on Hadoop ? What happens
>> if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not
>> there) ?
>>
>> Regards,
>> Sourav
>>
>> On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi Sourav!
>>>
>>> There is user-authentication support in Flink via the Hadoop / Kerberos
>>> infrastructure. If you run Flink on YARN, it should seamlessly work that
>>> Flink acquires the Kerberos tokens of the user that submits programs, and
>>> authenticate itself at YARN, HDFS, and HBase with that.
>>>
>>> If you run Flink standalone, Flink can still authenticate at HDFS/HBase
>>> via Kerberos, with a bit of manual help by the user (running kinit on the
>>> workers).
>>>
>>> With Kafka 0.9 and Flink's upcoming connector (
>>> https://github.com/apache/flink/pull/1489), streaming programs can
>>> authenticate themselves as stream brokers via SSL (and read via encrypted
>>> connections).
>>>
>>>
>>> What we have on the roadmap for the coming months it the following:
>>>   - Encrypt in-flight data streams that are exchanged between worker
>>> nodes (TaskManagers).
>>>   - Encrypt the coordination messages between client/master/workers.
>>> Note that these refer to encryption between Flink's own components only,
>>> which would use transient keys generated just for a specific job or session
>>> (hence would not need any user involvement).
>>>
>>>
>>> Let us know if that answers your questions, and if that meets your
>>> requirements.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <
>>> sourav.mazumde...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can anyone point me to ant documentation on support for Security in
>>>> Flink ?
>>>>
>>>> The type of information I'm looking for are -
>>>>
>>>> 1. How do I do user level authentication to ensure that a job is
>>>> submitted/deleted/modified by the right user ? Is it possible though the
>>>> web client ?
>>>> 2. Authentication across multiple slave nodes (where the task managers
>>>> are running) and driver program so that they can communicate with each 
>>>> other
>>>> 3. Support for SSL/encryption for data exchanged happening across the
>>>> slave nodes
>>>> 4. Support for pluggable authentication with existing solution like LDAP
>>>>
>>>> If not there today is there a roadmap for these security features ?
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>
>>>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Security in Flink

2016-01-11 Thread Welly Tambunan
Hi Stephen,

Do you have any plan on which encryption method and mechanism will be used
on Flink ? Could you share about the detail on this ?

We have very strict requirement from client that every communication need
to be encryption. So any detail would be really appreciated for answering
their security concern.


Cheers

On Mon, Jan 11, 2016 at 9:46 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> Thanks Steven for your details response. Things are more clear to me now.
>
> A follow up Qs -
> Looks like most of the security support depends on Hadoop ? What happens
> if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not
> there) ?
>
> Regards,
> Sourav
>
> On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Sourav!
>>
>> There is user-authentication support in Flink via the Hadoop / Kerberos
>> infrastructure. If you run Flink on YARN, it should seamlessly work that
>> Flink acquires the Kerberos tokens of the user that submits programs, and
>> authenticate itself at YARN, HDFS, and HBase with that.
>>
>> If you run Flink standalone, Flink can still authenticate at HDFS/HBase
>> via Kerberos, with a bit of manual help by the user (running kinit on the
>> workers).
>>
>> With Kafka 0.9 and Flink's upcoming connector (
>> https://github.com/apache/flink/pull/1489), streaming programs can
>> authenticate themselves as stream brokers via SSL (and read via encrypted
>> connections).
>>
>>
>> What we have on the roadmap for the coming months it the following:
>>   - Encrypt in-flight data streams that are exchanged between worker
>> nodes (TaskManagers).
>>   - Encrypt the coordination messages between client/master/workers.
>> Note that these refer to encryption between Flink's own components only,
>> which would use transient keys generated just for a specific job or session
>> (hence would not need any user involvement).
>>
>>
>> Let us know if that answers your questions, and if that meets your
>> requirements.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Can anyone point me to ant documentation on support for Security in
>>> Flink ?
>>>
>>> The type of information I'm looking for are -
>>>
>>> 1. How do I do user level authentication to ensure that a job is
>>> submitted/deleted/modified by the right user ? Is it possible though the
>>> web client ?
>>> 2. Authentication across multiple slave nodes (where the task managers
>>> are running) and driver program so that they can communicate with each other
>>> 3. Support for SSL/encryption for data exchanged happening across the
>>> slave nodes
>>> 4. Support for pluggable authentication with existing solution like LDAP
>>>
>>> If not there today is there a roadmap for these security features ?
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Material on Apache flink internals

2015-12-04 Thread Welly Tambunan
Hi Madhu,

You can also check this page for the details on internals

https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
http://www.slideshare.net/KostasTzoumas/flink-internals

Cheers

On Fri, Dec 4, 2015 at 10:14 AM, madhu phatak <phatak@gmail.com> wrote:

> Hi,
> Thanks a lot for the resources.
> On Dec 1, 2015 9:11 PM, "Fabian Hueske" <fhue...@gmail.com> wrote:
>
>> Hi Madhu,
>>
>> checkout the following resources:
>>
>> - Apache Flink Blog: http://flink.apache.org/blog/index.html
>> - Data Artisans Blog: http://data-artisans.com/blog/
>> - Flink Forward Conference website (Talk slides & recordings):
>> http://flink-forward.org/?post_type=session
>> - Flink Meetup talk recordings:
>> https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA
>> - Slim's Flink Knowledge base:
>> http://sparkbigdata.com/component/tags/tag/27-flink
>>
>> Best, Fabian
>>
>> 2015-12-01 16:23 GMT+01:00 madhu phatak <phatak@gmail.com>:
>>
>>> Hi everyone,
>>>
>>> I am fascinated with flink core engine way of streaming of operators
>>> rather than typical map/reduce way that followed by hadoop or spark. Is any
>>> good documentation/blog/video avalable which talks about this internal. I
>>> am ok from a batch or streaming point of view.
>>>
>>> It will be great if some one can share this info. Thank you for your
>>> excellent work.
>>>
>>> --
>>> Regards,
>>> Madhukara Phatak
>>> http://datamantra.io/
>>>
>>
>>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Running WebClient from Windows

2015-12-02 Thread Welly Tambunan
Hi Fabian,

I have already created JIRA for this one.
https://issues.apache.org/jira/browse/FLINK-3099

Thanks a lot for this.

Cheers

On Wed, Dec 2, 2015 at 6:02 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Welly,
>
> at the moment we only provide native Windows .bat scripts for start-local
> and the CLI client.
> However, we check that the Unix scripts (including start-webclient.sh)
> work in a Windows Cygwin environment.
> I have to admit, I am not familiar with MinGW, so not sure what is
> happening there.
>
> It would be nice to have a Windows start script for the webclient though.
> Would you mind and open a JIRA for that?
>
> Thanks,
> Fabian
>
> 2015-12-02 3:00 GMT+01:00 Welly Tambunan <if05...@gmail.com>:
>
>> Hi All,
>>
>> Is there any way to run WebClient for uploading the job from windows ?
>>
>> I try to run that from mingw but has these error
>>
>>
>> $ bin/start-webclient.sh
>> /c/flink-0.10.0/bin/config.sh: line 261: conditional binary operator
>> expected
>> /c/flink-0.10.0/bin/config.sh: line 261: syntax error near `=~'
>> /c/flink-0.10.0/bin/config.sh: line 261: `if [[ "$SLAVE" =~
>> ^.*/([0-9a-zA-Z.
>> -]+)$ ]]; then'
>> /c/flink-0.10.0/bin/config.sh: line 261: conditional binary operator
>> expected
>> /c/flink-0.10.0/bin/config.sh: line 261: syntax error near `=~'
>> /c/flink-0.10.0/bin/config.sh: line 261: `if [[ "$SLAVE" =~
>> ^.*/([0-9a-zA-Z.
>> -]+)$ ]]; then'
>> Starting Flink webclient
>>
>> [Terminate]
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Hi Aljoscha,

Is this fix has already been available on 0.10-SNAPSHOT ?


Cheers

On Tue, Dec 1, 2015 at 6:04 PM, Welly Tambunan <if05...@gmail.com> wrote:

> Thanks a lot Aljoscha.
>
> When it will be released ?
>
> Cheers
>
> On Tue, Dec 1, 2015 at 5:48 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Hi,
>> I relaxed the restrictions on union. This should make it into an upcoming
>> 0.10.2 bugfix release.
>>
>> Cheers,
>> Aljoscha
>> > On 01 Dec 2015, at 11:23, Welly Tambunan <if05...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > After upgrading our system to the latest version from 0.9 to 0.10.1 we
>> have this following error.
>> >
>> > Exception in thread "main" java.lang.UnsupportedOperationException: A
>> DataStream cannot be unioned with itself
>> >
>> > Then i find the relevant JIRA for this one.
>> > https://issues.apache.org/jira/browse/FLINK-3080
>> >
>> > Is there any plan which release this will be ?
>> >
>> >
>> > Another issue i have after upgrading is can't union with different
>> level of parallelism.
>> >
>> > I think we will need to fall back to 0.9 again for the time being.
>> >
>> > Cheers
>> >
>> > --
>> > Welly Tambunan
>> > Triplelands
>> >
>> > http://weltam.wordpress.com
>> > http://www.triplelands.com
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Ok Robert,

Thanks a lot.

Looking forward to it.


Cheers

On Wed, Dec 2, 2015 at 5:50 AM, Robert Metzger <rmetz...@apache.org> wrote:

> No, its not yet merged into the source repo of Flink.
>
> You can find the code here: https://github.com/apache/flink/pull/1425
> You can also check out the code of the PR or download the PR contents as a
> patch and apply it to the Flink source.
>
> I think the change will be merged tomorrow and then you'll have it in
> 0.10-SNAPSHOT.
>
>
> For the 0.10.2 release: There are no concrete plans yet, but I think it'll
> happen in the next 2-3 weeks.
>
>
> On Tue, Dec 1, 2015 at 11:48 PM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi Aljoscha,
>>
>> Is this fix has already been available on 0.10-SNAPSHOT ?
>>
>>
>> Cheers
>>
>> On Tue, Dec 1, 2015 at 6:04 PM, Welly Tambunan <if05...@gmail.com> wrote:
>>
>>> Thanks a lot Aljoscha.
>>>
>>> When it will be released ?
>>>
>>> Cheers
>>>
>>> On Tue, Dec 1, 2015 at 5:48 PM, Aljoscha Krettek <aljos...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>> I relaxed the restrictions on union. This should make it into an
>>>> upcoming 0.10.2 bugfix release.
>>>>
>>>> Cheers,
>>>> Aljoscha
>>>> > On 01 Dec 2015, at 11:23, Welly Tambunan <if05...@gmail.com> wrote:
>>>> >
>>>> > Hi All,
>>>> >
>>>> > After upgrading our system to the latest version from 0.9 to 0.10.1
>>>> we have this following error.
>>>> >
>>>> > Exception in thread "main" java.lang.UnsupportedOperationException: A
>>>> DataStream cannot be unioned with itself
>>>> >
>>>> > Then i find the relevant JIRA for this one.
>>>> > https://issues.apache.org/jira/browse/FLINK-3080
>>>> >
>>>> > Is there any plan which release this will be ?
>>>> >
>>>> >
>>>> > Another issue i have after upgrading is can't union with different
>>>> level of parallelism.
>>>> >
>>>> > I think we will need to fall back to 0.9 again for the time being.
>>>> >
>>>> > Cheers
>>>> >
>>>> > --
>>>> > Welly Tambunan
>>>> > Triplelands
>>>> >
>>>> > http://weltam.wordpress.com
>>>> > http://www.triplelands.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Running WebClient from Windows

2015-12-01 Thread Welly Tambunan
Hi All,

Is there any way to run WebClient for uploading the job from windows ?

I try to run that from mingw but has these error


$ bin/start-webclient.sh
/c/flink-0.10.0/bin/config.sh: line 261: conditional binary operator
expected
/c/flink-0.10.0/bin/config.sh: line 261: syntax error near `=~'
/c/flink-0.10.0/bin/config.sh: line 261: `if [[ "$SLAVE" =~
^.*/([0-9a-zA-Z.
-]+)$ ]]; then'
/c/flink-0.10.0/bin/config.sh: line 261: conditional binary operator
expected
/c/flink-0.10.0/bin/config.sh: line 261: syntax error near `=~'
/c/flink-0.10.0/bin/config.sh: line 261: `if [[ "$SLAVE" =~
^.*/([0-9a-zA-Z.
-]+)$ ]]; then'
Starting Flink webclient

[Terminate]

Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Hi All,

After upgrading our system to the latest version from 0.9 to 0.10.1 we have
this following error.

Exception in thread "main" java.lang.UnsupportedOperationException: A
DataStream cannot be unioned with itself

Then i find the relevant JIRA for this one.
https://issues.apache.org/jira/browse/FLINK-3080

Is there any plan which release this will be ?


Another issue i have after upgrading is can't union with different level of
parallelism.

I think we will need to fall back to 0.9 again for the time being.

Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Thanks a lot Aljoscha.

When it will be released ?

Cheers

On Tue, Dec 1, 2015 at 5:48 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> I relaxed the restrictions on union. This should make it into an upcoming
> 0.10.2 bugfix release.
>
> Cheers,
> Aljoscha
> > On 01 Dec 2015, at 11:23, Welly Tambunan <if05...@gmail.com> wrote:
> >
> > Hi All,
> >
> > After upgrading our system to the latest version from 0.9 to 0.10.1 we
> have this following error.
> >
> > Exception in thread "main" java.lang.UnsupportedOperationException: A
> DataStream cannot be unioned with itself
> >
> > Then i find the relevant JIRA for this one.
> > https://issues.apache.org/jira/browse/FLINK-3080
> >
> > Is there any plan which release this will be ?
> >
> >
> > Another issue i have after upgrading is can't union with different level
> of parallelism.
> >
> > I think we will need to fall back to 0.9 again for the time being.
> >
> > Cheers
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Ufuk

>In failure cases I find YARN more convenient, because it takes care of
restarting failed task manager processes/containers for you.

So this mean that we don't need zookeeper ?


Cheers

On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi <u...@apache.org> wrote:

> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I would like to know if there any feature differences between using
> Standalone Cluster vs YARN ?
> >
> > Until now we are using Standalone cluster for our jobs.
> > Is there any added value for using YARN ?
> >
> > We don't have any hadoop infrastructure in place right now but we can
> provide that if there's some value to that.
>
> There are no features, which only work on YARN or in standalone clusters.
> YARN mode is essentially starting a standalone cluster in YARN containers.
>
> In failure cases I find YARN more convenient, because it takes care of
> restarting failed task manager processes/containers for you.
>
> – Ufuk
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Fabian,

Interesting !

However YARN is still tightly couple to HDFS, is that seems wasteful to use
only YARN without Hadoop ?

Currently we are using Cassandra and CFS ( cass file system )


Cheers

On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> A strong argument for YARN mode can be the isolation of multiple users and
> jobs. You can easily start a new Flink cluster for each job or user.
> However, this comes at the price of resource (memory) fragmentation. YARN
> mode does not use memory as effective as cluster mode.
>
> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:
>
>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I would like to know if there any feature differences between using
>> Standalone Cluster vs YARN ?
>> >
>> > Until now we are using Standalone cluster for our jobs.
>> > Is there any added value for using YARN ?
>> >
>> > We don't have any hadoop infrastructure in place right now but we can
>> provide that if there's some value to that.
>>
>> There are no features, which only work on YARN or in standalone clusters.
>> YARN mode is essentially starting a standalone cluster in YARN containers.
>>
>> In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> – Ufuk
>>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Andreas,

Yes, seems I can't avoid Zookeeper right now. It would be really nice if we
can achieve HA via gossip protocol like Cassandra/Spark DSE does ?

Is this possible ?


Cheers

On Wed, Nov 25, 2015 at 4:12 PM, Andreas Fritzler <
andreas.fritz...@gmail.com> wrote:

> Hi Welly,
>
> you will need Zookeeper if you want to setup the standalone cluster in HA
> mode.
> http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
>
> In the YARN case you probably have already Zookeeper in place if you are
> running YARN in HA mode.
>
> Regards,
> Andreas
>
> On Wed, Nov 25, 2015 at 10:02 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
>
>> Hi Ufuk
>>
>> >In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> So this mean that we don't need zookeeper ?
>>
>>
>> Cheers
>>
>> On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi <u...@apache.org> wrote:
>>
>>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I would like to know if there any feature differences between using
>>> Standalone Cluster vs YARN ?
>>> >
>>> > Until now we are using Standalone cluster for our jobs.
>>> > Is there any added value for using YARN ?
>>> >
>>> > We don't have any hadoop infrastructure in place right now but we can
>>> provide that if there's some value to that.
>>>
>>> There are no features, which only work on YARN or in standalone
>>> clusters. YARN mode is essentially starting a standalone cluster in YARN
>>> containers.
>>>
>>> In failure cases I find YARN more convenient, because it takes care of
>>> restarting failed task manager processes/containers for you.
>>>
>>> – Ufuk
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Standalone Cluster vs YARN

2015-11-24 Thread Welly Tambunan
Hi All,

I would like to know if there any feature differences between using
Standalone Cluster vs YARN ?

Until now we are using Standalone cluster for our jobs.
Is there any added value for using YARN ?

We don't have any hadoop infrastructure in place right now but we can
provide that if there's some value to that.


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Cancel Streaming Job

2015-11-24 Thread Welly Tambunan
Hi Gyula and Ufuk,

Thanks, I will give it a try.

Cheers

On Tue, Nov 24, 2015 at 3:42 PM, Ufuk Celebi <u...@apache.org> wrote:

> You can use the current release candidate if you like to try it out:
>
> Binaries are here:
>
> http://people.apache.org/~rmetzger/flink-0.10.1-rc1/
>
> The dependency with version 0.10.1 is found in the staging repositories:
>
> https://repository.apache.org/content/repositories/orgapacheflink-1058
>
> If you can wait a few more days, the official release vote will be over in
> <= 72 hours if no issues are found.
>
> – Ufuk
>
> > On 24 Nov 2015, at 08:26, Gyula Fóra <gyula.f...@gmail.com> wrote:
> >
> > Hi!
> >
> > This issue has been fixed very recently and the fix will go into the
> upcoming bugfix release. (0.10.1)
> >
> > Should be out in the next few days :)
> >
> > Cheers
> > Gyula
> > On Tue, Nov 24, 2015 at 4:49 AM Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi All,
> >
> > Finally i've found the solution for killing the job manager.
> >
> >
> https://flink.apache.org/faq.html#i-cant-stop-flink-with-the-provided-stop-scripts-what-can-i-do
> >
> >
> > But i do really hope that we have that cancel button for restarting job.
> >
> >
> > Cheers
> >
> > On Tue, Nov 24, 2015 at 8:30 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi All,
> >
> > Is there any way to stop/cancel the job that's restarting ?
> >
> > I have already stop the cluster and start it again but seems it's still
> restarting in dashboard.
> > I also try to cancel the job via CLI by running bin/flink cancel
>  but it's not working.
> >
> >
> >
> > Cheers
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
> >
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Adding TaskManager on Cluster

2015-11-24 Thread Welly Tambunan
Hi Till,

I've just tried that. It's works like a charm. Thanks a lot.

Is there any documentation on taskmanager.sh and other script and the
parameters ? I try to look at the docs but can't find it

Thanks again



Cheers

On Tue, Nov 24, 2015 at 4:29 PM, Till Rohrmann <till.rohrm...@gmail.com>
wrote:

> Hi Welly,
>
> you can always start a new TaskManager by simply calling taskmanager.sh
> start [streaming|batch], depending whether you are running a streaming
> cluster or a batch cluster. You can find the script in /bin.
>
> Cheers,
> Till
> ​
>
> On Tue, Nov 24, 2015 at 10:27 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
>
>> What i'm looking for here is the ability to add a node to the cluster
>> (scale out) when there's no task slot left for use.
>>
>>
>> On Tue, Nov 24, 2015 at 4:24 PM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Currently we are running flink using standalone mode.
>>>
>>> Is there any way to add one node ( task manager ) to the cluster without
>>> bringing the cluster down ?
>>>
>>>
>>> Cheers
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Adding TaskManager on Cluster

2015-11-24 Thread Welly Tambunan
Hi All,

Currently we are running flink using standalone mode.

Is there any way to add one node ( task manager ) to the cluster without
bringing the cluster down ?


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Cancel Streaming Job

2015-11-23 Thread Welly Tambunan
Hi All,

Is there any way to stop/cancel the job that's restarting ?

I have already stop the cluster and start it again but seems it's still
restarting in dashboard.
I also try to cancel the job via CLI by running bin/flink cancel 
but it's not working.



Cheers


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Cancel Streaming Job

2015-11-23 Thread Welly Tambunan
Hi All,

Finally i've found the solution for killing the job manager.

https://flink.apache.org/faq.html#i-cant-stop-flink-with-the-provided-stop-scripts-what-can-i-do


But i do really hope that we have that cancel button for restarting job.


Cheers

On Tue, Nov 24, 2015 at 8:30 AM, Welly Tambunan <if05...@gmail.com> wrote:

> Hi All,
>
> Is there any way to stop/cancel the job that's restarting ?
>
> I have already stop the cluster and start it again but seems it's still
> restarting in dashboard.
> I also try to cancel the job via CLI by running bin/flink cancel 
> but it's not working.
>
>
>
> Cheers
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Specially introduced Flink to chinese users in CNCC(China National Computer Congress)

2015-11-18 Thread Welly Tambunan
agree,

and Stateful Streaming operator instance in Flink is looks natural compare
to Apache Spark.

On Thu, Nov 19, 2015 at 10:54 AM, Liang Chen <chenliang...@huawei.com>
wrote:

> Two aspects are attracting them:
> 1.Flink is using java, it is easy for most of them to start Flink, and be
> more easy to maintain in comparison to Storm(as Clojure is difficult to
> maintain, and less people know it.)
> 2.Users really want an unified system supporting streaming and batch
> processing.
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Specially-introduced-Flink-to-chinese-users-in-CNCC-China-National-Computer-Congress-tp3254p3574.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Welly Tambunan
Hi Stephan,

So that will be in Flink 1.0 right ?

Cheers

On Mon, Nov 16, 2015 at 9:06 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Anwar!
>
> 0.10.0 was feature frozen at that time already and under testing.
> Key/value state on connected streams will have to go into the next
> release...
>
> Stephan
>
>
> On Mon, Nov 16, 2015 at 3:00 PM, Anwar Rizal <anriza...@gmail.com> wrote:
>
>> Stephan,
>>
>> Having a look at the brand new 0.10 release, I noticed that OperatorState
>> is not implemented for ConnectedStream, which is quite the opposite of what
>> you said below.
>>
>> Or maybe I misunderstood your sentence here ?
>>
>> Thanks,
>> Anwar.
>>
>>
>> On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi!
>>>
>>> In general, if you can keep state in Flink, you get better
>>> throughput/latency/consistency and have one less system to worry about
>>> (external k/v store). State outside means that the Flink processes can be
>>> slimmer and need fewer resources and as such recover a bit faster. There
>>> are use cases for that as well.
>>>
>>> Storing the model in OperatorState is a good idea, if you can. On the
>>> roadmap is to migrate the operator state to managed memory as well, so that
>>> should take care of the GC issues.
>>>
>>> We are just adding functionality to make the Key/Value operator state
>>> usable in CoMap/CoFlatMap as well (currently it only works in windows and
>>> in Map/FlatMap/Filter functions over the KeyedStream).
>>> Until the, you should be able to use a simple Java HashMap and use the
>>> "Checkpointed" interface to get it persistent.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the answer.
>>>>
>>>> Currently the approach that i'm using right now is creating a
>>>> base/marker interface to stream different type of message to the same
>>>> operator. Not sure about the performance hit about this compare to the
>>>> CoFlatMap function.
>>>>
>>>> Basically this one is providing query cache, so i'm thinking instead of
>>>> using in memory cache like redis, ignite etc, i can just use operator state
>>>> for this one.
>>>>
>>>> I just want to gauge do i need to use memory cache or operator state
>>>> would be just fine.
>>>>
>>>> However i'm concern about the Gen 2 Garbage Collection for caching our
>>>> own state without using operator state. Is there any clarification on that
>>>> one ?
>>>>
>>>>
>>>>
>>>> On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anriza...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Let me understand your case better here. You have a stream of model
>>>>> and stream of data. To process the data, you will need a way to access 
>>>>> your
>>>>> model from the subsequent stream operations (map, filter, flatmap, ..).
>>>>> I'm not sure in which case Operator State is a good choice, but I
>>>>> think you can also live without.
>>>>>
>>>>> val modelStream =  // get the model stream
>>>>> val dataStream   =
>>>>>
>>>>> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can
>>>>> keep the latest model in a CoFlatMapRichFunction, not necessarily as
>>>>> Operator State, although maybe OperatorState is a good choice too.
>>>>>
>>>>> Does it make sense to you ?
>>>>>
>>>>> Anwar
>>>>>
>>>>> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> We have a high density data that required a downsample. However this
>>>>>> downsample model is very flexible based on the client device and user
>>>>>> interaction. So it will be wasteful to precompute and store to db.
>>>>>>
>>>>>> So we want to use Apache Flink to do downsampling and cache the
>>>>>> result for subsequent query.
>>>>>>
>>>>>> We are considering using Flink Operator state for that one.
>>>>>>
>>>>>> Is that the right approach to use that for memory cache ? Or if that
>>>>>> preferable using memory cache like redis etc.
>>>>>>
>>>>>> Any comments will be appreciated.
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>> --
>>>>>> Welly Tambunan
>>>>>> Triplelands
>>>>>>
>>>>>> http://weltam.wordpress.com
>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink 0.10.0 released

2015-11-16 Thread Welly Tambunan
Great Job guys,

So this is the first production ready for Streaming API !

Cool !

Cheers

On Mon, Nov 16, 2015 at 9:02 PM, Leonard Wolters <leon...@sagent.io> wrote:

> congrats!
>
> L.
>
>
> On 16-11-15 14:53, Fabian Hueske wrote:
>
> Hi everybody,
>
> The Flink community is excited to announce that Apache Flink 0.10.0 has
> been released.
> Please find the release announcement here:
>
> -->  http://flink.apache.org/news/2015/11/16/release-0.10.0.html
>
> Best,
> Fabian
>
>
> --
> Leonard Wolters
> Chief Product Manager
> *M*: +31 (0)6 55 53 04 01 | *T*: +31 (0)88 10 44 555
> *E*: leon...@sagent.io | *W*: sagent.io | Disclaimer
> <http://sagent.io/email-disclaimer> | Sagent BV
> Herengracht 504 | 1017CB Amsterdam | Netherlands
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-15 Thread Welly Tambunan
Hi Kostas,

Yes. Exactly. Thanks a lot for this one.

That's really what we need !


Cheers

On Sun, Nov 15, 2015 at 8:53 PM, Kostas Tzoumas <ktzou...@apache.org> wrote:

> Hi Wally,
>
> This version adds support for specifying and switching between time
> semantics - processing time, ingestion time, or event time.
>
> When working with event time, you can specify watermarks to track the
> progress of event time. So, even if events arrive out of order, windows
> will be specified on the event time (not arrival time), and the computation
> will be triggered on watermark arrival.
>
> You can see the API reference and an example here:
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#working-with-time
>
> Is this what you are looking for?
>
> Kostas
>
>
> On Sat, Nov 14, 2015 at 1:54 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi Robert,
>>
>> Is this version has already handle the stream perfection or out of order
>> event ?
>>
>> Any resource on how this work and the API reference ?
>>
>>
>> Cheers
>>
>> On Fri, Nov 13, 2015 at 4:00 PM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>>> Awesome !
>>>
>>> This is really the best weekend gift ever. :)
>>>
>>> Cheers
>>>
>>> On Fri, Nov 13, 2015 at 3:54 PM, Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> Hi Welly,
>>>> Flink 0.10.0 is out, its just not announced yet.
>>>> Its available on maven central and the global mirrors are currently
>>>> syncing it. This mirror for example has the update already:
>>>> http://apache.mirror.digionline.de/flink/flink-0.10.0/
>>>>
>>>> On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan <if05...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Aljoscha,
>>>>>
>>>>> Thanks for this one. Looking forward for 0.10 release version.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek <aljos...@apache.org
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>> I don’t know yet when the operator state will be transitioned to
>>>>>> managed memory but it could happen for 1.0 (which will come after 0.10).
>>>>>> The good thing is that the interfaces won’t change, so state can be used 
>>>>>> as
>>>>>> it is now.
>>>>>>
>>>>>> For 0.10, the release vote is winding down right now, so you can
>>>>>> expect the release to happen today or tomorrow. I think the streaming is
>>>>>> production ready now, we expect to mostly to hardening and some
>>>>>> infrastructure changes (for example annotations that specify API 
>>>>>> stability)
>>>>>> for the 1.0 release.
>>>>>>
>>>>>> Let us know if you need more information.
>>>>>>
>>>>>> Cheers,
>>>>>> Aljoscha
>>>>>> > On 12 Nov 2015, at 02:42, Welly Tambunan <if05...@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi Stephan,
>>>>>> >
>>>>>> > >Storing the model in OperatorState is a good idea, if you can. On
>>>>>> the roadmap is to migrate the operator state to managed memory as well, 
>>>>>> so
>>>>>> that should take care of the GC issues.
>>>>>> > Is this using off the heap memory ? Which version we expect this
>>>>>> one to be available ?
>>>>>> >
>>>>>> > Another question is when will the release version of 0.10 will be
>>>>>> out ? We would love to upgrade to that one when it's available. That
>>>>>> version will be a production ready streaming right ?
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen <se...@apache.org>
>>>>>> wrote:
>>>>>> > Hi!
>>>>>> >
>>>>>> > In general, if you can keep state in Flink, you get better
>>>>>> throughput/latency/consistency and have one less system to worry about
>>>>>> (external k/v store). State outside means that the Flink pr

Apache Flink Forward Videos

2015-11-13 Thread Welly Tambunan
Hi All,

I've just notice that the video has already available for this one.

http://flink-forward.org/?post_type=session


Another weekend gift for all.

Cheers
-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-13 Thread Welly Tambunan
Hi Aljoscha,

Thanks for this one. Looking forward for 0.10 release version.

Cheers

On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> I don’t know yet when the operator state will be transitioned to managed
> memory but it could happen for 1.0 (which will come after 0.10). The good
> thing is that the interfaces won’t change, so state can be used as it is
> now.
>
> For 0.10, the release vote is winding down right now, so you can expect
> the release to happen today or tomorrow. I think the streaming is
> production ready now, we expect to mostly to hardening and some
> infrastructure changes (for example annotations that specify API stability)
> for the 1.0 release.
>
> Let us know if you need more information.
>
> Cheers,
> Aljoscha
> > On 12 Nov 2015, at 02:42, Welly Tambunan <if05...@gmail.com> wrote:
> >
> > Hi Stephan,
> >
> > >Storing the model in OperatorState is a good idea, if you can. On the
> roadmap is to migrate the operator state to managed memory as well, so that
> should take care of the GC issues.
> > Is this using off the heap memory ? Which version we expect this one to
> be available ?
> >
> > Another question is when will the release version of 0.10 will be out ?
> We would love to upgrade to that one when it's available. That version will
> be a production ready streaming right ?
> >
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen <se...@apache.org> wrote:
> > Hi!
> >
> > In general, if you can keep state in Flink, you get better
> throughput/latency/consistency and have one less system to worry about
> (external k/v store). State outside means that the Flink processes can be
> slimmer and need fewer resources and as such recover a bit faster. There
> are use cases for that as well.
> >
> > Storing the model in OperatorState is a good idea, if you can. On the
> roadmap is to migrate the operator state to managed memory as well, so that
> should take care of the GC issues.
> >
> > We are just adding functionality to make the Key/Value operator state
> usable in CoMap/CoFlatMap as well (currently it only works in windows and
> in Map/FlatMap/Filter functions over the KeyedStream).
> > Until the, you should be able to use a simple Java HashMap and use the
> "Checkpointed" interface to get it persistent.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Thanks for the answer.
> >
> > Currently the approach that i'm using right now is creating a
> base/marker interface to stream different type of message to the same
> operator. Not sure about the performance hit about this compare to the
> CoFlatMap function.
> >
> > Basically this one is providing query cache, so i'm thinking instead of
> using in memory cache like redis, ignite etc, i can just use operator state
> for this one.
> >
> > I just want to gauge do i need to use memory cache or operator state
> would be just fine.
> >
> > However i'm concern about the Gen 2 Garbage Collection for caching our
> own state without using operator state. Is there any clarification on that
> one ?
> >
> >
> >
> > On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anriza...@gmail.com>
> wrote:
> >
> > Let me understand your case better here. You have a stream of model and
> stream of data. To process the data, you will need a way to access your
> model from the subsequent stream operations (map, filter, flatmap, ..).
> > I'm not sure in which case Operator State is a good choice, but I think
> you can also live without.
> >
> > val modelStream =  // get the model stream
> > val dataStream   =
> >
> > modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can
> keep the latest model in a CoFlatMapRichFunction, not necessarily as
> Operator State, although maybe OperatorState is a good choice too.
> >
> > Does it make sense to you ?
> >
> > Anwar
> >
> > On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi All,
> >
> > We have a high density data that required a downsample. However this
> downsample model is very flexible based on the client device and user
> interaction. So it will be wasteful to precompute and store to db.
> >
> > So we want to use Apache Flink to do downsampling and cache the result
> for subsequent query.
> >
> > We are considering using Flink Operator state for that one.
> >
> >

Re: Flink, Kappa and Lambda

2015-11-13 Thread Welly Tambunan
Hi rss rss,

Yes. I have already read that book.

However given the state of streaming right now, and Kappa Architecture, I
don't think we need Lambda Architecture again ?

Any thoughts ?

On Thu, Nov 12, 2015 at 12:29 PM, rss rss <rssde...@gmail.com> wrote:

> Hello,
>
>   regarding the Lambda architecture there is a following book -
> https://www.manning.com/books/big-data (Big Data. Principles and best
> practices of scalable realtime data systems
>  Nathan Marz and James Warren).
>
> Regards,
> Roman
>
> 2015-11-12 4:47 GMT+03:00 Welly Tambunan <if05...@gmail.com>:
>
>> Hi Stephan,
>>
>>
>> Thanks for your response.
>>
>>
>> We are trying to justify whether it's enough to use Kappa Architecture
>> with Flink. This more about resiliency and message lost issue etc.
>>
>> The article is worry about message lost even if you are using Kafka.
>>
>> No matter the message queue or broker you rely on whether it be RabbitMQ,
>> JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in
>> any of the following ways:
>>
>>- A downstream system from the broker can have data loss
>>- All message queues today can lose already acknowledged messages
>>during failover or leader election.
>>- A bug can send the wrong messages to the wrong systems.
>>
>> Cheers
>>
>> On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi!
>>>
>>> Can you explain a little more what you want to achieve? Maybe then we
>>> can give a few more comments...
>>>
>>> I briefly read through some of the articles you linked, but did not
>>> quite understand their train of thoughts.
>>> For example, letting Tomcat write to Cassandra directly, and to Kafka,
>>> might just be redundant. Why not let the streaming job that reads the Kafka
>>> queue
>>> move the data to Cassandra as one of its results? Further more, durable
>>> storing the sequence of events is exactly what Kafka does, but the article
>>> suggests to use Cassandra for that, which I find very counter intuitive.
>>> It looks a bit like the suggested approach is only adopting streaming for
>>> half the task.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I read a couple of article about Kappa and Lambda Architecture.
>>>>
>>>>
>>>> http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/
>>>>
>>>> I'm convince that Flink will simplify this one with streaming.
>>>>
>>>> However i also stumble upon this blog post that has valid argument to
>>>> have a system of record storage ( event sourcing ) and finally lambda
>>>> architecture is appear at the solution. Basically it will write twice to
>>>> Queuing system and C* for safety. System of record here is basically
>>>> storing the event (delta).
>>>>
>>>> [image: Inline image 1]
>>>>
>>>>
>>>> https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/
>>>>
>>>> Another approach is about lambda architecture for maintaining the
>>>> correctness of the system.
>>>>
>>>>
>>>> https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/
>>>>
>>>>
>>>> Given that he's using Spark for the streaming processor, do we have to
>>>> do the same thing with Apache Flink ?
>>>>
>>>>
>>>>
>>>> Cheers
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-13 Thread Welly Tambunan
Awesome !

This is really the best weekend gift ever. :)

Cheers

On Fri, Nov 13, 2015 at 3:54 PM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi Welly,
> Flink 0.10.0 is out, its just not announced yet.
> Its available on maven central and the global mirrors are currently
> syncing it. This mirror for example has the update already:
> http://apache.mirror.digionline.de/flink/flink-0.10.0/
>
> On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi Aljoscha,
>>
>> Thanks for this one. Looking forward for 0.10 release version.
>>
>> Cheers
>>
>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Hi,
>>> I don’t know yet when the operator state will be transitioned to managed
>>> memory but it could happen for 1.0 (which will come after 0.10). The good
>>> thing is that the interfaces won’t change, so state can be used as it is
>>> now.
>>>
>>> For 0.10, the release vote is winding down right now, so you can expect
>>> the release to happen today or tomorrow. I think the streaming is
>>> production ready now, we expect to mostly to hardening and some
>>> infrastructure changes (for example annotations that specify API stability)
>>> for the 1.0 release.
>>>
>>> Let us know if you need more information.
>>>
>>> Cheers,
>>> Aljoscha
>>> > On 12 Nov 2015, at 02:42, Welly Tambunan <if05...@gmail.com> wrote:
>>> >
>>> > Hi Stephan,
>>> >
>>> > >Storing the model in OperatorState is a good idea, if you can. On the
>>> roadmap is to migrate the operator state to managed memory as well, so that
>>> should take care of the GC issues.
>>> > Is this using off the heap memory ? Which version we expect this one
>>> to be available ?
>>> >
>>> > Another question is when will the release version of 0.10 will be out
>>> ? We would love to upgrade to that one when it's available. That version
>>> will be a production ready streaming right ?
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>> > Hi!
>>> >
>>> > In general, if you can keep state in Flink, you get better
>>> throughput/latency/consistency and have one less system to worry about
>>> (external k/v store). State outside means that the Flink processes can be
>>> slimmer and need fewer resources and as such recover a bit faster. There
>>> are use cases for that as well.
>>> >
>>> > Storing the model in OperatorState is a good idea, if you can. On the
>>> roadmap is to migrate the operator state to managed memory as well, so that
>>> should take care of the GC issues.
>>> >
>>> > We are just adding functionality to make the Key/Value operator state
>>> usable in CoMap/CoFlatMap as well (currently it only works in windows and
>>> in Map/FlatMap/Filter functions over the KeyedStream).
>>> > Until the, you should be able to use a simple Java HashMap and use the
>>> "Checkpointed" interface to get it persistent.
>>> >
>>> > Greetings,
>>> > Stephan
>>> >
>>> >
>>> > On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>> > Thanks for the answer.
>>> >
>>> > Currently the approach that i'm using right now is creating a
>>> base/marker interface to stream different type of message to the same
>>> operator. Not sure about the performance hit about this compare to the
>>> CoFlatMap function.
>>> >
>>> > Basically this one is providing query cache, so i'm thinking instead
>>> of using in memory cache like redis, ignite etc, i can just use operator
>>> state for this one.
>>> >
>>> > I just want to gauge do i need to use memory cache or operator state
>>> would be just fine.
>>> >
>>> > However i'm concern about the Gen 2 Garbage Collection for caching our
>>> own state without using operator state. Is there any clarification on that
>>> one ?
>>> >
>>> >
>>> >
>>> > On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anriza...@gmail.com>
>>> wrote:
>>> >
>>> > Let me understand your case better here. You have a stream of 

Re: Apache Flink Forward Videos

2015-11-13 Thread Welly Tambunan
Thanks Max,

I see that's all the videos has already been there. The keynote has also
been uploaded.

Great stuff !!

Cheers

On Fri, Nov 13, 2015 at 5:12 PM, Maximilian Michels <m...@apache.org> wrote:

> Hi Welly,
>
> Thanks for sharing! The videos are coming. They soon will all be available.
>
> Cheers,
> Max
>
> On Fri, Nov 13, 2015 at 11:08 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi All,
> >
> > I've just notice that the video has already available for this one.
> >
> > http://flink-forward.org/?post_type=session
> >
> >
> > Another weekend gift for all.
> >
> > Cheers
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-13 Thread Welly Tambunan
Hi Robert,

Is this version has already handle the stream perfection or out of order
event ?

Any resource on how this work and the API reference ?


Cheers

On Fri, Nov 13, 2015 at 4:00 PM, Welly Tambunan <if05...@gmail.com> wrote:

> Awesome !
>
> This is really the best weekend gift ever. :)
>
> Cheers
>
> On Fri, Nov 13, 2015 at 3:54 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hi Welly,
>> Flink 0.10.0 is out, its just not announced yet.
>> Its available on maven central and the global mirrors are currently
>> syncing it. This mirror for example has the update already:
>> http://apache.mirror.digionline.de/flink/flink-0.10.0/
>>
>> On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>>> Hi Aljoscha,
>>>
>>> Thanks for this one. Looking forward for 0.10 release version.
>>>
>>> Cheers
>>>
>>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek <aljos...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>> I don’t know yet when the operator state will be transitioned to
>>>> managed memory but it could happen for 1.0 (which will come after 0.10).
>>>> The good thing is that the interfaces won’t change, so state can be used as
>>>> it is now.
>>>>
>>>> For 0.10, the release vote is winding down right now, so you can expect
>>>> the release to happen today or tomorrow. I think the streaming is
>>>> production ready now, we expect to mostly to hardening and some
>>>> infrastructure changes (for example annotations that specify API stability)
>>>> for the 1.0 release.
>>>>
>>>> Let us know if you need more information.
>>>>
>>>> Cheers,
>>>> Aljoscha
>>>> > On 12 Nov 2015, at 02:42, Welly Tambunan <if05...@gmail.com> wrote:
>>>> >
>>>> > Hi Stephan,
>>>> >
>>>> > >Storing the model in OperatorState is a good idea, if you can. On
>>>> the roadmap is to migrate the operator state to managed memory as well, so
>>>> that should take care of the GC issues.
>>>> > Is this using off the heap memory ? Which version we expect this one
>>>> to be available ?
>>>> >
>>>> > Another question is when will the release version of 0.10 will be out
>>>> ? We would love to upgrade to that one when it's available. That version
>>>> will be a production ready streaming right ?
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen <se...@apache.org>
>>>> wrote:
>>>> > Hi!
>>>> >
>>>> > In general, if you can keep state in Flink, you get better
>>>> throughput/latency/consistency and have one less system to worry about
>>>> (external k/v store). State outside means that the Flink processes can be
>>>> slimmer and need fewer resources and as such recover a bit faster. There
>>>> are use cases for that as well.
>>>> >
>>>> > Storing the model in OperatorState is a good idea, if you can. On the
>>>> roadmap is to migrate the operator state to managed memory as well, so that
>>>> should take care of the GC issues.
>>>> >
>>>> > We are just adding functionality to make the Key/Value operator state
>>>> usable in CoMap/CoFlatMap as well (currently it only works in windows and
>>>> in Map/FlatMap/Filter functions over the KeyedStream).
>>>> > Until the, you should be able to use a simple Java HashMap and use
>>>> the "Checkpointed" interface to get it persistent.
>>>> >
>>>> > Greetings,
>>>> > Stephan
>>>> >
>>>> >
>>>> > On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05...@gmail.com>
>>>> wrote:
>>>> > Thanks for the answer.
>>>> >
>>>> > Currently the approach that i'm using right now is creating a
>>>> base/marker interface to stream different type of message to the same
>>>> operator. Not sure about the performance hit about this compare to the
>>>> CoFlatMap function.
>>>> >
>>>> > Basically this one is providing query cache, so i'm thinking instead
>>>> of using in memory cache like redis, ignite etc, i can just use operator
&

Re: Multilang Support on Flink

2015-11-13 Thread Welly Tambunan
Hi Max,

Do you know where the repo is ?

I try to search on the flink staging but seems it's not there anymore ( via
google)

Cheers

On Fri, Nov 13, 2015 at 5:07 PM, Maximilian Michels <m...@apache.org> wrote:

> Hi Welly,
>
> There is a protocol for communicating with other processes. This is
> reflected in flink-language-binding-generic module. I'm not aware how
> Spark or Storm communication protocols work but this protocol is
> rather low level.
>
> Cheers,
> Max
>
> On Fri, Nov 13, 2015 at 9:49 AM, Welly Tambunan <if05...@gmail.com> wrote:
> > Hi All,
> >
> > I want to ask if there's multilang support ( like in Storm and pipeTo in
> > Spark ) in flink ?
> >
> > I try to find it in the docs but can't find it.
> >
> > Any link or direction would be really appreciated.
> >
> >
> > Cheers
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Flink, Kappa and Lambda

2015-11-11 Thread Welly Tambunan
Hi Stephan,


Thanks for your response.


We are trying to justify whether it's enough to use Kappa Architecture with
Flink. This more about resiliency and message lost issue etc.

The article is worry about message lost even if you are using Kafka.

No matter the message queue or broker you rely on whether it be RabbitMQ,
JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in
any of the following ways:

   - A downstream system from the broker can have data loss
   - All message queues today can lose already acknowledged messages during
   failover or leader election.
   - A bug can send the wrong messages to the wrong systems.

Cheers

On Wed, Nov 11, 2015 at 4:13 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> Can you explain a little more what you want to achieve? Maybe then we can
> give a few more comments...
>
> I briefly read through some of the articles you linked, but did not quite
> understand their train of thoughts.
> For example, letting Tomcat write to Cassandra directly, and to Kafka,
> might just be redundant. Why not let the streaming job that reads the Kafka
> queue
> move the data to Cassandra as one of its results? Further more, durable
> storing the sequence of events is exactly what Kafka does, but the article
> suggests to use Cassandra for that, which I find very counter intuitive.
> It looks a bit like the suggested approach is only adopting streaming for
> half the task.
>
> Greetings,
> Stephan
>
>
> On Tue, Nov 10, 2015 at 7:49 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi All,
>>
>> I read a couple of article about Kappa and Lambda Architecture.
>>
>>
>> http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/
>>
>> I'm convince that Flink will simplify this one with streaming.
>>
>> However i also stumble upon this blog post that has valid argument to
>> have a system of record storage ( event sourcing ) and finally lambda
>> architecture is appear at the solution. Basically it will write twice to
>> Queuing system and C* for safety. System of record here is basically
>> storing the event (delta).
>>
>> [image: Inline image 1]
>>
>>
>> https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/
>>
>> Another approach is about lambda architecture for maintaining the
>> correctness of the system.
>>
>>
>> https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/
>>
>>
>> Given that he's using Spark for the streaming processor, do we have to do
>> the same thing with Apache Flink ?
>>
>>
>>
>> Cheers
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Flink, Kappa and Lambda

2015-11-09 Thread Welly Tambunan
Hi All,

I read a couple of article about Kappa and Lambda Architecture.

http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink/

I'm convince that Flink will simplify this one with streaming.

However i also stumble upon this blog post that has valid argument to have
a system of record storage ( event sourcing ) and finally lambda
architecture is appear at the solution. Basically it will write twice to
Queuing system and C* for safety. System of record here is basically
storing the event (delta).

[image: Inline image 1]

https://lostechies.com/ryansvihla/2015/09/17/event-sourcing-and-system-of-record-sane-distributed-development-in-the-modern-era-2/

Another approach is about lambda architecture for maintaining the
correctness of the system.

https://lostechies.com/ryansvihla/2015/09/17/real-time-analytics-with-spark-streaming-and-cassandra/


Given that he's using Spark for the streaming processor, do we have to do
the same thing with Apache Flink ?



Cheers
-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Apache Flink Operator State as Query Cache

2015-11-08 Thread Welly Tambunan
Thanks for the answer.

Currently the approach that i'm using right now is creating a base/marker
interface to stream different type of message to the same operator. Not
sure about the performance hit about this compare to the CoFlatMap
function.

Basically this one is providing query cache, so i'm thinking instead of
using in memory cache like redis, ignite etc, i can just use operator state
for this one.

I just want to gauge do i need to use memory cache or operator state would
be just fine.

However i'm concern about the Gen 2 Garbage Collection for caching our own
state without using operator state. Is there any clarification on that one
?



On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anriza...@gmail.com> wrote:

>
> Let me understand your case better here. You have a stream of model and
> stream of data. To process the data, you will need a way to access your
> model from the subsequent stream operations (map, filter, flatmap, ..).
> I'm not sure in which case Operator State is a good choice, but I think
> you can also live without.
>
> val modelStream =  // get the model stream
> val dataStream   =
>
> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can keep
> the latest model in a CoFlatMapRichFunction, not necessarily as Operator
> State, although maybe OperatorState is a good choice too.
>
> Does it make sense to you ?
>
> Anwar
>
> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi All,
>>
>> We have a high density data that required a downsample. However this
>> downsample model is very flexible based on the client device and user
>> interaction. So it will be wasteful to precompute and store to db.
>>
>> So we want to use Apache Flink to do downsampling and cache the result
>> for subsequent query.
>>
>> We are considering using Flink Operator state for that one.
>>
>> Is that the right approach to use that for memory cache ? Or if that
>> preferable using memory cache like redis etc.
>>
>> Any comments will be appreciated.
>>
>>
>> Cheers
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan,

Cheers

On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen <se...@apache.org> wrote:

> We will definitely also try to get the chaining overhead down a bit.
>
> BTW: To reach this kind of throughput, you need sources that can produce
> very fast...
>
> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> That's good information to know. We will hit that throughput easily. Our
>> computation graph has lot of chaining like this right now.
>> I think it's safe to minimize the chain right now.
>>
>> Thanks a lot for this Stephan.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> In a set of benchmarks a while back, we found that the chaining
>>> mechanism has some overhead right now, because of its abstraction. The
>>> abstraction creates iterators for each element and makes it hard for the
>>> JIT to specialize on the operators in the chain.
>>>
>>> For purely local chains at full speed, this overhead is observable (can
>>> decrease throughput from 25mio elements/core to 15-20mio elements per
>>> core). If your job does not reach that throughput, or is I/O bound, source
>>> bound, etc, it does not matter.
>>>
>>> If you care about super high performance, collapsing the code into one
>>> function helps.
>>>
>>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gyula,
>>>>
>>>> Thanks for your response. Seems i will use filter and map for now as
>>>> that one is really make the intention clear, and not a big performance hit.
>>>>
>>>> Thanks again.
>>>>
>>>> Cheers
>>>>
>>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey Welly,
>>>>>
>>>>> If you call filter and map one after the other like you mentioned,
>>>>> these operators will be chained and executed as if they were running in 
>>>>> the
>>>>> same operator.
>>>>> The only small performance overhead comes from the fact that the
>>>>> output of the filter will be copied before passing it as input to the map
>>>>> to keep immutability guarantees (but no serialization/deserialization will
>>>>> happen). Copying might be practically free depending on your data type,
>>>>> though.
>>>>>
>>>>> If you are using operators that don't make use of the immutability of
>>>>> inputs/outputs (i.e you don't hold references to those values) than you 
>>>>> can
>>>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>>>> in which case they will have exactly the same performance.
>>>>>
>>>>> Cheers,
>>>>> Gyula
>>>>>
>>>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept.
>>>>> 3., Cs, 4:33):
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I would like to filter some item from the event stream. I think there
>>>>>> are two ways doing this.
>>>>>>
>>>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>>>> flatMap for doing both in the same operator.
>>>>>>
>>>>>> Any performance improvement if we are using flatMap ? As that will be
>>>>>> done in one operator instance.
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Welly Tambunan
>>>>>> Triplelands
>>>>>>
>>>>>> http://weltam.wordpress.com
>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan,

Thanks for your clarification.

Basically we will have lots of sensor that will push this kind of data to
queuing system ( currently we are using RabbitMQ, but will soon move to
Kafka).
We also will use the same pipeline to process the historical data.

I also want to minimize the chaining as in the filter is doing very little
work. By minimizing the pipeline we can minimize db/external source hit and
cached local data efficiently.

Cheers

On Fri, Sep 4, 2015 at 2:58 PM, Welly Tambunan <if05...@gmail.com> wrote:

> Hi Stephan,
>
> Cheers
>
> On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> We will definitely also try to get the chaining overhead down a bit.
>>
>> BTW: To reach this kind of throughput, you need sources that can produce
>> very fast...
>>
>> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> That's good information to know. We will hit that throughput easily. Our
>>> computation graph has lot of chaining like this right now.
>>> I think it's safe to minimize the chain right now.
>>>
>>> Thanks a lot for this Stephan.
>>>
>>> Cheers
>>>
>>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> In a set of benchmarks a while back, we found that the chaining
>>>> mechanism has some overhead right now, because of its abstraction. The
>>>> abstraction creates iterators for each element and makes it hard for the
>>>> JIT to specialize on the operators in the chain.
>>>>
>>>> For purely local chains at full speed, this overhead is observable (can
>>>> decrease throughput from 25mio elements/core to 15-20mio elements per
>>>> core). If your job does not reach that throughput, or is I/O bound, source
>>>> bound, etc, it does not matter.
>>>>
>>>> If you care about super high performance, collapsing the code into one
>>>> function helps.
>>>>
>>>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gyula,
>>>>>
>>>>> Thanks for your response. Seems i will use filter and map for now as
>>>>> that one is really make the intention clear, and not a big performance 
>>>>> hit.
>>>>>
>>>>> Thanks again.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey Welly,
>>>>>>
>>>>>> If you call filter and map one after the other like you mentioned,
>>>>>> these operators will be chained and executed as if they were running in 
>>>>>> the
>>>>>> same operator.
>>>>>> The only small performance overhead comes from the fact that the
>>>>>> output of the filter will be copied before passing it as input to the map
>>>>>> to keep immutability guarantees (but no serialization/deserialization 
>>>>>> will
>>>>>> happen). Copying might be practically free depending on your data type,
>>>>>> though.
>>>>>>
>>>>>> If you are using operators that don't make use of the immutability of
>>>>>> inputs/outputs (i.e you don't hold references to those values) than you 
>>>>>> can
>>>>>> disable copying altogether by calling 
>>>>>> env.getConfig().enableObjectReuse(),
>>>>>> in which case they will have exactly the same performance.
>>>>>>
>>>>>> Cheers,
>>>>>> Gyula
>>>>>>
>>>>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept.
>>>>>> 3., Cs, 4:33):
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I would like to filter some item from the event stream. I think
>>>>>>> there are two ways doing this.
>>>>>>>
>>>>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>>>>> flatMap for doing both in the same operator.
>>>>>>>
>>>>>>> Any performance improvement if we are using flatMap ? As that will
>>>>>>> be done in one operator instance.
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Welly Tambunan
>>>>>>> Triplelands
>>>>>>>
>>>>>>> http://weltam.wordpress.com
>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Welly Tambunan
>>>>> Triplelands
>>>>>
>>>>> http://weltam.wordpress.com
>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Question on flink and hdfs

2015-09-04 Thread Welly Tambunan
Hi Jerry,

yes, that's possible. You can download the appropriate version
https://flink.apache.org/downloads.html
[image: Inline image 1]

Cheers

On Fri, Sep 4, 2015 at 1:57 AM, Jerry Peng <jerry.boyang.p...@gmail.com>
wrote:

> Hello,
>
> Does flink require hdfs to run? I know you can use hdfs to checkpoint and
> process files in a distributed fashion.  So can flink run standalone
> without hdfs?
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-03 Thread Welly Tambunan
Hi Stephan,

That's good information to know. We will hit that throughput easily. Our
computation graph has lot of chaining like this right now.
I think it's safe to minimize the chain right now.

Thanks a lot for this Stephan.

Cheers

On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <se...@apache.org> wrote:

> In a set of benchmarks a while back, we found that the chaining mechanism
> has some overhead right now, because of its abstraction. The abstraction
> creates iterators for each element and makes it hard for the JIT to
> specialize on the operators in the chain.
>
> For purely local chains at full speed, this overhead is observable (can
> decrease throughput from 25mio elements/core to 15-20mio elements per
> core). If your job does not reach that throughput, or is I/O bound, source
> bound, etc, it does not matter.
>
> If you care about super high performance, collapsing the code into one
> function helps.
>
> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Hi Gyula,
>>
>> Thanks for your response. Seems i will use filter and map for now as that
>> one is really make the intention clear, and not a big performance hit.
>>
>> Thanks again.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>>
>>> Hey Welly,
>>>
>>> If you call filter and map one after the other like you mentioned, these
>>> operators will be chained and executed as if they were running in the same
>>> operator.
>>> The only small performance overhead comes from the fact that the output
>>> of the filter will be copied before passing it as input to the map to keep
>>> immutability guarantees (but no serialization/deserialization will happen).
>>> Copying might be practically free depending on your data type, though.
>>>
>>> If you are using operators that don't make use of the immutability of
>>> inputs/outputs (i.e you don't hold references to those values) than you can
>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>> in which case they will have exactly the same performance.
>>>
>>> Cheers,
>>> Gyula
>>>
>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. szept. 3.,
>>> Cs, 4:33):
>>>
>>>> Hi All,
>>>>
>>>> I would like to filter some item from the event stream. I think there
>>>> are two ways doing this.
>>>>
>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>> flatMap for doing both in the same operator.
>>>>
>>>> Any performance improvement if we are using flatMap ? As that will be
>>>> done in one operator instance.
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Welly Tambunan
Hi All,

I would like to filter some item from the event stream. I think there are
two ways doing this.

Using the regular pipeline filter(...).map(...). We can also use flatMap
for doing both in the same operator.

Any performance improvement if we are using flatMap ? As that will be done
in one operator instance.


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Keep Model in Operator instance up to date

2015-08-21 Thread Welly Tambunan
Hi Gyula,

Thanks a lot. That's really help a lot !

Have a great vacation

Cheers

On Fri, Aug 21, 2015 at 7:47 PM, Gyula Fóra gyula.f...@gmail.com wrote:

 Hi

 You are right, if all operators need continuous updates than the most
 straightforward way is to push all the updates to the operators like you
 described.

 If the cached data is the same for all operators and is small enough you
 can centralize the updates in a dedicated operator and push the updated
 data to the operators every once in a while.

 Cheers
 Gyula



 On Thu, Aug 20, 2015 at 4:31 PM Welly Tambunan if05...@gmail.com wrote:

 Hi Gyula,

 I have a couple of operator on the pipeline. Filter, mapper, flatMap, and
 each of these operator contains some cache data.

 So i think that means for every other operator on the pipeline, i will
 need to add a new stream to update each cache data.


 Cheers

 On Thu, Aug 20, 2015 at 5:33 PM, Gyula Fóra gyula.f...@gmail.com wrote:

 Hi,

 I don't think I fully understand your question, could you please try to
 be a little more specific?

 I assume by caching you mean that you keep the current model as an
 operator state. Why would you need to add new streams in this case?

 I might be slow to answer as I am currently on vacation without stable
 internet connection.

 Cheers,
 Gyula

 On Thu, Aug 20, 2015 at 5:36 AM Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 I have another question. So if i cache something on the operator, to
 keep it up to date,  i will always need to add and connect another stream
 of changes to the operator ?

 Is this right for every case ?

 Cheers

 On Wed, Aug 19, 2015 at 3:21 PM, Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 That's really helpful. The docs is improving so much since the last
 time (0.9).

 Thanks a lot !

 Cheers

 On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra gyula.f...@gmail.com
 wrote:

 Hey,

 If it is always better to check the events against a more up-to-date
 model (even if the events we are checking arrived before the update) then
 it is fine to keep the model outside of the system.

 In this case we need to make sure that we can push the updates to the
 external system consistently. If you are using the PersistentKafkaSource
 for instance it can happen that some messages are replayed in case of
 failure. In this case you need to make sure that you remove duplicate
 updates or have idempotent updates.

 You can read about the checkpoint mechanism in the Flink website:
 https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

 Cheers,
 Gyula

 On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan if05...@gmail.com
 wrote:

 Thanks Gyula,

 Another question i have..

  ... while external model updates would be *tricky *to keep
 consistent.
 Is that still the case if the Operator treat the external model as
 read-only ? We create another stream that will update the external model
 separately.

 Could you please elaborate more about this one ?

 Cheers

 On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra gyula.f...@gmail.com
 wrote:

 In that case I would apply a map to wrap in some common type, like
 a n Eithert1,t2 before the union.

 And then in the coflatmap you can unwrap it.
 On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 Thanks.

 However update1 and update2 have a different type. Based on my
 understanding, i don't think we can use union. How can we handle this 
 one ?

 We like to create our event strongly type to get the domain
 language captured.


 Cheers

 On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra gyula.f...@gmail.com
 wrote:

 Hey,

 One input of your co-flatmap would be model updates and the other
 input would be events to check against the model if I understand 
 correctly.

 This means that if your model updates come from more than one
 stream you need to union them into a single stream before connecting 
 them
 with the event stream and applying the coatmap.

 DataStream updates1 = 
 DataStream updates2 = 
 DataStream events = 

 events.connect(updates1.union(updates2).broadcast()).flatMap(...)

 Does this answer your question?

 Gyula


 On Wednesday, August 19, 2015, Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 Thanks for your response.

 However the model can received multiple event for update. How
 can we do that with co-flatmap as i can see the connect API only 
 received
 single datastream ?


  ... while external model updates would be tricky to keep
 consistent.
 Is that still the case if the Operator treat the external model
 as read-only ? We create another stream that will update the 
 external model
 separately.

 Cheers

 On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra gyf...@apache.org
 wrote:

 Hey!

 I think it is safe to say that the best approach in this case
 is creating a co-flatmap that will receive updates on one input. 
 The events
 should probably be broadcasted in this case so you can check in 
 parallel

Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi All,

We have a streaming computation that required to validate the data stream
against the model provided by the user.

Right now what I have done is to load the model into flink operator and
then validate against it. However the model can be updated and changed
frequently. Fortunately we always publish this event to RabbitMQ.

I think we can


   1. Create RabbitMq listener for model changed event from inside the
   operator, then update the model if event arrived.

   But i think this will create race condition if not handle correctly and
   it seems odd to keep this

   2. We can move the model into external in external memory cache storage
   and keep the model up to date using flink. So the operator will retrieve
   that from memory cache

   3. Create two stream and using co operator for managing the shared
   state.


What is your suggestion on keeping the state up to date from external event
? Is there some kind of best practice for maintaining model up to date on
streaming operator ?

Thanks a lot


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com http://www.triplelands.com/blog/


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

Thanks for your response.

However the model can received multiple event for update. How can we do
that with co-flatmap as i can see the connect API only received single
datastream ?


 ... while external model updates would be tricky to keep consistent.
Is that still the case if the Operator treat the external model as
read-only ? We create another stream that will update the external model
separately.

Cheers

On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra gyf...@apache.org wrote:

 Hey!

 I think it is safe to say that the best approach in this case is creating
 a co-flatmap that will receive updates on one input. The events should
 probably be broadcasted in this case so you can check in parallel.

 This approach can be used effectively with Flink's checkpoint mechanism,
 while external model updates would be tricky to keep consistent.

 Cheers,
 Gyula




 On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan if05...@gmail.com wrote:

 Hi All,

 We have a streaming computation that required to validate the data stream
 against the model provided by the user.

 Right now what I have done is to load the model into flink operator and
 then validate against it. However the model can be updated and changed
 frequently. Fortunately we always publish this event to RabbitMQ.

 I think we can


1. Create RabbitMq listener for model changed event from inside the
operator, then update the model if event arrived.

But i think this will create race condition if not handle correctly
and it seems odd to keep this

2. We can move the model into external in external memory cache
storage and keep the model up to date using flink. So the operator will
retrieve that from memory cache

3. Create two stream and using co operator for managing the shared
state.


 What is your suggestion on keeping the state up to date from external
 event ? Is there some kind of best practice for maintaining model up to
 date on streaming operator ?

 Thanks a lot


 Cheers


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com http://www.triplelands.com/blog/




-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com http://www.triplelands.com/blog/


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

That's really helpful. The docs is improving so much since the last time
(0.9).

Thanks a lot !

Cheers

On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra gyula.f...@gmail.com wrote:

 Hey,

 If it is always better to check the events against a more up-to-date model
 (even if the events we are checking arrived before the update) then it is
 fine to keep the model outside of the system.

 In this case we need to make sure that we can push the updates to the
 external system consistently. If you are using the PersistentKafkaSource
 for instance it can happen that some messages are replayed in case of
 failure. In this case you need to make sure that you remove duplicate
 updates or have idempotent updates.

 You can read about the checkpoint mechanism in the Flink website:
 https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

 Cheers,
 Gyula

 On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan if05...@gmail.com wrote:

 Thanks Gyula,

 Another question i have..

  ... while external model updates would be *tricky *to keep consistent.
 Is that still the case if the Operator treat the external model as
 read-only ? We create another stream that will update the external model
 separately.

 Could you please elaborate more about this one ?

 Cheers

 On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra gyula.f...@gmail.com wrote:

 In that case I would apply a map to wrap in some common type, like a n
 Eithert1,t2 before the union.

 And then in the coflatmap you can unwrap it.
 On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 Thanks.

 However update1 and update2 have a different type. Based on my
 understanding, i don't think we can use union. How can we handle this one ?

 We like to create our event strongly type to get the domain language
 captured.


 Cheers

 On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra gyula.f...@gmail.com
 wrote:

 Hey,

 One input of your co-flatmap would be model updates and the other
 input would be events to check against the model if I understand 
 correctly.

 This means that if your model updates come from more than one stream
 you need to union them into a single stream before connecting them with 
 the
 event stream and applying the coatmap.

 DataStream updates1 = 
 DataStream updates2 = 
 DataStream events = 

 events.connect(updates1.union(updates2).broadcast()).flatMap(...)

 Does this answer your question?

 Gyula


 On Wednesday, August 19, 2015, Welly Tambunan if05...@gmail.com
 wrote:

 Hi Gyula,

 Thanks for your response.

 However the model can received multiple event for update. How can we
 do that with co-flatmap as i can see the connect API only received single
 datastream ?


  ... while external model updates would be tricky to keep
 consistent.
 Is that still the case if the Operator treat the external model as
 read-only ? We create another stream that will update the external model
 separately.

 Cheers

 On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra gyf...@apache.org
 wrote:

 Hey!

 I think it is safe to say that the best approach in this case is
 creating a co-flatmap that will receive updates on one input. The events
 should probably be broadcasted in this case so you can check in 
 parallel.

 This approach can be used effectively with Flink's checkpoint
 mechanism, while external model updates would be tricky to keep 
 consistent.

 Cheers,
 Gyula




 On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan if05...@gmail.com
 wrote:

 Hi All,

 We have a streaming computation that required to validate the data
 stream against the model provided by the user.

 Right now what I have done is to load the model into flink operator
 and then validate against it. However the model can be updated and 
 changed
 frequently. Fortunately we always publish this event to RabbitMQ.

 I think we can


1. Create RabbitMq listener for model changed event from inside
the operator, then update the model if event arrived.

But i think this will create race condition if not handle
correctly and it seems odd to keep this

2. We can move the model into external in external memory cache
storage and keep the model up to date using flink. So the operator 
 will
retrieve that from memory cache

3. Create two stream and using co operator for managing the
shared state.


 What is your suggestion on keeping the state up to date from
 external event ? Is there some kind of best practice for maintaining 
 model
 up to date on streaming operator ?

 Thanks a lot


 Cheers


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com http://www.triplelands.com/blog/




 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com http://www.triplelands.com/blog/




 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com http://www.triplelands.com/blog/




 --
 Welly Tambunan

Re: Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread Welly Tambunan
Thanks Chiwan,


Glad to hear that.


Cheers

On Fri, Jul 3, 2015 at 3:24 PM, Chiwan Park chiwanp...@apache.org wrote:

 Hi tambunanw,

 The issue is already known and we’ll patch soon. [1]
 In next release (maybe 0.9.1), the problem will be solved.

 Regards,
 Chiwan Park

 [1] https://issues.apache.org/jira/browse/FLINK-2257

  On Jul 3, 2015, at 4:57 PM, tambunanw if05...@gmail.com wrote:
 
  Hi All,
 
  I'm trying to create some experiment with rich windowing function and
  operator state. I modify the streaming stock prices from
 
 
 https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/StockPrices.scala
 
  I create the simple windowing function like below
 
  class MyWindowFunction extends RichWindowMapFunction[StockPricex,
  StockPricex] {
   println(created)
   private var counter = 0
 
   override def open(conf: Configuration): Unit = {
 println(opened)
   }
 
   override def mapWindow(values: Iterable[StockPricex], out:
  Collector[StockPricex]): Unit = {
 // if not initialized ..
 
 println(counter)
 println(values)
 counter = counter + 1
 
   }
  }
 
  However the open() method is not invoked when i try to run this code on
 my
  local environment
 
 spx
   .groupBy(x = x.symbol)
   .window(Time.of(5, TimeUnit.SECONDS)).every(Time.of(1,
  TimeUnit.SECONDS))
   .mapWindow(new MyWindowFunction)
 
  Any thought on this one ?
 
 
  Cheers
 
 
 
  --
  View this message in context:
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Open-method-is-not-called-with-custom-implementation-RichWindowMapFunction-tp1924.html
  Sent from the Apache Flink User Mailing List archive. mailing list
 archive at Nabble.com.








-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com http://www.triplelands.com/blog/


Re: Data Source from Cassandra

2015-07-02 Thread Welly Tambunan
Hi Stephan,

Thanks a lot !

I will give it a look.


Cheers

On Thu, Jul 2, 2015 at 6:05 PM, Stephan Ewen se...@apache.org wrote:

 Hi!

 If there is a CassandraSource for Hadoop, you can also use that with the
 HadoopInputFormatWrapper.

 If you want to implement a Flink-specific source, extending InputFormat is
 the right thing to do. A user has started to implement a cassandra sink in
 this fork (may be able to reuse some code or testing infrastructure):
 https://github.com/rzvoncek/flink/tree/zvo/cassandraSink

 Greetings,
 Stephan





 On Thu, Jul 2, 2015 at 11:34 AM, tambunanw if05...@gmail.com wrote:

 Hi All,

 I want to if there's a custom data source available for Cassandra ?

 From my observation seems that we need to implement that by extending
 InputFormat. Is there any guide on how to do this robustly ?


 Cheers



 --
 View this message in context:
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-from-Cassandra-tp1908.html
 Sent from the Apache Flink User Mailing List archive. mailing list
 archive at Nabble.com.





-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com http://www.triplelands.com/blog/


Re: Batch Processing as Streaming

2015-07-02 Thread Welly Tambunan
Thanks Stephan,

That's clear !

Cheers

On Thu, Jul 2, 2015 at 6:13 PM, Stephan Ewen se...@apache.org wrote:

 Hi!

 I am actually working to get some more docs out there, there is a lack
 right now, agreed.

 Concerning your questions:

 (1) Batch programs basically recover from the data sources right now.
 Checkpointing as in the streaming case does not happen for batch programs.
 We have branches that materialize the intermediate streams and apply
 backtracking logic for batch programs, but they are not merged into the
 master at this point.

 (2) Streaming operators and user functions are long lived. They are
 started once and live to the end of the stream, or the machine failure.

 Greetings,
 Stephan


 On Thu, Jul 2, 2015 at 11:48 AM, tambunanw if05...@gmail.com wrote:

 Hi All,

 I see that the way batch processing works in Flink is quite different with
 Spark. It's all about using streaming engine in Flink.

 I have a couple of question

 1. Is there any support on Checkpointing on batch processing also ? Or
 that's only for streaming

 2. I want to ask about operator lifecyle ? is that short live or long
 live ?
 Any docs where i can read about this more ?


 Cheers



 --
 View this message in context:
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Batch-Processing-as-Streaming-tp1909.html
 Sent from the Apache Flink User Mailing List archive. mailing list
 archive at Nabble.com.





-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com http://www.triplelands.com/blog/