Java library for Flink-Kudu integration

2017-03-27 Thread ruben.casado.tejedor
Hi all,

I apologize for sending the email to both accounts, but not sure where this 
topic fits better.

In my team, we have been working in some PoCs and PoVs about new data 
architectures. As part of this work, we have implemented a library to connect 
Kudu and Flink. The library allows reading/writing from/to Kudu tablets using 
DataSet API and also writing to Kudu using DataStream API.

You can find the code and documentation (including some examples) in 
https://github.com/rubencasado/Flink-Kudu

Any comment/suggestion/contribution is very welcomed ☺

We will try to publish this contribution to the Apache Bahir project.

Best


Rubén Casado Tejedor, PhD
> accenture digital
Big Data Manager
' + 34 629 009 429
• ruben.casado.teje...@accenture.com



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Re: Need guidance for write a client connector for 'Flink'

2017-01-19 Thread ruben.casado.tejedor
Hi,

Just in case it could useful, we are working in Flink-Kudu integration [1]. 
This is a still Work in Progess but we had to implemente an InputFormat to read 
from Kudu tables so maybe the code is useful for you [2]

Best

[1] https://github.com/rubencasado/Flink-Kudu
[2] 
https://github.com/rubencasado/Flink-Kudu/blob/master/src/main/java/es/accenture/flink/Sources/KuduInputFormat.java


El 19/1/17 6:03, "Pawan Manishka Gunarathna"  
escribió:

Hi,
When we are implementing that InputFormat Interface, if we have that Input
split part in our data analytics server APIs can we directly go to the
second phase that you have described earlier?

Since Our data source has database tables architecture I have a thought of
follow that 'JDBCInputFormat' in Flink. Can you provide some information
regarding how that JDBCInputFormat execution happens?

Thanks,
Pawan

On Mon, Jan 16, 2017 at 3:37 PM, Pawan Manishka Gunarathna <
pawan.manis...@gmail.com> wrote:

> Hi Fabian,
> Thanks for providing those information.
>
> On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske  wrote:
>
>> Hi Pawan,
>>
>> this sounds like you need to implement a custom InputFormat [1].
>> An InputFormat is basically executed in two phases. In the first phase it
>> generates InputSplits. An InputSplit references a a chunk of data that
>> needs to be read. Hence, InputSplits define how the input data is split 
to
>> be read in parallel. In the second phase, multiple InputFormats are
>> started
>> and request InputSplits from an InputSplitProvider. Each instance of the
>> InputFormat processes one InputSplit at a time.
>>
>> It is hard to give general advice on implementing InputFormats because
>> this
>> very much depends on the data source and data format to read from.
>>
>> I'd suggest to have a look at other InputFormats.
>>
>> Best, Fabian
>>
>> [1]
>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_blob_master_flink-2Dcore_src_=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=RVDymwyU0kgdfLg3Rv7z3F9J81xIKmyt-6MlPBY5hSw=BDRgnhShzvotGlc7rLXFHyh5iiP4pHXF9lP8uysQW8M=
>> main/java/org/apache/flink/api/common/io/InputFormat.java
>>
>>
>> 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna <
>> pawan.manis...@gmail.com>:
>>
>> > Hi,
>> >
>> > we have a data analytics server that has analytics data tables. So I
>> need
>> > to write a custom *Java* implementation for read data from that data
>> source
>> > and do processing (*batch* processing) using Apache Flink. Basically
>> it's
>> > like a new client connector for Flink.
>> >
>> > So It would be great if you can provide a guidance for my requirement.
>> >
>> > Thanks,
>> > Pawan
>> >
>>
>
>
>
> --
>
> *Pawan Gunaratne*
> *Mob: +94 770373556 <+94%2077%20037%203556>*
>



--

*Pawan Gunaratne*
*Mob: +94 770373556*





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Re: Flink CEP development is stalling

2017-01-12 Thread ruben.casado.tejedor
+1

I have some clientes interested in CEP features

El 11/1/17 16:23, "Ivan Mushketyk"  escribió:

Hi Flink devs,

Roughly half a year ago I implemented several PRs for Flink-CEP[1][2][3]
but since then there were no progress with the reviews. What is frustrating
about this situation is that Flink customers are asking for features in
these PRs. For example customers commented on [1] and [2] asking for these
features. During a presentation about CEP during Flink Forward 2016
somebody asked[4] for a feature that is implemented in [1]. Another CEP
feature that was requested in this Stack Overflow post[5] was implemented
by PR [2].

I also started conversations regarding following JIRA issues:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D4641=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=ml7ZfW_GN8zahdrUAEQrRD3KTBbvr6RbPpqTg71Fhao=

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D3414=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=uBNY1zlRRLXhvJQlguoBh4qXZVPRtHxmlavRCLD2UGE=

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D3320=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=7lPpQ1mPiTRr7qz8mX6GK0Qq9CWWeXcg8Bf3h43uJ08=
  (wrote to Till about this
one)

and I would like to work on them, but it seems pointless if nobody is going
to review new PRs.

I wrote to Till(who is the only Flink CEP reviewer at the moment) but it
seems that he is very busy and cannot help with these PRs. On the other
hand Flink CEP has got some attention and customers are asking for new
features.

Is there any way for community to make progress with Flink CEP?
Are there other core committers that can review Flink CEP PRs?

Best regards,
Ivan.



[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_pull_2361=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=dNLjBUb6wz8125hkAlm3tEzxLnZNfzeRrkFPEFJR2qM=
[2] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_pull_2367=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=tpKgiz_VFKYGXCn26GJolxFfSvRwqaMSpOPYFqG3ZYI=
[3] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_pull_2396=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=28ZX8-B728xOrkxZ1DxhnOZfaZOB30fQQxWzJV5lrXY=
[4] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_vws5bv3XdD8-3Ft-3D35m26s=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=NQ0d1W3QqhUU5tMKy7vqbLWPtYbFQ-pOLdsEEsN6ugc=
[5] -

https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_38225286_ho-2Dcan-2Di-2Ddo-2Da-2Dlazy-2Dmatch-2Dwith-2Dflink-2Dcep=DgIBaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=OvPHe08A8IQPc6PIPmNkgZWKDWmAMAIkfcqrek_iJbQ=Q4FXZ_t5rW5UZt-geVcKjq7QNf4TgEluZlGAXxtzXN0=





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


RE: Apache Flink and Kudu integration

2016-11-09 Thread ruben.casado.tejedor
Hi,

I am starting a PoC to do that. I will try to develop both source and sink. I 
will let you know asap I have something ;-)

Best

-Original Message-
From: Márton Balassi [mailto:balassi.mar...@gmail.com]
Sent: viernes, 28 de octubre de 2016 8:50
To: dev@flink.apache.org
Subject: Re: Apache Flink and Kudu integration

Hi Ruben,

I am currently not aware of such an effort, but I definitely do agree that it 
is an interesting pattern to investigate. As a motivation you could have a look 
at the Spark connector implementations to see the Kudu APIs in use.
For that I would recommend the DataSource API implementation that is now part 
of Spark or Ted Malaska's prototype [2] that is bit less complex thus might be 
easier to read.

Let us know if you decide to give the implementation a try.

[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__kudu.apache.org_docs_developing.html-23-5Fkudu-5Fintegration-5Fwith-5Fspark=DQIFaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=ncxdkfO9tu7KuKTt6QEv5kC5nYPTuBj2fFI65S3vg3g=ut78uIYh-IHXL-ZV2ChjskV3u9ChY3UE1Vkso5z7M-c=
[2] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmalaska_SparkOnKudu=DQIFaQ=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc=ncxdkfO9tu7KuKTt6QEv5kC5nYPTuBj2fFI65S3vg3g=rx8u6LdU2OCmU1lCD-_8o_9R3naC5viVIO-ijINS2aI=

Best,

Marton

On Fri, Oct 28, 2016 at 8:33 AM,  wrote:

> Hi all,
>
> Is there any PoC about reading/writing from/to Kudu? I think the flow
> kafka-flink-kudu is an interesting pattern. I would like to evaluate
> it so please let me know if there is any existing attempt to avoid
> starting from scratch. Advices are welcomed :)
>
> Best
>
>
> 
> Rubén Casado Tejedor, PhD
> > accenture digital
> Big Data Manager
> ':+ 34 629 009 429
> *:ruben.casado.teje...@accenture.com
>
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture
> and its affiliates, including e-mail and instant messaging (including
> content), may be scanned by our systems for the purposes of
> information security and assessment of internal compliance with Accenture 
> policy.
> 
> __
>
> www.accenture.com
>



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Apache Flink and Kudu integration

2016-10-28 Thread ruben.casado.tejedor
Hi all,

Is there any PoC about reading/writing from/to Kudu? I think the flow 
kafka-flink-kudu is an interesting pattern. I would like to evaluate it so 
please let me know if there is any existing attempt to avoid starting from 
scratch. Advices are welcomed :)

Best



Rubén Casado Tejedor, PhD
> accenture digital
Big Data Manager
':+ 34 629 009 429
*:ruben.casado.teje...@accenture.com




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


RE: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

2016-10-13 Thread ruben.casado.tejedor
I am totally agree with Robert. From the industry point of view, we are not 
using in any client Hadoop 1.x . Even in legacy system, we have already 
upgraded the software.

From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: jueves, 13 de octubre de 2016 16:48
To: dev@flink.apache.org; u...@flink.apache.org
Subject: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Hi,

The Apache Hadoop community has recently released the first alpha version for 
Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time to 
finally drop Hadoop 1 support in Flink.

The last minor Hadoop 1 release was in 27 June, 2014.
Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
Hadoop 2.2 was first released in October 2013, so there was enough time for 
users to upgrade.

I added also the user@ list to the discussion to get opinions about this from 
there as well.

Let me know what you think about this!


Regards,
Robert



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com