Aggregation based on Timestamp

2017-08-10 Thread Madhukar Thota
Hi

We have use case where we have thousands of Telegraf agents sending data to
kafka( some of them are sending 10s interval, 15s interval and 30s
interval). We would like to aggregate the incoming data to 1 minuter
interval based on the hostname as key before we write into influxdb. Is it
possible to do this type of usecase with Flink? if so any sample to get
started?

sample data ( influxdb line protocal) coming from Kafka

weather,location=us-midwest,season=summer temperature=82 1465839830100400200


-Madhu


Flink streaming Python

2017-06-07 Thread Madhukar Thota
Hi

I have asked the same question back in Jan 2016 and checking again with
community to see if there is any update or plan for supporting streaming
Flink in python.


Amazon Athena

2017-05-30 Thread Madhukar Thota
Anyone used used Amazon Athena with Apache Flink?

I have use case where I want to write streaming data ( which is in Avro
format) from kafka to s3 by converting into parquet format and update S3
location with daily partitions on Athena table.

Any guidance is appreciated.


HTTP listener source

2017-05-30 Thread Madhukar Thota
Hi

As anyone implemented HTTP listener in flink source which acts has a
Rest API to receive JSON payload via Post method and writes to Kafka or
kinesis or any sink sources.

Any guidance or sample snippet will be appreciated.


Re: Elasticsearch Http Connector

2016-10-25 Thread Madhukar Thota
Thanks Philipp,

I am also started looking at Jest client. Did you use it with Flink? is
possible for you to share the project so that i can reuse it?


On Tue, Oct 25, 2016 at 11:54 AM, Philipp Bussche  wrote:

> Hi there,
> not me (which I guess is not what you wanted to hear) but I had to
> implement
> a custom ElasticSearch based on Jest to be able to sink data into ES on
> AWS.
> Works quite alright !
> Philipp
>
> https://github.com/searchbox-io/Jest/tree/master/jest
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Elasticsearch-
> Http-Connector-tp9700p9715.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Elasticsearch Http Connector

2016-10-24 Thread Madhukar Thota
Friends

Any one using new Elasticsearch RestClient(
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html)
with Flink?


Upserts with Flink-elasticsearch

2016-03-28 Thread Madhukar Thota
Is it possible to do Upsert with existing flink-elasticsearch connector
today?


protobuf messages from Kafka to elasticsearch using flink

2016-03-08 Thread Madhukar Thota
Friends,

Can someone guide me or share an example on  how to consume protobuf
message from kafka and index into Elasticsearch using flink?


Re: flink kafka scala error

2016-01-06 Thread Madhukar Thota
I did the solve problem by changing the scala version for Kafka library as
i download the scala_2.11 version of flink (
flink-0.10.1-bin-hadoop27-scala_2.11.tg
<http://www.carfab.com/apachesoftware/flink/flink-0.10.1/flink-0.10.1-bin-hadoop27-scala_2.11.tgz>
z).

Before:


  
org.apache.kafka
kafka_2.10
0.8.2.2


After:


  
org.apache.kafka
kafka_2.11
0.8.2.2




On Wed, Jan 6, 2016 at 4:13 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Madhukar,
>
> could you check whether your Flink installation contains the
> flink-dist-0.10.1.jar in the lib folder? This file contains the necessary
> scala-library.jar which you are missing. You can also remove the line
> org.scala-lang:scala-library which excludes the
> scala-library dependency to be included in the fat jar of your job.
>
> Cheers,
> Till
> ​
>
> On Wed, Jan 6, 2016 at 5:54 AM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am seeing the following error when i am trying to run the jar in Flink
>> Cluster. I am not sure what dependency is missing.
>>
>>  /opt/DataHUB/flink-0.10.1/bin/flink  run datahub-heka-1.0-SNAPSHOT.jar
>> flink.properties
>> java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
>> at kafka.utils.Pool.(Pool.scala:28)
>> at
>> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
>> at
>> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
>> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
>> at
>> kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49)
>> at com.lmig.datahub.heka.Main.main(Main.java:39)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
>> at
>> org.apache.flink.client.program.Client.runBlocking(Client.java:252)
>> at
>> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
>> at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
>> Caused by: java.lang.ClassNotFoundException:
>> scala.collection.GenTraversableOnce$class
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 20 more
>>
>> The exception above occurred while trying to run your command.
>>
>>
>> Here is my pom.xml:
>>
>> 
>> http://maven.apache.org/POM/4.0.0;
>>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> 4.0.0
>>
>> com.datahub
>> datahub-heka
>> 1.0-SNAPSHOT
>> 
>> 
>> org.apache.flink
>> flink-java
>> 0.10.1
>> 
>> 
>> org.apache.flink
>> flink-streaming-java
>> 0.10.1
>> 
>> 
>> org.apache.flink
>> flink-clients
>> 0.10.1
>> 
>> 
>> org.apache.kafka
>> kafka_2.10
>> 0.8.2.2
>> 
>> 
>> org.apache.flink
>> flink-connector-kafka
>

flink kafka scala error

2016-01-05 Thread Madhukar Thota
Hi

I am seeing the following error when i am trying to run the jar in Flink
Cluster. I am not sure what dependency is missing.

 /opt/DataHUB/flink-0.10.1/bin/flink  run datahub-heka-1.0-SNAPSHOT.jar
flink.properties
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28)
at
kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
at
kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
at
kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49)
at com.lmig.datahub.heka.Main.main(Main.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
at
org.apache.flink.client.program.Client.runBlocking(Client.java:252)
at
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
at
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
Caused by: java.lang.ClassNotFoundException:
scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more

The exception above occurred while trying to run your command.


Here is my pom.xml:


http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
4.0.0

com.datahub
datahub-heka
1.0-SNAPSHOT


org.apache.flink
flink-java
0.10.1


org.apache.flink
flink-streaming-java
0.10.1


org.apache.flink
flink-clients
0.10.1


org.apache.kafka
kafka_2.10
0.8.2.2


org.apache.flink
flink-connector-kafka
0.10.1


org.apache.flink
flink-connector-elasticsearch
0.10.1


org.elasticsearch
elasticsearch
1.7.2


org.elasticsearch
elasticsearch-shield
1.3.3


org.elasticsearch
elasticsearch-license-plugin
1.0.0


com.fasterxml.jackson.core
jackson-core
2.6.4


com.fasterxml.jackson.core
jackson-databind
2.6.4




elasticsearch-releases
http://maven.elasticsearch.org/releases

true


false







org.apache.maven.plugins
maven-shade-plugin
2.4.1



package

shade






org.apache.flink:flink-shaded-*

org.apache.flink:flink-core

org.apache.flink:flink-java

org.apache.flink:flink-scala

org.apache.flink:flink-runtime

org.apache.flink:flink-optimizer

org.apache.flink:flink-clients

org.apache.flink:flink-avro

org.apache.flink:flink-java-examples

org.apache.flink:flink-scala-examples

org.apache.flink:flink-streaming-examples

org.apache.flink:flink-streaming-java



org.scala-lang:scala-library

org.scala-lang:scala-compiler

org.scala-lang:scala-reflect

com.amazonaws:aws-java-sdk


Re: Flink-Elasticsearch connector support for elasticsearch 2.0

2015-12-04 Thread Madhukar Thota
Sure. I can submit the pull request.

On Fri, Dec 4, 2015 at 12:37 PM, Maximilian Michels <m...@apache.org> wrote:

> Hi Madhu,
>
> Great. Do you want to contribute it back via a GitHub pull request? If
> not that's also fine. We will try look into the 2.0 connector next
> week.
>
> Best,
> Max
>
> On Fri, Dec 4, 2015 at 4:16 PM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
> > i have created working connector for Elasticsearch 2.0 based on
> > elasticsearch-flink connector. I am using it right now but i want
> official
> > connector from flink.
> >
> > ElasticsearchSink.java
> >
> >
> > import org.apache.flink.api.java.utils.ParameterTool;
> > import org.apache.flink.configuration.Configuration;
> > import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
> > import org.slf4j.Logger;
> > import org.slf4j.LoggerFactory;
> >
> > import java.net.InetAddress;
> > import java.net.UnknownHostException;
> > import java.util.List;
> > import java.util.Map;
> > import java.util.concurrent.atomic.AtomicBoolean;
> > import java.util.concurrent.atomic.AtomicReference;
> >
> > import org.elasticsearch.action.bulk.BulkItemResponse;
> > import org.elasticsearch.action.bulk.BulkProcessor;
> > import org.elasticsearch.action.bulk.BulkRequest;
> > import org.elasticsearch.action.bulk.BulkResponse;
> > import org.elasticsearch.action.index.IndexRequest;
> > import org.elasticsearch.client.Client;
> > import org.elasticsearch.client.transport.TransportClient;
> > import org.elasticsearch.cluster.node.DiscoveryNode;
> > import org.elasticsearch.common.settings.Settings;
> > import org.elasticsearch.common.transport.InetSocketTransportAddress;
> > import org.elasticsearch.common.unit.ByteSizeUnit;
> > import org.elasticsearch.common.unit.ByteSizeValue;
> > import org.elasticsearch.common.unit.TimeValue;
> >
> >
> > public class ElasticsearchSink extends RichSinkFunction {
> >
> > public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS =
> > "bulk.flush.max.actions";
> > public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB =
> > "bulk.flush.max.size.mb";
> > public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS =
> > "bulk.flush.interval.ms";
> >
> > private static final long serialVersionUID = 1L;
> > private static final int DEFAULT_PORT = 9300;
> > private static final Logger LOG =
> > LoggerFactory.getLogger(ElasticsearchSink.class);
> >
> > /**
> >  * The user specified config map that we forward to Elasticsearch
> when
> > we create the Client.
> >  */
> > private final Map<String, String> userConfig;
> >
> > /**
> >  * The builder that is used to construct an {@link IndexRequest} from
> > the incoming element.
> >  */
> > private final IndexRequestBuilder indexRequestBuilder;
> >
> > /**
> >  * The Client that was either retrieved from a Node or is a
> > TransportClient.
> >  */
> > private transient Client client;
> >
> > /**
> >  * Bulk processor that was created using the client
> >  */
> > private transient BulkProcessor bulkProcessor;
> >
> > /**
> >  * This is set from inside the BulkProcessor listener if there where
> > failures in processing.
> >  */
> > private final AtomicBoolean hasFailure = new AtomicBoolean(false);
> >
> > /**
> >  * This is set from inside the BulkProcessor listener if a Throwable
> was
> > thrown during processing.
> >  */
> > private final AtomicReference failureThrowable = new
> > AtomicReference();
> >
> > public ElasticsearchSink(Map<String, String> userConfig,
> > IndexRequestBuilder indexRequestBuilder) {
> > this.userConfig = userConfig;
> > this.indexRequestBuilder = indexRequestBuilder;
> > }
> >
> >
> > @Override
> > public void open(Configuration configuration) {
> >
> > ParameterTool params = ParameterTool.fromMap(userConfig);
> > Settings settings = Settings.settingsBuilder()
> > .put(userConfig)
> > .build();
> >
> > TransportClient transportClient =
> > TransportClient.builder().settings(settings).build();
> > for (String server : params.get("esHost").split(";"))
> > {
> >

Running Flink in Cloudfoundry Environment

2015-11-23 Thread Madhukar Thota
Hi

Is it possible to run Flink in Cloudfoundry Environment? if yes, How can we
achive this?

Any help is appreciated. Thanks in Advance.

Thanks,
Madhu


Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Madhukar Thota
ndOfStream(MyType nextElement) {
>  return false;
> }
>
> }
>
> Then you supply this class when creating the consumer:
>
> DeserializationSchema decoder = new MyAvroDeserializer()
> Properties props = new Properties();
> OffsetStore offsetStore = FlinkKafkaConsumer.OffsetStore.KAFKA;
> FetcherType fetcherType = FlinkKafkaConsumer.FetcherType.LEGACY_LOW_LEVEL;
>
> FlinkKafkaConsumer consumer = new FlinkKafkaConsumer("myTopic"), decoder,
> props, offsetStore, fetcherType);
>
>
> Let me know if that works for you.
>
> Best regards,
> Max
>
> On Thu, Nov 19, 2015 at 9:51 AM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am very new to Avro. Currently I am using confluent Kafka version and I
>> am able to write an Avro message to Kafka by storing schema in schema
>> registry. Now I need to consume those messages using Flink Kafka Consumer
>> and having a hard time to deserialize the messages.
>>
>> I am looking for an example on how to deserialize Avro message where
>> schema is stored in schema registry.
>>
>> Any help is appreciated. Thanks in Advance.
>>
>> Thanks,
>> Madhu
>>
>>
>


Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Madhukar Thota
an Michels <m...@apache.org> wrote:

> Stephan is right, this should do it in deserialize():
>
> if (decoder == null) {
> decoder = new KafkaAvroDecoder(vProps);
> }
>
> Further, you might have to specify the correct return type for
> getProducedType(). You may use
>
> public TypeInformation getProducedType() {
> return TypeExtractor.getForClass(String.class);
> }
>
> Cheers,
> Max
>
> On Thu, Nov 19, 2015 at 12:18 PM, Stephan Ewen <se...@apache.org> wrote:
> > The KafkaAvroDecoder is not serializable, and Flink uses serialization to
> > distribute the code to the TaskManagers in the cluster.
> >
> > I think you need to "lazily" initialize the decoder, in the first
> invocation
> > of "deserialize()". That should do it.
> >
> > Stephan
> >
> >
> > On Thu, Nov 19, 2015 at 12:10 PM, Madhukar Thota <
> madhukar.th...@gmail.com>
> > wrote:
> >>
> >> Hi Max
> >>
> >> Thanks for the example.
> >>
> >> Based on your example here is what i did:
> >>
> >> public class Streamingkafka {
> >> public static void main(String[] args) throws Exception {
> >> StreamExecutionEnvironment env =
> >> StreamExecutionEnvironment.getExecutionEnvironment();
> >> env.enableCheckpointing(500);
> >> env.setParallelism(1);
> >>
> >> ParameterTool parameterTool = ParameterTool.fromArgs(args);
> >> Properties props = new Properties();
> >> props.put("schema.registry.url", "http://localhost:8081;);
> >> props.put("specific.avro.reader", true);
> >> VerifiableProperties vProps = new VerifiableProperties(props);
> >>
> >> DeserializationSchema decoder = new
> >> MyAvroDeserializer(vProps);
> >> env.addSource(new
> >> FlinkKafkaConsumer082<>(parameterTool.getRequired("topic"), decoder,
> >> parameterTool.getProperties())).print();
> >>
> >> env.execute();
> >> }
> >>
> >> public class MyAvroDeserializer implements
> DeserializationSchema {
> >> private KafkaAvroDecoder decoder;
> >>
> >> public MyAvroDeserializer(VerifiableProperties vProps) {
> >> this.decoder = new KafkaAvroDecoder(vProps);
> >> }
> >>
> >> @Override
> >> public String deserialize(byte[] message) throws IOException {
> >> return (String) this.decoder.fromBytes(message);
> >> }
> >>
> >> @Override
> >> public boolean isEndOfStream(String nextElement) {
> >> return false;
> >> }
> >>
> >> @Override
> >> public TypeInformation getProducedType() {
> >> return null;
> >> }
> >> }
> >>
> >>
> >> Here is the error i am seeing...
> >>
> >>
> >> Exception in thread "main"
> >> org.apache.flink.api.common.InvalidProgramException: Object
> >>
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082@3bf9ce3e
> >> not serializable
> >> at
> >>
> org.apache.flink.api.java.ClosureCleaner.ensureSerializable(ClosureCleaner.java:97)
> >> at
> org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:59)
> >> at
> >>
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1228)
> >> at
> >>
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1163)
> >> at
> >>
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1107)
> >> at
> >>
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1089)
> >> at test.flink.Streamingkafka.main(Streamingkafka.java:25)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:497)
> >> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> >> Caused by: java.io.N