Aggregation based on Timestamp
Hi We have use case where we have thousands of Telegraf agents sending data to kafka( some of them are sending 10s interval, 15s interval and 30s interval). We would like to aggregate the incoming data to 1 minuter interval based on the hostname as key before we write into influxdb. Is it possible to do this type of usecase with Flink? if so any sample to get started? sample data ( influxdb line protocal) coming from Kafka weather,location=us-midwest,season=summer temperature=82 1465839830100400200 -Madhu
Flink streaming Python
Hi I have asked the same question back in Jan 2016 and checking again with community to see if there is any update or plan for supporting streaming Flink in python.
Amazon Athena
Anyone used used Amazon Athena with Apache Flink? I have use case where I want to write streaming data ( which is in Avro format) from kafka to s3 by converting into parquet format and update S3 location with daily partitions on Athena table. Any guidance is appreciated.
HTTP listener source
Hi As anyone implemented HTTP listener in flink source which acts has a Rest API to receive JSON payload via Post method and writes to Kafka or kinesis or any sink sources. Any guidance or sample snippet will be appreciated.
Re: Elasticsearch Http Connector
Thanks Philipp, I am also started looking at Jest client. Did you use it with Flink? is possible for you to share the project so that i can reuse it? On Tue, Oct 25, 2016 at 11:54 AM, Philipp Busschewrote: > Hi there, > not me (which I guess is not what you wanted to hear) but I had to > implement > a custom ElasticSearch based on Jest to be able to sink data into ES on > AWS. > Works quite alright ! > Philipp > > https://github.com/searchbox-io/Jest/tree/master/jest > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Elasticsearch- > Http-Connector-tp9700p9715.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >
Elasticsearch Http Connector
Friends Any one using new Elasticsearch RestClient( https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html) with Flink?
Upserts with Flink-elasticsearch
Is it possible to do Upsert with existing flink-elasticsearch connector today?
protobuf messages from Kafka to elasticsearch using flink
Friends, Can someone guide me or share an example on how to consume protobuf message from kafka and index into Elasticsearch using flink?
Re: flink kafka scala error
I did the solve problem by changing the scala version for Kafka library as i download the scala_2.11 version of flink ( flink-0.10.1-bin-hadoop27-scala_2.11.tg <http://www.carfab.com/apachesoftware/flink/flink-0.10.1/flink-0.10.1-bin-hadoop27-scala_2.11.tgz> z). Before: org.apache.kafka kafka_2.10 0.8.2.2 After: org.apache.kafka kafka_2.11 0.8.2.2 On Wed, Jan 6, 2016 at 4:13 AM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Madhukar, > > could you check whether your Flink installation contains the > flink-dist-0.10.1.jar in the lib folder? This file contains the necessary > scala-library.jar which you are missing. You can also remove the line > org.scala-lang:scala-library which excludes the > scala-library dependency to be included in the fat jar of your job. > > Cheers, > Till > > > On Wed, Jan 6, 2016 at 5:54 AM, Madhukar Thota <madhukar.th...@gmail.com> > wrote: > >> Hi >> >> I am seeing the following error when i am trying to run the jar in Flink >> Cluster. I am not sure what dependency is missing. >> >> /opt/DataHUB/flink-0.10.1/bin/flink run datahub-heka-1.0-SNAPSHOT.jar >> flink.properties >> java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class >> at kafka.utils.Pool.(Pool.scala:28) >> at >> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60) >> at >> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala) >> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39) >> at >> kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34) >> at >> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691) >> at >> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281) >> at >> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49) >> at com.lmig.datahub.heka.Main.main(Main.java:39) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:497) >> at >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497) >> at >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395) >> at >> org.apache.flink.client.program.Client.runBlocking(Client.java:252) >> at >> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676) >> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326) >> at >> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978) >> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028) >> Caused by: java.lang.ClassNotFoundException: >> scala.collection.GenTraversableOnce$class >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 20 more >> >> The exception above occurred while trying to run your command. >> >> >> Here is my pom.xml: >> >> >> http://maven.apache.org/POM/4.0.0; >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; >> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 >> http://maven.apache.org/xsd/maven-4.0.0.xsd;> >> 4.0.0 >> >> com.datahub >> datahub-heka >> 1.0-SNAPSHOT >> >> >> org.apache.flink >> flink-java >> 0.10.1 >> >> >> org.apache.flink >> flink-streaming-java >> 0.10.1 >> >> >> org.apache.flink >> flink-clients >> 0.10.1 >> >> >> org.apache.kafka >> kafka_2.10 >> 0.8.2.2 >> >> >> org.apache.flink >> flink-connector-kafka >
flink kafka scala error
Hi I am seeing the following error when i am trying to run the jar in Flink Cluster. I am not sure what dependency is missing. /opt/DataHUB/flink-0.10.1/bin/flink run datahub-heka-1.0-SNAPSHOT.jar flink.properties java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class at kafka.utils.Pool.(Pool.scala:28) at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60) at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala) at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39) at kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49) at com.lmig.datahub.heka.Main.main(Main.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395) at org.apache.flink.client.program.Client.runBlocking(Client.java:252) at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028) Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 20 more The exception above occurred while trying to run your command. Here is my pom.xml: http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> 4.0.0 com.datahub datahub-heka 1.0-SNAPSHOT org.apache.flink flink-java 0.10.1 org.apache.flink flink-streaming-java 0.10.1 org.apache.flink flink-clients 0.10.1 org.apache.kafka kafka_2.10 0.8.2.2 org.apache.flink flink-connector-kafka 0.10.1 org.apache.flink flink-connector-elasticsearch 0.10.1 org.elasticsearch elasticsearch 1.7.2 org.elasticsearch elasticsearch-shield 1.3.3 org.elasticsearch elasticsearch-license-plugin 1.0.0 com.fasterxml.jackson.core jackson-core 2.6.4 com.fasterxml.jackson.core jackson-databind 2.6.4 elasticsearch-releases http://maven.elasticsearch.org/releases true false org.apache.maven.plugins maven-shade-plugin 2.4.1 package shade org.apache.flink:flink-shaded-* org.apache.flink:flink-core org.apache.flink:flink-java org.apache.flink:flink-scala org.apache.flink:flink-runtime org.apache.flink:flink-optimizer org.apache.flink:flink-clients org.apache.flink:flink-avro org.apache.flink:flink-java-examples org.apache.flink:flink-scala-examples org.apache.flink:flink-streaming-examples org.apache.flink:flink-streaming-java org.scala-lang:scala-library org.scala-lang:scala-compiler org.scala-lang:scala-reflect com.amazonaws:aws-java-sdk
Re: Flink-Elasticsearch connector support for elasticsearch 2.0
Sure. I can submit the pull request. On Fri, Dec 4, 2015 at 12:37 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Madhu, > > Great. Do you want to contribute it back via a GitHub pull request? If > not that's also fine. We will try look into the 2.0 connector next > week. > > Best, > Max > > On Fri, Dec 4, 2015 at 4:16 PM, Madhukar Thota <madhukar.th...@gmail.com> > wrote: > > i have created working connector for Elasticsearch 2.0 based on > > elasticsearch-flink connector. I am using it right now but i want > official > > connector from flink. > > > > ElasticsearchSink.java > > > > > > import org.apache.flink.api.java.utils.ParameterTool; > > import org.apache.flink.configuration.Configuration; > > import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; > > import org.slf4j.Logger; > > import org.slf4j.LoggerFactory; > > > > import java.net.InetAddress; > > import java.net.UnknownHostException; > > import java.util.List; > > import java.util.Map; > > import java.util.concurrent.atomic.AtomicBoolean; > > import java.util.concurrent.atomic.AtomicReference; > > > > import org.elasticsearch.action.bulk.BulkItemResponse; > > import org.elasticsearch.action.bulk.BulkProcessor; > > import org.elasticsearch.action.bulk.BulkRequest; > > import org.elasticsearch.action.bulk.BulkResponse; > > import org.elasticsearch.action.index.IndexRequest; > > import org.elasticsearch.client.Client; > > import org.elasticsearch.client.transport.TransportClient; > > import org.elasticsearch.cluster.node.DiscoveryNode; > > import org.elasticsearch.common.settings.Settings; > > import org.elasticsearch.common.transport.InetSocketTransportAddress; > > import org.elasticsearch.common.unit.ByteSizeUnit; > > import org.elasticsearch.common.unit.ByteSizeValue; > > import org.elasticsearch.common.unit.TimeValue; > > > > > > public class ElasticsearchSink extends RichSinkFunction { > > > > public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = > > "bulk.flush.max.actions"; > > public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = > > "bulk.flush.max.size.mb"; > > public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = > > "bulk.flush.interval.ms"; > > > > private static final long serialVersionUID = 1L; > > private static final int DEFAULT_PORT = 9300; > > private static final Logger LOG = > > LoggerFactory.getLogger(ElasticsearchSink.class); > > > > /** > > * The user specified config map that we forward to Elasticsearch > when > > we create the Client. > > */ > > private final Map<String, String> userConfig; > > > > /** > > * The builder that is used to construct an {@link IndexRequest} from > > the incoming element. > > */ > > private final IndexRequestBuilder indexRequestBuilder; > > > > /** > > * The Client that was either retrieved from a Node or is a > > TransportClient. > > */ > > private transient Client client; > > > > /** > > * Bulk processor that was created using the client > > */ > > private transient BulkProcessor bulkProcessor; > > > > /** > > * This is set from inside the BulkProcessor listener if there where > > failures in processing. > > */ > > private final AtomicBoolean hasFailure = new AtomicBoolean(false); > > > > /** > > * This is set from inside the BulkProcessor listener if a Throwable > was > > thrown during processing. > > */ > > private final AtomicReference failureThrowable = new > > AtomicReference(); > > > > public ElasticsearchSink(Map<String, String> userConfig, > > IndexRequestBuilder indexRequestBuilder) { > > this.userConfig = userConfig; > > this.indexRequestBuilder = indexRequestBuilder; > > } > > > > > > @Override > > public void open(Configuration configuration) { > > > > ParameterTool params = ParameterTool.fromMap(userConfig); > > Settings settings = Settings.settingsBuilder() > > .put(userConfig) > > .build(); > > > > TransportClient transportClient = > > TransportClient.builder().settings(settings).build(); > > for (String server : params.get("esHost").split(";")) > > { > >
Running Flink in Cloudfoundry Environment
Hi Is it possible to run Flink in Cloudfoundry Environment? if yes, How can we achive this? Any help is appreciated. Thanks in Advance. Thanks, Madhu
Re: Kafka + confluent schema registry Avro parsing
ndOfStream(MyType nextElement) { > return false; > } > > } > > Then you supply this class when creating the consumer: > > DeserializationSchema decoder = new MyAvroDeserializer() > Properties props = new Properties(); > OffsetStore offsetStore = FlinkKafkaConsumer.OffsetStore.KAFKA; > FetcherType fetcherType = FlinkKafkaConsumer.FetcherType.LEGACY_LOW_LEVEL; > > FlinkKafkaConsumer consumer = new FlinkKafkaConsumer("myTopic"), decoder, > props, offsetStore, fetcherType); > > > Let me know if that works for you. > > Best regards, > Max > > On Thu, Nov 19, 2015 at 9:51 AM, Madhukar Thota <madhukar.th...@gmail.com> > wrote: > >> Hi >> >> I am very new to Avro. Currently I am using confluent Kafka version and I >> am able to write an Avro message to Kafka by storing schema in schema >> registry. Now I need to consume those messages using Flink Kafka Consumer >> and having a hard time to deserialize the messages. >> >> I am looking for an example on how to deserialize Avro message where >> schema is stored in schema registry. >> >> Any help is appreciated. Thanks in Advance. >> >> Thanks, >> Madhu >> >> >
Re: Kafka + confluent schema registry Avro parsing
an Michels <m...@apache.org> wrote: > Stephan is right, this should do it in deserialize(): > > if (decoder == null) { > decoder = new KafkaAvroDecoder(vProps); > } > > Further, you might have to specify the correct return type for > getProducedType(). You may use > > public TypeInformation getProducedType() { > return TypeExtractor.getForClass(String.class); > } > > Cheers, > Max > > On Thu, Nov 19, 2015 at 12:18 PM, Stephan Ewen <se...@apache.org> wrote: > > The KafkaAvroDecoder is not serializable, and Flink uses serialization to > > distribute the code to the TaskManagers in the cluster. > > > > I think you need to "lazily" initialize the decoder, in the first > invocation > > of "deserialize()". That should do it. > > > > Stephan > > > > > > On Thu, Nov 19, 2015 at 12:10 PM, Madhukar Thota < > madhukar.th...@gmail.com> > > wrote: > >> > >> Hi Max > >> > >> Thanks for the example. > >> > >> Based on your example here is what i did: > >> > >> public class Streamingkafka { > >> public static void main(String[] args) throws Exception { > >> StreamExecutionEnvironment env = > >> StreamExecutionEnvironment.getExecutionEnvironment(); > >> env.enableCheckpointing(500); > >> env.setParallelism(1); > >> > >> ParameterTool parameterTool = ParameterTool.fromArgs(args); > >> Properties props = new Properties(); > >> props.put("schema.registry.url", "http://localhost:8081;); > >> props.put("specific.avro.reader", true); > >> VerifiableProperties vProps = new VerifiableProperties(props); > >> > >> DeserializationSchema decoder = new > >> MyAvroDeserializer(vProps); > >> env.addSource(new > >> FlinkKafkaConsumer082<>(parameterTool.getRequired("topic"), decoder, > >> parameterTool.getProperties())).print(); > >> > >> env.execute(); > >> } > >> > >> public class MyAvroDeserializer implements > DeserializationSchema { > >> private KafkaAvroDecoder decoder; > >> > >> public MyAvroDeserializer(VerifiableProperties vProps) { > >> this.decoder = new KafkaAvroDecoder(vProps); > >> } > >> > >> @Override > >> public String deserialize(byte[] message) throws IOException { > >> return (String) this.decoder.fromBytes(message); > >> } > >> > >> @Override > >> public boolean isEndOfStream(String nextElement) { > >> return false; > >> } > >> > >> @Override > >> public TypeInformation getProducedType() { > >> return null; > >> } > >> } > >> > >> > >> Here is the error i am seeing... > >> > >> > >> Exception in thread "main" > >> org.apache.flink.api.common.InvalidProgramException: Object > >> > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082@3bf9ce3e > >> not serializable > >> at > >> > org.apache.flink.api.java.ClosureCleaner.ensureSerializable(ClosureCleaner.java:97) > >> at > org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:59) > >> at > >> > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1228) > >> at > >> > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1163) > >> at > >> > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1107) > >> at > >> > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1089) > >> at test.flink.Streamingkafka.main(Streamingkafka.java:25) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:497) > >> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) > >> Caused by: java.io.N