Re: StreamTableEnvironment can not run in batch mode

2020-02-19 Thread Jingsong Li
Hi Fanbin, TableEnvironment is unification of batch/streaming in blink planner. Use: TableEnvironment.create(fsSettings) We continue improving TableEnvironment to contain more features. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#create-a-tableenvironment

JDBC source running continuously

2020-02-19 Thread Fanbin Bu
Hi, My app creates the source from JDBC inputformat and running some sql and print out. But the source terminates itself after the query is done. Is there anyway to keep the source running? samle code: val env = StreamExecutionEnvironment.getExecutionEnvironment val settings =

Fwd: StreamTableEnvironment can not run in batch mode

2020-02-19 Thread Fanbin Bu
Hi, Currently, "StreamTableEnvironment can not run in batch mode for now, please use TableEnvironment." Are there any plans on the unification of batch/streaming roadmap that use StreamTableEnvironment for both streamingMode and batchMode? Thanks, Fanbin

FlinkKinesisConsumer throws "Unknown datum type org.joda.time.DateTime"

2020-02-19 Thread Lian Jiang
Hi, I use a FlinkKinesisConsumer in a Flink job to de-serialize kinesis events. Flink: 1.9.2 Avro: 1.9.2 The serDe class is like: public class ManagedSchemaKinesisPayloadSerDe implements KinesisSerializationSchema, KinesisDeserializationSchema { private static final String

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-19 Thread John Smith
I think so also. But I was wondering if this was Consumer or actual Kafka Broker. But this error displayed on the flink task node where the task was running. The brokers looked fine at the time. I have about a dozen topics which all are single partition except one which is 18. So I really doubt

Re: Side Outputs from RichAsyncFunction

2020-02-19 Thread KristoffSC
Hi, any thoughts about this one? Regards, Krzysztof -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Tests in FileUtilsTest while building Flink in local

2020-02-19 Thread Arujit Pradhan
Hi all, I was trying to build Flink in my local machine and these two unit tests are failing. *[ERROR] Errors:[ERROR] FileUtilsTest.testCompressionOnRelativePath:261->verifyDirectoryCompression:440 » NoSuchFile[ERROR] FileUtilsTest.testDeleteDirectoryConcurrently » FileSystem

StreamTableEnvironment can not run in batch mode

2020-02-19 Thread Fanbin Bu
Hi, Currently, "StreamTableEnvironment can not run in batch mode for now, please use TableEnvironment." Are there any plans on the unification of batch/streaming roadmap that use StreamTableEnvironment for both streamingMode and batchMode? Thanks, Fanbin

Re: Link to Flink on K8S Webinar

2020-02-19 Thread Austin Bennett
Cool; @aniket and @dagang, As someone who hasn't dug into the code of either (will go through your recording) -- might you share any thoughts on differences between: https://github.com/googlecloudplatform/flink-on-k8s-operator and https://github.com/lyft/flinkk8soperator ?? Also, for those in

Re: Flink Kafka connector consume from a single kafka partition

2020-02-19 Thread hemant singh
Hi Arvid, Thanks for your response. I think I did not word my question properly. I wanted to confirm that if the data is distributed to more than one partition then the ordering cannot be maintained (which is documented). According to your response I understand if I set the parallelism to number

Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Chris Stevens
Thanks so much Timo, got it working now. All down to my lack of Java skill. Many thanks, Chris Stevens Head of Research & Development +44 7565 034 595 On Wed, 19 Feb 2020 at 15:12, Timo Walther wrote: > Hi Chris, > > your observation is right. By `new Sensor() {}` instead of just `new >

Re: CSV StreamingFileSink

2020-02-19 Thread Austin Cawley-Edwards
Hey Timo, Thanks for the assignment link! Looks like most of my issues can be solved by getting better acquainted with Java file APIs and not in Flink-land. Best, Austin On Wed, Feb 19, 2020 at 6:48 AM Timo Walther wrote: > Hi Austin, > > the StreamingFileSink allows bucketing the output

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread aj
Thanks, Timo. I have not used and explore Table API until now. I have used dataset and datastream API only. I will read about the Table API. On Wed, Feb 19, 2020 at 4:33 PM Timo Walther wrote: > Hi Anuj, > > another option would be to use the new Hive connectors. Have you looked > into those?

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread aj
Thanks, Rafi. I will try with this but yes if partitioning is not possible then I also have to look some other solution. On Wed, Feb 19, 2020 at 3:44 PM Rafi Aroch wrote: > Hi Anuj, > > It's been a while since I wrote this (Flink 1.5.2). Could be a > better/newer way, but this is what how I

RE: KafkaFetcher closed before end of stream is received for all partitions.

2020-02-19 Thread Bill Wicker
Thanks for the update! Since we are still in the planning stage I will try to find another way to achieve what we are trying to do in the meantime and I'll keep an eye on that Jira. Two workarounds I thought about are to either match the parallelism of the source to the partition count, or

Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
Hi Chris, your observation is right. By `new Sensor() {}` instead of just `new Sensor()` you are creating an anonymous non-static class that references the outer method and class. If you check your logs, there might be also a reason why your POJO is used as a generic type. I assume because

Re: Process stream multiple time with different KeyBy

2020-02-19 Thread Lehuede sebastien
Hi guys, Thanks for your answers and sorry for the late reply. My use case is : I receive some events on one stream, each events can contain: - 1 field category - 1 field subcategory - 1 field category AND 1 field subcategory Events are matched against rules which can contain :

Re: Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Chris Stevens
Thanks again Timo, I hope I replied correctly this time. As per my previous message the Sensor class is a very simple POJO type (I think). When the serialization trace talks about PGSql stuff it makes me think that something from my operator is being included in serialization. Not just the

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Till Rohrmann
I have to correct myself. DataStream respects the ExecutionConfig.enableObjectReuse which happens in the form of creating different Outputs in the OperatorChain. This is also in line with the different behaviour you are observing. Concerning your initial question Theo, you could do the following

Re: Flink Kafka connector consume from a single kafka partition

2020-02-19 Thread Arvid Heise
Hi Hemant, Flink passes your configurations to the Kafka consumer, so you could check if you can subscribe to only one partition there. However, I would discourage that approach. I don't see the benefit to just subscribing to the topic entirely and have dedicated processing for the different

Fwd: Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
Hi Chris, [forwarding the private discussion to the mailing list again] first of all, are you sure that your Sensor class is either a top-level class or a static inner class. Because it seems there is way more stuff in it (maybe included by accident transitively?). Such as:

Re: Table API: Joining on Tables of Complex Types

2020-02-19 Thread Timo Walther
Hi Andreas, you are right, currently the Row type only supports accessing fields by index. Usually, we recommend to fully work in Table API. There you can access structured type fields by name (`SELECT row.field.field` or `'row.get("field").get("field")`) and additional utilities such as

Re: How Do i Serialize a class using default kryo and protobuf in scala

2020-02-19 Thread Timo Walther
Hi, would Apache Avro be an option for you? Because this is currently still the best supported format when it comes to schema upgrades as far as I know. Maybe Gordon in CC can give your some additional hints. Regards, Timo On 18.02.20 10:38, ApoorvK wrote: I have some case class which

Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
Hi Chris, it seems there are field serialized into state that actually don't belong there. You should aim to treat Sensor as a POJO instead of a Kryo generic serialized black-box type. Furthermore, it seems that field such as "org.apache.logging.log4j.core.layout.AbstractCsvLayout" should

Re: CSV StreamingFileSink

2020-02-19 Thread Timo Walther
Hi Austin, the StreamingFileSink allows bucketing the output data. This should help for your use case: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#bucket-assignment Regards, Timo On 19.02.20 01:00, Austin Cawley-Edwards wrote: Following up on

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Timo Walther
Hi Theo, there are lot of performance improvements that Flink could do but they would further complicate the interfaces and API. They would require deep knowledge of users about the runtime when it is safe to reuse object and when not. The Table/SQL API of Flink uses a lot of these

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread Timo Walther
Hi Anuj, another option would be to use the new Hive connectors. Have you looked into those? They might work on SQL internal data types which is why you would need to use the Table API then. Maybe Bowen in CC can help you here. Regards, Timo On 19.02.20 11:14, Rafi Aroch wrote: Hi Anuj,

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread Rafi Aroch
Hi Anuj, It's been a while since I wrote this (Flink 1.5.2). Could be a better/newer way, but this is what how I read & write Parquet with hadoop-compatibility: // imports > import org.apache.avro.generic.GenericRecord; > import org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat; >

Re: Exceptions in Web UI do not appear in logs

2020-02-19 Thread Robert Metzger
Thanks a lot for reporting this issue. Which version of Flink are you using? I checked the code of the Kinesis ShardConsumer (the current version though), and I found that exceptions from the ShardConsumer are properly forwarded to the lower level runtime. Did you check the *.out files of the

RE: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Theo Diefenthal
I have the same experience as Eleanore, When enabling object reuse, I saw a significant performance improvement and in my profiling session. I saw that a lot of serialization/deserialization was not performed any more. That’s why my question should originally aim a bit further: It’s good