Re: Flink - Kafka topic null error; happens only when running on cluster

2020-10-22 Thread Timo Walther
Hi Manas, you need to make sure to differentiate between what Flink calls "pre-flight phase" and "cluster phase". The pre-flight phase is were the pipeline is constructed and all functions are instantiated. They are then later serialized and send to the cluster. If you are reading your pro

Re: In 1.11.2/flinkSql/batch, tableEnv.getConfig.setNullCheck(false) seems to break group by-s

2020-10-16 Thread Timo Walther
Hi Jon, I would not recommend to use the configuration parameter. It is not deprecated yet but can be considered legacy code from before we reworked the type system. Regards, Timo On 16.10.20 13:23, Kurt Young wrote: Yes, I think this is a bug, feel free to open a jira and a pull request.

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-12 Thread Timo Walther
uplicates, but for records with no duplicates, I'd have to wait until no more records are coming -- am I missing something? Thanks so much, Austin On Fri, Oct 9, 2020 at 10:44 AM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Austin, if you don't want to worry a

Re: Best way to test Table API and SQL

2020-10-09 Thread Timo Walther
Hi Rex, let me copy paste my answer from a similar thread 2 months ago: Hi, this might be helpful as well: https://lists.apache.org/thread.html/rfe3b45a10fc58cf19d2f71c6841515eb7175ba731d5055b06f236b3f%40%3Cuser.flink.apache.org%3E First of all, it is important to know if you are interested i

Re: sql/table configuration naming guide/style/spec

2020-10-09 Thread Timo Walther
Hi Luan, we haven't updated all config parameters to string-based options. This is still on going. The idle state retention will be configurable in 1.12: https://issues.apache.org/jira/browse/FLINK-18555 I hope this helps. Regards, Timo On 09.10.20 15:33, Luan Cooper wrote: Hi I've read

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-09 Thread Timo Walther
ub.com/austince/flink-1.10-sql-windowing-error On Mon, Oct 5, 2020 at 4:24 AM Timo Walther <mailto:twal...@apache.org>> wrote: Btw which planner are you using? Regards, Timo On 05.10.20 10:23, Timo Walther wrote: > Hi Austin, > > could you share

Re: Flink SQL - can I force the planner to reuse a temporary view to be reused across multiple queries that access different fields?

2020-10-05 Thread Timo Walther
28, 2020 at 6:41 AM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Dan, unfortunetely, it is very difficult to read you plan? Maybe you can share a higher resolution and highlight which part of the pipeline is A, B etc. In general, the planner should be smart

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-05 Thread Timo Walther
Btw which planner are you using? Regards, Timo On 05.10.20 10:23, Timo Walther wrote: Hi Austin, could you share some details of your SQL query with us? The reason why I'm asking is because I guess that the rowtime field is not inserted into the `StreamRecord` of DataStream API. The ro

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-05 Thread Timo Walther
es, it should also work for ingestion time. I am not entirely sure whether event time is preserved when converting a Table into a retract stream. It should be possible and if it is not working, then I guess it is a missing feature. But I am sure that @Timo Walther

Re: Flink Batch Processing

2020-09-29 Thread Timo Walther
Hi Sunitha, currently, not every connector can be mixed with every API. I agree that it is confusing from time to time. The HBase connector is an InputFormat. DataSet, DataStream and Table API can work with InputFormats. The current Hbase input format might work best with Table API. If you li

Re: Flink SQL - can I force the planner to reuse a temporary view to be reused across multiple queries that access different fields?

2020-09-28 Thread Timo Walther
Hi Dan, unfortunetely, it is very difficult to read you plan? Maybe you can share a higher resolution and highlight which part of the pipeline is A, B etc. In general, the planner should be smart enough to reuse subplans where appropriate. Maybe this is a bug or shortcoming in the optimizer r

Re: Back pressure with multiple joins

2020-09-25 Thread Timo Walther
Hi Dan, could you share the plan with us using `TableEnvironment.explainSql()` for both queries? In general, views should not have an impact on the performance. They are a logical concept that gives a bunch of operations a name. The contained operations are inlined into the bigger query duri

Re: RichFunctions in Flink's Table / SQL API

2020-09-23 Thread Timo Walther
Hi Piyush, unfortunately, UDFs have no direct access to Flink's state. Aggregate functions are the only type of functions that can be stateful at the moment. Aggregate functions store their state in an accumulator that is serialized/deserialized on access, but an accumulator field can be back

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Timo Walther
Hi Husky, I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is needed to make this feature possible. @Rui: Do you know more about this issue and current limitations. Regards, Timo On 18.09.20 09:11, Husky Zeng wrote: When we submit a job which use udf of hive , the job will

Re: hourly counter

2020-09-22 Thread Timo Walther
Hi Lian, you are right that timers are not available in a ProcessWindowFunction but the state store can be accessed. So given that your window width is 1 min, you could maintain an additional state value for counting the minutes and updating your counter once this value reached 60. Otherwise

Re: Zookeeper connection loss causing checkpoint corruption

2020-09-21 Thread Timo Walther
Hi Arpith, is there a JIRA ticket for this issue already? If not, it would be great if you can report it. This sounds like a critical priority issue to me. Thanks, Timo On 22.09.20 06:25, Arpith P wrote: Hi Peter, I have recently had a similar issue where I could not load from the checkpoi

Re: How to disconnect taskmanager via rest api?

2020-09-21 Thread Timo Walther
Hi Luan, this sound more of a new feature request to me. Maybe you can already open an issue for it. I will loop in Chesnay in CC if there is some possibility to achieve this already? Regards, Timo On 21.09.20 06:37, Luan Cooper wrote: Hi We're running flink standalone cluster on k8s whe

Re: Problem with zookeeper and flink config

2020-09-21 Thread Timo Walther
Hi Saksham, if I understand you correctly, you are running Zookeeper and Flink locally on your machine? Are you using Docker or is this a bare metal setup? The exception indicates that your paths contain `hdfs:` as URL scheme. Are you sure you want to use HDFS? If yes, you need to add an HDFS

Re: Watermark advancement in late side output

2020-09-21 Thread Timo Walther
Hi Ori, first of all, watermarks are sent to all side outputs (this is tested here [1]). Thus, operators in the side output branch of the pipeline will work similar to operators in the main branch. When calling `assignTimestampsAndWatermarks`, the inserted operator will erase incoming waterm

Re: Backquote in SQL dialect

2020-09-17 Thread Timo Walther
Hi Satyam, this has historical reasons. In the beginning all SQL queries were embedded in Java programs and thus Java strings. So single quote was handy for declaring SQL strings in a Java string and backticks for escaping keywords. But I agree that we should make this configurable. Feel free

Re:

2020-09-10 Thread Timo Walther
Hi Violeta, I just noticed that the plan might be generated from Flink's old planner instead of the new, more performant Blink planner. Which planner are you currently using? Regards, Timo On 08.09.20 17:51, Timo Walther wrote: You are using the old connectors. The new connector

Re: Slow Performance inquiry

2020-09-10 Thread Timo Walther
r way? Regards, Heidy ---- *From:* Timo Walther mailto:twal...@apache.org>> *Sent:* Wednesday, September 9, 2020 1:58 PM *To:* user@flink.apache.org <mailto:user@flink.apache.org>

Re: Slow Performance inquiry

2020-09-09 Thread Timo Walther
Hi Hazem, I guess your performance is mostly driven by the serialization overhead in this case. How do you declare your state type? Flink comes with different serializers. Not all of them are extracted automatically when using reflective extraction methods: - Note that `Serializable` decla

Re: Flink alert after database lookUp

2020-09-09 Thread Timo Walther
ossible sample source code for reference to stream database. Please help me badly stuck. In the mail, I see you asked me to register. Are you referring to any training here or any other registration. Regards, Sunitha. On Tuesday, September 8, 2020, 08:19:49 PM GMT+5:30, Timo Walther wrote

Re:

2020-09-08 Thread Timo Walther
You are using the old connectors. The new connectors are available via SQL DDL (and execute_sql() API) like documented here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html Maybe this will give your some performance boost, but certainly not eno

Re: Flink alert after database lookUp

2020-09-08 Thread Timo Walther
Hi Sunitha, what you are describing is a typical streaming enrichment. We need to enrich the stream with some data from a database. There are different strategies to handle this: 1) You are querying the database for every record. This is usually not what you want because it would slow down y

Re: FLINK DATASTREAM Processing Question

2020-09-08 Thread Timo Walther
Hi Vijay, one comment to add is that the performance might suffer with multiple map() calls. For safety reason, records between chained operators are serialized and deserialized in order to strictly don't influence each other. If all functions of a pipeline are guaranteed to not modify incomi

Re:

2020-09-08 Thread Timo Walther
Hi Violeta, can you share your connector code with us? The plan looks quite complicated given the relatively simple query. Maybe there is some optimization potential. But before we dive deeper, I see a `Map(to: Row)` which indicates that we might work with a legacy sink connector. Did you tr

Re: Editing Rowtime for SQL Table

2020-09-02 Thread Timo Walther
m On Tue, Sep 1, 2020 at 4:46 AM Timo Walther <mailto:t...@ververica.com>> wrote: Hi Satyam, Matthias is right. A rowtime attribute cannot be modified and needs to be passed "as is" through the pipeline. The only exceptions are if a newer rowtime is offered

Re: UT/IT Table API/SQL streaming job

2020-08-19 Thread Timo Walther
Hi, this might be helpful as well: https://lists.apache.org/thread.html/rfe3b45a10fc58cf19d2f71c6841515eb7175ba731d5055b06f236b3f%40%3Cuser.flink.apache.org%3E First of all, it is important to know if you are interested in end-to-end tests (incl. connectors) or excluding connectors. If you jus

Re: Flink SQL UDAF com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID

2020-08-18 Thread Timo Walther
Hi Forideal, luckily these problems will belong to the past in Flink 1.12 when UDAF are updated to the new type system [1]. Lists will be natively supported and registering custom KryoSerializers consistently as well. Until then, another workaround is to override getAccumulatorType() and def

Re: How to write a customer sink partitioner when using flinksql kafka-connector

2020-08-18 Thread Timo Walther
Hi Lei, you can check how the FlinkFixedPartitioner [1] or Tuple2FlinkPartitioner [2] are implemented. Since you are using SQL connectors of the newest generation, you should receive an instance of org.apache.flink.table.data.RowData in your partitioner. You can create a Maven project with a

Re: Flink 1.11 SQL error: No operators defined in streaming topology. Cannot execute.

2020-08-13 Thread Timo Walther
Hi Lu, `env.execute("table api");` is not necessary after FLIP-84 [1]. Every method that has `execute` in its name will immediately execute a job. Therefore your `env.execute` has an empty pipeline. Regards, Timo [1] https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878

Re: Is there a way to start a timer without ever receiving an event?

2020-08-13 Thread Timo Walther
What you can do is creating an initial control stream e.g. using `StreamExecutionEnivronment.fromElements()` and either use `union(controlStream, actualStream)` or use `actualStream.connect(controlStream)`. Regards, Timo On 12.08.20 18:15, Andrey Zagrebin wrote: I do not think so. Each timer

Re: Using Event Timestamp sink get's back with machine timezone

2020-08-12 Thread Timo Walther
Hi Faye, the problem lies in the wrong design of JDK's java.sql.Timestamp. You can also find a nice summary in the answer here [1]. java.sql.Timestamp is timezone dependent. Internally, we subtract/normalize the timezone and work with the UNIX timestamp. Beginning from Flink 1.9 we are using

Re: how to add a new runtime operator

2020-08-12 Thread Timo Walther
Hi Vincent, we don't have a step by step guide for adding new operators. Most of the important operations are exposed via DataStream API. Esp. ProcessFunction [1] fits for most complex use cases with access to the primitives such as time and state. What kind of operator is missing for your u

Re: Event time based disconnection detection logic

2020-08-11 Thread Timo Walther
is produces the expected output. Also, I will assume that this is the best way to solve my problem - I can't use Flink's session windows. Let me know if anyone has any other ideas though! Thank you for your time and quick response! On Tue, Aug 11, 2020 at 1:45 PM Timo Walther <ma

Re: Proper way to do Integration Testing ?

2020-08-11 Thread Timo Walther
Hi Faye, Flink does not officially provide testing tools at the moment. However, you can use internal Flink tools if they solve your problem. The `flink-end-to-end-tests` module [1] shows some examples how we test Flink together with other systems. Many tests are still using plain bash scrip

Re: Batch version of StreamingFileSink.forRowFormat(...)

2020-08-11 Thread Timo Walther
Hi Dan, InputFormats are the connectors of the DataSet API. Yes, you can use either readFile, readCsvFile, readFileOfPrimitives etc. However, I would recommend to also give Table API a try. The unified TableEnvironment is able to perform batch processing and is integrated with a bunch of conn

Re: Event time based disconnection detection logic

2020-08-11 Thread Timo Walther
Hi Manas, at the first glance your code looks correct to me. I would investigate if your keys and watermarks are correct. Esp. the watermark frequency could be an issue. If watermarks are generated at the same time as the heartbeats itself, it might be the case that the timers fire first befo

Re: [SQL DDL] How to extract timestamps from Kafka message's metadata

2020-08-11 Thread Timo Walther
Hi Dongwon, another possibility is to use DataStream API before. There you can extract the metadata and use DataStream.assignTimestampsAndWatermarks before converting the stream to a table. Regards, Timo On 11.08.20 09:41, Dongwon Kim wrote: Hi Dawid, I'll try your suggestion [2] and wait

Re: Flink 1.11.0 UDAF fails with "No match found for function signature fun()"

2020-07-29 Thread Timo Walther
s about TableFunction, so maybe it is something different, but related. I have created a small github project with both cases: https://github.com/dmytroDragan/flink-1.11-sql-agg-issue/blob/master/src/test/java/lets/test/flink/AggFunTest.java I would appreciate if you could take a look. On 27/07/2

Re:  problem with build from source flink 1.11

2020-07-27 Thread Timo Walther
recommended Maven version for building Flink. @Felipe Can you provide us the full stacktrace? This could be a library issue in regards to JDK compatibility. On 27/07/2020 15:23, Timo Walther wrote: Hi Felipe, are you sure that Maven and the TaskManagers are using the JDK version that you ment

Re: Flink 1.11.0 UDAF fails with "No match found for function signature fun()"

2020-07-27 Thread Timo Walther
Hi Dmytro, aggregate functions will support the new type system in Flink 1.12. Until then, they cannot be used with the new `call()` syntax as anonymous functions. In order to use the old type system, you need to register the function explicilty using SQL `CREATE FUNCTION a AS 'myFunc'` and t

Re: problem with build from source flink 1.11

2020-07-27 Thread Timo Walther
Hi Felipe, are you sure that Maven and the TaskManagers are using the JDK version that you mentioned? Usually, a `mvn clean install` in the `.../flink/` directory should succeed without any problems. Also your Maven version seems pretty old. I'm using Apache Maven 3.6.3 for example. The No

Re: Kafka connector with PyFlink

2020-07-27 Thread Timo Walther
you check that? @Timo I think we should define the serialVersionUID for all the classes which implements Serializable. What do you think? Regards, Dian 在 2020年7月27日,下午4:38,Timo Walther <mailto:twal...@apache.org>> 写道: Hi, the InvalidClassException indicates that you are using

Re: Flink 1.11 Simple pipeline (data stream -> table with egg -> data stream) failed

2020-07-27 Thread Timo Walther
them, but I`m wonder if there is a better way to do it. It sounds a quite strange that with having Blink planner which optimise DataStream pipelines for stream and batch jobs, there is necessity to write the same things on DataStream and DataSet API. On 24/07/2020, 15:36, "Timo

Re: Unable to deduce RocksDB api calls in streaming.

2020-07-27 Thread Timo Walther
Hi Aviral, as far as I know we are not calling RocksDB API to perform snapshots. As the Stackoverflow answer also indicates most of the snapshotting is done outside of RocksDB by just dealing with the SST files. Have you checked the available metrics in the web UI? https://ci.apache.org/proj

Re: Kafka connector with PyFlink

2020-07-27 Thread Timo Walther
Hi, the InvalidClassException indicates that you are using different versions of the same class. Are you sure you are using the same Flink minor version (including the Scala suffix) for all dependencies and Kubernetes? Regards, Timo On 27.07.20 09:51, Wojciech Korczyński wrote: Hi, when

Re: Flink 1.11 Simple pipeline (data stream -> table with egg -> data stream) failed

2020-07-24 Thread Timo Walther
Hi Dmytro, `StreamTableEnvironment` does not support batch mode currently. Only `TableEnvironment` supports the unified story. I saw that you disabled the check in the `create()` method. This check exists for a reason. For batch execution, the planner sets specific properties on the stream g

Re: Byte arrays in Avro

2020-07-16 Thread Timo Walther
definitely the `AvroSerializer` if the type information is `AvroTypeInfo`. You can check that via `dataStream.getType`. I hope this helps. Regards, Timo On 16.07.20 14:28, Timo Walther wrote: Hi Lasse, are you using Avro specific records? A look into the code shows that the warnings in the

Re: Byte arrays in Avro

2020-07-16 Thread Timo Walther
Hi Lasse, are you using Avro specific records? A look into the code shows that the warnings in the log are generated after the Avro check: https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741 Somehow your Avro object

Re: Convert sql table with field of type MULITSET to datastream with field of type java.util.Map[T, java.lang.Integer]

2020-06-29 Thread Timo Walther
Hi YI, not all conversion might be supported in the `toRetractStream` method. Unfortunately, the rework of the type system is still in progress. I hope we can improve the user experience there quite soon. Have you tried to use `Row` instead? `toRetractStream[Row]` should work for all data ty

Re: Error reporting for Flink jobs

2020-06-29 Thread Timo Walther
Hi Satyam, I'm not aware of an API to solve all your problems at once. A common pattern for failures in user-code is to catch errors in user-code and define a side output for an operator to pipe the errors to dedicated sinks. However, such a functionality does not exist in SQL yet. For the si

Re: Reading from AVRO files

2020-06-10 Thread Timo Walther
Hi Lorenzo, as far as I know we don't support Avro's logical times in Flink's AvroInputFormat yet. E.g. AvroRowDeserializationSchema [1] supports the 1.8.2 version of logical types but might be incompatible with 1.9.2. Reg 2) Specific record generated with AVRO 1.9.2 plugin: Could you send u

Re: Incremental state

2020-06-10 Thread Timo Walther
Hi Annemarie, if TTL is what you are looking for and queryable state is what limits you, it might make sense to come up with a custom implementation of queryable state? TTL might be more difficult to implement. As far as I know this feature is more of an experimental feature without any consi

Re: Writing to SQL server

2020-05-21 Thread Timo Walther
Hi Martin, usually, this error occurs when people forget to add `org.apache.flink.api.scala._` to their imports. It is triggered by the Scala macro that the DataStream API uses for extracting types. Can you try to call `result.toAppendStream[Row]` directly? This should work if you import `or

Re: Properly using ConnectorDescriptor instead of registerTableSource

2020-05-18 Thread Timo Walther
Hi Nikola, the reason for deprecating `registerTableSource` is that we aim to have everything declarative in Table API. A table program should simply declare what it needs and the planner should find a suitable connector, regardless how the underlying class structure looks like. This might al

Re: Infer if a Table will create an AppendStream / RetractStream

2020-05-18 Thread Timo Walther
Hi Yuval, currently there is no API for getting those insights. I guess you need to use internal API for getting this information. Which planner and version are you using? Regards, Timo On 18.05.20 14:16, Yuval Itzchakov wrote: Hi, Is there any way to infer if a Table is going to generate

Re: Publishing Sink Task watermarks outside flink

2020-05-18 Thread Timo Walther
of the box, is it possible to add some extra operator after sink which will always have watermark which is greater than sink function watermarks, as its a downstream operator. Also, does the problem simplify if we have Kafka sink? On Tue, Apr 28, 2020 at 10:35 PM Timo Walther mail

Re: ML/DL via Flink

2020-04-28 Thread Timo Walther
Hi Max, as far as I know a better ML story for Flink is in the making. I will loop in Becket in CC who may give you more information. Regards, Timo On 28.04.20 07:20, m@xi wrote: Hello Flinkers, I am building a *streaming* prototype system on top of Flink and I want ideally to enable ML tra

Re: Publishing Sink Task watermarks outside flink

2020-04-28 Thread Timo Walther
Hi Shubham, you can call stream.process(...). The context of ProcessFunction gives you access to TimerService which let's you access the current watermark. I'm assuming your are using the Table API? As far as I remember, watermark are travelling through the stream even if there is no time-ba

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Timo, I am trying to convert simply back to a DataStream. Let's say: DataStream> I can convert the DataStream into a table without a problem, the problem is getting a DataStream back. Thanks Gyula On Tue, Apr 28, 2020 at 6:32 PM Timo Walther <mailto:twal...@apache.org>> wrot

Re: "Fill in" notification messages based on event time watermark

2020-04-28 Thread Timo Walther
Hi Manas, Reg. 1: I would recommend to use a debugger in your IDE and check which watermarks are travelling through your operators. Reg. 2: All event-time operations are only performed once the watermark arrived from all parallel instances. So roughly speaking, in machine time you can assume

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Hi Gyula, are you coming from DataStream API or are you trying to implement a source/sink? It looks like the array is currently serialized with Kryo. I would recommend to take a look at this class: org.apache.flink.table.types.utils.LegacyTypeInfoDataTypeConverter This is the current mapping

Re: Joining table with row attribute against an enrichment table

2020-04-20 Thread Timo Walther
Hi Gyula, first of all the exception ``` org.apache.flink.table.api.TableException: Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before. ``` is IMHO one of the biggest shortcomings that we cu

Re: Processing Message after emitting to Sink

2020-04-15 Thread Timo Walther
Yes. But that's the problem of your use cases, right? If you need to wait for the sink to be completed, it is not a terminating operator anymore. Regards, Timo On 15.04.20 10:50, KristoffSC wrote: Thank you very much for your answer. I have a question regarding your first paragraph: " it req

Re: Objects with fields that are not serializable

2020-04-15 Thread Timo Walther
Hi Dominik, Flink does not use Java serialization logic for network communication. So objects must not implement `Serializable` for usage during runtime (DataStream). Only if those classes are member variables of a Function like MapFunction, they need to serializable to ship the function cod

Re: Processing Message after emitting to Sink

2020-04-15 Thread Timo Walther
Hi Kristoff, synchronization across operators is not easy to achieve. If one needs to wait until a sink has processed some element, it requires that a sink participates in the pipeline. So it is not located as a "leaf" operator but location somewhere in the middle. So your idea to call MQ di

Re: Registering UDAF in blink batch app

2020-04-14 Thread Timo Walther
Hi Dmytro, table function will be supported in Flink 1.11 with the new type system. Hopefully, we can also support aggregate functions until then. Regards, Timo On 14.04.20 15:33, godfrey he wrote: Hi Dmytro, Currently, TableEnvironment does not support register AggregationFunction and Tab

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-14 Thread Timo Walther
Hi Oleg, this sounds indeed like abnormal behavior. Are you sure that these large checkpoints are related to the Kafka consumer only? Are there other operators in the pipeline? Because internally the state kept in a Kafka consumer is pretty minimal and only related to Kafka partition and offs

Re: Flink sql Session window

2020-04-14 Thread Timo Walther
Hi, currently we don't provide more flexible windowing semantics in SQL. For this, a programmatic API like the DataStream API is a better fit with custom triggers and other more advanced features. Regards, Timo On 14.04.20 13:31, snack white wrote: Hi, In flink sql session window, is ther

Re: Flink

2020-04-14 Thread Timo Walther
Hi Navneeth, it might be also worth to look into Ververica Plaform for this. The community edition was published recently is free of charge. It provides first class K8s support [1]. There is also a tutorial how to deploy it on EKS [2] (not the most recent one through). Regards, Timo [1]

Re: Inserting nullable data into NOT NULL columns

2020-04-09 Thread Timo Walther
Hi Gyula, some disclaimer: the type system rework is still ongoing and there a a couple of known issues and missing end-to-end tests around this topic. I would therefore recommend to declare the sink as `STRING NULL` for now. Can you open an issue for your concrete use case with some example

Re: [SURVEY] What Change Data Capture tools are you using?

2020-04-03 Thread Timo Walther
Hi Jark, thanks for sharing the results. I think for the databases usage part, I'm pretty sure that we could also rely on some Gartner landscape research. Just an idea. Thanks for performing the survey! Timo On 03.04.20 10:17, Jark Wu wrote: Hi everyone, Thanks all for the feedbacks! I wo

Re: Questions regarding Key Managed state

2020-04-02 Thread Timo Walther
Hi Kristoff, case 1: first of all Flink groups keys internally into so-called "key groups" for reducing the management overhead. The maximum parallelism decides about the number of key groups. When performing a rescale, the key groups are basically distributed using some consistent hashing al

Re: Perform processing only when watermark updates, buffer data otherwise

2020-04-02 Thread Timo Walther
Hi Manas, first of all, after assigning watermarks at the source level, usually Flink operators make sure to handle the watermarks. In case of a `union()`, the subsequent operator will increment its internal event-time clock and emit a new watermark only if all input streams (and their paral

Re: Making job fail on Checkpoint Expired?

2020-04-02 Thread Timo Walther
Hi Robin, this is a very good observation and maybe even unintended behavior. Maybe Arvid in CC is more familiar with the checkpointing? Regards, Timo On 02.04.20 15:37, Robin Cassan wrote: Hi all, I am wondering if there is a way to make a flink job fail (not cancel it) when one or sever

Re: Issues with Watermark generation after join

2020-03-24 Thread Timo Walther
Or better: "But for sources, you need to emit a watermark from all sources in order to have progress in event-time." On 24.03.20 13:09, Timo Walther wrote: Hi, 1) yes with "partition" I meant "parallel instance". If the watermarking is correct in the DataStre

Re: Issues with Watermark generation after join

2020-03-24 Thread Timo Walther
watermark progresses and the output is generated? Best Regards, Dom. wt., 24 mar 2020 o 10:01 Timo Walther <mailto:twal...@apache.org>> napisał(a): Hi Dominik, the big conceptual difference between DataStream and Table API is that record timestamps are part of the schema in

Re: Issues with Watermark generation after join

2020-03-24 Thread Timo Walther
Hi Dominik, the big conceptual difference between DataStream and Table API is that record timestamps are part of the schema in Table API whereas they are attached internally to each record in DataStream API. When you call `y.rowtime` during a stream to table conversion, the runtime will extra

Re: Need help on timestamp type conversion for Table API on Pravega Connector

2020-03-24 Thread Timo Walther
o/pravega/connectors/flink/FlinkPravegaTableITCase.java#L310 Best Regards, Brian *From:* Jark Wu *Sent:* Thursday, March 19, 2020 20:25 *To:* Till Rohrmann *Cc:* Zhou, Brian; Timo Walther; Jingsong Li; user *Subject:* Re: Need help on timestamp type conversion for Table API on Pravega Connect

Re: time-windowed joins and tumbling windows

2020-03-18 Thread Timo Walther
 in the 2nd time window because it is from the first table and 3rd one. -- Vinod On Fri, Mar 13, 2020 at 6:42 AM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Vinod, I cannot spot any problems in your SQL query. Some questions for clarification: 1) Whic

Re: time-windowed joins and tumbling windows

2020-03-13 Thread Timo Walther
Hi Vinod, I cannot spot any problems in your SQL query. Some questions for clarification: 1) Which planner are you using? 2) How do you create your watermarks? 3) Did you unit test with only parallelism of 1 or higher? 4) Can you share the output of TableEnvironment.explain() with us? Shouldn't

Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
o I prevent this undesirable behaviour? I'm quite happy for my solution to serialize only what I explicitly tell it to, I don't need exactly once or anything. Many thanks, Chris Stevens Head of Research & Development +44 7565 034 595 On Wed, 19 Feb 2020 at 12:19, Timo Walther <

Fwd: Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
orking in hosted Kinesis Date: Wed, 19 Feb 2020 12:02:16 + From: Chris Stevens To: Timo Walther Hi Timo, Thanks for your reply. This makes sense to me, how do I treat something as a POJO instead of a generic serialized BB type? Sorry relatively new to Java and Flink.

Re: Table API: Joining on Tables of Complex Types

2020-02-19 Thread Timo Walther
ch will end up in this metadata loss. We know what fields the Table must be composed of, but we just won't know which index they live in so Row#getField() isn't what quite what we need. // ah -Original Message- From: Timo Walther Sent: Friday, January 17, 2020 11:29 AM To:

Re: How Do i Serialize a class using default kryo and protobuf in scala

2020-02-19 Thread Timo Walther
Hi, would Apache Avro be an option for you? Because this is currently still the best supported format when it comes to schema upgrades as far as I know. Maybe Gordon in CC can give your some additional hints. Regards, Timo On 18.02.20 10:38, ApoorvK wrote: I have some case class which have

Re: Updating ValueState not working in hosted Kinesis

2020-02-19 Thread Timo Walther
Hi Chris, it seems there are field serialized into state that actually don't belong there. You should aim to treat Sensor as a POJO instead of a Kryo generic serialized black-box type. Furthermore, it seems that field such as "org.apache.logging.log4j.core.layout.AbstractCsvLayout" should no

Re: CSV StreamingFileSink

2020-02-19 Thread Timo Walther
Hi Austin, the StreamingFileSink allows bucketing the output data. This should help for your use case: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#bucket-assignment Regards, Timo On 19.02.20 01:00, Austin Cawley-Edwards wrote: Following up on th

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Timo Walther
Hi Theo, there are lot of performance improvements that Flink could do but they would further complicate the interfaces and API. They would require deep knowledge of users about the runtime when it is safe to reuse object and when not. The Table/SQL API of Flink uses a lot of these optimizat

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread Timo Walther
Hi Anuj, another option would be to use the new Hive connectors. Have you looked into those? They might work on SQL internal data types which is why you would need to use the Table API then. Maybe Bowen in CC can help you here. Regards, Timo On 19.02.20 11:14, Rafi Aroch wrote: Hi Anuj, I

Re: 1.9 timestamp type default

2020-02-14 Thread Timo Walther
Hi, the type system is still under heavy refactoring that touches a lot of interfaces. Where would you like to use java.sql.Timestamp? UDFs are not well supported right now. Source and sinks might work for the Blink planner and java.sql.Timestamp is the only supported conversion class of old

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Timo Walther
Congratualations everyone! Great stuff :-) Regards, Timo On 12.02.20 16:05, Leonard Xu wrote: Great news! Thanks everyone involved ! Thanks Gary and Yu for being the release manager ! Best, Leonard Xu 在 2020年2月12日,23:02,Stephan Ewen 写道: Congrats to us all. A big piece of work, nicely don

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-06 Thread Timo Walther
so quickly! We could do it with insert-only rows. When you say flags in the data do you mean a field with a name like 'retracts' and then the value of that field is the id of the event/row we want to retract? How would that be possible with Flink? Thanks! On 2020/02/04 15:27:20, Timo

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Timo Walther
Hi Stephan, the use cases you are describing sound like a perfect fit to Flink. Internally, Flink deals with insertions and deletions that are flowing through the system and can update chained aggregations and complex queries. The only bigger limitation at the moment is that we only support s

Re: time column used by timer

2020-02-04 Thread Timo Walther
Hi, timestamps and watermarks are attached to every stream record in the runtime. After assignTimestampsAndWatermarks() extracted them, Flink handles those attributes internally. For example, it doesn't matter which class you are emitting in a flatMap function, the runtime will set the times

Re: Question: Determining Total Recovery Time

2020-02-04 Thread Timo Walther
Hi Morgan, as far as I know this is not possible mostly because measuring "till the point when the system catches up to the last message" is very pipeline/connector dependent. Some sources might need to read from the very beginning, some just continue from the latest checkpointed offset. Mea

Re: Flink Dynamodb as sink

2020-02-04 Thread Timo Walther
Hi Hemant, maybe this thread from last year could also help you: http://mail-archives.apache.org/mod_mbox/flink-user/201903.mbox/%3c2df93e1c-ae46-47ca-9c62-0d26b2b3d...@gmail.com%3E Someone also proposes a skeleton of the code there. Regards, Timo On 04.02.20 08:10, hemant singh wrote: Than

Re: Table API: Joining on Tables of Complex Types

2020-01-17 Thread Timo Walther
Hi Andreas, if dataset.getType() returns a RowTypeInfo you can ignore this log message. The type extractor runs before the ".returns()" but with this method you override the old type. Regards, Timo On 15.01.20 15:27, Hailu, Andreas wrote: Dawid, this approach looks promising. I’m able to fl

<    1   2   3   4   5   6   7   >