Re: JDBCInputFormat and SplitDataProperties

2018-08-09 Thread Alexis Sarda
er case, you properly want to annotate them with > Forward Field annoations. > > The number of source tasks is unrelated to the number of splits. If you > have more tasks than splits, some tasks won't process any data. > > Best, Fabian > > [1] > https://ci.apache.org/projects/fli

JDBCInputFormat and SplitDataProperties

2018-08-07 Thread Alexis Sarda
Hi everyone, I have the following scenario: I have a database table with 3 columns: a host (string), a timestamp, and an integer ID. Conceptually, what I'd like to do is: group by host and timestamp -> based on all the IDs in each group, create a mapping to n new tuples -> for each unique tuple,

Re: JDBCInputFormat and SplitDataProperties

2018-08-08 Thread Alexis Sarda
n > (ExecutionEnvironment.getExecutionPlan()) using the plan visualizer [1] and > validate that the result are identical whether you use SDP or not. > > Best, Fabian > > [1] https://flink.apache.org/visualizer/ > > 2018-08-07 22:32 GMT+02:00 Alexis Sarda : > >> H

Re: JDBCInputFormat and SplitDataProperties

2018-08-10 Thread Alexis Sarda
, even though a groupBy was used. - The third and final subtask is a sink that saves back to the database. Does anyone know why parallelism is not being used? Regards, Alexis. On Thu, Aug 9, 2018 at 11:22 AM Alexis Sarda wrote: > Hi Fabian, > > Thanks a lot for the help. The scala Da

Re: JDBCInputFormat and SplitDataProperties

2018-08-10 Thread Alexis Sarda
hare the plan for the program? > > Are you sure that more than 1 split is generated by the JdbcInputFormat? > > 2018-08-10 12:04 GMT+02:00 Alexis Sarda : > >> It seems I may have spoken too soon. After executing the job with more >> data, I can see the following things i

Data loss when connecting keyed streams

2021-05-21 Thread Alexis Sarda-Espinosa
Hello everyone, I just experienced something weird and I'd like to know if anyone has any idea of what could have happened. I have a simple Flink cluster version 1.11.3 running on Kubernetes with a single worker. I was testing a pipeline that connects 2 keyed streams and processes the result

OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
Hello, I am currently testing a scenario where I would run the same job multiple times in a loop with different inputs each time. I am testing with a local Flink cluster v1.12.4. I initially got an OOM - Metaspace error, so I increased the corresponding memory in the TM's JVM (to 512m), but it

Re: OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
cleaned up? In this scenario only my jobs would be running in the cluster, so I can have a bit more control. Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, July 8, 2021 12:14 AM To: user@flink.apache.org Subject: OOM Metaspace after multiple jobs

Re: Using Flink's Kubernetes API inside Java

2021-07-02 Thread Alexis Sarda-Espinosa
Alexis. From: Roman Khachatryan Sent: Friday, July 2, 2021 9:19 PM To: Alexis Sarda-Espinosa ; Yang Wang Cc: user@flink.apache.org Subject: Re: Using Flink's Kubernetes API inside Java Hi Alexis, Have you looked at flink-on-k8s-operator [1]? It seems t

RE: Using Flink's Kubernetes API inside Java

2021-07-07 Thread Alexis Sarda-Espinosa
Thanks Roman and Yang, I understand. I’ll have a look and ask on the developer list depending on what I find. Regards, Alexis. From: Yang Wang Sent: Mittwoch, 7. Juli 2021 05:14 To: ro...@apache.org Cc: Alexis Sarda-Espinosa ; user@flink.apache.org Subject: Re: Using Flink's Kubernetes API

Using Flink's Kubernetes API inside Java

2021-07-01 Thread Alexis Sarda-Espinosa
Hello everyone, I'm testing a custom Kubernetes operator that should fulfill some specific requirements I have for Flink. I know of this WIP project: https://github.com/wangyang0918/flink-native-k8s-operator I can see that it uses some classes that aren't publicly documented, and I believe it

Re: [Schema Evolution] Cannot restore from savepoint after deleting field from POJO

2021-04-29 Thread Alexis Sarda-Espinosa
, March 12, 2021 6:22 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: [Schema Evolution] Cannot restore from savepoint after deleting field from POJO Hi Alexis, This looks like a bug, I've created a Jira ticket to address it [1]. Please feel free to provide any additional

DataStream in batch mode - handling (un)ordered bounded data

2021-03-12 Thread Alexis Sarda-Espinosa
Hello, Regarding the new BATCH mode of the data stream API, I see that the documentation states that some operators will process all data for a given key before moving on to the next one. However, I don't see how Flink is supposed to know whether the input will provide all data for a given key

[Schema Evolution] Cannot restore from savepoint after deleting field from POJO

2021-03-11 Thread Alexis Sarda-Espinosa
Hi everyone, It seems I'm having either the same problem, or a problem similar to the one mentioned here: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-when-restoring-from-savepoint-with-missing-state-amp-POJO-modification-td39945.html I have a POJO class that is

Re: DataStream in batch mode - handling (un)ordered bounded data

2021-03-13 Thread Alexis Sarda-Espinosa
: Dawid Wysakowicz Sent: Friday, March 12, 2021 4:10 PM To: Alexis Sarda-Espinosa; user@flink.apache.org Subject: Re: DataStream in batch mode - handling (un)ordered bounded data Hi Alexis, As of now there is no such feature in the DataStream API. The Batch mode in DataStream API is a new feature

Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Alexis Sarda-Espinosa
Hello, I have a Flink TM configured with taskmanager.memory.managed.size: 1372m. There is a streaming job using RocksDB for checkpoints, so I assume some of this memory will indeed be used. I was looking at the metrics exposed through the REST interface, and I queried some of them:

RE: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Alexis Sarda-Espinosa
:30 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Clarification about Flink's managed memory and metric monitoring Hi Alexis, First of all, I strongly recommend not to look into the JVM metrics. These metrics are fetched directly from JVM and do not well correspond to Flink's

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-08-26 Thread Alexis Sarda-Espinosa
I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.

Keystore format limitations for TLS

2021-08-16 Thread Alexis Sarda-Espinosa
Hello, I am trying to configure TLS communication for a Flink cluster running on Kubernetes. I am currently using the BCFKS format and setting that as default via javax.net.ssl.keystoretype and javax.net.ssl.truststoretype (which are injected in the environment variable FLINK_ENV_JAVA_OPTS).

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-02 Thread Alexis Sarda-Espinosa
; Alexis Sarda-Espinosa ; matth...@ververica.com; user@flink.apache.org Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests Hi Yang, I agree with you, but I think the limit-factor should be greater than or equal to 1, and default to 1 is a better

Re: logback variable substitution in kubernetes

2021-09-01 Thread Alexis Sarda-Espinosa
I'm fairly certain you need the curly braces surrounding the variable, the substitution is not done by the shell, it's just similar syntax (as mentioned in the doc http://logback.qos.ch/manual/configuration.html#variableSubstitution). Chapter 3: Logback configuration -

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-03 Thread Alexis Sarda-Espinosa
. September 2021 08:09 To: Alexis Sarda-Espinosa Cc: spoon_lz ; Denis Cosmin NUTIU ; matth...@ververica.com; user@flink.apache.org Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests Hi Alexis Thanks for your valuable inputs. First, I want to share

Re: TaskManagers OOM'ing for Flink App with very large state only when restoring from checkpoint

2021-09-13 Thread Alexis Sarda-Espinosa
I'm not very knowledgeable when it comes to Linux memory management, but do note that Linux (and by extension Kubernetes) takes disk IO into account when deciding whether a process is using more memory than it's allowed to, see e.g.

RE: Fast serialization for Kotlin data classes

2021-09-16 Thread Alexis Sarda-Espinosa
Someone please correct me if I’m wrong but, until FLINK-16686 [1] is fixed, a class must be a POJO to be used in managed state with RocksDB, right? That’s not to say that the approach with TypeInfoFactory won’t work, just that even then it will mean none of the data classes can be used for

Using POJOs with the table API

2021-08-05 Thread Alexis Sarda-Espinosa
Hi everyone, I had been using the DataSet API until now, but since that's been deprecated, I started looking into the Table API. In my DataSet job I have a lot of POJOs, all of which are even annotated with @TypeInfo and provide the corresponding factories. The Table API documentation talks

RE: Troubleshooting checkpoint timeout

2021-10-21 Thread Alexis Sarda-Espinosa
, Alexis. From: Alexis Sarda-Espinosa Sent: Mittwoch, 20. Oktober 2021 09:43 To: Parag Somani ; Caizhi Weng Cc: Flink ML Subject: RE: Troubleshooting checkpoint timeout Currently the windows are 10 minutes in size with a 1-minute slide time. The approximate 500 event/minute throughput is already

Troubleshooting checkpoint timeout

2021-10-19 Thread Alexis Sarda-Espinosa
Hello everyone, I am doing performance tests for one of our streaming applications and, after increasing the throughput a bit (~500 events per minute), it has started failing because checkpoints cannot be completed within 10 minutes. The Flink cluster is not exactly under my control and is

RE: Troubleshooting checkpoint timeout

2021-10-20 Thread Alexis Sarda-Espinosa
(keySelector) .connect(DataStreamUtils.reinterpretAsKeyedStream(openedEventsTimestamped, keySelector)) .process(...) Could this lead to delays or alignment issues? Regards, Alexis. From: Parag Somani Sent: Mittwoch, 20. Oktober 2021 09:22 To: Caizhi Weng Cc: Alexis Sarda-Espinosa

Re: Kubernetes HA - Reusing storage dir for different clusters

2021-10-08 Thread Alexis Sarda-Espinosa
, Alexis. From: Yang Wang Sent: Friday, October 8, 2021 5:24 AM To: Alexis Sarda-Espinosa Cc: Flink ML Subject: Re: Kubernetes HA - Reusing storage dir for different clusters When the Flink job reached to global terminal state(FAILED, CANCELED, FINISHED), all the HA

Kubernetes HA - Reusing storage dir for different clusters

2021-10-04 Thread Alexis Sarda-Espinosa
Hello, If I deploy a Flink-Kubernetes application with HA, I need to set high-availability.storageDir. If my application is a batch job that may run multiple times with the same configuration, do I need to manually clean up the storage dir between each execution? Regards, Alexis.

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
eam operator has lower parallelism? Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 09:59 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint timeout Hi Alexis, You can read about those metrics in the documentation

Watermark behavior when connecting streams

2021-12-01 Thread Alexis Sarda-Espinosa
Hi everyone, Based on what I know, a single operator with parallelism > 1 checks the watermarks from all its streams and uses the smallest one out of the non-idle streams. My first question is whether watermarks are forwarded as long as a different watermark strategy is not applied downstream?

RE: Using POJOs with the table API

2021-10-26 Thread Alexis Sarda-Espinosa
Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Donnerstag, 5. August 2021 15:49 To: user@flink.apache.org Subject: Using POJOs with the table API Hi everyone, I had been using the DataSet API until now, but since that's been deprecated, I started looking into the Table API. In my DataSet job I

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
checkpoints. Thanks again for all the info. Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 15:51 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint timeout Hi Alexis, > Should I understand these metrics as a prope

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
be that the new checkpoint barriers created after the first restart are behind more data than before it restarted, no? Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 13:35 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint

RE: Watermark behavior when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#consecutive-windowed-operations Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 17:26 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Watermark behavior when connecting streams

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
. Dezember 2021 17:18 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams Even with the TwoInputStreamOperator you can not "halt" the processing. You need to buffer these elements for example in the ListState for later processing. At th

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
ávek Sent: Donnerstag, 2. Dezember 2021 16:59 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams I think this would require using lower level API and implementing a custom `TwoInputStreamOperator`. Then you can hook to `processWatemark{1,2}` metho

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
reaches processElement1, even when considering watermarks. Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 15:45 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams You can not rely on order of the two streams that easily

Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
Hello, I have a use case with event-time processing that ends up with a DAG roughly like this: source -> filter -> keyBy -> watermark -> window -> process -> keyBy_2 -> connect (KeyedCoProcessFunction) -> sink | /

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
n second 17, and my windows are evaluated every minute, so it wasn’t a race condition. Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 14:52 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams Hi Alexis, I'm not sure w

Re: OOM Metaspace after multiple jobs

2021-07-16 Thread Alexis Sarda-Espinosa
per job, it seems) of some of my job's classes (e.g. sources), and their GC roots were the Flink User Class Loader. I haven't figured out why they would remain across different jobs. Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, July 8, 2021 12:51

Re: Interval join operator is not forwarding watermarks correctly

2022-03-15 Thread Alexis Sarda-Espinosa
For completeness, this still happens with Flink 1.14.4 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Friday, March 11, 2022 12:21 AM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: Re: Interval join operator is not forwarding watermarks

Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
Hello, I'm in the process of updating from Flink 1.11.3 to 1.14.3, and it seems the interval join in my pipeline is no longer working. More specifically, I have a sliding window after the interval join, and the window isn't firing. After many tests, I ended up creating a custom operator that

RE: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
I found [1] and [2], which are closed, but could be related? [1] https://issues.apache.org/jira/browse/FLINK-23698 [2] https://issues.apache.org/jira/browse/FLINK-18934 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Donnerstag, 10. März 2022 19:27 To: user@flink.apache.org Subject

Re: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
] https://github.com/asardaes/flink-interval-join-test Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, March 10, 2022 7:47 PM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: RE: Interval join operator is not forwarding watermarks

Max parallelism and reactive mode

2022-03-03 Thread Alexis Sarda-Espinosa
Hi everyone, I have some questions regarding max parallelism and how interacts with deployment modes. The documentation states that max parallelism should be "set on a per-job and per-operator granularity" but doesn't provide more details. Is it possible to have different values of max

RE: Determinism of interval joins

2022-01-29 Thread Alexis Sarda-Espinosa
handling multiple streams that need joins and so on. What do you think? Regards, Alexis. From: Robert Metzger Sent: Freitag, 28. Januar 2022 14:49 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Determinism of interval joins Instead of using `reinterpretAsKeyedStream` can you

RE: Determinism of interval joins

2022-02-02 Thread Alexis Sarda-Espinosa
{ if (mark.getTimestamp() > maxTimestamp2) { maxTimestamp2 = mark.getTimestamp(); } maybeProcessWatermark2(mark, maxTimestamp1, maxTimestamp1); } } Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Samstag, 29. Januar 2022 13:47 To: Robert Metzger Cc: user@

RE: Pojo State Migration - NPE with field deletion

2022-02-02 Thread Alexis Sarda-Espinosa
Hello, Happened to me too, here’s the JIRA ticket: https://issues.apache.org/jira/browse/FLINK-21752 Regards, Alexis. From: bastien dine Sent: Mittwoch, 2. Februar 2022 16:01 To: user Subject: Pojo State Migration - NPE with field deletion Hello, I have some trouble restoring a state

Re: Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
rks should be deterministic, the input file is sorted, and the watermark strategies should essentially behave like the monotonous generator. [1] https://issues.apache.org/jira/browse/FLINK-24466 Regards, Alexis. ____ From: Alexis Sarda-Espinosa Sent: Thursday, January 27, 20

Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
Hi everyone, I'm seeing a lack of determinism in unit tests when using an interval join. I am testing with both Flink 1.11.3 and 1.14.3 and the relevant bits of my pipeline look a bit like this: keySelector1 = ... keySelector2 = ... rightStream = stream1 .flatMap(...) .keyBy(keySelector1)

RE: Buffering when connecting streams

2022-01-18 Thread Alexis Sarda-Espinosa
in a (Process)JoinFunction? The join needs keys, but I don’t know if the resulting stream counts as keyed from the state’s point of view. Regards, Alexis. From: Piotr Nowojski Sent: Montag, 6. Dezember 2021 08:43 To: David Morávek Cc: Alexis Sarda-Espinosa ; user@flink.apache.org Subject: Re

Re: Flink native k8s integration vs. operator

2022-01-20 Thread Alexis Sarda-Espinosa
mean an operator is a bad idea, it's just something that other users might want to keep in mind. Regards, Alexis. From: Robert Metzger Sent: Thursday, January 20, 2022 7:06 PM To: Alexis Sarda-Espinosa Cc: dev ; user Subject: Re: Flink native k8s integration vs

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-12 Thread Alexis Sarda-Espinosa
in the state as well. Regards, Alexis. -Original Message- From: Roman Khachatryan Sent: Dienstag, 12. April 2022 14:06 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API /shared folder contains ke

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-12 Thread Alexis Sarda-Espinosa
rom: Roman Khachatryan Sent: Dienstag, 12. April 2022 12:37 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API Hi Alexis, Thanks a lot for sharing this. I think the program is correct. Although it doesn't t

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-11 Thread Alexis Sarda-Espinosa
oman Khachatryan mailto:ro...@apache.org>> Sent: Friday, April 8, 2022 11:06 PM To: Alexis Sarda-Espinosa mailto:alexis.sarda-espin...@microfocus.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> mailto:user@flink.apache.org>> Subject: Re: RocksDB's state size

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-08 Thread Alexis Sarda-Espinosa
h parallelism=4, this doesn't affect the result, does it? Regards, Alexis. From: Roman Khachatryan Sent: Friday, April 8, 2022 11:06 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processo

RE: Using state processor API to read state defined with a TypeHint

2022-04-08 Thread Alexis Sarda-Espinosa
Message- From: Roman Khachatryan Sent: Freitag, 8. April 2022 11:48 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Using state processor API to read state defined with a TypeHint Hi Alexis, I think your setup is fine, but probably Java type erasure makes Flink consider

Using state processor API to read state defined with a TypeHint

2022-04-08 Thread Alexis Sarda-Espinosa
Hi everyone, I have a ProcessWindowFunction that uses Global window state. It uses MapState with a descriptor defined like this: MapStateDescriptor> msd = new MapStateDescriptor<>( "descriptorName", TypeInformation.of(Long.class), TypeInformation.of(new TypeHint>() {})

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-14 Thread Alexis Sarda-Espinosa
File State; that accounts for almost 1.5GB. I believe that is one of the files RocksDB uses internally, but is that related to managed state used by my functions? Or does that indicate size growth is elsewhere? Regards, Alexis. -Original Message- From: Alexis Sarda-Espinosa Sent

Semantics of purging with global windows

2023-08-30 Thread Alexis Sarda-Espinosa
Hello, According to the javadoc of TriggerResult.PURGE, "All elements in the window are cleared and the window is discarded, without evaluating the window function or emitting any elements." However, I've noticed that using a GlobalWindow (with a custom trigger) followed by an AggregateFunction

Re: Failure to restore from last completed checkpoint

2023-09-08 Thread Alexis Sarda-Espinosa
Hello, Just a shot in the dark here, but could it be related to https://issues.apache.org/jira/browse/FLINK-32241 ? Such failures can cause many exceptions, but I think the ones you've included aren't pointing to the root cause, so I'm not sure if that issue applies to you. Regards, Alexis. On

Re: Updating existing state with state processor API

2023-10-27 Thread Alexis Sarda-Espinosa
> > PS: I’m currently working on this ticket in order to get some glitches > removed: FLINK-26585 <https://issues.apache.org/jira/browse/FLINK-26585> > > > > > > *From:* Alexis Sarda-Espinosa > *Sent:* Thursday, October 26, 2023 4:01 PM > *To:* user &

Updating existing state with state processor API

2023-10-26 Thread Alexis Sarda-Espinosa
Hello, The documentation of the state processor API has some examples to modify an existing savepoint by defining a StateBootstrapTransformation. In all cases, the entrypoint is OperatorTransformation#bootstrapWith, which expects a DataStream. If I pass an empty DataStream to bootstrapWith and

Side outputs documentation

2023-09-22 Thread Alexis Sarda-Espinosa
Hello, very quick question, the documentation for side outputs states that an OutputTag "needs to be an anonymous inner class, so that we can analyze the type" (this is written in a comment in the example). Is this really true? I've seen many examples where it's a static element and it seems to

Re: Side outputs documentation

2023-09-25 Thread Alexis Sarda-Espinosa
behavior. A ticket and PR could be > added to fix the document. What do you think? > > Best, > Yunfeng > > On Fri, Sep 22, 2023 at 4:55 PM Alexis Sarda-Espinosa > wrote: > > > > Hello, > > > > very quick question, the documentation for side outputs state

Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Alexis Sarda-Espinosa
Singh Lilhore < surendralilh...@gmail.com>: > Hi Alexis, > > Could you please check the TaskManager log for any exceptions? > > Thanks > Surendra > > > On Thu, Sep 28, 2023 at 7:06 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> He

Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Alexis Sarda-Espinosa
> Ram > > On Thu, Sep 28, 2023 at 2:06 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> We are using ABFSS for RocksDB's backend as well as the storage dir >> required for Kubernetes HA. In the Azure Portal's monitoring

Re: Side outputs documentation

2023-09-26 Thread Alexis Sarda-Espinosa
dependent from each > other and OutputTag's document is correct from this aspect. > > [1] > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/util/OutputTag.java#L82 > > Best, > Yunfeng > > On Mon, Sep 25, 2023 at 10:57 PM Alexis Sarda-Espi

Continuous errors with Azure ABFSS

2023-09-27 Thread Alexis Sarda-Espinosa
Hello, We are using ABFSS for RocksDB's backend as well as the storage dir required for Kubernetes HA. In the Azure Portal's monitoring insights I see that every single operation contains failing transactions for the GetPathStatus API. Unfortunately I don't see any additional details, but I know

Re: Continuous errors with Azure ABFSS

2023-10-06 Thread Alexis Sarda-Espinosa
job , is it able to > safely use the checkpoint and get back to the checkpointed state? > > Regards > Ram, > > On Thu, Sep 28, 2023 at 4:46 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hi Surendra, >> >> there are no exceptions in the logs,

Re: Question about serialization of java.util classes

2023-08-15 Thread Alexis Sarda-Espinosa
r classes this but unfortunately I have no > idea how to use these classes or how they might be able to help me. This is > all very new to me and I honestly can't wrap my head around Flink's type > information system. > > Best regards, > Saleh. > > On 14 Aug 2023, at 4:05 PM, Alex

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-22 Thread Alexis Sarda-Espinosa
b run and wait? I've been reading through RocksDB documentation as well, but that might not be enough because I don't know how Flink handles its own framework state internally. Regards, Alexis. From: David Anderson Sent: Friday, April 22, 2022 9:57 AM To: Alexi

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-21 Thread Alexis Sarda-Espinosa
at 10:52 PM Alexis Sarda-Espinosa wrote: > > I can look into RocksDB metrics, I need to configure Prometheus at some point > anyway. However, going back to the original question, is there no way to gain > more insight into this with the state processor API? You've mentioned > p

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-19 Thread Alexis Sarda-Espinosa
ructors, i.e. before they are serialized to the TM, but the descriptors are Serializable, so I imagine that's not relevant. [1] https://stackoverflow.com/a/50510054/5793905 Regards, Alexis. -Original Message- From: Roman Khachatryan Sent: Dienstag, 19. April 2022 11:48 To: Alexis Sard

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-19 Thread Alexis Sarda-Espinosa
ds, Alexis. From: Roman Khachatryan Sent: Tuesday, April 19, 2022 5:51 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API > I assume that when you say "new states",

RE: RocksDB state not cleaned up

2022-04-08 Thread Alexis Sarda-Espinosa
May I ask if anyone tested RocksDB’s periodic compaction in the meantime? And if yes, if it helped with this case. Regards, Alexis. From: tao xiao Sent: Samstag, 18. September 2021 05:01 To: David Morávek Cc: Yun Tang ; user Subject: Re: RocksDB state not cleaned up Thanks for the feedback!

RocksDB's state size discrepancy with what's seen with state processor API

2022-04-08 Thread Alexis Sarda-Espinosa
Hello, I have a streaming job running on Flink 1.14.4 that uses managed state with RocksDB with incremental checkpoints as backend. I've been monitoring a dev environment that has been running for the last week and I noticed that state size and end-to-end duration have been increasing

RE: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Alexis Sarda-Espinosa
Hi Dawid, Thanks for the update, I also managed to work around it by adding another watermark assignment operator between the join and the window. I’ll have to see if it’s possible to assign watermarks at the source, but even if it is, I worry that the different partitions created by all my

Re: Continuous errors with Azure ABFSS

2023-11-10 Thread Alexis Sarda-Espinosa
. Seems like normal operations, so it's just unfortunate the Azure API exposes that as continuous ClientOtherError metrics. Regards, Alexis. Am Fr., 6. Okt. 2023 um 08:10 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Yes, that also works correctly, at least based on

RE: Schema Evolution of POJOs fails on Field Removal

2022-05-18 Thread Alexis Sarda-Espinosa
Hi David, Please refer to https://issues.apache.org/jira/browse/FLINK-21752 Regards, Alexis. -Original Message- From: David Jost Sent: Mittwoch, 18. Mai 2022 15:07 To: user@flink.apache.org Subject: Schema Evolution of POJOs fails on Field Removal Hi, we currently have an issue,

Re: Making Kafka source respect offset changed externally

2022-07-20 Thread Alexis Sarda-Espinosa
\ --command-config kafka-preprod.properties \ --reset-offsets --to-datetime '2022-07-20T00:01:00.000' \ --execute Regards, Alexis. Am Do., 14. Juli 2022 um 22:56 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hi Yaroslav, > > The test I did was just using earl

Re: Making Kafka source respect offset changed externally

2022-07-21 Thread Alexis Sarda-Espinosa
n the job is initially started > from a clear slate. > Once checkpoints are involved it only relies on offsets stored in the > state. > > On 20/07/2022 14:51, Alexis Sarda-Espinosa wrote: > > Hello again, > > I just performed a test > using OffsetsInitializer.committedOffse

Making Kafka source respect offset changed externally

2022-07-14 Thread Alexis Sarda-Espinosa
Hello, Regarding the new Kafka source (configure with a consumer group), I found out that if I manually change the group's offset with Kafka's admin API independently of Flink (while the job is running), the Flink source will ignore that and reset it to whatever it stored internally. Is there any

Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-13 Thread Alexis Sarda-Espinosa
Hello, I have a job running with Flink 1.15.0 that consumes from Kafka with the new KafkaSource API, setting a group ID explicitly and specifying OffsetsInitializer.earliest() as a starting offset. Today I restarted the job ignoring both savepoint and checkpoint, and the consumer started reading

Re: Making Kafka source respect offset changed externally

2022-07-14 Thread Alexis Sarda-Espinosa
In this case, it should get the offsets from Kafka and > not the state. > > On Thu, Jul 14, 2022 at 11:18 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> Regarding the new Kafka source (configure with a consumer group), I found

Re: Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-18 Thread Alexis Sarda-Espinosa
tartFromEarliest()** / **setStartFromLatest()**: Start from the >> earliest / latest record. Under these modes, committed offsets in Kafka >> will be ignored and not used as starting positions.* >> >> On 13/07/2022 18:53, Alexis Sarda-Espinosa wrote: >> >> Hello,

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-05-02 Thread Alexis Sarda-Espinosa
Ok Regards, Alexis. From: Peter Brucia Sent: Freitag, 22. April 2022 15:31 To: Alexis Sarda-Espinosa Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API No Sent from my iPhone

Serialization in window contents and network buffers

2022-09-27 Thread Alexis Sarda-Espinosa
Hi everyone, I know the low level details of this are likely internal, but at a high level we can say that operators usually have some state associated with them. Particularly for error handling and job restarts, I imagine windows must persist state, and operators in general probably persist

Broadcast state and job restarts

2022-10-27 Thread Alexis Sarda-Espinosa
Hello, The documentation for broadcast state specifies that it is always kept in memory. My assumptions based on this statement are: 1. If a job restarts in the same Flink cluster (i.e. using a restart strategy), the tasks' attempt number increases and the broadcast state is restored since it's

Window state size with global window and custom trigger

2022-10-07 Thread Alexis Sarda-Espinosa
Hello, I found an SO thread that clarifies some details of window state size [1]. I would just like to confirm that this also applies when using a global window with a custom trigger. The reason I ask is that the TriggerResult API is meant to cover all supported scenarios, so FIRE vs

Re: Window state size with global window and custom trigger

2022-10-10 Thread Alexis Sarda-Espinosa
. > Hope these can help you. > > > On Sat, Oct 8, 2022 at 4:49 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> I found an SO thread that clarifies some details of window state size >> [1]. I would just like to confirm

Partial broadcast/keyed connected streams

2022-10-11 Thread Alexis Sarda-Espinosa
Hi everyone, I am currently thinking about a use case for a streaming job and, while I'm fairly certain it cannot be done with the APIs that Flink currently provides, I figured I'd put it out there in case other users think something like this would be useful to a wider audience. The current

Re: Partial broadcast/keyed connected streams

2022-10-11 Thread Alexis Sarda-Espinosa
as normal keyedstream. > > > > Best Regards! > > 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 > > > > *发件人: *Alexis Sarda-Espinosa > *发送时间: *2022年10月12日 4:11 > *收件人: *user > *主题: *Partial broadcast/keyed connected streams > > > >

Broadcast state restoration for BroadcastProcessFunction

2022-10-14 Thread Alexis Sarda-Espinosa
Hello, I wrote a test for a broadcast function to check how it handles broadcast state during retries [1] (the gist only shows a subset of the test in Kotlin, but it's hopefully understandable). The test will not pass unless my function also implements CheckpointedFunction, although those

Re: Cleanup for high-availability.storageDir

2022-12-06 Thread Alexis Sarda-Espinosa
Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hi Gyula, > > that certainly helps, but to set up automatic cleanup (in my case, of > azure blob storage), the ideal option would be to be able to set a simple > policy that deletes blobs that haven't been updated in some t

Re: Cleanup for high-availability.storageDir

2022-12-08 Thread Alexis Sarda-Espinosa
> > @Gyula: Please correct me if I misunderstood something here. > > I hope that helped. > Matthias > > On Wed, Dec 7, 2022 at 4:19 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> I see, thanks for the details. >> >> I do mea

Re: Deterministic co-results

2022-12-08 Thread Alexis Sarda-Espinosa
Hi Salva, Just to give you further thoughts from another user, I think the "temporal join" semantics are very critical in this use case, and what you implement for that may not easily generalize to other cases. Because of that, I'm not sure if you can really define best practices that apply in

Re: Clarification on checkpoint cleanup with RETAIN_ON_CANCELLATION

2022-12-12 Thread Alexis Sarda-Espinosa
nt deletion? > In this case, the checkpoint will be cleaned and not retained and the > savepoint will remain. So you still could use savepoint to restore. > > On Mon, Dec 5, 2022 at 6:33 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >

  1   2   >