Re: How to enable RocksDB native metrics?

2024-04-07 Thread Marco Villalobos
Hi Lei, Have you tried enabling these Flink configuration properties?Configurationnightlies.apache.orgSent from my iPhoneOn Apr 7, 2024, at 6:03 PM, Lei Wang wrote:I  want to enable it only for specified jobs, how can I specify the   configurations on  cmd line when submitting a job?Thanks,LeiOn

Re: Flink cache support

2024-03-28 Thread Marco Villalobos
Zhanghao is correct. You can use what is called "keyed state". It's like a cache. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/state/ > On Mar 28, 2024, at 7:48 PM, Zhanghao Chen wrote: > > Hi, > > You can maintain a cache manually in

Re: need flink support framework for dependency injection

2024-03-26 Thread Marco Villalobos
Hi Ganesh, I disagree. I don’t think Flink needs a dependency injection framework. I have implemented many complex jobs without one. Can you please articulate why you think it needs a dependency injection framework, along with some use cases that will show its benefit? I would rather see more

Re: Request for sample codes for Dockerizing Java application

2024-02-13 Thread Marco Villalobos
Hi Nida, I request that you read https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/ in order to learn how to Dockerize your Flink job. You're Welcome & Regard Marco A. Villalobos > On Feb 13, 2024, at 12:00 AM, Fidea Lidea wrote: > > Hi

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-07 Thread Marco Villalobos
Hi Nida, You can find sample code for using Kafka here: https://kafka.apache.org/documentation/ You can find sample code for using Flink here: https://nightlies.apache.org/flink/flink-docs-stable/ You can find sample code for using Flink with Kafka here:

Re: [DISCUSS] Status of Statefun Project

2023-06-06 Thread Marco Villalobos
a Flink Streaming job and >>>>> micro-services. >>>> >>>> This is essentially how I use it as well, and I would also be sad to see >>>> it sunsetted. It works well; I don't know that there is a lot of new >>>> development required,

Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-29 Thread Marco Villalobos
. 3. You can also run the stream in batch mode. Remember, a stream does not end (unless it is run in batch mode). > On Apr 29, 2023, at 9:11 AM, Marco Villalobos > wrote: > > Hi Luke, > > A batch has a beginning and an end. Although a stream has a beginning, it has &

Re: [DISCUSS] Status of Statefun Project

2023-04-17 Thread Marco Villalobos
I am currently using Stateful Functions in my application. I use Apache Flink for stream processing, and StateFun as a hand-off point for the rest of the application. It serves well as a bridge between a Flink Streaming job and micro-services. I would be disappointed if StateFun was sunsetted.

Re: Is it practicle to enrich a Flink DataStream in middle operator with Flink Stateful Functions?

2022-09-28 Thread Marco Villalobos
Did this list receive my email? I’m only asking because my last few questions have gone unanswered and maybe the list server is blocking me. Anybody, please let me know. > On Sep 26, 2022, at 8:41 PM, Marco Villalobos > wrote: > > I indeed see the value of Flink Statef

Is it practicle to enrich a Flink DataStream in middle operator with Flink Stateful Functions?

2022-09-26 Thread Marco Villalobos
I indeed see the value of Flink Stateful Functions. However, if I already have a Flink Job, is it possible to enrich a datastream with it? For example, like this: I really don't see how it would fit such a purpose. But, I do see that it would be very at the end of a Flink Job, not

Re: Enrichment of stream from another stream

2022-09-17 Thread Marco Villalobos
I might need more details, but conceptually, streams can be thought of as never ending tables and our code as functions applied to them. JOIN is a concept supported in the SQL API and DataStream API. However, the SQL API is more succinct (unlike my writing ;). So, how about the "fast stream"

Re: What is the recommended solution for this error of too many files open during a checkpoint?

2022-09-03 Thread Marco Villalobos
Thus, Flink 1.12.2 was using the version of RocksDB with a known bug. > On Sep 2, 2022, at 10:49 AM, Marco Villalobos > wrote: > > What is the recommended solution for this error of too many files open during > a checkpoint? > > 2022-09-02 10:04:56 java.io.IOExcepti

What is the recommended solution for this error of too many files open during a checkpoint?

2022-09-02 Thread Marco Villalobos
What is the recommended solution for this error of too many files open during a checkpoint? 2022-09-02 10:04:56 java.io.IOException: Could not perform checkpoint 119366 for operator tag enrichment (3/4)#104. at

Re: Is this a Batch SQL Bug?

2022-08-18 Thread Marco Villalobos
at is reused, but > DeserializationSchemaAdapter#Reader only do shallow copy of the produced > data, so that the finnal result will always be the last row value. > > Could you please help create a jira to track it? > > Best regards, > Yuxia > > ----- 原始邮件 - > 发件人:

Is this a Batch SQL Bug?

2022-08-17 Thread Marco Villalobos
Given this program: ```java package mvillalobos.bug; import org.apache.flink.api.common.RuntimeExecutionMode; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.table.api.TableResult; import

without DISTINCT unique lines show up many times in FLINK SQL

2022-08-16 Thread Marco Villalobos
Hello everybody, When I perform this simple set of queries, a unique line from the source file shows up many times. I have verified many times that a unique line in the source shows up as much as 100 times in the select statement. Is this the correct behavior for Flink 1.15.1? FYI, it does

Flink SQL and tumble window by size (number of rows)

2022-08-02 Thread Marco Villalobos
Is it possible in Flink SQL to tumble a window by row size instead of time? Let's say that I want a window for every 1 rows for example using the Flink SQL API. is that possible? I can't find any documentation on how to do that, and I don't know if it is supported.

how does a slow FaaS affect the Flink StateFun cluster?

2022-05-25 Thread Marco Villalobos
If the performance of a stateful function (FaaS) is very slow, how does this impact performance on the Flink StateFun Cluster? I am trying to figure out what is too slow for a FaaS. I expect the Flink StateFun Cluster to receive about 2000 events per a minute, but some, not all FaaS might

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
You're right. I didn't notice that the ports were different. That was very subtle. Thank you for pointing this out to me. I was stuck on it for quite a while. > On Apr 16, 2022, at 6:17 PM, Marco Villalobos > wrote: > > I'm sorry, I accidentally hit send before I was finished.

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
022, at 6:12 PM, Marco Villalobos > wrote: > > IK > > If what you're saying is true, then why do most of the examples in the > flink-statefun-playground example use HTTP as an alternative entry point? > > Here is the greeter example: > > https://github.com/apache/fl

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
d-universe/part-ii-building-next-gen-event-driven-application-powered-by-stateful-functions-a3139f299736> > > Best, > Tymur Yarosh > 14 квіт. 2022 р., 03:51 +0300, Marco Villalobos , > писав(-ла): >> I'm trying to write very simple echo app with Stateful Function to prove it &

Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-13 Thread Marco Villalobos
I'm trying to write very simple echo app with Stateful Function to prove it as a technology for some of my use cases. I have not been able to accept different content types though. Here is an example of my code for a simple echo function: My Echo stateful function class. package

Re: Trouble sinking to Kafka

2022-02-23 Thread Marco Villalobos
I fixed this, but I'm not 100% sure why. Here is my theory: My checkpoint interval is one minute, and the minimum pause interval is also one minute. My transaction timeout time is also one minute. I think the checkpoint causes Flink to hold the transaction open for one minute, and thus it times

Trouble sinking to Kafka

2022-02-22 Thread Marco Villalobos
I keep on receiving this exception during the execution of a simple job that receives time series data via Kafka, transforms it into avro format, and then sends into a Kafka topic consumed by druid. Any advise would be appreciated as to how to resolve this type of error. I'm using Apache Kafka

Flink 1.12.1 and KafkaSource

2022-02-02 Thread Marco Villalobos
According to the Flink 1.12 documentation ( https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kafka.html), it states to use FlinkKafkaSource when consuming from Kafka. However, I noticed that the newer API uses KafkaSource, which uses KafkaSourceBuilder and

Re: Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
quot; */); > b.keyBy(...).window(TumblingEventTimeWindows.of(Time.minutes(5))); > > If needed, you can then union all of the separate results streams together. > a.union(b, c ...); > > There is no need for separate Flink deployments to create such a pipeline. > > Best, > A

Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
Hi, I am working with time series data in the form of (timestamp, name, value), and an event time that is the timestamp when the data was published onto kafka, and I have a business requirement in which each stream element becomes enriched, and then processing requires different time series names

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-12-01 Thread Marco Villalobos
be helpful for you to understand the behavior. > > > Marco Villalobos <mailto:mvillalo...@kineteque.com>> 于2021年12月1日周三 上午3:43写道: > Thanks! > > However, I noticed that KafkaSourceOptions.COMMIT_OFFSETS_ON_CHECKPOINT does > not exist in Flink 1.12. > > Is that

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-30 Thread Marco Villalobos
ther additional properties could be found here : > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#additional-properties > > Marco Villalobos 于2021年11月30日周二 上午11:08写道: > >> Thank you for the information. That still does not answer my

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-29 Thread Marco Villalobos
]. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer > > Marco Villalobos 于2021年11月30日周二 上午7:12写道: > >> Hi everybody, >> >> I am using Flink 1.12 and migrating my code from using FlinkKaf

How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-29 Thread Marco Villalobos
Hi everybody, I am using Flink 1.12 and migrating my code from using FlinkKafkaConsumer to using the KafkaSourceBuilder. FlinkKafkaConsumer has the method /** > * Specifies whether or not the consumer should commit offsets back to > Kafka on checkpoints. > * This setting will only have effect

How do I configure commit offsets on checkpoint in new KafkaSource api?

2021-11-19 Thread Marco Villalobos
The FlinkKafkaConsumer that will be deprecated has the method "setCommitOffsetsOnCheckpoints(boolan)" method. However, that functionality is not the new KafkaSource class. How is this behavior / functionality configured in the new API? -Marco A. Villalobos

Re: to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
Dario Heinisch wrote: > > Union creates a new stream containing all elements of the unioned > > streams: > > > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/overview/#union > > > > > > On 05.11.21 14:25, Marco Villalobo

to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
Can two different streams flow to the same operator (an operator with the same name, uid, and implementation) and then share keyed state or will that require joining the streams first?

Re: Problem with Flink job and Kafka.

2021-10-20 Thread Marco Villalobos
ka connector > version, Kafka broker version, and full exception stack? Also it will be > helpful to paste part of your code (on DataStream API) or SQL (on Table & > SQL API). > > -- > Best Regards, > > Qingsheng Ren > Email: renqs...@gmail.com > On Oct 19, 2021, 9:28

Problem with Flink job and Kafka.

2021-10-18 Thread Marco Villalobos
I have the simplest Flink job that simply deques off of a kafka topic and writes to another kafka topic, but with headers, and manually copying the event time into the kafka sink. It works as intended, but sometimes I am getting this error:

How do I verify data is written to a JDBC sink?

2021-09-26 Thread Marco Villalobos
In my Flink Job, I am using event time to process time-series data. Due to our business requirements, I need to verify that a specific subset of data written to a JDBC sink has been written before I send an activemq message to another component. My job flows like this: 1. Kafka Source 2. Split

could not stop with a Savepoint.

2021-09-26 Thread Marco Villalobos
Today, I kept on receiving a timeout exception when stopping my job with a savepoint. This happened with Flink version 1.12.2 running in EMR. I had to use the deprecated cancel with savepoint feature instead. In fact, stopping with a savepoint, creating a savepoint, and cancelling with a

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Marco Villalobos
t checkpoint barrier. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint > > Best, > JING ZHANG > > Marco Villalobos 于2021年9月24日周五 下午12:54写道: > >> Something strange happe

stream processing savepoints and watermarks question

2021-09-23 Thread Marco Villalobos
Something strange happened today. When we tried to shutdown a job with a savepoint, the watermarks became equal to 2^63 - 1. This caused timers to fire indefinitely and crash downstream systems with overloaded untrue data. We are using event time processing with Kafka as our source. It seems

Re: What is the event time of an element produced in a timer?

2021-09-07 Thread Marco Villalobos
ter/docs/dev/table/sql/queries/window-agg/> > > Best regards, > JING ZHANG > > Marco Villalobos <mailto:mvillalo...@kineteque.com>> 于2021年9月7日周二 下午2:24写道: > If an event time timer is registered to fire exactly every 15 minutes, > starting from exactly at the top o

Re: What is the event time of an element produced in a timer?

2021-09-07 Thread Marco Villalobos
-master/docs/dev/datastream/operators/windows/ > [2] > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/ > > Best regards, > JING ZHANG > > Marco Villalobos 于2021年9月7日周二 下午2:24写道: > >> If an event time timer is registered

What is the event time of an element produced in a timer?

2021-09-07 Thread Marco Villalobos
If an event time timer is registered to fire exactly every 15 minutes, starting from exactly at the top of the hour (exactly 00:00, 00:15, 00:30, 00:45 for example), and within that timer it produces an element in the stream, what event time will that element have, and what window will it belong

aggregation, triggers, and no activity

2021-08-20 Thread Marco Villalobos
I use event time,with Kafka as my source. The system that I am developing requires data to be aggregated every 15 minutes, thus I am using a Tumbling Event Time window. However, my system also is required to take action every 15 minutes even if there is activity. I need the elements collected in

State Processor API and existing state

2021-06-28 Thread Marco Villalobos
Let's say that a job has operators with UIDs: W, X, Y, and Z, and uses RocksDB as a backend with checkpoint data URI s3://checkpoints" Then I stop the job with a savepoint at s3://savepoint-1. I assume that all the data within the checkpoint are stored within the given Savepoint. Is that

Re: Please advise bootstrapping large state

2021-06-18 Thread Marco Villalobos
scala:101) > > > > On Thu, Jun 17, 2021 at 12:51 AM Timo Walther > <mailto:twal...@apache.org>> wrote: > > > > Hi Marco, > > > > which operations do you want to execute in the bootstrap pipeline? > > > > Maybe you don't need t

Re: Please advise bootstrapping large state

2021-06-17 Thread Marco Villalobos
nd old planner. At least this would > simplify the friction by going through another API layer. > > The JDBC connector can be directly be used in DataSet API as well. > > Regards, > Timo > > > > On 17.06.21 07:33, Marco Villalobos wrote: > > Thank you very much

Re: Please advise bootstrapping large state

2021-06-16 Thread Marco Villalobos
ta using the DataSet API (could be less > convenient than the Flink SQL JDBC connector). > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/jdbc/ > > > On Wed, Jun 16, 2021 at 6:50 AM Marco Villalobos < > mvillalo...@kineteque.com> wrote: &g

Please advise bootstrapping large state

2021-06-15 Thread Marco Villalobos
I must bootstrap state from postgres (approximately 200 GB of data) and I notice that the state processor API requires the DataSet API in order to bootstrap state for the Stream API. I wish there was a way to use the SQL API and use a partitioned scan, but I don't know if that is even possible

Re: DataStream API in Batch Execution mode

2021-06-09 Thread Marco Villalobos
d opening an issue for lacking the document? > > [1] > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/test/java/org/apache/flink/connector/file/src/FileSourceTextLinesITCase.java > Best, > Guowei > > > On Tue, Jun 8, 2021 at 5:59 AM Ma

DataStream API in Batch Execution mode

2021-06-07 Thread Marco Villalobos
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode? I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-05 Thread Marco Villalobos
in this operator, it only goes to one task manager node. I need state, but I don't really need it keyed. On Sat, Jun 5, 2021 at 4:56 AM Marco Villalobos wrote: > Does that work in the DataStream API in Batch Execution Mode? > > On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG wrote: > >>

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-05 Thread Marco Villalobos
> Best regards, > JING ZHANG > > > Marco Villalobos 于2021年6月5日周六 下午1:55写道: > >> Is it possible to use OperatorState, when NOT implementing a source or >> sink function? >> >> If yes, then how? >> >

Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-04 Thread Marco Villalobos
Is it possible to use OperatorState, when NOT implementing a source or sink function? If yes, then how?

DataStream API in Batch Mode job is timing out, please advise on how to adjust the parameters.

2021-05-24 Thread Marco Villalobos
I am running with one job manager and three task managers. Each task manager is receiving at most 8 gb of data, but the job is timing out. What parameters must I adjust? Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd) switched from SCHEDULED to FAILED on [unassigned

How do you debug a DataStream flat join on common window?

2021-05-24 Thread Marco Villalobos
Hi, Stream one has one element. Stream two has 2 elements. Both streams derive from side-outputs. I am using the DataStream API in batch execution mode. I want to join them on a common key and window. I am certain the keys match, but the flat join does not seem to be working. I deduce that

Re: Does Flink 1.12.1 DataStream API batch execution mode support side outputs?

2021-05-23 Thread Marco Villalobos
I found the problem. I tried to sign timestamps to the operator (I don't know why), and when I did that, because I used the Flink API fluently, I was no longer referencing the operator that contained the side-outputs. Disregard my question. On Sat, May 22, 2021 at 9:28 PM Marco Villalobos

Does Flink 1.12.1 DataStream API batch execution mode support side outputs?

2021-05-22 Thread Marco Villalobos
I have been struggling for two days with an issue using the DataStream API in Batch Execution mode. It seems as though my side-output has no elements available to downstream operators. However, I am certain that the downstream operator received events. I logged the side-output element just

Is it possible to leverage the sort order in DataStream Batch Execution Mode?

2021-05-21 Thread Marco Villalobos
Hello. I am using Flink 1.12.1 in EMR. I am processing historical time-series data with the DataStream API in Batch execution mode. I must average time series data into a fifteen minute interval and forward fill missing values. For example, this input: name, timestamp, value

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
> On May 19, 2021, at 7:26 AM, Yun Gao wrote: > > Hi Marco, > > For the remaining issues, > > 1. For the aggregation, the 500GB of files are not required to be fit into > memory. > Rough speaking for the keyed().window().reduce(), the input records would be > first > sort according to

Re: DataStream Batch Execution Mode and large files.

2021-05-19 Thread Marco Villalobos
figured by io.tmp.dirs[2]. > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/batch/blocking_shuffle/ > [2] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#io-tmp-dirs > > --Original Mail -- &g

Re: DataStream API Batch Execution Mode restarting...

2021-05-19 Thread Marco Villalobos
e job to execute it from the scratch. > > Best, > Yun > > > > ------Original Mail -- > *Sender:*Marco Villalobos > *Send Date:*Wed May 19 11:27:37 2021 > *Recipients:*user > *Subject:*DataStream API Batch Execution Mode restarting...

Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many files. The instance that I am running on only has 50GB of EBS storage. The nature of this data is time

DataStream API Batch Execution Mode restarting...

2021-05-18 Thread Marco Villalobos
I have a DataStream running in Batch Execution mode within YARN on EMR. My job failed an hour into the job two times in a row because the task manager heartbeat timed out. Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation. thank you.

DataStream Batch Execution Mode and large files.

2021-05-18 Thread Marco Villalobos
Hi, I am using the DataStream API in Batch Execution Mode, and my "source" is an s3 Buckets with about 500 GB of data spread across many files. Where does Flink stored the results of processed / produced data between tasks? There is no way that 500GB will fit in memory. So I am very curious

How can I demarcate which event elements are the boundaries of a window?

2021-04-19 Thread Marco Villalobos
I have a tumbling window that aggregates into a process window function. Downstream there is a keyed process function. [window aggregate into process function] -> keyed process function I am not quite sure how the keyed process knows which elements are at the boundary of the window. Is there a

Re: questions regarding stateful functions

2021-04-07 Thread Marco Villalobos
eam to a stateful function can > emit a message that > in turn will be routed to that function using the data stream integration. > > > On Wed, Apr 7, 2021 at 7:16 PM Marco Villalobos > wrote: > >> Thank you for the clarification. >> >> BUTthere was one qu

Re: questions regarding stateful functions

2021-04-07 Thread Marco Villalobos
teFun within a DataStream application [1] > > [1] > https://ci.apache.org/projects/flink/flink-statefun-docs-master/docs/sdk/flink-datastream/ > > On Wed, Apr 7, 2021 at 2:49 AM Marco Villalobos > wrote: > >> Upon reading about stateful functions, it seems as though first, a

questions regarding stateful functions

2021-04-06 Thread Marco Villalobos
Upon reading about stateful functions, it seems as though first, a data stream has to flow to an event ingress. Then, the stateful functions will perform computations via whatever functionality it provides. Finally, the results of said computations will flow to the event egress which will be yet

questions about broadcasts

2021-03-05 Thread Marco Villalobos
Is it possible for an operator to receive two different kinds of broadcasts? Is it possible for an operator to receive two different types of streams and a broadcast? For example, I know there is a KeyedCoProcessFunction, but is there a version of that which can also receive broadcasts?

clarification on backpressure metrics in Apache Flink Dashboard

2021-02-10 Thread Marco Villalobos
given: [source] -> [operator 1] -> [operator 2] -> [sink]. If within the dashboard, operator 1 shows that it has backpressure, does that mean I need to improve the performance of operator 2 in order to alleviate backpressure upon operator 1?

What is the difference between RuntimeContext state and ProcessWindowFunction.Context state when using a ProcessWindowFunction?

2021-02-09 Thread Marco Villalobos
Hi, I am having a difficult time distinguishing the difference between RuntimeContext state and global state when using a ProcessWindowFunction. A ProcessWindowFunction has three access different kinds of state. 1. RuntimeContext state. 2. ProcessWindowFunction.Context global state 3.

Re: threading and distribution

2021-02-05 Thread Marco Villalobos
. On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos wrote: > as data flows from a source through a pipeline of operators and finally > sinks, is there a means to control how many threads are used within an > operator, and how an operator is distributed across the network? > > Wher

hybrid state backends

2021-02-05 Thread Marco Villalobos
Is it possible to use different statebackends for different operators? There are certain situations where I want the state to reside completely in memory, and other situations where I want it stored in rocksdb.

threading and distribution

2021-02-05 Thread Marco Villalobos
as data flows from a source through a pipeline of operators and finally sinks, is there a means to control how many threads are used within an operator, and how an operator is distributed across the network? Where can I read up on these types of details specifically?

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-03 Thread Marco Villalobos
Oh, I found the solution. I simply need to not use TRACE log level for Flink. On Wed, Feb 3, 2021 at 7:07 PM Marco Villalobos wrote: > > Please advise me. I don't know what I am doing wrong. > > After I added the blink table planner to my my dependency management: &g

org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-03 Thread Marco Villalobos
Please advise me. I don't know what I am doing wrong. After I added the blink table planner to my my dependency management: dependency "org.apache.flink:flink-table-planner-blink_${scalaVersion}:${flinkVersion}" and added it as a dependency: implementation

Re: Question regarding a possible use case for Iterative Streams.

2021-02-03 Thread Marco Villalobos
Hi Gorden, Thank you very much for the detailed response. I considered using the state-state processor API, however, our enrichment requirements make the state-processor API a bit inconvenient. 1. if an element from the stream matches a record in the database then it can remain in the cache a

Question a possible use can for Iterative Streams.

2021-02-02 Thread Marco Villalobos
Hi everybody, I am brainstorming how it might be possible to perform database enrichment with the DataStream API, use keyed state for caching, and also utilize Async IO. Since AsyncIO does not support keyed state, then is it possible to use an Iterative Stream that uses keyed state for caching

Re: question on checkpointing

2021-02-01 Thread Marco Villalobos
heckpoint timeout. > > 2) What kind of effects are you worried about? > > On 1/28/2021 8:05 PM, Marco Villalobos wrote: > > Is it possible that checkpointing times out due to an operator taking > > too long? > > > > Also, does windowing affect the checkpoint barriers? > > >

Re: Flink and Amazon EMR

2021-02-01 Thread Marco Villalobos
king. You would need to analyse what's working slower than > expected. Checkpointing times? (Async duration? Sync duration? Start > delay/back pressure?) Throughput? Recovery/startup? Are you being rate > limited by Amazon? > > Piotrek > > czw., 28 sty 2021 o 03:46 Marco Villa

Re: flink checkpoints adjustment strategy

2021-01-29 Thread Marco Villalobos
y ...), and async duration(too much io/network process ...), etc. > > Best, > Congxian > > > Marco Villalobos 于2021年1月29日周五 上午7:19写道: > >> I am kind of stuck in determining how large a checkpoint interval should >> be. >> >> Is there a guide for that

flink checkpoints adjustment strategy

2021-01-28 Thread Marco Villalobos
I am kind of stuck in determining how large a checkpoint interval should be. Is there a guide for that? If a timeout time is 10 minutes, we time out, what is a good strategy for adjusting that? Where is a good starting point for a checkpoint? How shall they be adjusted? We often see checkpoint

question on checkpointing

2021-01-28 Thread Marco Villalobos
Is it possible that checkpointing times out due to an operator taking too long? Also, does windowing affect the checkpoint barriers?

Re: presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
-style-access: true > s3.path.style.access: true (only one of them is needed, but I don't know > which, so please try out) > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/s3.html#configure-access-credentials > > On Thu, Jan 28, 2021 at 4

presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
. Any advice would be appreciated. -Marco Villalobos

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Marco Villalobos
t; double-checked and saw that the buffer pool is only released on >> cancellation or shutdown. >> >> So I'm assuming that there is another issue (e.g., Kafka cluster not >> reachable) and there is a race condition while shutting down. It seems like >> the buffer pool ex

Flink and Amazon EMR

2021-01-27 Thread Marco Villalobos
Just curious, has anybody had success with Amazon EMR with RocksDB and checkpointing in S3? That's the configuration I am trying to setup, but my system is running more slowly than expected.

stopping with save points

2021-01-27 Thread Marco Villalobos
When I try to stop with a savepoint, I usually get the error below. I have not been able to create a single save point. Please advise. I am using Flink 1.11.0 Draining job "ed51084378323a7d9fb1c4c97c2657df" with a savepoint. The

Re: windows and triggers

2021-01-26 Thread Marco Villalobos
based on the > progress of time but only by count. Right now, you have to write your own > custom trigger if you want to react based on both time and count. > On Tue, Jan 26, 2021 at 10:44 AM Marco Villalobos wrote: > I wrote this simple test: > > .window(TumblingProcessingTimeW

windows and triggers

2021-01-26 Thread Marco Villalobos
I wrote this simple test: .window(TumblingProcessingTimeWindows.of(Time.minutes(1))) .trigger(PurgingTrigger.of(CountTrigger.of(5))) Thinking that if I send 2 elements of data, it would collect them after a minute. But that doesn't seem to be happening. Is my understanding of how windows and

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-26 Thread Marco Villalobos
ve the option to upgrade to 1.11.3? > > Best, > > Arvid > > On Tue, Jan 26, 2021 at 6:35 AM Marco Villalobos < > mvillalo...@kineteque.com> wrote: > >> Hi. What causes a buffer pool exception? How can I mitigate it? It is >> causing u

Re: memory tuning

2021-01-26 Thread Marco Villalobos
26, 2021 at 12:34 AM Matthias Pohl wrote: > Hi Marco, > Could you share the preconfiguration logs? They are printed in the > beginning of the taskmanagers' logs and contain a summary of the used > memory configuration? > > Best, > Matthias > > On Tue, Jan 26, 2021

memory tuning

2021-01-25 Thread Marco Villalobos
I have a flink job that collects and aggregates time-series data from many devices into one object (let's call that X) that was collected by a window. X contains time-series data, so it contains many String, Instant, a HashMap, and another type (Let's call Y) objects. When I collect 4 X

What causes a buffer pool exception? How can I mitigate it?

2021-01-25 Thread Marco Villalobos
Hi. What causes a buffer pool exception? How can I mitigate it? It is causing us plenty of problems right now. 2021-01-26 04:14:33,041 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets [] - Subtask 1 received completion notification for checkpoint with id=4. 2021-01-26

JDBC connection pools

2021-01-21 Thread Marco Villalobos
Currently, my jobs that require JDBC initialize a connection in the open method directly via JDBC driver. 1. What are the established best practices for this? 2. Is it better to use a connection pool that can validate the connection and reconnect? 3. Would each operator require its own connection

question about timers

2021-01-19 Thread Marco Villalobos
If there are timers that have been checkpointed (we use rocksdb), and the system goes down, and then the system goes back up after the timers should have fired, do those timers that were skipped still fire, even though we are past that time? example: for example, if the current time is 1:00 p.m.

Re: UDTAGG and SQL

2021-01-05 Thread Marco Villalobos
o move the implementation there and keep the SQL query > simple. But this is up to you. Consecutive windows are supported. > > Regards, > Timo > > > On 05.01.21 15:23, Marco Villalobos wrote: > > Hi Timo, > > > > Thank you for the quick response. > > >

Re: UDTAGG and SQL

2021-01-05 Thread Marco Villalobos
ar function called > `MyUDTAGG` in your example and cannot find one. > > Maybe the built-in function `COLLECT` or `LISTAGG`is what you are looking for? > > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/functions/systemFunctions.html > > Regards, > Timo &

UDTAGG and SQL

2021-01-05 Thread Marco Villalobos
I am trying to use User defined Table Aggregate function directly in the SQL so that I could combine all the rows collected in a window into one object. GIVEN a User defined Table Aggregate function public class MyUDTAGG extends TableAggregateFunction { public PurchaseWindow

  1   2   >