Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
n this case I'd need to share the lib directory for the image running the job to copy it over from the original image. Does that make sense? Is there a better way to handle this in this scenario? On Thu, Feb 22, 2024 at 6:31 AM Rion Williams wrote: > Correct! Building a custom image for the depl

Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
> in the lib folder (don’t confuse this with the usrlib folder). > > Kind Regards > Dominik > From: Rion Williams > Date: Thursday, 22 February 2024 at 13:09 > To: Bünzli Dominik, INI-DNA-INF > Cc: user@flink.apache.org > Subject: Re: Using Custom JSON Formatting with Fli

Re: Using Custom JSON Formatting with Flink Operator

2024-02-22 Thread Rion Williams
ut that I also needed to add the additional > dependencies (I guess JsonTemplateLayout is one of them) to the lib folder of > the deployment. > > Kind regards > Dominik > > From: Rion Williams > Date: Thursday, 22 February 2024 at 00:46 > To: Flink User List &g

Using Custom JSON Formatting with Flink Operator

2024-02-21 Thread Rion Williams
Hey Flinkers, Recently I’ve been in the process of migrating a series of older Flink jobs to use the official operator and have run into a snag on the logging front. I’ve attempted to use the following configuration for the job: ``` logConfiguration: log4j-console.properties: |+

Handling Batched Failures in ElasticsearchSink

2023-03-23 Thread Rion Williams
Hi all, I have a pipeline that is currently reading from Kafka and writing to Elasticsearch. I recently was doing some testing for how it handles failures and was wondering if there’s a best practice or recommendation for doing so. Specifically, if I have a batch of 100 records being sent via

Re: Handling JSON Serialization without Kryo

2023-03-22 Thread Rion Williams
, Ken Krugler wrote:Hi Rion,I’m using Gson to deserialize to a Map.1-2 records/second sounds way too slow, unless each record is enormous.— KenOn Mar 21, 2023, at 6:18 AM, Rion Williams <rionmons...@gmail.com> wrote:Hi Ken,Thanks for the response. I hadn't tried exploring the use of the Record

Re: Handling JSON Serialization without Kryo

2023-03-20 Thread Rion Williams
and other types supported by flink[1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/Best,Shammon FYOn Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com> wrote:Hi all, I’m reaching out today for some suggestions (and hopefully a so

Handling JSON Serialization without Kryo

2023-03-18 Thread Rion Williams
Hi all, I’m reaching out today for some suggestions (and hopefully a solution) for a Flink job that I’m working on. The job itself reads JSON strings from a Kafka topic and reads those into JSONObjects (currently via Gson), which are then operated against, before ultimately being written out

Handling JSON Serialization without Kryo

2023-03-18 Thread Rion Williams
Hi all, I’m reaching out today for some suggestions (and hopefully a solution) for a Flink job that I’m working on. The job itself reads JSON strings from a Kafka topic and reads those into JSONObjects (currently via Gson), which are then operated against, before ultimately being written out

Flink Forward Session Question

2022-12-31 Thread Rion Williams
Hey Flinkers, Firstly, early Happy New Year’s to everyone in the community. I’ve been digging a bit into exactly-once processing with Flink and Pinot and I came across this session from Flink Foward last year: -

Re: Question Regarding State Migrations in Ververica Platform

2022-08-31 Thread Rion Williams
+dev > On Aug 30, 2022, at 11:20 AM, Rion Williams wrote: > >  > Hi all, > > I wasn't sure if this would be the best audience, if not, please advise if > you know of a better place to ask it. I figured that at least some folks here > either work for Verv

Question Regarding State Migrations in Ververica Platform

2022-08-30 Thread Rion Williams
Hi all, I wasn't sure if this would be the best audience, if not, please advise if you know of a better place to ask it. I figured that at least some folks here either work for Ververica or might have used their platform. *tl;dr; I'm trying to migrate an existing stateful Flink job to run in

Exception Handling in ElasticsearchSink

2022-04-21 Thread Rion Williams
Hi all, I've recently been encountering some issues that I've noticed in the logs of my Flink job that handles writing to an Elasticsearch index. I was hoping to leverage some of the metrics that Flink exposes (or piggyback on them) to update metric counters when I encounter specific kinds of

Re: [ANNOUNCE] Apache Flink 1.14.0 released

2021-09-29 Thread Rion Williams
Great news all! Looking forward to it! > On Sep 29, 2021, at 10:43 AM, Theo Diefenthal > wrote: > >  > Awesome, thanks for the release. > > - Ursprüngliche Mail - > Von: "Dawid Wysakowicz" > An: "dev" , "user" , > annou...@apache.org > Gesendet: Mittwoch, 29. September 2021

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-09-04 Thread Rion Williams
nit test doesn't in this case, so I'd recommend it as the next > step (I'm also bit concerned that this test would take a long time to > execute / be resource intensive as it would need to spawn more elastic > clusters?). > > Best, > D. > > On Thu, Aug 26, 2021 at 5:47 PM R

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-26 Thread Rion Williams
don't want to go overkill. Thanks again for all of your help, Rion On Wed, Aug 25, 2021 at 2:10 PM Rion Williams wrote: > Thanks again David, > > I've spun up a JIRA issue for the ticket > <https://issues.apache.org/jira/browse/FLINK-23977> while I work on > getting things

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
data. Most complete generic work on this topic > that I'm aware of are Splittable DoFn based IOs in Apache Beam. > > I think the best module for the contribution would be > "elasticsearch-base", because this could be easily reused for all ES > versions that we currently support. > &g

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-25 Thread Rion Williams
//github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBaseTest.java > > Best, > D. > > On Tue, Aug 24, 2021 at 12:03 AM Rion Williams > wrote: > >> Hi

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-23 Thread Rion Williams
(ElasticsearchSinkFunction< > >Tuple2>) >(el, ctx, indexer) -> { >// Construct index >request. >

Re: Handling HTTP Requests for Keyed Streams

2021-08-17 Thread Rion Williams
; As you mentioned that the configuration fetching is very infrequent, why > don't you use a blocking approach to send HTTP requests and receive > responses? This seems like a more reasonable solution to me. > > Rion Williams 于2021年8月17日周二 上午4:00写道: >> Hi all, >> >> I've be

Handling HTTP Requests for Keyed Streams

2021-08-16 Thread Rion Williams
Hi all, I've been exploring a few different options for storing tenant-specific configurations within Flink state based on the messages I have flowing through my job. Initially I had considered creating a source that would periodically poll an HTTP API and connect that stream to my original event

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-16 Thread Rion Williams
gh-level I think could look something like > this <https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8> [1]. > > [1] https://gist.github.com/dmvk/3f8124d585cd33e52cacd4a38b80f8c8 > > On Fri, Aug 13, 2021 at 2:57 PM Rion Williams > wrote: > >> Hi David, >> &g

Re: Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-13 Thread Rion Williams
(I'm able to provide some guidance). > > [1] > https://github.com/apache/flink/blob/release-1.13.2/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkFunction.java > > Best, > D. > >> O

Dynamic Cluster/Index Routing for Elasticsearch Sink

2021-08-08 Thread Rion Williams
Hi folks, I have a use-case that I wanted to initially pose to the mailing list as I’m not terribly familiar with the Elasticsearch connector to ensure I’m not going down the wrong path trying to accomplish this in Flink (or if something downstream might be a better option). Basically, I have

Re: Dead Letter Queue for JdbcSink

2021-08-03 Thread Rion Williams
gt; > wt., 3 sie 2021 o 18:07 Rion Williams napisał(a): >> >> Thanks Maciek, >> >> It looks like my initial issue had another problem with a bad interface that >> was being used (or an improper one), but after changing that and ensuring >> all of the fie

Re: Dead Letter Queue for JdbcSink

2021-08-03 Thread Rion Williams
verriding functions like open, > setRuntimeContext, snapshotState, initializeState - the calls needs to > be passed to the inner sink function. > > pon., 2 sie 2021 o 19:31 Rion Williams napisał(a): > > > > Hi again Maciek (and all), > > > > I just recently returned to

Re: Dead Letter Queue for JdbcSink

2021-08-02 Thread Rion Williams
ckpointComplete, snapshotState, initializeState and > setRuntimeContext. > > The problem is that if you want to catch problematic record you need > to set batch size to 1, which gives very bad performance. > > Regards, > Maciek > > śr., 14 lip 2021 o 17:31 Rion Williams napisał(

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
nks, Rion On Wed, Jul 14, 2021 at 9:56 AM Maciej Bryński wrote: > Hi Rion, > We have implemented such a solution with Sink Wrapper. > > > Regards, > Maciek > > śr., 14 lip 2021 o 16:21 Rion Williams napisał(a): > > > > Hi all, > > > > Recently I've bee

Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi all, Recently I've been encountering an issue where some external dependencies or process causes writes within my JDBCSink to fail (e.g. something is being inserted with an explicit constraint that never made it's way there). I'm trying to see if there's a pattern or recommendation for

Handling Large Broadcast States

2021-06-16 Thread Rion Williams
Hey Flink folks, I was discussing the use of the Broadcast Pattern with some colleagues today for a potential enrichment use-case and noticed that it wasn’t currently backed by RocksDB. This seems to indicate that it would be solely limited to the memory allocated, which might not support a

Guidance for Integration Tests with External Technologies

2021-05-18 Thread Rion Williams
Hey all, I’ve been taking a very TDD-oriented approach to developing many of the Flink apps I’ve worked on, but recently I’ve encountered a problem that has me scratching my head. A majority of my integration tests leverage a few external technologies such as Kafka and typically a relational

Re: Handling "Global" Updating State

2021-05-17 Thread Rion Williams
o executing the Flink pipeline >> to ensure synchronicity) >> - Use this to initialize the state of my broadcast stream (if possible) >> - At this point that stream would be broadcasting any new records coming in, >> so I “should” stay up to date at that point. >&

Re: Handling "Global" Updating State

2021-05-16 Thread Rion Williams
or is there an obviously better / well known approach to handling this? Thanks, Rion > On May 14, 2021, at 9:51 AM, Rion Williams wrote: > >  > Hi all, > > I've encountered a challenge within a Flink job that I'm currently working > on. The gist of it is that I have a job that listens

Handling "Global" Updating State

2021-05-14 Thread Rion Williams
Hi all, I've encountered a challenge within a Flink job that I'm currently working on. The gist of it is that I have a job that listens to a series of events from a Kafka topic and eventually sinks those down into Postgres via the JDBCSink. A requirement recently came up for the need to filter

Capturing Statement Execution / Results within JdbcSink

2021-03-19 Thread Rion Williams
Hey all, I've been working with JdbcSink and it's really made my life much easier, but I had two questions about it that folks might be able to answer or provide some clarity around. *Accessing Statement Execution / Results* Is there any mechanism in place (or out of the box) to support reading

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
TaskMetricGroup() { > @Overridepublic OperatorMetricGroup > getOrAddOperator(OperatorID id, String name) { > return operatorMetricGroup;} > };new MockEnvironmentBuilder() > .setMetricGroup(taskMetricGroup) > >

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
:36 AM Chesnay Schepler wrote: > Are you actually running a job, or are you using a harness for testing > your function? > > On 3/16/2021 3:24 PM, Rion Williams wrote: > > Hi Chesnay, > > Thanks for the prompt response and feedback, it's very much appreciated. >

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
e JM and > one for the TM. > To remedy this, I would recommend creating a factory that returns a static > reporter instance instead; overall this tends to be cleaner. > > Alternatively, when using the testing harnesses IIRC you can also set set > a custom MetricGroup implementatio

Unit Testing for Custom Metrics in Flink

2021-03-15 Thread Rion Williams
Hi all, Recently, I was working on adding some custom metrics to a Flink job that required the use of dynamic labels (i.e. capturing various counters that were "slicable" by things like tenant / source, etc.). I ended up handling it in a very naive fashion that would just keep a dictionary of

Re: Handling Bounded Sources with KafkaSource

2021-03-13 Thread Rion Williams
). Any ideas/recommendations/workarounds would be greatly welcome and I’d be happy to share my specific code / use-cases if needed. Thanks much, Rion > On Mar 12, 2021, at 10:19 AM, Rion Williams wrote: > >  > Hi all, > > I've been using the KafkaSource API as opposed to th

Handling Bounded Sources with KafkaSource

2021-03-12 Thread Rion Williams
Hi all, I've been using the KafkaSource API as opposed to the classic consumer and things have been going well. I configured my source such that it could be used in either a streaming or bounded mode, with the bounded approach specifically aimed at improving testing (unit/integration). I've

Request for Flink JIRA Access

2021-03-07 Thread Rion Williams
Hey folks, The community here has been awesome with my recent questions about Flink, so I’d like to give back. I’m already a member of the ASF JIRA but I was wondering if I could get access to the Flink Project. I’ve contributed a good bit to Apache Beam in the past, but I figured that I’ll

Dynamic JDBC Sink Support

2021-03-05 Thread Rion Williams
Hi all, I’ve been playing around with a proof-of-concept application with Flink to assist a colleague of mine. The application is fairly simple (take in a single input and identify various attributes about it) with the goal of outputting those to separate tables in Postgres: object

Re: Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-04 Thread Rion Williams
ally calls > open(), thus any mutations on the harness happen too late. > > I'd suggest to take a look at the implementation of that method and > essentially copy the code. > You can then call the harness constructor manually and mutate the > execution config before calling open(). > >

Re: Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-04 Thread Rion Williams
:47 AM Chesnay Schepler wrote: > Could you show us how you create test harness? > > On 3/4/2021 5:13 AM, Rion Williams wrote: > > Hi all, > > Early today I had asked a few questions regarding the use of the many > testing constructs available within Flink and believe tha

Defining GlobalJobParameters in Flink Unit Testing Harnesses

2021-03-03 Thread Rion Williams
Hi all, Early today I had asked a few questions regarding the use of the many testing constructs available within Flink and believe that I have things in a good direction at present. I did run into a specific case that either may not be supported, or just isn't documented well enough for me to

Re: Unit Testing State Stores in KeyedProcessFunctions

2021-03-03 Thread Rion Williams
e that anyway) > > On 3/3/2021 8:10 PM, Rion Williams wrote: >> Hi all! >> >> Is it possible to apply assertions against the underlying state stores >> within a KeyedProcessFunction using the existing >> KeyedOneInputStreamOperatorTestHarness class within u

Unit Testing State Stores in KeyedProcessFunctions

2021-03-03 Thread Rion Williams
Hi all! Is it possible to apply assertions against the underlying state stores within a KeyedProcessFunction using the existing KeyedOneInputStreamOperatorTestHarness class within unit tests? Basically I wanted to ensure that if I passed in two elements each with unique keys that I would be able

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-03-01 Thread Rion Williams
, but maybe not? Thanks much! Rion On Sat, Feb 27, 2021 at 10:56 AM Rion Williams wrote: > Thanks David, > > I figured that the correct approach would obviously be to adopt a keying > strategy upstream to ensure the same data that I used as a key downstream > fell on the same part

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
metadata from context into metric labels. > > If this doesn't work for you, then consider encoding tenant identifier > into job names, and extract this identifier in a metric_relabel_config [2] > > [0]: https://github.com/prometheus/node_exporter/issues/319 > [1]: > https://prom

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
/github.com/prometheus/node_exporter/issues/319 > [1]: > https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config > [2]: > https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs > > > From: Rion Wi

Re: Using Prometheus Client Metrics in Flink

2021-02-28 Thread Rion Williams
rks is you are using the > metric counter. > > Prasanna. > >> On Sat, Feb 27, 2021 at 9:01 PM Rion Williams wrote: >> Hi folks, >> >> I’ve just recently started working with Flink and I was in the process of >> adding some metrics through my existi

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-27 Thread Rion Williams
e processing you need to > do. Otherwise, a simple change that may help would be to increase the bounded > delay you use in calculating your own per-tenant watermarks, thereby making > late events less likely. > > David > >> On Sat, Feb 27, 2021 at 3:29 AM Rion Williams wr

Using Prometheus Client Metrics in Flink

2021-02-27 Thread Rion Williams
Hi folks, I’ve just recently started working with Flink and I was in the process of adding some metrics through my existing pipeline with the hopes of building some Grafana dashboards with them to help with observability. Initially I looked at the built-in Flink metrics that were available,

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread Rion Williams
Flink provides but calling >> your own logic based on the timestamps that enter your process function >> and the stored state. >> >> Regards, >> Timo >> >> >> On 26.02.21 00:29, Rion Williams wrote: >> >  >> > Hi David, >>

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
gt; > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/learn-flink/event_driven.html#example > >> On Thu, Feb 25, 2021 at 9:05 PM Rion Williams wrote: >> Hey folks, I have a somewhat high-level/advice question regarding Flink and >> if it has

Handling Data Separation / Watermarking from Kafka in Flink

2021-02-25 Thread Rion Williams
Hey folks, I have a somewhat high-level/advice question regarding Flink and if it has the mechanisms in place to accomplish what I’m trying to do. I’ve spent a good bit of time using Apache Beam, but recently pivoted over to native Flink simply because some of the connectors weren’t as mature or