Custom metrics in Stateful Functions

2021-04-27 Thread Cliff Resnick
We think Embedded Statefun is a nicer fit than Datastream for some problem domains, but one thing we miss is support for custom metrics/counters. Is there a way to access the Flink support? It looks like if we want custom metrics we'll need to roll our own.

KeyedStream and chained forward operators

2020-04-21 Thread Cliff Resnick
I'm running a massive file sifting by timestamp DataSteam job from s3. The basic job is: FileMonitor -> ContinuousFileReader -> MultipleFileOutputSink The MultipleFileOutputSink sifts data based on timestamp to date-hour directories It's a lot of data, so I'm using high parallelism, but I want

post-checkpoint watermark out of sync with event stream?

2020-04-14 Thread Cliff Resnick
We have an event-time pipeline that uses a ProcessFunction to accept events with an allowed lateness of a number of days. We a BoundedOutOfOrdernessTimestampExtractor and our event stream has a long tail that occasionally exceeds our allowed lateness, in which case we drop the events. The logic

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Cliff Resnick
I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

Re: State Migration with RocksDB MapState

2019-04-25 Thread Cliff Resnick
till bump into the same issue. > > Best, > Gordon > > On Wed, Apr 24, 2019 at 7:45 PM Cliff Resnick wrote: > >> Hi Gordon, >> >> I noticed there has been no movement on this issue and I'm wondering if I >> can find some way to work around this. >> M

Re: State Migration with RocksDB MapState

2019-04-24 Thread Cliff Resnick
s definitely possible. > > Cheers, > Gordon > > [1] https://issues.apache.org/jira/browse/FLINK-11947 > > On Mon, Mar 18, 2019 at 11:20 AM Cliff Resnick wrote: > >> After trying out state migration in 1.8 rc2 I ran into this hard stop >> below. The comment do

State Migration with RocksDB MapState

2019-03-17 Thread Cliff Resnick
After trying out state migration in 1.8 rc2 I ran into this hard stop below. The comment does not give an indication why rocksdb map state cannot be migrated, and I'm wondering what the status is, since we do need this functionality and would like to know if this is a long-term blocker or not.

how to use Hadoop Inputformats with flink shaded s3?

2019-01-31 Thread Cliff Resnick
I need to process some Parquet data from S3 as a unioned input in my DataStream pipeline. From what I know, this requires using the hadoop AvroParquetInputFormat. The problem I'm running into is that also requires using un-shaded hadoop classes that conflict with the Flink shaded hadoop3

Re: Problem building 1.7.1 with scala-2.12

2019-01-03 Thread Cliff Resnick
scala-2.11 profile) > > On 02.01.2019 17:48, Cliff Resnick wrote: > > The build fails at flink-connector-kafka-0.9 because _2.12 libraries > > apparently do not exist for kafka < 0.10. Any help appreciated! > > > > > > -Cliff > > >

Problem building 1.7.1 with scala-2.12

2019-01-02 Thread Cliff Resnick
The build fails at flink-connector-kafka-0.9 because _2.12 libraries apparently do not exist for kafka < 0.10. Any help appreciated! -Cliff

Re: Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-12 Thread Cliff Resnick
icationId ` you should see the problem why the TMs > don't start up. > > Cheers, > Till > > On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick wrote: > >> Hi Till, >> >> Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG >> level. I saw

Re: Flink Web UI does not show specific exception messages when job submission fails

2018-11-09 Thread Cliff Resnick
+1! On Fri, Nov 9, 2018 at 1:34 PM Gary Yao wrote: > Hi, > > We only propagate the exception message but not the complete stacktrace > [1]. > Can you create a ticket for that? > > Best, > Gary > > [1] >

Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-08 Thread Cliff Resnick
I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a configuration of 3 slots per TM. The cluster is dedicated to a single job that runs at full capacity in "FLIP6" mode. So in this cluster, the parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager). When I run

Re: classloading strangeness with Avro in Flink

2018-08-21 Thread Cliff Resnick
, Aug 21, 2018 at 8:02 AM Cliff Resnick wrote: > Hi Aljoscha, > > We need flink-shaded-hadoop2-uber.jar because there is no hadoop distro on > the instance the Flink session/jobs is managed from and the process that > launches Flink is not a java process, but execs a process that ca

Re: classloading strangeness with Avro in Flink

2018-08-21 Thread Cliff Resnick
5, vino yang wrote: > > Hi Cliff, > > If so, you can explicitly exclude Avro's dependencies from related > dependencies (using ) and then directly introduce dependencies on > the Avro version you need. > > Thanks, vino. > > Cliff Resnick 于2018年8月21日周二 上午5:13写道

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
the Avro dependency to the user jar. However, since I'm using YARN, I'm required to have flink-shaded-hadoop2-uber.jar loaded from lib -- and that has avro bundled un-shaded. So I'm back to the start problem... Any advice is welcome! -Cliff On Mon, Aug 20, 2018 at 1:42 PM Cliff Resnick wrote: >

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
eratorStateBackend.deserializeStateValues(DefaultOperatorStateBackend.java:584) >> at >> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:399) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorStateBackend(

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
, 2018 at 8:43 AM Cliff Resnick wrote: > Hi Vino, > > Thanks for the explanation, but the job only ever uses the Avro (1.8.2) > pulled in by flink-formats/avro, so it's not a class version conflict > there. > > I'm using default child-first loading. It might be a further tr

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
documentation.[2] > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html > [2]: > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html > > Thanks, vino. > > Cliff Resnick 于2018年8月20日周一 上午10:40写道: > >&

classloading strangeness with Avro in Flink

2018-08-19 Thread Cliff Resnick
Our Flink/YARN pipeline has been reading Avro from Kafka for a while now. We just introduced a source of Avro OCF (Object Container Files) read from S3. The Kafka Avro continued to decode without incident, but the OCF files failed 100% with anomalous parse errors in the decoding phase after the

Re: MIs-reported metrics using SideOutput stream + Broadcast

2018-07-03 Thread Cliff Resnick
stMetadataJoin.broadcast(metadata) for the* late-join*. > Is this intended, or just a copy error? > > > On 03.07.2018 04:16, Cliff Resnick wrote: > > Our topology has a metadata source that we push via Broadcast. Because > this metadata source is critical, but sometimes late, we a

MIs-reported metrics using SideOutput stream + Broadcast

2018-07-02 Thread Cliff Resnick
Our topology has a metadata source that we push via Broadcast. Because this metadata source is critical, but sometimes late, we added a buffering mechanism via a SideOutput. We call the initial look-up from Broadcast "join" and the secondary, state-backed buffered lookup, "late-join" Today I

Re: Task Manager detached under load

2018-01-30 Thread Cliff Resnick
I've seen a similar issue while running successive Flink SQL batches on 1.4. In my case, the Job Manager would fail with the log output about unreachability (with an additional statement about something going "horribly wrong"). Under workload pressure, I reverted to 1.3.2 where everything works

CSV writer/parser inconsistency when using the Table API?

2017-12-22 Thread Cliff Resnick
I've been trying out the Table API for some ETL using a two-stage job of CsvTableSink (DataSet) -> CsvInputFormat (Stream). I ran into an issue where the first stage produces output with trailing null values (valid), which causes a parse error in the second stage. Looking at

Question on checkpoint management

2017-05-08 Thread Cliff Resnick
When a job cancel-with-savepoint finishes a successful Savepoint, the preceding last successful Checkpoint is removed. Is this the intended behavior? I thought that checkpoints and savepoints were separate entities and, as such, savepoints should not infringe on checkpoints. This is actually an

Re: REST api: how to upload jar?

2017-01-25 Thread Cliff Resnick
; Can you also please open a jira for fixing the documentation? > > - Sachin > > > On Jan 25, 2017 06:55, "Cliff Resnick" <cre...@gmail.com> wrote: > >> The 1.2 release documentation (https://ci.apache.org/project >> s/flink/flink-docs-release-1.2/monitoring

REST api: how to upload jar?

2017-01-24 Thread Cliff Resnick
The 1.2 release documentation (https://ci.apache.org/ projects/flink/flink-docs-release-1.2/monitoring/rest_api.html) states "It is possible to upload, run, and list Flink programs via the REST APIs and web frontend". However there is no documentation about uploading a jar via REST api. Does this

Re: Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
for a bit): > > - There is some much enhanced Checkpoint Monitoring almost ready to be > merged. That should help in fining out where the barriers get delayed. > > - Finally, we are experimenting with some other checkpoint alignment > variants (alternatives to the BarrierBuffer). We can p

Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
We are running a DataStream pipeline using Exactly Once/Event Time semantics on 1.2-SNAPSHOT. The pipeline sources from S3 using the ContinuousFileReaderOperator. We use a custom version of the ContinuousFileMonitoringFunction since our source directory changes over time. The pipeline transforms

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2016-12-08 Thread Cliff Resnick
or how long did you look at iotop. It could be that the IO access happens > in bursts, depending on how data is cached. > > I'll also add Stefan Richter to the conversation, he has maybe some more > ideas what we can do here. > > > On Mon, Dec 5, 2016 at 6:19 PM, Cliff Res

Re: Resource under-utilization when using RocksDb state backend

2016-12-05 Thread Cliff Resnick
ou could look > into tuning the RocksDB settings so that it uses more memory for caching. > > Regards, > Robert > > > On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <cre...@gmail.com> wrote: > >> In tests comparing RocksDb to fs state backend we observe mu

Resource under-utilization when using RocksDb state backend

2016-12-02 Thread Cliff Resnick
In tests comparing RocksDb to fs state backend we observe much lower throughput, around 10x slower. While the lowered throughput is expected, what's perplexing is that machine load is also very low with RocksDb, typically falling to < 25% CPU and negligible IO wait (around 0.1%). Our test

Re: Why are externalized checkpoints deleted on Job Manager exit?

2016-11-15 Thread Cliff Resnick
Hi, Anything keeping this from being merged into master? On Thu, Nov 3, 2016 at 10:56 AM, Ufuk Celebi wrote: > A fix is pending here: https://github.com/apache/flink/pull/2750 > > The behaviour on graceful shut down/suspension respects the > cancellation behaviour with this