Re: flink checkpoint timeout

2020-09-14 Thread Deshpande, Omkar
I have followed this https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_migration.html and I am using taskmanager.memory.flink.size now instead of taskmana

Re: restoring from externalized incremental rocksdb checkpoint?

2020-09-14 Thread Jeffrey Martin
Thanks for the quick reply Congxian. The non-empty chk-N directories I looked at contained only files whose names are UUIDs. Nothing named _metadata (unless HDFS hides files that start with an underscore?). Just to be clear though -- I should expect a metadata file when using incremental checkpoi

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-14 Thread Arvid Heise
The new backend would be for unit tests (instead of a RocksDB mock). It's kind of the mock for out-of-core behavior that you initially requested. To use rocksDB in an IT Case with multiple task managers, you would adjust the configuration in the usual minicluster setup, for example [1]. Note that

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Dan Hill
Yes, the client runs in K8. It uses a different K8 config than the Helm chart and does not load the plugins. Does the client use the same plugin structure as the Flink job/task manager? I can try using it tomorrow. Cool, that link would work too. Thanks, Arvid! On Mon, Sep 14, 2020 at 10:59

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Arvid Heise
Hi Dan, Are you running the client also in K8s? If so you need an initialization step, where you add the library to the plugins directory. Putting it into lib or into the user jar doesn't work anymore as we removed the shading in s3 in Flink 1.10. The official Flink docker image has an easy way t

Re: flink checkpoint timeout

2020-09-14 Thread Congxian Qiu
Hi You can try to find out is there is some hot method, or the snapshot stack is waiting for some lock. and maybe Best, Congxian Deshpande, Omkar 于2020年9月15日周二 下午12:30写道: > Few of the subtasks fail. I cannot upgrade to 1.11 yet. But I can still > get the thread dump. What should I be lookin

Re: restoring from externalized incremental rocksdb checkpoint?

2020-09-14 Thread Congxian Qiu
Hi Jeff You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` , you may find the metadata in the `chk-xxx` directory[2] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-check

restoring from externalized incremental rocksdb checkpoint?

2020-09-14 Thread Jeffrey Martin
Hi, My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation. How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT

flink checkpoint timeout

2020-09-14 Thread Deshpande, Omkar
Hello, I recently upgraded from flink 1.9 to 1.10. The checkpointing succeeds first couple of times and then starts failing because of timeouts. The checkpoint time grows with every checkpoint and starts exceeding 10 minutes. I do not see any exceptions in the logs. I have enabled debug logging

Re: Performance issue associated with managed RocksDB memory

2020-09-14 Thread Yun Tang
Hi Juha Would you please consider to contribute this back to community? If agreed, please open a JIRA ticket and we could help review your PR then. Best Yun Tang From: Juha Mynttinen Sent: Thursday, September 10, 2020 19:05 To: Stephan Ewen Cc: Yun Tang ; user@

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Dan Hill
Thanks for the update! I'm trying a bunch of combinations on the client side to get the S3 Filesystem to be picked up correctly. Most of my attempts involved building into the job jar (which I'm guessing won't work). I then started getting issues with ClassCastExceptions. I might try a little m

Re: info about flinkml

2020-09-14 Thread Yun Tang
Hi The flinkML has been choosen to drop since Flink-1.9 [1] and a new machine learning library has been developed under the umbrella of FLIP-39 [2][3]. As far as I know, the new Flink ml library has not been completed and you could try Alink [4], a Machine Learning algorithm platform based on Fl

Re: flink checkpoint timeout

2020-09-14 Thread Yun Tang
Hi Omkar First of all, you should check the web UI of checkpoint [1] to see whether many subtasks fail to complete in time or just few of them. The former one might be your checkpoint time out is not enough for current case. The later one might be some task stuck in slow machine or cannot grab

[DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-14 Thread Yang Wang
Hi devs and users, I would like to start the discussion about FLIP-144[1], which will introduce a new native high availability service for Kubernetes. Currently, Flink has provided Zookeeper HA service and been widely used in production environments. It could be integrated in standalone cluster,

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-14 Thread Alexey Trenikhun
Thank you for ideas. Do you suggest to use new backend with unit test or integration test? Thanks, Alexey From: Arvid Heise Sent: Monday, September 14, 2020 4:26:47 AM To: Dawid Wysakowicz Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Unit Test for

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Jingsong Li
Hi Dan, I think Arvid and Dawid are right, as a workaround, you can try making S3Filesystem works in the client. But for a long term solution, we can fix it. I created https://issues.apache.org/jira/browse/FLINK-19228 for tracking this. Best, Jingsong On Mon, Sep 14, 2020 at 3:57 PM Dawid Wysak

Get only the 1st gz file from an s3 folder

2020-09-14 Thread Vijay Balakrishnan
Hi, Able to read *.gz files from an s3 folder. I want to *get the 1st gz file* from the s3 folder and then sort only the 1st gz file into an Ordered Map as below and get the orderedMap.*getFirstKey() as a 1st event timestamp*. I want to then *pass this 1st event timestamp to all TaskManagers along

Re: Struggling with reading the file from s3 as Source

2020-09-14 Thread Vijay Balakrishnan
My problem was the plugins jar needs to be under plugins/s3-fs-hadoop. Running code with Added to flink-conf.yaml: s3.access-key: s3.secret-key: Removed from pom.xml all hadoop dependencies. cd / /bin/start-cluster.sh ./bin/flink runxyz..jar Still struggling with how to get it work with pom.xml

Re: Struggling with reading the file from s3 as Source

2020-09-14 Thread Vijay Balakrishnan
Hi Robert, Thanks for the link. Is there a simple example I can use as a starting template for using S3 with pom.xml ? I copied the flink-s3-fs-hadoop-1.11.1.jar into the plugins/s3-fs-hadoop directory Running from flink-1.11.1/ flink run -cp ../target/monitoring-rules-influx-1.0.jar -jar /Users/v

Re: Flink DynamoDB stream connector losing records

2020-09-14 Thread Ying Xu
Hi Jiawei: Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost? FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset manageme

Re: Parallelism of Keyed Process Function

2020-09-14 Thread Arvid Heise
Hi Arti, This is nothing specific to KeyedProcessFunction, but the general way Flink distributes subtasks. The general idea is to use as few task managers as possible such that they are available for cluster downsizing or other concurrent jobs. You can change this behavior through cluster.evenly-

Parallelism of Keyed Process Function

2020-09-14 Thread Arti Pande
Hi, Here is a question related to parallelism of keyed-process-function that is applied to the KeyedStream. For some code that looks like this myStream.keyBy(...) .process(new MyKeyedProcessFunction()) .process().setParallelism(10) On a Flink cluster with 5 TM nodes each with 10 task s

Re: Streaming data to parquet

2020-09-14 Thread Senthil Kumar
Arvid, Jan and Ayush, Thanks for the ideas! -Kumar From: Jan Lukavský Date: Monday, September 14, 2020 at 6:23 AM To: "user@flink.apache.org" Subject: Re: Streaming data to parquet Hi, I'd like to mention another approach, which might not be as "flinkish", but removes the source of issues w

Re: I hit a bad jobmanager address when trying to use Flink SQL Client

2020-09-14 Thread Robert Metzger
Hi Dan, I don't think the SQL Client officially supports running against Kubernetes. What you could try is using an undocumented, untested feature: Put something like jobmanager: kubernetes into the "deployment:" section of the Sql Client configuration

Re: time window and reduce problem

2020-09-14 Thread Arvid Heise
Hi Jiazhi, not sure if I got the question correctly, but the reduce function will be repeatedly applied to all elements in your tumbling window until only one final aggregate per key and window remains. So in your case, you would get the user with the max id per UserBehavior(?) for each 30 secs w

Re: Streaming data to parquet

2020-09-14 Thread Jan Lukavský
Hi, I'd like to mention another approach, which might not be as "flinkish", but removes the source of issues which arise when writing bulk files. The actual cause of issues here is that when creating bulk output, the most efficient option is to have _reversed flow of commit_. That is to say -

Re: I hit a bad jobmanager address when trying to use Flink SQL Client

2020-09-14 Thread Arvid Heise
Hi Dan, Can you verify from the pod that jobmanager and *10.98.253.58:8081 *is actually accessible (e.g., with curl)? I'd probably also try out localhost:8081 as you are connecting to the respective pod directly. On Fri, Sep 11, 2020 at 9:59 PM Dan Hill wrote: > Hi Ro

Re: Flink 1.8.3 GC issues

2020-09-14 Thread Piotr Nowojski
Hi Josson, The TM logs that you attached are only from a 5 minutes time period. Are you sure they are encompassing the period before the potential failure and after the potential failure? It would be also nice if you would provide the logs matching to the charts (like the one you were providing in

Re: Streaming data to parquet

2020-09-14 Thread Arvid Heise
Hi Kumar, for late events, I have seen two approaches: * Initial compaction every day, repeated compaction after two days, and after 1 week. * Using something like delta lake [1], which is a set of specially structured parquet files. Usually you also compact them after some time (e.g. 1 week in y

Re: How to schedule Flink Batch Job periodically or daily

2020-09-14 Thread Arvid Heise
Hi Sunitha, oozie is a valid approach, but I'd recommend to evaluate Airflow first [1]. It's much better maintained and easier to use. Both tools are more used to compose complex workflows though. If you just need a repeated execution, I'd go with cron jobs. For example, you can completely rely o

Re: Flink alert after database lookUp

2020-09-14 Thread Arvid Heise
Hi Sunitha, to listen to changes in your database a change-data-capture approach should be taken [1], which is supported in Flink since 1.11. Basically, a tool like debezium [2] will monitor the changelog of the database and publish the result as a change stream, which can be ingested in Flink as

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-14 Thread Arvid Heise
Hi Alexey, Definition of test levels are always a bit blurry when writing tests for a data processing framework, but I'm convinced that in your case, you should rather think in terms of integration tests than unit tests: * Unit test should really just be about business logic * If it's about specif

info about flinkml

2020-09-14 Thread Cristian Lorenzetto
Hi i m evaluating to adopt flink instead spark for data mining processor. I knew flinkML for this scope but in the last release i cant find it. Why? Can you suggest the best way ? -- Cristian Lorenzetto Direzione ICT e Agenda Digitale U.O. Demand, Progettazione e Sviluppo Software Tel: 041 2792

Re: Flink Stateful Functions API

2020-09-14 Thread Tzu-Li (Gordon) Tai
Hi! Dawid is right, there currently is no developer documentation for the remote request-reply protocol. One reason for this is that the protocol isn't considered a fully stable user-facing interface yet, and thus not yet properly advertised in the documentation. However, there are plans to revisi

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Dawid Wysakowicz
Hi Dan, As far as I checked in the code, the FileSystemSink will try to create staging directories from the client. I think it might be problematic, as your case shows. We might need to revisit that part. I am cc'ing Jingsong who worked on the FileSystemSink. As a workaround you might try putting

Re: Flink Stateful Functions API

2020-09-14 Thread Dawid Wysakowicz
Hi, Not sure if there is a "developer" documentation for the protocol. I am cc'ing Igal and Gordon who know better than I if there is one. To give you some hints though. If I am correct the Python API is implemented as a so called remote functions [1][2], which communicate with Flink via HTTP/gRP