Re: Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
I hava already tested it. [root@node ~]#ll /mnt/yarn/local/usercache/yarn/appcache/application_1544101169829_0038/ total 32 drwxr-xr-x 2 yarn hadoop 4096 Dec 8 02:29 blobStore-273cf1a6-0f98-4c86-801e-5d76fef66a58 drwxr-xr-x 2 yarn hadoop 4096 Dec 8 02:29 blobStore-992562a5-f42f-43f7-90de-a415b

Re: Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
Thank you for your advice! I will check this out next, and I will sync the information at any time with new progress. Stefan Richter 于2018年12月8日周六 上午12:05写道: > I think then you need to investigate what goes wrong > in RocksDBIncrementalRestoreOperation::restoreInstanceDirectoryFromPath. If > you

Re: Failed to resume job from checkpoint

2018-12-07 Thread Stefan Richter
I think then you need to investigate what goes wrong in RocksDBIncrementalRestoreOperation::restoreInstanceDirectoryFromPath. If you look at the code it lists the files in a directory and tries to hard link them into another directory, and I would only expect to see the mentioned exception if t

Re: Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
The version of the recovered checkpoint is also 1.7.0 . Stefan Richter 于2018年12月7日周五 下午11:06写道: > Just to clarify, the checkpoint from which you want to resume in 1.7, was > that taken by 1.6 or by 1.7? So far this is a bit mysterious because it > says FileNotFound, but the whole iteration is dr

[SURVEY] Usage of flink-python and flink-streaming-python

2018-12-07 Thread Till Rohrmann
Dear Flink community, in order to better understand the needs of our users and to plan for the future, I wanted to reach out to you and ask how much you use Flink's Python API, namely flink-python and flink-streaming-python. In order to gather feedback, I would like to ask all Python users to res

[ANNOUNCE] Weekly community update #49

2018-12-07 Thread Till Rohrmann
Dear community, this is the weekly community update thread #49. Please post any news and updates you want to share with the community to this thread. # Flink 1.7.0 has been released The community has release Flink 1.7.0 [1]. # Flink intro slide set Fabian has refined the slide set for an intro

Re: Failed to resume job from checkpoint

2018-12-07 Thread Stefan Richter
Just to clarify, the checkpoint from which you want to resume in 1.7, was that taken by 1.6 or by 1.7? So far this is a bit mysterious because it says FileNotFound, but the whole iteration is driven by listing the existing files. Can you somehow monitor which files and directories are created du

Re: Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
hi ,Stefan Thank you for your explanation. I used flink1.6.2, which is without any problems. I have tested it a few times with version 1.7.0, but every time I resume from the checkpoint, the job will show the exception I showed earlier, which will make the job unrecoverable.And I checked all the l

Re: Failed to resume job from checkpoint

2018-12-07 Thread Stefan Richter
Hi, From what I can see in the log here, it looks like your RocksDB is not recovering from local but from a remote filesystem. This recovery basically has steps: 1: Create a temporary directory (in your example, this is the dir that ends …/5683a26f-cde2-406d-b4cf-3c6c3976f8ba) and download all

Re: number of files in checkpoint directory grows endlessly

2018-12-07 Thread Andrey Zagrebin
Could you also have a look into other task executors and RocksDb LOG files? How many files are hard linked there after the last checkpoint? Does the total counter match number of files in shared directory? It would confirm that the problem is with the timer state compaction and there is no file h

Re: Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
Thanks. If you need me to provide information, please let me know, I will provide relevant information. Piotr Nowojski 于2018年12月7日周五 下午7:31写道: > Adding back user mailing list. > > Andrey, could you take a look at this? > > Piotrek > > On 7 Dec 2018, at 12:28, Ben Yan wrote: > > Yes. Previous ve

Re: Flink with Docker: docker-compose and FLINK_JOB_ARGUMENT exception

2018-12-07 Thread Spico Florin
Hello! I have used with success Flink with Docker, the approach presented in this article: http://diegoreico.com/environments/runningflinkgclusterwithzeppelin/ It it using a older version of Flink (1.4.2) in order to be compatible with Zeppelin client, but you can check if it works with new vers

Re: Flink with Docker: docker-compose and FLINK_JOB_ARGUMENT exception

2018-12-07 Thread Jeff Zhang
I didn't use the built-in docker of flink, but the following flink docker works for me pretty well. https://github.com/big-data-europe/docker-flink Piotr Nowojski 于2018年12月7日周五 下午6:20写道: > Hi, > > I have never used flink and docker together, so I’m not sure if I will be > able to help, howeve

Re: Failed to resume job from checkpoint

2018-12-07 Thread Piotr Nowojski
Adding back user mailing list. Andrey, could you take a look at this? Piotrek > On 7 Dec 2018, at 12:28, Ben Yan wrote: > > Yes. Previous versions never happened > > Piotr Nowojski mailto:pi...@data-artisans.com>> > 于2018年12月7日周五 下午7:27写道: > Hey, > > Do you mean that the problem started occ

Re: Failed to resume job from checkpoint

2018-12-07 Thread Piotr Nowojski
Hey, Do you mean that the problem started occurring only after upgrading to Flink 1.7.0? Piotrek > On 7 Dec 2018, at 11:28, Ben Yan wrote: > > hi . I am using flink-1.7.0. I am using RockDB and hdfs as statebackend, but > recently I found the following exception when the job resumed from the

Re: Use event time

2018-12-07 Thread Piotr Nowojski
You are welcome :) More or less you are correct. Assigning event time doesn’t reorder anything in the stream, that’s just a meta information about a record that can be used by various functions/operators, not only by windowed operations. As I answered in “A question on the Flink "rolling" FoldF

Failed to resume job from checkpoint

2018-12-07 Thread Ben Yan
hi . I am using flink-1.7.0. I am using RockDB and hdfs as statebackend, but recently I found the following exception when the job resumed from the checkpoint. Task-local state is always considered a secondary copy, the ground truth of the checkpoint state is the primary copy in the distributed sto

Re: Flink with Docker: docker-compose and FLINK_JOB_ARGUMENT exception

2018-12-07 Thread Piotr Nowojski
Hi, I have never used flink and docker together, so I’m not sure if I will be able to help, however have you seen this README: https://github.com/apache/flink/tree/master/flink-container/docker ? Shouldn’t you be passing your arguments via `FLINK_JOB_ARGUMENTS` environment variable? Piotrek >

RE: Re: Use event time

2018-12-07 Thread min.tan
Many thanks for sending your email. Does this mean that the event time only impacts on the event selection for a time window? Without use of a time window, the event time has no impact on the order of any records/events? Is my understanding correct? Thank you very much for your help. Regards

Re: Use event time

2018-12-07 Thread Piotr Nowojski
Hi again! Flink doesn’t order/sort the records according to event time. The preveiling idea is: - records will be arriving out of order, operators should handle that - watermarks are used for indicators of the current lower bound of the event time “clock” For examples windowed joins/aggregation

Re: A question on the Flink "rolling" FoldFunction

2018-12-07 Thread Piotr Nowojski
Hi Min, Please feel welcomed in the Flink community. One small remark, dev mailing list is for developers of Flink and all of the issues/discussions that arise in the process (discussing how to implement new feature etc), so user mailing list is the right one to ask questions about using Flink

Flink with Docker: docker-compose and FLINK_JOB_ARGUMENT exception

2018-12-07 Thread Marke Builder
Hi, I'm trying to run flink with docker (docker-compose) and job arguments "config-dev.properties". But it seams that the job arguments are not available: docker-compose.yml version: '2' services: job-cluster: image: ${FLINK_DOCKER_IMAGE_NAME:-timeseries-v1} ports: - '8081:8081'

Re: runtime.resourcemanager

2018-12-07 Thread Piotr Nowojski
Hi, Please investigate logs/standard output/error from the task manager that has failed (the logs that you showed are from job manager). Probably there is some obvious error/exception explaining why has it failed. Most common reasons: - out of memory - long GC pause - seg fault or other error fr

Use event time

2018-12-07 Thread min.tan
Hi, I am new to Flink. I have the following small code to use the event time. I did not get the result expected, i.e. it print out events in the order of event time. Did I miss something here? Regards, Min --Event time-- public static void main(String[] args)

A question on the Flink "rolling" FoldFunction

2018-12-07 Thread min.tan
Hi, I am new to Flink. I have a question on this "rolling" fold function. If its parallelism is large than one, does the "rolling" order remains the same? i.e. it is always keep the "1-2-3-4-5" on an increasing sequence. Regards, Min ---