Re: Checkpoint error - "The job has failed"

2021-04-28 Thread Dan Hill
1.11.1. > > [1] https://issues.apache.org/jira/browse/FLINK-16753 > > Best > Yun Tang > -- > *From:* Dan Hill > *Sent:* Tuesday, April 27, 2021 7:50 > *To:* Yun Tang > *Cc:* Robert Metzger ; user > *Subject:* Re: Checkpoint error - "The job has

Re: Checkpoint error - "The job has failed"

2021-04-28 Thread Yun Tang
n Tang Cc: Robert Metzger ; user Subject: Re: Checkpoint error - "The job has failed" Hey Yun and Robert, I'm using Flink v1.11.1. Robert, I'll send you a separate email with the logs. On Mon, Apr 26, 2021 at 12:46 AM Yun Tang mailto:myas...@live.com>> wrote: Hi Dan, I think y

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Dan Hill
.3. > > > [1] https://issues.apache.org/jira/browse/FLINK-16753 > > Best > Yun Tang > -- > *From:* Robert Metzger > *Sent:* Monday, April 26, 2021 14:46 > *To:* Dan Hill > *Cc:* user > *Subject:* Re: Checkpoint error - "T

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Yun Tang
Hill Cc: user Subject: Re: Checkpoint error - "The job has failed" Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill mailto:quietgol...@gmail.com>>

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Robert Metzger
Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill wrote: > My Flink job failed to checkpoint with a "The job has failed" error. The > logs contained no other recent

Re: Re: Re: Checkpoint Error

2021-03-10 Thread Till Rohrmann
og? > > Also, have you enabled concurrent checkpoint? > > Best, > Yun > > > --Original Mail -- > *Sender:*Navneeth Krishnan > *Send Date:*Mon Mar 8 13:10:46 2021 > *Recipients:*Yun Gao > *CC:*user > *Subject:*Re: Re: Checkpoint

Re: Re: Re: Checkpoint Error

2021-03-08 Thread Yun Gao
:46 2021 Recipients:Yun Gao CC:user Subject:Re: Re: Checkpoint Error Hi Yun, Thanks for the response. I checked the mounts and only the JM's and TM's are mounted with this EFS. Not sure how to debug this. Thanks On Sun, Mar 7, 2021 at 8:29 PM Yun Gao wrote: Hi Navneeth, It seems from

Re: Re: Checkpoint Error

2021-03-07 Thread Navneeth Krishnan
Krishnan > *Send Date:*Sun Mar 7 15:44:59 2021 > *Recipients:*user > *Subject:*Re: Checkpoint Error > >> Hi All, >> >> Any suggestions? >> >> Thanks >> >> On Mon, Jan 18, 2021 at 7:38 PM Navneeth Krishnan < >> reachnavnee...@gma

Re: Re: Checkpoint Error

2021-03-07 Thread Yun Gao
Hi Navneeth, It seems from the stack that the exception is caused by the underlying EFS problems ? Have you checked if there are errors reported for EFS, or if there might be duplicate mounting for the same EFS and others have ever deleted the directory? Best, Yun

Re: Checkpoint Error

2021-03-06 Thread Navneeth Krishnan
Hi All, Any suggestions? Thanks On Mon, Jan 18, 2021 at 7:38 PM Navneeth Krishnan wrote: > Hi All, > > We are running our streaming job on flink 1.7.2 and we are noticing the > below error. Not sure what's causing it, any pointers would help. We have > 10 TM's checkpointing to AWS EFS. > >

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-24 Thread Robert Metzger
Thanks for opening the ticket. I've asked a committer who knows the streaming sink well to take a look at the ticket. On Fri, Apr 24, 2020 at 6:47 AM Lu Niu wrote: > Hi, Robert > > BTW, I did some field study and I think it's possible to support streaming > sink using presto s3 filesystem. I

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-23 Thread Lu Niu
Hi, Robert BTW, I did some field study and I think it's possible to support streaming sink using presto s3 filesystem. I think that would help user to use presto s3 fs in all access to s3. I created this jira ticket https://issues.apache.org/jira/browse/FLINK-17364 . what do you think? Best Lu

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-21 Thread Lu Niu
Cool, thanks! On Tue, Apr 21, 2020 at 4:51 AM Robert Metzger wrote: > I'm not aware of anything. I think the presto s3 file system is generally > the recommended S3 FS implementation. > > On Mon, Apr 13, 2020 at 11:46 PM Lu Niu wrote: > >> Thank you both. Given the debug overhead, I might just

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-21 Thread Robert Metzger
I'm not aware of anything. I think the presto s3 file system is generally the recommended S3 FS implementation. On Mon, Apr 13, 2020 at 11:46 PM Lu Niu wrote: > Thank you both. Given the debug overhead, I might just try out presto s3 > file system then. Besides that presto s3 file system

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-13 Thread Lu Niu
Thank you both. Given the debug overhead, I might just try out presto s3 file system then. Besides that presto s3 file system doesn't support streaming sink, is there anything else I need to keep in mind? Thanks! Best Lu On Thu, Apr 9, 2020 at 12:29 AM Robert Metzger wrote: > Hey, > Others

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-09 Thread Robert Metzger
Hey, Others have experienced this as well, yes: https://lists.apache.org/thread.html/5cfb48b36e2aa2b91b2102398ddf561877c28fdbabfdb59313965f0a%40%3Cuser.flink.apache.org%3EDiskErrorException I have also notified the Hadoop project about this issue: https://issues.apache.org/jira/browse/HADOOP-15915

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-08 Thread Congxian Qiu
Hi LU I'm not familiar with S3 file system, maybe others in Flink community can help you in this case, or maybe you can also reach out to s3 teams/community for help. Best, Congxian Lu Niu 于2020年4月8日周三 上午11:05写道: > Hi, Congxiao > > Thanks for replying. yeah, I also found those references.

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-07 Thread Lu Niu
Hi, Congxiao Thanks for replying. yeah, I also found those references. However, as I mentioned in original post, there is enough capacity in all disk. Also, when I switch to presto file system, the problem goes away. Wondering whether others encounter similar issue. Best Lu On Tue, Apr 7, 2020

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-07 Thread Congxian Qiu
Hi >From the stack, seems the problem is that "org.apache.flink.fs.shaded. hadoop3.org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for s3ablock-0001-", and I googled the exception, found there is some relative page[1], could you please make sure

Re: Checkpoint Error in flink with Rockdb state backend

2016-05-29 Thread Aljoscha Krettek
Ah yes, if you used a local filesystem for backups this certainly was the source of the problem. On Sun, 29 May 2016 at 17:57 arpit srivastava wrote: > I think the problem was that i was using local filesystem in a cluster. > Now I have switched to hdfs. > > Thanks, > Arpit

Re: Checkpoint Error in flink with Rockdb state backend

2016-05-29 Thread arpit srivastava
I think the problem was that i was using local filesystem in a cluster. Now I have switched to hdfs. Thanks, Arpit On Sun, May 29, 2016 at 12:57 PM, Aljoscha Krettek wrote: > Hi, > could you please provide the code of your user function that has the > Checkpointed

Re: Checkpoint Error in flink with Rockdb state backend

2016-05-29 Thread Aljoscha Krettek
Hi, could you please provide the code of your user function that has the Checkpointed interface and is keeping state? This might give people a chance of understanding what is going on. Cheers, Aljoscha On Sat, 28 May 2016 at 20:55 arpit srivastava wrote: > Hi, > > I am