最近多个以rocksdb作为状态后端,hdfs作为远程文件系统的任务,频繁报错,这个报错有以下特征 1、报错之前这些任务都平稳运行,突然在某一天报错 2、当发现此类错误的时候,多个任务也会因相同的报错而导致checkpoint失败
报错信息如下 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/stream/flink-checkpoints/19523bf083346eb80b409167e9b91b53/chk-43396/cef72b90-8492-4b09-8d1b-384b0ebe5768 could only be replicated to 0 nodes instead of minReplication (=1). There are 8 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1723) 辛苦大家看看 谢谢
