还是一直报一样的问题?那说明可能是切换 reader 实现的地方有 bug,应该用
ProtobufWALStreamReader,不应该再用 TailingReader 了

sudo rm -rf /* <2326130...@qq.com.invalid> 于2024年7月22日周一 22:06写道:
>
> 我试试附件,或者明天再发下截图,在家里,谷歌邮箱登陆不了。不是最后一个文件,replication queue已经积压到了8百多
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "user-zh" <palomino...@gmail.com>;
> 发送时间: 2024年7月22日(星期一) 晚上9:56
> 收件人: "user-zh"<user-zh@hbase.apache.org>;
> 主题: Re: hbase2.6.0 replicationSource WALReader读取WAL异常
>
> 截图似乎挂了,看不到。。。
>
> 如果还在用 tailing reader 读,说明这是最后一个文件,他是不会跳过空文件的
>
> 如果已经有新的 WAL 文件了,应该不会继续用 tailing reader 读了,这个时候如果遇到 EOF 了,是有逻辑直接跳过的
>
> 现在 tailing reader 一直在读的是最后一个文件吗?还是其实已经不是最后一个文件了,但还是一直在用 tailing reader 读?
>
> sudo rm -rf /* <2326130...@qq.com.invalid> 于2024年7月22日周一 21:41写道:
>
> > 张老师
> >    您好,感谢您的回复,replication卡住了,我挑选了一个RS节点,replication status如下截图:
> >
> >
> > 截图中第一个文件格式是:hdfs://coreHBaseProdHa/hbase/WALs/sh2-int-hbase-main-ha-2,16020,1720603345541/sh2-int-hbase-main-ha-2%2C16020%2C1720603345541.1720606991648
> > 第一个文件已经不存在了
> > 第二个 三个文件指向oldWals目录中,文件存在,用hbase wal -p 文件读,报错如下:
> > Writer Classes: ProtobufLogWriter AsyncProtobufLogWriter
> > SecureProtobufLogWriter SecureAsyncProtobufLogWriter
> > Cell Codec Class: org.apache.hadoop.hbase.regionserver.wal.WALCellCodec
> > Exception in thread "main" java.io.EOFException: EOF while reading message
> > size
> >         at
> > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3727)
> >         at
> > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:56)
> >         at
> > org.apache.hadoop.hbase.wal.WALStreamReader.next(WALStreamReader.java:42)
> >         at
> > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:297)
> >         at
> > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:516)
> >         at
> > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:429)
> > 像是一个读到空文件的报错,
> > 其他正常WAL文件,hbase wal -p  命令运行正常,能解析wal文件的内容。您有空帮忙再看看,非常感谢
> >
> >
> > ------------------ 原始邮件 ------------------
> > *发件人:* "user-zh" <palomino...@gmail.com>;
> > *发送时间:* 2024年7月22日(星期一) 晚上8:56
> > *收件人:* "user-zh"<user-zh@hbase.apache.org>;
> > *主题:* Re: hbase2.6.0 replicationSource WALReader读取WAL异常
> >
> > Replication 卡了吗?Stream reader 是在不停的 tail
> > 文件的,如果遇到写了一半的就是有可能出异常,他会重试。如果没卡,后面还能继续读说明就没问题
> >
> > 你也可以尝试用 WALPrettyPrinter 去读一下那个文件看看能不能读?
> >
> > leojie <leo...@apache.org> 于2024年7月22日周一 18:03写道:
> > >
> > > 张老师
> > >     您好,请教一个问题,最近在测试hbase2.6.0,在开启replication时,replication
> > >
> > Source线程中,目前使用ProtobufWALStreamReader类(2.6.0新类)读取和解析WAL文件,遇到异常如下:InvalidProtocolBufferException$InvalidWireTypeException:
> > > Protocol message tag had invalid wire type.
> > > 看了源码,没看太懂,涉及底层Protocol序列化的问题,会是因为使用低版本hbase-client(比如:hbase2.2.7) api
> > > 写入数据导致的么
> > > 我的环境是:hadoop3.3.6 hbase2.6.0
> > > 详细的异常堆栈如下:
> > > 2024-07-22T17:47:49,130 WARN
> > >
> > [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464]
> > > wal.ProtobufWALStreamReader: Error while reading WALKey,
> > > originalPosition=0, currentPosition=81
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException:
> > > Protocol message tag had invalid wire type.
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829)
> > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212)
> > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204)
> > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321)
> > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> > >         at
> > >
> > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321)
> > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > >         at
> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153)
> > > ~[hbase-server-2.6.0.jar:2.6.0]
> > > 2024-07-22T17:48:13,315 WARN  [RS-EventLoopGroup-1-65]
> > > ipc.NettyRpcConnection: Exception encountered while connecting to the
> > > server tx1-int-hbase-main-prod-3:16020
> > > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException:
> > > connection timed out after 10000 ms: tx1-int-hbase-main-prod-3/
> > > 127.0.0.1:16020
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at
> > >
> > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > > ~[hbase-shaded-netty-4.1.7.jar:?]
> > >         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
> >
>

Reply via email to