semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-215965
@lhotari
I had tested
pulsar 3.2.3 + bookkeeper 4.16.6-SNAPSHOT.jar + netty buffer
4.1.111.Final-SNAPSHOT
and I think this issue had been fixed.
I will test
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2136866092
@lhotari
I tested in standalone server.
here is my test report
https://github.com/semistone/personal_notes/blob/main/pulsar_issue_22601/20240528Test.md
I will
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2136253977
I have made a fix to bookkeeper which fixes the issue:
https://github.com/apache/bookkeeper/pull/4404 .
I tested with Pulsar 3.2.3 with bookkeeper.version set to
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2134759227
RangeCache race condition fix: https://github.com/apache/pulsar/pull/22789
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2134495454
> Thanks to point out that config issue
> I tested and verified it worked.
@semistone It seems that V2 should be supported also when using TLS. The
downside of using V3 is
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2134203674
@lhotari
Thanks to point out that config issue
I tested and verified it worked.
also talk about our release history.
actually we have notice that issue
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133851254
It looks like there's a feature that enables V3 protocol using a separate
client when V2 is configured: https://github.com/apache/bookkeeper/pull/2085 .
As mentioned in the
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133843464
Previous comment about bookkeeperUseV2WireProtocol
https://github.com/apache/pulsar/issues/21421#issuecomment-1818774973
--
This is an automated message from the Apache Git
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133840855
I found https://github.com/apache/bookkeeper/issues/2071 and
https://github.com/apache/bookkeeper/pull/2085 which seem to indicate that
running with
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133830552
@semistone Your problem is caused by invalid TLS configuration. When TLS is
enabled between Broker and Bookkeeper, you must set
`bookkeeperUseV2WireProtocol=false` in
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133426757
I can confirm that this issue reproduces with
https://github.com/lhotari/pulsar-playground/tree/master/issues/issue22601/standalone_env
/
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133358300
> hope you could reproduce in your local .
@semistone Thanks, very useful. I revisited your repro instructions so that
it's possible to semi-automate the steps. I have the
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2133033536
Experiment in [branch lh-issue22601-experiment in my
fork](https://github.com/lhotari/pulsar/commits/lh-issue22601-experiment/) to
build with Bookkeeper 4.16.6-SNAPSHOT and Netty
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2132973488
@semistone Some initial comments about the reproduce steps.
Btw. regarding "op.data = data.asReadOnly(); <=== then that error disappear
but I still don't known why." in
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2132818143
I will try again in branch-4.16
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2132795726
@lhotari
I have update repo and all config in standalone server on that link
I only patch your PR bookkeeper on tag release-4.16.5
```
[chenyinchin01@cockroach308
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2132777078
@lhotari
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2131233568
I write repproduce steps and some investigate history in
https://github.com/semistone/personal_notes/blob/main/pulsar_issue_22601/Test.md
--
This is an automated
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2127868798
I finally found the root cause in Bookkeeper client. The client had several
buffer leaks since the ByteBufList lifecycle was incorrectly handled and write
promises were dropped and
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2127692053
It's likely that this is a Bookkeeper client issue. The PR #22760 might be
fixing a different issue.
There's a fix https://github.com/apache/bookkeeper/pull/4289 which will be
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2126606215
> > I tested with that patch.. rollback all libs to version 3.2.2 and only
replace pulsar-testclient.jar and pulsar-common.jar I also tried to run my test
in one data center
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2126291607
> I tested with that patch.. rollback all libs to version 3.2.2 and only
replace pulsar-testclient.jar and pulsar-common.jar I also tried to run my test
in one data center only.
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2126072096
I tested with that patch..
rollback all libs to version 3.2.2 and only replace pulsar-testclient.jar
and pulsar-common.jar
I also tried to run my test in one data center
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2125182828
> seem change to read only actually change the behavior... :(
> please ignore my previous comments.
`.duplicate()` or `.retainedDuplicate()` should be sufficient to replace
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2124331850
I have tested #22760 with the repro case [and
scripts](https://github.com/lhotari/pulsar-playground/tree/master/issues/issue22601)
that are based on the modified pulsar-perf and
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2124240776
> @semistone I have created a PR #22760 to fix the problem. It's currently
in draft state since I'm currently testing the solution to verify that it
mitigates the problem.
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2124209699
@semistone I have created a PR #22760 to fix the problem. It's currently in
draft state since I'm currently testing the solution to verify that it
mitigates the problem.
--
This
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2124039443
Thanks for helping check this issue ,
I don't have any progress today :(
I could see there is another io thread is using that unwrap bytebuf
but at least it could
eolivelli commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2123993443
This is a great finding
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2123954717
Looking up in Netty issue tracker.
Found these issues that provide a lot of context:
* https://github.com/netty/netty/issues/6184
*
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2123898015
I couldn't reproduce with Pulsar Standalone, but I have a way with a local
Microk8s cluster where I could also attach a debugger. With break points in
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2123891351
my JVM option is
```
/usr/bin/java -Dlog4j.shutdownHookEnabled=false -cp
/opt/pulsar/hybrid/conf:::/opt/pulsar/hybrid/lib/*:
-Dlog4j.configurationFile=log4j2.yaml
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122988851
This `Got exception java.lang.IllegalArgumentException: newPosition > limit:
(2094 > 88)` issue also reproduces with Pulsar 3.2.3:
```
[pulsar-testenv-deployment-broker-0]
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122801177
also got this type of exception
```
[pulsar-testenv-deployment-broker-2] 2024-05-21T13:31:48,622+
[pulsar-io-3-10] WARN org.apache.pulsar.broker.service.ServerCnx -
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122642708
I was able to reproduce a few issues with the test setup.
logs are at https://gist.github.com/lhotari/8302131cde5a0f0999e39f8fbd391f09
.
```
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122163905
> I will debug who touch that object tomorrow.
It's also possible that nothing touches it, but it's due to a multithreading
issue. one useful experiment would be to make the
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122157125
which JVM args do you run with? For example heap size etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122149152
```
java --version
java 17.0.10 2024-01-16 LTS
Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.10+11-LTS-240,
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122133892
> and it happen only when it have consumer running and higher QPS (in my
server it happen about 1000 QPS)
> and -s 2000 (payload 2K)
consumer
```
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122073555
> do you happen to run with debug logging level when the issue reproduces?
(just wondering if debug logging code like
>
>
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122071018
> > -bp 2 to reproduce it.
>
> I was using `-bp 5` before, updated that to `-bp 2`.
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122067529
do you happen to run with debug logging level when the issue reproduces?
(just wondering if debug logging code like
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122046569
> -bp 2 to reproduce it.
I was using `-bp 5` before, updated that to `-bp 2`.
https://github.com/lhotari/pulsar-playground/commit/63035e9c4ebf656efe12bfcea859743e8ffb8a8c
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122027036
> I've been trying to reproduce the issue with local microk8s cluster by
deploying Pulsar with Apache Pulsar Helm chart using this values file:
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2122009334
I have 6 bookkeeper in 3 different data center and I left only one broker
running for debug
--
This is an automated message from the Apache Git Service.
To respond to the
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121997245
How many brokers and bookies do you have in the cluster where it reproduces?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121995724
I've been trying to reproduce the issue with local microk8s cluster by
deploying Pulsar with Apache Pulsar Helm chart using this values file:
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121994296
> data.duplicate();
I test it and It seems also work
I will repeat success/failure test later to confirm it again.
--
This is an automated message from the Apache
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121976803
> then that issue seem disappear. but not sure is there any side affect or
not.
> and I don't known who could touch that bytebuf.
> It have OpAddEntry.getData and
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121966346
I also debug in
PulsarDecoder.channelRead
print bytebuf object id and compare with the bytebuf in OpAddEntry
I don't see the same bytebuf object been reused during
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121902080
> acknowledgmentAtBatchIndexLevelEnabled
Yes, I enabled it
after debugging, I found
If I add
```
op.data = data.asReadOnly(); << make it read only
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2121816254
@semistone would it be possible to share your broker.conf customizations?
That could help reproduce the issue. I noticed `--batch-index-ack` in the
pulsar-perf command line. I
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2119233183
> error happen when messageKeyGenerationMode=random if without
messageKeyGenerationMode, then error disappear
This is a useful detail. When messageKeyGenerationMode is random
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2118793218
> > unfortunately I can't preproduce in docker, I guess docker standalone is
different from my pulsar cluster.
> > my pulsar cluster is
> > almost default config but with
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2118789154
> unfortunately I can't preproduce in docker, I guess docker standalone is
different from my pulsar cluster.
> my pulsar cluster is
> almost default config but with TLS auth
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2117009710
I also test again
if publish payload always 20K, it won't happen
only happen when normal is 2K but some data bigger than 16K( sound like
netty receive buffer size but I also
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2116971327
@lhotari
I am checking when that byteBuf become wrong data
and in
OpAddEntry.java
I verify data when construct this object and save original data
and during
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2102186905
> > @semistone since you have some way to reproduce this in your own tests,
would you be able to test if this can be reproduced with
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2102175699
> @semistone since you have some way to reproduce this in your own tests,
would you be able to test if this can be reproduced with
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2101968443
Hi @lhotari
I update perf tool in
https://github.com/semistone/pulsar/tree/debug_ssues_22601
it only include one commit which modify PerformanceProducer.java to
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2101947995
Hi @lhotari
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2100631193
I almost could reproduce by perf tool
when very few payload > 30K bytes. others are 3K bytes
then
error happen when messageKeyGenerationMode=random
if without
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2100050623
@semistone since you have some way to reproduce this in your own tests,
would you be able to test if this can be reproduced with
lhotari commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2099829370
> I tried to upgrade to bookkeeper 4.17.0
> but still have the same issue :(
@semistone Thanks for testing this.
--
This is an automated message from the Apache Git
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2099819649
I tried to upgrade to bookkeeper 4.17.0
but still have the same issue :(
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2089814280
we do many tests
current broker setting is
```
maxMessageSize=5242880
and producer setting (small batch message and big max bytes)
```
semistone commented on issue #22601:
URL: https://github.com/apache/pulsar/issues/22601#issuecomment-2088011747
we still try to compare what's the different between our producer and perf
tool
will feedback later once we have any conclusion.
--
This is an automated message from the
67 matches
Mail list logo