2019-03-15 09:52:51 UTC - Kev Jackson: @Sijie Guo that sounds like a plan -
start in bk and add consul support in the same vein as etcd then if that is
looking like it will work move up the stack
----
2019-03-15 10:19:35 UTC - jia zhai: @Sanjeev Kulkarni Is there any log that
contained in class `org.apache.bookkeeper.stream.server.StorageServer` also
showed in the bookie1.log?
----
2019-03-15 10:29:40 UTC - Shivji Kumar Jha: When I run a test, it fails with
this exception
Caused by: com.github.dockerjava.api.exception.NotFoundException:
{"message":"pull access denied for apachepulsar/pulsar-test-latest-version,
repository does not exist or may require 'docker login'"}
----
2019-03-15 10:30:32 UTC - Sijie Guo: pulsar doesn’t publish
pulsar-test-latest-version image.
you can run `mvn clean install -DskipTest -Pdocker` to produce a test image
locally.
----
2019-03-15 10:31:23 UTC - Shivji Kumar Jha: ack, thanks!
----
2019-03-15 10:33:12 UTC - Shivji Kumar Jha: I have a patch but I need a quick
way to start the server from source and run my test. Ideas on quick hacks to
start broker server from source?
----
2019-03-15 10:41:22 UTC - Alexandre DUVAL: Do not hesitate to ask if i Can
help. @Sijie Guo this patch does nt add header :p
----
2019-03-15 10:47:23 UTC - Shivji Kumar Jha: @Sijie Guo
----
2019-03-15 11:04:27 UTC - Sijie Guo: are you testing a change on client side or
server side?
----
2019-03-15 11:05:13 UTC - Sijie Guo: @Alexandre DUVAL :slightly_smiling_face:
----
2019-03-15 11:05:24 UTC - Sijie Guo: is your admin talking to a broker or a
proxy?
----
2019-03-15 11:05:50 UTC - Alexandre DUVAL: I tried directly to one broker
----
2019-03-15 11:06:06 UTC - Alexandre DUVAL: Should I go through proxy node?
----
2019-03-15 11:17:23 UTC - Shivji Kumar Jha: broker side -
SchemaUpdateStrategyTest
----
2019-03-15 11:18:44 UTC - Shivji Kumar Jha: I am introducing an option to the
disable schema update compatibility check on broker side. This test will be
included in this test file. And then I have to run this test.
----
2019-03-15 11:22:45 UTC - Shivji Kumar Jha: @Sijie Guo
----
2019-03-15 11:53:05 UTC - Sijie Guo: oh i see. if you are reusing
SchemaUpdateStrategyTest, unfortunately you have to generate a test image
locally :-)
----
2019-03-15 11:54:16 UTC - Sijie Guo: oh you can write a separate test under
pulsar-broker module, so you can use some mocking classes
----
2019-03-15 12:26:11 UTC - Darragh: all our commands were run with -b 0 from the
start
----
2019-03-15 12:37:41 UTC - Alexandre DUVAL: Same issue.
----
2019-03-15 12:57:48 UTC - Darragh: although I've managed to get rid of most
tail latencies, there are still some spikes going to 200+ms
----
2019-03-15 15:23:33 UTC - Matteo Merli: how frequent the spikes? What
percentile do these affect?
----
2019-03-15 15:24:51 UTC - Matteo Merli: Typically there are 2 sources of
latency spikes:
1. Disk writes stalling for 100 ms or so (this happens on SSDs with or without
fsyncs)
2. JVM GC pauses
----
2019-03-15 15:25:16 UTC - Matteo Merli: for 1. doing w=3 a=2 will be able to
smooth the latency
----
2019-03-15 15:26:13 UTC - Matteo Merli: for 2. same as above w=3 a=2 will
smooth the latency for Bookies GC pauses, although broker (and client) pauses
will still be there
----
2019-03-15 15:31:49 UTC - Darragh: its for the 99 percentile ranges
----
2019-03-15 15:31:54 UTC - Darragh: 50 etc is fine
----
2019-03-15 15:32:27 UTC - Matteo Merli: (for improving GC pauses you should
consider using Shenandoah or ZGc in Java11)
----
2019-03-15 15:32:44 UTC - Darragh: hm ok currently we've been using java8
----
2019-03-15 15:33:11 UTC - Matteo Merli: if you’re on RHEL/Centos that would
come with Shenandoah
----
2019-03-15 15:33:30 UTC - Darragh: we're using amazon linux 2
----
2019-03-15 15:33:35 UTC - Darragh: so it's rhel based iirc
----
2019-03-15 15:33:56 UTC - Matteo Merli: Yes, I think it does have it by default
----
2019-03-15 15:34:13 UTC - Darragh: ok I'll try that then
----
2019-03-15 15:34:22 UTC - Darragh: we already are using w=3 a=2 as the default
----
2019-03-15 15:34:29 UTC - Darragh: in our broker conf
----
2019-03-15 15:34:45 UTC - Matteo Merli: Ok, can you then correlate the latency
spikes with the GC pauses?
----
2019-03-15 15:35:03 UTC - Darragh: I'll have to recheck with grafana
----
2019-03-15 15:35:44 UTC - Darragh: and these would be gc pauses on the bookies
right ?
----
2019-03-15 15:38:21 UTC - Darragh: I don't really see any spike showing up in
the GC pauses graph
----
2019-03-15 15:41:04 UTC - Darragh:
----
2019-03-15 15:41:49 UTC - Darragh: this is with -r 10000 -b 0
----
2019-03-15 15:42:06 UTC - Darragh: I'll try with java11 next week I guess
----
2019-03-15 15:44:03 UTC - Darragh: broker seems to have had some gc pause spikes
----
2019-03-15 15:44:51 UTC - Darragh: just 2 though and I've seen more latency
spikes
----
2019-03-15 15:48:05 UTC - Darragh: yeah I see some pattern in the GC spikes of
the broker about every ~1.40 minutes to 300/400ms
----
2019-03-15 15:48:16 UTC - Shivji Kumar Jha: Hi, I am running a test
(SchemaUpdateStrategyTest) in debug mode and while the server is running I wish
to check something using the rest APIs. Lets say
curl -X GET <http://localhost:32783/admin/v2/brokers/:cluster>
I cant get the curl working... wrong broker url? Please help!
----
2019-03-15 15:48:41 UTC - Shivji Kumar Jha: I see the test starts pulsar using
docker, here is my docker ps
----
2019-03-15 15:48:47 UTC - Shivji Kumar Jha:
----
2019-03-15 15:49:46 UTC - Shivji Kumar Jha: The docker thing worked well
actually, thank you very much :slightly_smiling_face:
----
2019-03-15 15:55:50 UTC - Alexandre DUVAL: @Matteo Merli did you have time to
work on it? Can I hlep you?
----
2019-03-15 15:56:29 UTC - Matteo Merli: Started but don’t have solution yet.
Trying to get this completed today
----
2019-03-15 15:56:59 UTC - Alexandre DUVAL: Cool, do not hesitate to ping me for
anything or when it's done :stuck_out_tongue:.
----
2019-03-15 15:57:00 UTC - Alexandre DUVAL: Thanks
----
2019-03-15 15:58:10 UTC - Matteo Merli: Yes, the publishing without batching is
more intensive on GC. If you use a 1ms batching time, it would reduce that
since broker will only deal in “batches”.
Other options are to increase JVM heap size to make the pauses less frequent
----
2019-03-15 15:59:44 UTC - Sanjeev Kulkarni: Hey @jia zhai i resolved the issue.
it had to do with specifying the right quorum
+1 : jia zhai
----
2019-03-15 16:10:08 UTC - Maarten Tielemans: Thanks for the feedback @Matteo
Merli We'll look into Java11, prob Monday. We will also do some testing with
smaller msg/sec rate and bigger msg size. Hopefully those changes will resolve
the last spikes
----
2019-03-15 16:28:48 UTC - Matteo Merli: I’d say the easiest fix might be to
enable batching with 1ms group time
----
2019-03-15 18:18:40 UTC - Joe Francis: :+1: A few points to note though -->
Pulsar use of ZK is very different from Kafka. Pulsar does not allow clients to
access ZK. --> It would also be good to split the metadata storage and
cluster management functions so that they can use separate services.
----
2019-03-15 18:48:07 UTC - Ali Ahmed: <http://localhost:32783/> is bookie url ,
you need to connect to localhost:32788
----
2019-03-15 19:49:33 UTC - JAYARAM NAGARAJAN: Hello:
Facing couple of issues with S3 Offloading as follows:
Setup broker.conf:
managedLedgerOffloadDriver=aws-s3
s3ManagedLedgerOffloadBucket=pulsar-topic-offload=<ourBucket>/temp/pulsar
s3ManagedLedgerOffloadRegion=us-east-1
1) Auto offload issue
Set 1M as size for Threshold
bin/pulsar-admin namespaces set-offload-threshold --size 1M nest/cicd
Sent a file more than 1Mb to the topic (went fine):
bin/pulsar-client produce <persistent://nest/cicd/coaf_sql_revup> -f
licenseNew
Tried checking the status for offload, but did not run
bin/pulsar-admin topics offload-status nest/cicd/coaf_sql_revup
Offload has not been run for <persistent://nest/cicd/coaf_sql_revup>
since broker startup
Repeated the process for many times so size increase more than 15 MB, still no
luck....
2) Manual offload issue
bin/pulsar-admin topics offload --size-threshold 1M nest/cicd/coaf_sql_revup
Offload triggered for <persistent://nest/cicd/coaf_sql_revup> for messages
before 65:0:-1
[root@ip-10-207-192-140 apache-pulsar-2.3.0]# bin/pulsar-admin topics
offload-status nest/cicd/coaf_sql_revup
Offload was a success
Though i get success here, but in the s3 path
s3://<ourBucket>/test/pulsar we do not see anything. Expected some
offloaded file to show up
Please let me know what needs to be done....
----
2019-03-15 20:02:49 UTC - David Kjerrumgaard: @JAYARAM NAGARAJAN How are you
providing your AWS credentials to Pulsar?
----
2019-03-15 20:02:52 UTC - David Kjerrumgaard: "To be able to access AWS S3, you
need to authenticate with AWS S3. Pulsar does not provide any direct means of
configuring authentication for AWS S3, but relies on the mechanisms supported
by the DefaultAWSCredentialsProviderChain."
----
2019-03-15 20:21:44 UTC - JAYARAM NAGARAJAN: @David Kjerrumgaard Our Ec2 has
IAM role which has full access to my S3 bucket and guessing by the status it
gave me "Offload was a success" i am guessing that it got the required
authentication... else should have seen an error...
----
2019-03-15 20:42:19 UTC - Ashwin: @Ashwin has joined the channel
----
2019-03-15 21:02:49 UTC - David Kjerrumgaard: @JAYARAM NAGARAJAN Are there any
messages in the log file?
----
2019-03-15 21:04:32 UTC - David Kjerrumgaard: @JAYARAM NAGARAJAN Also, does the
bucket path in S3 already exist?, .i.e is there a
`<ourBucket>/test/pulsar` inside S3 currently?
----
2019-03-15 21:10:23 UTC - JAYARAM NAGARAJAN: @David Kjerrumgaard We started
this as a standalone pulsar mode and when we ran the cli commands as above , we
did not see any log file, is there a place to see the logs getting written?
Also YES the S3 path exists, we have some sample files in the path
<ourBucket>/test/pulsar currently
----
2019-03-15 21:11:10 UTC - Ali Ahmed: @JAYARAM NAGARAJAN is this a standalone
container ?
----
2019-03-15 21:13:26 UTC - JAYARAM NAGARAJAN: @Ali Ahmed This is not a
container, but bare metal ec2 single node and i am running pulsar as standalone
in this
----
2019-03-15 21:14:56 UTC - Ali Ahmed: if you used the pulsar standalone can you
check the stdout from the process
----
2019-03-15 21:15:25 UTC - Ali Ahmed: also can you try with a root level empty
s3 bucket
----
2019-03-15 21:15:37 UTC - Ali Ahmed: so that we can better isolate the problem
----
2019-03-15 22:30:49 UTC - JAYARAM NAGARAJAN: This is what i am getting in the
logs now, now we updated the s3 bucket to just the root bucket
<our_bucket> and re-ran ... now offloading does not work
----
2019-03-15 22:34:11 UTC - Ali Ahmed: @JAYARAM NAGARAJAN this makes more sense I
can there still maybe a permission issue , can’t tell from the logs will have
to check the code.
----
2019-03-16 00:25:26 UTC - David Kjerrumgaard: Based on the log output, it looks
like you are using the the
`org.apache.bookkeeper.mledger.impl.NullLedgerOffloader`, which only happens if
the offload driver property isn't set
----
2019-03-16 00:34:26 UTC - David Kjerrumgaard: So for some reason, Pulsar is not
seeing the `managedLedgerOffloadDriver` setting in your broker.conf file.
----
2019-03-16 00:37:14 UTC - David Kjerrumgaard: @JAYARAM NAGARAJAN Make sure you
only have one setting for the `managedLedgerOffloadDriver` property in your
broker.conf and restart. Look for the following error message in the log file
that indicates that Pulsar was find a value for that property. `No ledger
offloader configured, using NULL instance`
----
2019-03-16 07:16:51 UTC - naga: Guys...how about making this aws managed service
----
2019-03-16 07:17:27 UTC - naga: I can volunteer to manage this project
----
2019-03-16 07:51:29 UTC - Ali Ahmed: @naga the focus is make to pulsar work
well with kubernetes as that’s seems to where the cloud providers are moving
towards
----
2019-03-16 08:06:09 UTC - naga: Yaeh... good then...
----
2019-03-16 08:06:22 UTC - naga: Any idea of when this would be available
----
2019-03-16 08:18:05 UTC - Shivji Kumar Jha: Doesn't work either. There is
something thats eluding me since yesterday :thinking_face:
----
2019-03-16 09:06:46 UTC - Shivji Kumar Jha: Though I can go inside docker and
get a response:
root@pulsar-broker-1:/pulsar# curl
<http://pulsar-broker-1:8080/admin/v2/brokers/health>
ok
----