2019-09-05 10:29:13 UTC - Ravi Shah: What is the max size of producer queue?
I am getting this error : Producer send queue is full
----
2019-09-05 10:30:16 UTC - Ravi Shah: I am evaluation pulsar on standalone
cluster with 10000K, is that ok?
----
2019-09-05 11:43:31 UTC - borlandor: Cannot use JSONObject In Pulsar Java
Function .
When I use any json.jar ,the Pulsar Java function with JSONObject/JsonObject
cannot work :
19:31:49.252 [function-timer-thread-90-1] ERROR
org.apache.pulsar.functions.runtime.ProcessRuntime - Health check failed for
AddWindowFunction-0
java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException:
UNAVAILABLE: io exception
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
~[?:1.8.0_222]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
~[?:1.8.0_222]
at
org.apache.pulsar.functions.runtime.ProcessRuntime.lambda$start$1(ProcessRuntime.java:164)
~[org.apache.pulsar-pulsar-functions-runtime-2.4.0.jar:2.4.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_222]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[?:1.8.0_222]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_222]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[?:1.8.0_222]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_222]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_222]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
----
2019-09-05 11:43:36 UTC - Retardust: is there any cli or rest api to get
(latest - 10) messages in partitioned topic?
----
2019-09-05 11:45:21 UTC - borlandor: The Pulsar Function code likes this:
=----------------------------------------------------------------------------------=
package org.apache.pulsar.functions.api.examples;
import lombok.extern.slf4j.Slf4j;
import java.util.Collection;
import java.util.function.Function;
import java.util.Set;
import java.util.Map;
import java.util.HashMap;
import java.util.Optional;
//import java.util.concurrent.CompletableFuture;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
/**
* Example Function that acts on a window of tuples at a time rather than per
tuple basis.
*/
/*
@Slf4j
public class AddWindowFunction implements Function
<Collection<Integer>, Integer> {
@Override
public Integer apply(Collection<Integer> integers) {
return integers.stream().reduce(0, (x, y) -> x + y);
}
}
*/
@Slf4j
public class AddWindowFunction implements Function
<Collection<String>, String> {
@Override
public String apply(Collection<String> LogItems) {
String result = "";
for (String record : LogItems)
{
JsonObject jsonObject = new
Gson().fromJson(record, JsonObject.class);//new
JsonObject(record);//JSONObject.parseObject(record);
String s =
jsonObject.get("pktype").getAsString();//jsonObject.getString("pktype");
result += "," + s;
}
return result;
}
}
----
2019-09-05 16:15:04 UTC - Martin Ashby: Hi, does anyone know of any RxJava
integration with Apache Pulsar?
----
2019-09-05 16:52:33 UTC - Sijie Guo: Do you mind filing a github issue with
more detailed information of you case? That would help us understand your use
case and be able to help you.
----
2019-09-05 16:53:43 UTC - Sijie Guo: there is a setting in producer
configuration called maxPendingMessages. Tune the setting accordingly.
----
2019-09-05 17:00:44 UTC - Sijie Guo: Not I am aware of
----
2019-09-05 17:15:27 UTC - Luke Lu: Has anyone here gotten Athena (AWS Presto as
a service) to work with Pulsar Proxy with auth?
----
2019-09-05 17:29:22 UTC - Kendall Magesh-Davis: I’ve had a pulsar minikube
scale cluster running for a week. A node in my cluster died and was
automatically replaced, and a bookie has not been able to reschedule itself.
``` Normal Scheduled 83s default-scheduler
Successfully assigned
pulsar-test-240/pulsar-test-240-bookkeeper-2 to node1.redacted.compute.internal
Normal SuccessfulMountVolume 83s kubelet,
node1.redacted.compute.internal MountVolume.SetUp succeeded for volume
"pulsar-test-240-bookkeeper-journal"
Normal SuccessfulMountVolume 83s kubelet,
node1.redacted.compute.internal MountVolume.SetUp succeeded for volume
"pulsar-test-240-bookkeeper-ledgers"
Normal SuccessfulMountVolume 83s kubelet,
node1.redacted.compute.internal MountVolume.SetUp succeeded for volume
"default-token-m2rbj"
Normal Pulled 82s kubelet,
node1.redacted.compute.internal Container image
"apachepulsar/pulsar-all:2.4.0" already present on machine
Normal Created 82s kubelet,
node1.redacted.compute.internal Created container
Normal Started 82s kubelet,
node1.redacted.compute.internal Started container
Normal Pulled 79s kubelet,
node1.redacted.compute.internal Container image
"apachepulsar/pulsar-all:2.4.0" already present on machine
Normal Created 79s kubelet,
node1.redacted.compute.internal Created container
Normal Started 79s kubelet,
node1.redacted.compute.internal Started container
Normal Pulled 20s (x4 over 76s) kubelet,
node1.redacted.compute.internal Container image
"apachepulsar/pulsar-all:2.4.0" already present on machine
Normal Created 20s (x4 over 76s) kubelet,
node1.redacted.compute.internal Created container
Normal Started 20s (x4 over 76s) kubelet,
node1.redacted.compute.internal Started container
Warning BackOff 14s (x4 over 65s) kubelet,
node1.redacted.compute.internal Back-off restarting failed container```
Any suggestions? I tried removing that pod’s entry from ZK /ledger/cookies and
deleting the pod - it doesn’t help.
----
2019-09-05 17:50:47 UTC - Guy Cole: @Guy Cole has joined the channel
----
2019-09-05 18:39:22 UTC - Sijie Guo: unfortunately, currently there is no
command for that. feel free to create a github issue to request adding this
functionality
----
2019-09-05 18:39:48 UTC - Retardust: Ok, I get it, thanks
----
2019-09-05 18:42:30 UTC - Sijie Guo: I am not sure how does network work for
Athena and your AWS cluster. but as you mentioned “Pulsar Proxy”, I am guessing
that your Pulsar cluster is behind a VPC and Pulsar proxy is the access point.
If my guess is correct, then currently it is hard to get Athena to access your
Pulsar cluster. Because the presto connector reads directly from the storage
layer (bookkeeper). It means that Athena should be able to access the
bookkeeper machines in order to make the presto connector work.
Feel free to file a github issue and we will look into how to improve or
provide a solution for that.
----
2019-09-05 18:45:58 UTC - Sijie Guo: `bookie has not been able to reschedule
itself`
---
Did your pod keep crash or have you seen any errors in your bookkeeper pod?
If the pod didn’t keep crashing, I am more suspecting it is not about the
bookie itself. can you use kubectl to get more details about why kubernetes
doesn’t reschedule the tool?
----
2019-09-05 18:48:30 UTC - Retardust:
<https://github.com/apache/pulsar/issues/5126>
----
2019-09-05 18:51:13 UTC - Kendall Magesh-Davis: The pod keep crashing, what i
pasted was the events from the pod.
----
2019-09-05 18:51:50 UTC - Kendall Magesh-Davis: pod logs say
```17:26:33.813 [main] ERROR org.apache.bookkeeper.bookie.Bookie - There are
directories without a cookie, and this is neither a new environment, nor is
storage expansion enabled. Empty directories are
[data/bookkeeper/journal/current, data/bookkeeper/ledgers/current]```
and
```17:26:33.823 [main] ERROR org.apache.bookkeeper.server.Main - Failed to
build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException:
at
org.apache.bookkeeper.bookie.Bookie.checkEnvironmentWithStorageExpansion(Bookie.java:470)
~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]```
----
2019-09-05 19:15:41 UTC - Karthik Duraisamy: @Karthik Duraisamy has joined the
channel
----
2019-09-05 19:38:46 UTC - Luke Lu: Got it. So Presto needs to be in the same
network of Pulsar? Does Presto work with Pulsar auth turned on (e.g. for
certain namespace only)?
----
2019-09-05 19:41:39 UTC - Sijie Guo: @Luke Lu - currently yes, it has to be in
the same network. the authentication part is done very basic. because the
approach that Presto connector taking is by pass the broker to read directly
from the storage.
----
2019-09-05 19:44:09 UTC - Luke Lu: What about data already offloaded to s3? Are
you saying the connector would talk to s3 directly as well?
----
2019-09-05 19:44:56 UTC - Luke Lu: How does the connector get the s3
credentials then?
----
2019-09-05 19:53:51 UTC - Sijie Guo: @Luke Lu yes. the connector talks to s3
directly.
> How does the connector get the s3 credentials then?
I believe you have to configure the presto worker nodes for the s3 credentials
at this moment.
Currently the authentication / authorization about this part is very basic. It
requires quite some work at the bookkeeper level and about how to manage the
credentials for offloaded data.
----
2019-09-05 20:09:39 UTC - Luke Lu: Thanks. Will dig more then.
----
2019-09-05 20:56:14 UTC - Chris Bartholomew: It looks like you lost the storage
for a bookie when the node died. Did your minikube config include persistent
volumes?
----
2019-09-05 22:46:26 UTC - Luke Lu: Is there anyway to set up offloading such
that the offloaded ledger’s sizes are larger than certain size. We already set
the offload-threshold to be 10MB. We found that if the ingestion rate of a
particular topic is fairly low (say 200msg/s), the offloaded ledger sizes would
vary between 500KB - 10MB, which makes catch up read and seek read from earlier
time extremely slow (on the order of 200msg/s). This can be easily reproduced
with pulsar-perf. The performance of catch up read on offloaded topics is
acceptable when the ingestion rate is high, when we see offloaded ledger size
between 500MB and 1.5GB.
----
2019-09-05 22:52:10 UTC - Matteo Merli: There are few tunables to control the
ledger roll-overs:
```
# Max number of entries to append to a ledger before triggering a rollover
# A ledger rollover is triggered on these conditions
# * Either the max rollover time has been reached
# * or max entries have been written to the ledged and at least min-time
# has passed
managedLedgerMaxEntriesPerLedger=50000
# Minimum time between ledger rollover for a topic
managedLedgerMinLedgerRolloverTimeMinutes=10
# Maximum time before forcing a ledger rollover for a topic
managedLedgerMaxLedgerRolloverTimeMinutes=240
```
----
2019-09-05 22:52:25 UTC - Matteo Merli: you can increase the max time, to have
bigger ledgers
----
2019-09-05 22:57:45 UTC - Luke Lu: Thanks!
----
2019-09-06 05:17:39 UTC - Ali Ahmed: new slides by yahoo japan on pulsar
----
2019-09-06 05:17:42 UTC - Ali Ahmed:
<https://www.slideshare.net/NozomiKurihara1/apache-pulsar-meetup-pulsarmeetupjapan20190904>
----
2019-09-06 08:20:30 UTC - borlandor: Ok, thanks! see also:
<https://github.com/apache/pulsar/issues/5136>
----
2019-09-06 08:46:45 UTC - Martin Ashby: OK. It seems like a pretty good
candidate; both are systems that let you operate on streams
----
2019-09-06 08:48:39 UTC - Sijie Guo: yeah feel free to contribute one
integration :slightly_smiling_face:
+1 : Martin Ashby
----