Slack digest for #general - 2019-09-06

Apache Pulsar Slack Fri, 06 Sep 2019 02:11:17 -0700

2019-09-05 10:29:13 UTC - Ravi Shah: What is the max size of producer queue?
I am getting this error : Producer send queue is full
----
2019-09-05 10:30:16 UTC - Ravi Shah: I am evaluation pulsar on standalone 
cluster with 10000K, is that ok?
----
2019-09-05 11:43:31 UTC - borlandor: Cannot use JSONObject In Pulsar Java 
Function .
When I use any json.jar ,the Pulsar Java function with JSONObject/JsonObject 
cannot work :
19:31:49.252 [function-timer-thread-90-1] ERROR 
org.apache.pulsar.functions.runtime.ProcessRuntime - Health check failed for 
AddWindowFunction-0
java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
~[?:1.8.0_222]
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) 
~[?:1.8.0_222]
        at 
org.apache.pulsar.functions.runtime.ProcessRuntime.lambda$start$1(ProcessRuntime.java:164)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.4.0.jar:2.4.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
----
2019-09-05 11:43:36 UTC - Retardust: is there any cli or rest api to get 
(latest - 10) messages in partitioned topic?
----
2019-09-05 11:45:21 UTC - borlandor: The Pulsar Function code likes this:
=----------------------------------------------------------------------------------=
package org.apache.pulsar.functions.api.examples;


import lombok.extern.slf4j.Slf4j;
import java.util.Collection;
import java.util.function.Function;
import java.util.Set;
import java.util.Map;
import java.util.HashMap;
import java.util.Optional;
//import java.util.concurrent.CompletableFuture;
import com.google.gson.Gson;
import com.google.gson.JsonObject;


/**
 * Example Function that acts on a window of tuples at a time rather than per 
tuple basis.
 */
/*
@Slf4j
public class AddWindowFunction implements Function 
&lt;Collection&lt;Integer&gt;, Integer&gt; {
    @Override
    public Integer apply(Collection&lt;Integer&gt; integers) {
        return integers.stream().reduce(0, (x, y) -&gt; x + y);
    }
}
*/

@Slf4j
public class AddWindowFunction implements Function 
&lt;Collection&lt;String&gt;, String&gt; {
    @Override
    public String apply(Collection&lt;String&gt; LogItems) {
                                String result = "";
        for (String record : LogItems)
        {
                                                JsonObject jsonObject = new 
Gson().fromJson(record, JsonObject.class);//new 
JsonObject(record);//JSONObject.parseObject(record);
                                                String s = 
jsonObject.get("pktype").getAsString();//jsonObject.getString("pktype");
                                                result += "," + s;
                                }
        return result;
    }
}
----
2019-09-05 16:15:04 UTC - Martin Ashby: Hi, does anyone know of any RxJava 
integration with Apache Pulsar?
----
2019-09-05 16:52:33 UTC - Sijie Guo: Do you mind filing a github issue with 
more detailed information of you case? That would help us understand your use 
case and be able to help you.
----
2019-09-05 16:53:43 UTC - Sijie Guo: there is a setting in producer 
configuration called maxPendingMessages. Tune the setting accordingly.
----
2019-09-05 17:00:44 UTC - Sijie Guo: Not I am aware of
----
2019-09-05 17:15:27 UTC - Luke Lu: Has anyone here gotten Athena (AWS Presto as 
a service) to work with Pulsar Proxy with auth?
----
2019-09-05 17:29:22 UTC - Kendall Magesh-Davis: I’ve had a pulsar minikube 
scale cluster running for a week. A node in my cluster died and was 
automatically replaced, and a bookie has not been able to reschedule itself.
```  Normal   Scheduled              83s                default-scheduler       
                             Successfully assigned 
pulsar-test-240/pulsar-test-240-bookkeeper-2 to node1.redacted.compute.internal
  Normal   SuccessfulMountVolume  83s                kubelet, 
node1.redacted.compute.internal  MountVolume.SetUp succeeded for volume 
"pulsar-test-240-bookkeeper-journal"
  Normal   SuccessfulMountVolume  83s                kubelet, 
node1.redacted.compute.internal  MountVolume.SetUp succeeded for volume 
"pulsar-test-240-bookkeeper-ledgers"
  Normal   SuccessfulMountVolume  83s                kubelet, 
node1.redacted.compute.internal  MountVolume.SetUp succeeded for volume 
"default-token-m2rbj"
  Normal   Pulled                 82s                kubelet, 
node1.redacted.compute.internal  Container image 
"apachepulsar/pulsar-all:2.4.0" already present on machine
  Normal   Created                82s                kubelet, 
node1.redacted.compute.internal  Created container
  Normal   Started                82s                kubelet, 
node1.redacted.compute.internal  Started container
  Normal   Pulled                 79s                kubelet, 
node1.redacted.compute.internal  Container image 
"apachepulsar/pulsar-all:2.4.0" already present on machine
  Normal   Created                79s                kubelet, 
node1.redacted.compute.internal  Created container
  Normal   Started                79s                kubelet, 
node1.redacted.compute.internal  Started container
  Normal   Pulled                 20s (x4 over 76s)  kubelet, 
node1.redacted.compute.internal  Container image 
"apachepulsar/pulsar-all:2.4.0" already present on machine
  Normal   Created                20s (x4 over 76s)  kubelet, 
node1.redacted.compute.internal  Created container
  Normal   Started                20s (x4 over 76s)  kubelet, 
node1.redacted.compute.internal  Started container
  Warning  BackOff                14s (x4 over 65s)  kubelet, 
node1.redacted.compute.internal  Back-off restarting failed container```

Any suggestions? I tried removing that pod’s entry from ZK /ledger/cookies and 
deleting the pod - it doesn’t help.
----
2019-09-05 17:50:47 UTC - Guy Cole: @Guy Cole has joined the channel
----
2019-09-05 18:39:22 UTC - Sijie Guo: unfortunately, currently there is no 
command for that. feel free to create a github issue to request adding this 
functionality
----
2019-09-05 18:39:48 UTC - Retardust: Ok, I get it, thanks
----
2019-09-05 18:42:30 UTC - Sijie Guo: I am not sure how does network work for 
Athena and your AWS cluster. but as you mentioned “Pulsar Proxy”, I am guessing 
that your Pulsar cluster is behind a VPC and Pulsar proxy is the access point. 
If my guess is correct, then currently it is hard to get Athena to access your 
Pulsar cluster. Because the presto connector reads directly from the storage 
layer (bookkeeper). It means that Athena should be able to access the 
bookkeeper machines in order to make the presto connector work.

Feel free to file a github issue and we will look into how to  improve or 
provide a solution for that.
----
2019-09-05 18:45:58 UTC - Sijie Guo: `bookie has not been able to reschedule 
itself`

---

Did your pod keep crash or have you seen any errors in your bookkeeper pod?

If the pod didn’t keep crashing, I am more suspecting it is not about the 
bookie itself. can you use kubectl to get more details about why kubernetes 
doesn’t reschedule the tool?
----
2019-09-05 18:48:30 UTC - Retardust: 
<https://github.com/apache/pulsar/issues/5126>
----
2019-09-05 18:51:13 UTC - Kendall Magesh-Davis: The pod keep crashing, what i 
pasted was the events from the pod.
----
2019-09-05 18:51:50 UTC - Kendall Magesh-Davis: pod logs say
```17:26:33.813 [main] ERROR org.apache.bookkeeper.bookie.Bookie - There are 
directories without a cookie, and this is neither a new environment, nor is 
storage expansion enabled. Empty directories are 
[data/bookkeeper/journal/current, data/bookkeeper/ledgers/current]```

and

```17:26:33.823 [main] ERROR org.apache.bookkeeper.server.Main - Failed to 
build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: 
        at 
org.apache.bookkeeper.bookie.Bookie.checkEnvironmentWithStorageExpansion(Bookie.java:470)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]```
----
2019-09-05 19:15:41 UTC - Karthik Duraisamy: @Karthik Duraisamy has joined the 
channel
----
2019-09-05 19:38:46 UTC - Luke Lu: Got it. So Presto needs to be in the same 
network of Pulsar? Does Presto work with Pulsar auth turned on (e.g. for 
certain namespace only)?
----
2019-09-05 19:41:39 UTC - Sijie Guo: @Luke Lu - currently yes, it has to be in 
the same network. the authentication part is done very basic. because the 
approach that Presto connector taking is by pass the broker to read directly 
from the storage.
----
2019-09-05 19:44:09 UTC - Luke Lu: What about data already offloaded to s3? Are 
you saying the connector would talk to s3 directly as well?
----
2019-09-05 19:44:56 UTC - Luke Lu: How does the connector get the s3 
credentials then?
----
2019-09-05 19:53:51 UTC - Sijie Guo: @Luke Lu yes. the connector talks to s3 
directly.

&gt; How does the connector get the s3 credentials then?

I believe you have to configure the presto worker nodes for the s3 credentials 
at this moment.

Currently the authentication / authorization about this part is very basic. It 
requires quite some work at the bookkeeper level and about how to manage the 
credentials for offloaded data.
----
2019-09-05 20:09:39 UTC - Luke Lu: Thanks. Will dig more then.
----
2019-09-05 20:56:14 UTC - Chris Bartholomew: It looks like you lost the storage 
for a bookie when the node died. Did your minikube config include persistent 
volumes?
----
2019-09-05 22:46:26 UTC - Luke Lu: Is there anyway to set up offloading such 
that the offloaded ledger’s sizes are larger than certain size. We already set 
the offload-threshold to be 10MB. We found that if the ingestion rate of a 
particular topic is fairly low (say 200msg/s), the offloaded ledger sizes would 
vary between 500KB - 10MB, which makes catch up read and seek read from earlier 
time extremely slow (on the order of 200msg/s). This can be easily reproduced 
with pulsar-perf. The performance of catch up read on offloaded topics is 
acceptable when the ingestion rate is high, when we see offloaded ledger size 
between 500MB and 1.5GB.
----
2019-09-05 22:52:10 UTC - Matteo Merli: There are few tunables to control the 
ledger roll-overs:

```
# Max number of entries to append to a ledger before triggering a rollover
# A ledger rollover is triggered on these conditions
#  * Either the max rollover time has been reached
#  * or max entries have been written to the ledged and at least min-time
#    has passed
managedLedgerMaxEntriesPerLedger=50000

# Minimum time between ledger rollover for a topic
managedLedgerMinLedgerRolloverTimeMinutes=10

# Maximum time before forcing a ledger rollover for a topic
managedLedgerMaxLedgerRolloverTimeMinutes=240
```
----
2019-09-05 22:52:25 UTC - Matteo Merli: you can increase the max time, to have 
bigger ledgers
----
2019-09-05 22:57:45 UTC - Luke Lu: Thanks!
----
2019-09-06 05:17:39 UTC - Ali Ahmed: new slides by yahoo japan on pulsar
----
2019-09-06 05:17:42 UTC - Ali Ahmed: 
<https://www.slideshare.net/NozomiKurihara1/apache-pulsar-meetup-pulsarmeetupjapan20190904>
----
2019-09-06 08:20:30 UTC - borlandor: Ok, thanks! see also: 
<https://github.com/apache/pulsar/issues/5136>
----
2019-09-06 08:46:45 UTC - Martin Ashby: OK. It seems like a pretty good 
candidate; both are systems that let you operate on streams
----
2019-09-06 08:48:39 UTC - Sijie Guo: yeah feel free to contribute one 
integration :slightly_smiling_face:
+1 : Martin Ashby
----

Slack digest for #general - 2019-09-06

Reply via email to