Slack digest for #general - 2020-06-07

Apache Pulsar Slack Sun, 07 Jun 2020 02:12:12 -0700

2020-06-06 09:24:11 UTC - Liam Clarke: More fun, I'm debugging, and noticed 
this in the logs:


```09:21:46.710 [main] INFO  
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
 - Constructor offload driver: aws-s3, host: null, container: test, region: 
ap-southeast-2 ```
So the Jcloud offloader got the region okay - but the OffloadPolicies in 
BrokerService#getManagedLedgerConfig is still missing the necessary values:

```OffloadPolicies{managedLedgerOffloadDriver=aws-s3, 
managedLedgerOffloadMaxThreads=2, managedLedgerOffloadPrefetchRounds=1, 
managedLedgerOffloadThresholdInBytes=-1, 
managedLedgerOffloadDeletionLagInMillis=60000, 
s3ManagedLedgerOffloadRegion=null, s3ManagedLedgerOffloadBucket=null, 
s3ManagedLedgerOffloadServiceEndpoint=null, 
s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864, 
s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576, 
s3ManagedLedgerOffloadRole=null, 
s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload, 
gcsManagedLedgerOffloadRegion=null, gcsManagedLedgerOffloadBucket=null, 
gcsManagedLedgerOffloadMaxBlockSizeInBytes=67108864, 
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576, 
gcsManagedLedgerOffloadServiceAccountKeyFile=null, fileSystemProfilePath=null, 
fileSystemURI=null}```
----
2020-06-06 09:30:55 UTC - Ebere Abanonu: Hi, I have been able to look into 
this. PatternMultiTopicConsumer support auto discovery of new topics. You can 
configure that with ConsumerBuilder
----
2020-06-06 09:44:14 UTC - Liam Clarke: Okay, so using `pulsar-admin namespaces 
set-offload-policies --driver aws-s3 --region ap-southeast-2 --bucket test ... 
test-tenant/test-namespace`  to set an explicit offload policy on the namespace 
worked, so I guess my question is - is this because I was using 
`standalone.conf`  vs `broker.conf`? Or will I have to set a per-namespace 
offload policy for a production cluster also?
----
2020-06-06 10:16:58 UTC - Adriaan de Haan: Hi, I am trying to get the jdbc io 
connector working, but I keep getting the following:
```07:11:25.185 [main] INFO  
org.apache.pulsar.functions.utils.io.ConnectorUtils - Searching for connectors 
in /home/adriaan/apache-pulsar-2.5.2/connectors
07:11:26.013 [main] INFO  org.apache.pulsar.functions.utils.io.ConnectorUtils - 
Found connector ConnectorDefinition(name=jdbc, description=Jdbc sink, 
sourceClass=null, sinkClass=org.apache.pulsar.io.jdbc.JdbcAutoSchemaSink) from 
/home/adriaan/apache-pulsar-2.5.2/connectors/pulsar-io-jdbc-2.5.2.nar
Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.pulsar.functions.LocalRunner.startThreadedMode(LocalRunner.java:421)
        at org.apache.pulsar.functions.LocalRunner.start(LocalRunner.java:319)
        at org.apache.pulsar.functions.LocalRunner.main(LocalRunner.java:152)```
NullPointerException is not very helpful in trying to debug the issue... any 
advice on how I can determine what is wrong?
----
2020-06-06 10:25:01 UTC - Liam Clarke: Hi Adrian, line 421 is

`instanceConfig.setMaxPendingAsyncRequests(functionConfig.getMaxPendingAsyncRequests());`

maxPendingAsyncRequests in InstanceConfig is an `int` while in FunctionConfig 
it's an `Integer` - if it was set to `null` in the function config, it will 
throw an NPE on unboxing to an `int`.
----
2020-06-06 10:27:33 UTC - Liam Clarke: In both *Config classes it defaults to 
1000. Are you setting it explicitly to null?
----
2020-06-06 10:28:52 UTC - Adriaan de Haan: I don't set it at all
----
2020-06-06 10:42:27 UTC - Liam Clarke: Try setting it to 1000, can't hurt and 
might resolve the issue
----
2020-06-06 12:39:22 UTC - Adriaan de Haan: so the null pointer exception at 
that line would imply that functionConfig is null
----
2020-06-06 12:42:14 UTC - Aaron Batilo: @Aaron Batilo has joined the channel
----
2020-06-06 12:46:28 UTC - Aaron Batilo: :wave: Hi everyone. I'm Aaron. I came 
across Pulsar a few weeks ago and have been trying to push it on my 
organization because I think it solves a lot of our use cases.
+1 : Enrico Olivelli, Karthik Ramasamy
----
2020-06-06 12:46:46 UTC - Adriaan de Haan: Since this is a Sink it has a 
SinkConfig and not a FunctionConfig I believe... so it seems that mgiht be why 
it's failing
----
2020-06-06 12:56:14 UTC - Adriaan de Haan: Hi, can anyobdy please confirm that 
sinks still work in v2.5.x?
----
2020-06-06 12:57:37 UTC - Adriaan de Haan: It seems that this commit:
<https://github.com/apache/pulsar/commit/55d5430701d41d92ce290d838e332eb9d9154b9e>
might have introduced a bug that will result in a null pointer exception - 
since functionConfig is null for a sink, but it is using functionConfig without 
checking for null
----
2020-06-06 13:01:17 UTC - alex kurtser: Hi @Sijie Guo

We set up it as separated statefullset (seprated from brokers) with "bin/pulsar 
proxy"  as entrypoint command for the container.
We also provide function_worker,yaml config file with parameters like this:
processContainerFactory:
  extraFunctionDependenciesDir: null
  javaInstanceJarLocation: null
  logDirectory: null
  pythonInstanceLocation: null
----
2020-06-06 13:03:36 UTC - alex kurtser: Of course, we have other paramters like 
pulsar endpoints and so on. Important to note that the functions actually are 
working good. The only one issue is with metrics. As i mentioned earlier, each 
function instance inside the container creates random port exposing its 
metrics. So we can not know what the port it will expose and can't define it on 
the annotations on in the prometheus config file.
----
2020-06-06 14:45:12 UTC - YounggyuChun: @YounggyuChun has joined the channel
----
2020-06-06 15:55:12 UTC - Amit Pal: @Amit Pal has joined the channel
----
2020-06-06 16:40:47 UTC - Asaf Mesika: @Asaf Mesika has joined the channel
----
2020-06-06 16:48:02 UTC - Asaf Mesika: I’ve got a couple of questions on that:
1. I searched a lot in the documentation and in the internet to answer this 
exact question. Is it documented some where and I missed it?
2. The default behaviour means I will potentially acknowledge to the broker, 
the broker acks back and I can still lose that information (meaning, the 
message in that 1sec will be redelivered)? From you information, is that 
different from Kafka design (out of curiosity comparing the two)?
----
2020-06-06 17:11:28 UTC - Asaf Mesika: I’m reading a lot about Apache Pulsar to 
understand how it works and understand it failures. One failure I couldn’t 
understand yet. If I experience a complete data loss (all machines terminated, 
or some corruption ruined data dir of all ZK nodes) - other than back up ZK 
disks and recover by restoring, is there any other way to recover or without ZK 
data, the pulsar+bookkeeper is essentially useless?
----
2020-06-06 17:19:07 UTC - Matteo Merli: Yes. ZK stores the metadata, so the 
pointers to the data. If that is missing, the data is not accessible.

Though....

ZK availability is determined by the number of nodes. Eg: in normal production 
environment one would run 5 ZK nodes.

On a bare-metal deployment, that would mean that 5 disks would have to 
physically break down in a very short amount of time to lose this data.
It would be **very** unlikely to happen. Sure, there's still a chance, but in 
any storage system the durability guarantee cannot ever be 100%, just 
approximate to that through more redundancy.

On a cloud deployment, the local VM disks are ephemerals, so it's not a good 
idea to use them for ZK. Rather, you would use EBS volumes (or similars). At 
that point, the data on each EBS volume is already replicated 2 way and it can 
be remounted in a different VM.

Finally, it's certainly possible to take offline backups of ZK snapshot and 
txn-log. You can restore ZK nodes through that.
+1 : Asaf Mesika
----
2020-06-06 22:48:17 UTC - Nicolas Ha: the json seems fixed, but I still can’t 
get to the page

<http://pulsar.apache.org/functions-rest-api/?version=2.5.1>
----

Slack digest for #general - 2020-06-07

Reply via email to