Slack digest for #general - 2020-08-22

Apache Pulsar Slack Sat, 22 Aug 2020 02:12:00 -0700

2020-08-21 11:22:06 UTC - Takahiro Hozumi: Is it possible to use avro schema 
without code generation in pulsar?
I have an avro schema as json file and want to create a pulsar message with 
`org.apache.avro.generic.GenericRecord` which use the schema.
It seems that pulsar producer requires pojo generated from the schema.


<http://pulsar.apache.org/docs/en/schema-understand/>
<https://github.com/sijie/pulsar-avro-schema-example/blob/f85c6e1a83b47fe5017840e35d6989e6e153aa4f/src/main/java/org/apache/pulsar/examples/TweetProducer.java#L22>
----
2020-08-21 11:25:33 UTC - Joshua Decosta: That seems like a standard process. 
It’s the same way you would produce or consume any message.
----
2020-08-21 12:13:12 UTC - Aaron: You can produce messages with a Producer of 
type GenericRecord
----
2020-08-21 13:41:01 UTC - Raghav: What is the use of the command “./bookkeeper 
shell localconsistencycheck”. In my cluster with 3 bookies and E,Qw,Qa as 
(3,3,3) the simpletest is working fine “./bookkeeper shell simpletest -ensemble 
3 -writeQuorum 3 -ackQuorum  3 -numEntries 100". But the localconsistencycheck 
is failing with exception on all 3 boxes

Exception in thread “main” 
com.google.common.util.concurrent.UncheckedExecutionException: Error open 
RocksDB database
        at 
org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.apply(LocalConsistencyCheckCommand.java:56)
        at 
org.apache.bookkeeper.bookie.BookieShell$LocalConsistencyCheck.runCmd(BookieShell.java:787)
        at 
org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:223)
        at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:1976)
        at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2067)
Caused by: <http://java.io|java.io>.IOException: Error open RocksDB database
        at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.&lt;init&gt;(KeyValueStorageRocksDB.java:182)
        at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.&lt;init&gt;(KeyValueStorageRocksDB.java:83)
        at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:58)
        at 
org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.&lt;init&gt;(LedgerMetadataIndex.java:69)
        at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.&lt;init&gt;(SingleDirectoryDbLedgerStorage.java:161)
        at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:149)
        at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129)
        at 
org.apache.bookkeeper.bookie.Bookie.mountLedgerStorageOffline(Bookie.java:657)
        at 
org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.check(LocalConsistencyCheckCommand.java:63)
        at 
org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.apply(LocalConsistencyCheckCommand.java:54)
        ... 4 more
Caused by: org.rocksdb.RocksDBException: While lock file: 
/var/pulsar/bookie/ledger1/data-1/current/ledgers/LOCK: Resource temporarily 
unavailable
        at org.rocksdb.RocksDB.open(Native Method)
        at org.rocksdb.RocksDB.open(RocksDB.java:231)
        at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.&lt;init&gt;(KeyValueStorageRocksDB.java:179)
        ... 13 more
----
2020-08-21 14:52:30 UTC - Addison Higham: The biggest one I can think of is 
schemas, if you aren't using schemas, then you wouldn't need to worry
----
2020-08-21 14:54:24 UTC - Addison Higham: That version was built manually and 
should have included rc1 in the tag, since rc1 passed it is pretty much the 
same, but just not the official version
----
2020-08-21 15:00:36 UTC - Lari Hotari: Thanks. It looks like 
<https://hub.docker.com/r/apachepulsar/pulsar/tags?page=1&amp;name=2.6.1|the 
official 2.6.1 image is now available> so I'll use that one.
----
2020-08-21 15:32:27 UTC - Frank Kelly: FYI documentation says Python Client for 
2.6.1 is available 
<https://pulsar.apache.org/docs/en/client-libraries-python/#install-using-pip>
But I see the following
```$ pip3 install pulsar-client==2.6.1
ERROR: Could not find a version that satisfies the requirement 
pulsar-client==2.6.1 (from versions: 2.1.0, 2.1.1, 2.2.0, 2.2.1, 2.3.0, 
2.3.0.post1, 2.3.1, 2.3.2, 2.4.0, 2.4.1, 2.4.1.post1, 2.4.2, 2.5.0, 2.5.1, 
2.5.2, 2.6.0)
ERROR: No matching distribution found for pulsar-client==2.6.1```
----
2020-08-21 15:36:48 UTC - Matt Mitchell: Anyone know what might cause this?
Caused by: 
org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: 
Topic does not have schema to check
        at 
org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1000)
        at 
org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:609)
        at 
org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:171)
----
2020-08-21 15:48:37 UTC - Matt Mitchell: I have several services running, and 
I’m thinking one of them is using an older version of the client code, which 
contains the schema (protobuf format) - is that potentially a reason for 
causing this error?
----
2020-08-21 15:51:04 UTC - Addison Higham: do you have the full stacktrace?
----
2020-08-21 15:51:19 UTC - Addison Higham: err actually better yet if there are 
logs from the broker?
----
2020-08-21 15:54:10 UTC - Matt Mitchell: checking
----
2020-08-21 16:48:15 UTC - Nathan Mills: Just bumping this to see if anyone can 
provide some clarity for me: 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1597961001029900>
----
2020-08-21 17:23:48 UTC - Addison Higham: does it allow you to set it up that 
way? Backlog Quotas you usually think of as being a "subset" of your retention 
policy. But with infinite retention it may make sense to still have a limit on 
the size of a subscription.

But stepping back a bit, it is import to remember the distinction between a 
messages in and out of a subscription.

A retention policy applies to messages NOT in a subscription, backlog quotas 
and TTL only apply to messages IN a subscription. I like to think of 
subscriptions as a "view" over all the messages, with each subscription having 
it's own view over the same underlying data. The backlog quota and TTL allow 
you to place some constraints on how long a message is visible in that view, 
but the retention policy is what is responsible for how long the data remains 
in the underlying storage
----
2020-08-21 17:24:19 UTC - Addison Higham: so more concretely:
if messages are evicted from your subscription, they will no longer be visible 
in your subscription but they remain in the underlying storage
----
2020-08-21 17:28:48 UTC - Nathan Mills: ok, just to make sure I understand 
correctly. With `consumer_backlog_eviction` the messages still get written to 
the topic just removed from subscriptions that have exceeded the backlog quota, 
but if someone uses the `producer_exception` policy, then if any of the 
subscriptions exceed the the backlog quota it will cause the producer to 
disconnect?
----
2020-08-21 17:29:11 UTC - Addison Higham: yes
----
2020-08-21 17:29:20 UTC - Nathan Mills: :thumbsup:  thanks
----
2020-08-21 19:18:54 UTC - Nathan Mills: So here is what I'm trying to figure 
out, I'm investigating reports of missing messages. So I created a function, 
stop the function, and reset the cursor for the input topic to before it was 
created. When I look at internal stats initially I get
```    "canvas-cdc%2Ffiltered%2Fcdc-filter-96": {
      "markDeletePosition": "8367455:-1",
      "readPosition": "8367455:0",
      "waitingReadOp": false,
      "pendingReadOps": 0,
      "messagesConsumedCounter": -16267653,
      "cursorLedger": 9274512,
      "cursorLedgerLastEntry": 1,
      "individuallyDeletedMessages": "[]",
      "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z",
      "state": "Open",
      "numberOfEntriesSinceFirstNotAckedMessage": 1,
      "totalNonContiguousDeletedMessagesRange": 0,
      "properties": {}
    }```
after a couple of minutes with the function still stopped the internal stats 
for the cursor looks like:
```  "canvas-cdc%2Ffiltered%2Fcdc-filter-96": {
    "markDeletePosition": "9216422:3654",
    "readPosition": "9216422:3655",
    "waitingReadOp": false,
    "pendingReadOps": 0,
    "messagesConsumedCounter": 2189250,
    "cursorLedger": 9274512,
    "cursorLedgerLastEntry": 2,
    "individuallyDeletedMessages": "[]",
    "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z",
    "state": "Open",
    "numberOfEntriesSinceFirstNotAckedMessage": 1,
    "totalNonContiguousDeletedMessagesRange": 0,
    "properties": {}
  }```
The read position seems to jump forward without the function running.  Would 
this be caused by the backlog quota policy which is currently
```{
  "destination_storage" : {
    "limit" : 5368709120,
    "policy" : "consumer_backlog_eviction"
  }
}```

----
2020-08-21 19:21:57 UTC - Addison Higham: yes, that is what would be expected
----
2020-08-21 19:23:06 UTC - Nathan Mills: Any recommended settings for the 
backlog policy since it lives at the namespace? Just increase the limit to an 
arbitrary large size?
----
2020-08-21 19:23:46 UTC - Addison Higham: @Nathan Mills sorry should keep that 
threaded, is there a default backlog quota set?
----
2020-08-21 19:23:54 UTC - Addison Higham: cluster wide one I mean
----
2020-08-21 19:24:05 UTC - Nathan Mills: I think so, need to validate that though
----
2020-08-21 19:27:00 UTC - Nathan Mills: yes, The one above is the default quota 
that was inherited
----
2020-08-21 19:27:14 UTC - Nathan Mills: I guess I could just set the limit to 
`-1`?
----
2020-08-21 19:27:21 UTC - Nathan Mills: for that namespace
----
2020-08-21 19:35:13 UTC - Joe Selvi: @Joe Selvi has joined the channel
----
2020-08-21 19:37:31 UTC - Addison Higham: IDK if -1 is a valid "unlimited" 
value, there is a field call to `remove-backlog-quota` but I think that will 
just set you back to the cluster default
----
2020-08-21 19:37:54 UTC - Nathan Mills: yeah I tried that, but it looks like it 
just set it back to the cluster default.
----
2020-08-21 19:38:07 UTC - Nathan Mills: I'll try a value larger than the topic 
size and see what happens
----
2020-08-21 19:38:18 UTC - Addison Higham: your best bet may be an arbitrarily 
large number, but it is pretty crappy UX, there *might* be a issue/bug for this
----
2020-08-21 19:38:59 UTC - Addison Higham: it is like 2 "empty" states, need a 
"unset use default" and "unlimited"
----
2020-08-21 19:40:19 UTC - Nathan Mills: yeah that would be nice.
----
2020-08-21 19:40:47 UTC - Addison Higham: if you want to look and see if there 
is an issue for this or file one, that would be super great
----
2020-08-21 20:13:59 UTC - Nathan Mills: thanks for the help, looks like `-1` 
does work to disable the backlog quota for a namespace. But we have a default 
TTL as well, and you aren't able to disable it at the namespace level. So I set 
it back to it's max of 68 years, and created 
<https://github.com/apache/pulsar/issues/7875>
----
2020-08-21 20:24:26 UTC - Vil: I came across this by Twitter. written by the 
Confluence folks
<https://www.confluent.io/blog/kafka-fastest-messaging-system/>

Any comments from us? Technical parts are a bit too deep for me. The only thing 
I can say it looks indeed ‘fair’ to me. Like they at least tried to make it 
fair. I am still beginner so I can not say whether statements are true or not
----
2020-08-21 20:27:33 UTC - Frank Kelly: Is there any documentation on the 
backwards compatibility strategy for Apache pulsar e.g. is the expectation that 
a minor version upgrade will be backwards compatible e.g. 2.6 client -&gt; 2.5 
server OR 2.5 client -&gt; 2.6 server? Thanks in advance
----
2020-08-21 20:39:31 UTC - Addison Higham: See <#C5Z1W2BDY|random> where there 
has been some discussions about it, there are some issues with the way they 
configure bookkeeper disk that make the test not very apple-to-apples in 
comparison
+1 : Vil, Sijie Guo
----
2020-08-21 20:40:03 UTC - Addison Higham: oh okay, cool
----
2020-08-21 21:40:17 UTC - Vil: thanks for pointer
----
2020-08-21 22:09:20 UTC - Jorge Miralles: Hello, is there a way to delete  
messages acked or outside the retention limit?
----

Slack digest for #general - 2020-08-22

Reply via email to