2020-08-21 11:22:06 UTC - Takahiro Hozumi: Is it possible to use avro schema without code generation in pulsar? I have an avro schema as json file and want to create a pulsar message with `org.apache.avro.generic.GenericRecord` which use the schema. It seems that pulsar producer requires pojo generated from the schema.
<http://pulsar.apache.org/docs/en/schema-understand/> <https://github.com/sijie/pulsar-avro-schema-example/blob/f85c6e1a83b47fe5017840e35d6989e6e153aa4f/src/main/java/org/apache/pulsar/examples/TweetProducer.java#L22> ---- 2020-08-21 11:25:33 UTC - Joshua Decosta: That seems like a standard process. It’s the same way you would produce or consume any message. ---- 2020-08-21 12:13:12 UTC - Aaron: You can produce messages with a Producer of type GenericRecord ---- 2020-08-21 13:41:01 UTC - Raghav: What is the use of the command “./bookkeeper shell localconsistencycheck”. In my cluster with 3 bookies and E,Qw,Qa as (3,3,3) the simpletest is working fine “./bookkeeper shell simpletest -ensemble 3 -writeQuorum 3 -ackQuorum 3 -numEntries 100". But the localconsistencycheck is failing with exception on all 3 boxes Exception in thread “main” com.google.common.util.concurrent.UncheckedExecutionException: Error open RocksDB database at org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.apply(LocalConsistencyCheckCommand.java:56) at org.apache.bookkeeper.bookie.BookieShell$LocalConsistencyCheck.runCmd(BookieShell.java:787) at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:223) at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:1976) at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2067) Caused by: <http://java.io|java.io>.IOException: Error open RocksDB database at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:182) at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:83) at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:58) at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:69) at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:161) at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:149) at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) at org.apache.bookkeeper.bookie.Bookie.mountLedgerStorageOffline(Bookie.java:657) at org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.check(LocalConsistencyCheckCommand.java:63) at org.apache.bookkeeper.tools.cli.commands.bookie.LocalConsistencyCheckCommand.apply(LocalConsistencyCheckCommand.java:54) ... 4 more Caused by: org.rocksdb.RocksDBException: While lock file: /var/pulsar/bookie/ledger1/data-1/current/ledgers/LOCK: Resource temporarily unavailable at org.rocksdb.RocksDB.open(Native Method) at org.rocksdb.RocksDB.open(RocksDB.java:231) at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:179) ... 13 more ---- 2020-08-21 14:52:30 UTC - Addison Higham: The biggest one I can think of is schemas, if you aren't using schemas, then you wouldn't need to worry ---- 2020-08-21 14:54:24 UTC - Addison Higham: That version was built manually and should have included rc1 in the tag, since rc1 passed it is pretty much the same, but just not the official version ---- 2020-08-21 15:00:36 UTC - Lari Hotari: Thanks. It looks like <https://hub.docker.com/r/apachepulsar/pulsar/tags?page=1&name=2.6.1|the official 2.6.1 image is now available> so I'll use that one. ---- 2020-08-21 15:32:27 UTC - Frank Kelly: FYI documentation says Python Client for 2.6.1 is available <https://pulsar.apache.org/docs/en/client-libraries-python/#install-using-pip> But I see the following ```$ pip3 install pulsar-client==2.6.1 ERROR: Could not find a version that satisfies the requirement pulsar-client==2.6.1 (from versions: 2.1.0, 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.0.post1, 2.3.1, 2.3.2, 2.4.0, 2.4.1, 2.4.1.post1, 2.4.2, 2.5.0, 2.5.1, 2.5.2, 2.6.0) ERROR: No matching distribution found for pulsar-client==2.6.1``` ---- 2020-08-21 15:36:48 UTC - Matt Mitchell: Anyone know what might cause this? Caused by: org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: Topic does not have schema to check at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1000) at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:609) at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:171) ---- 2020-08-21 15:48:37 UTC - Matt Mitchell: I have several services running, and I’m thinking one of them is using an older version of the client code, which contains the schema (protobuf format) - is that potentially a reason for causing this error? ---- 2020-08-21 15:51:04 UTC - Addison Higham: do you have the full stacktrace? ---- 2020-08-21 15:51:19 UTC - Addison Higham: err actually better yet if there are logs from the broker? ---- 2020-08-21 15:54:10 UTC - Matt Mitchell: checking ---- 2020-08-21 16:48:15 UTC - Nathan Mills: Just bumping this to see if anyone can provide some clarity for me: <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1597961001029900> ---- 2020-08-21 17:23:48 UTC - Addison Higham: does it allow you to set it up that way? Backlog Quotas you usually think of as being a "subset" of your retention policy. But with infinite retention it may make sense to still have a limit on the size of a subscription. But stepping back a bit, it is import to remember the distinction between a messages in and out of a subscription. A retention policy applies to messages NOT in a subscription, backlog quotas and TTL only apply to messages IN a subscription. I like to think of subscriptions as a "view" over all the messages, with each subscription having it's own view over the same underlying data. The backlog quota and TTL allow you to place some constraints on how long a message is visible in that view, but the retention policy is what is responsible for how long the data remains in the underlying storage ---- 2020-08-21 17:24:19 UTC - Addison Higham: so more concretely: if messages are evicted from your subscription, they will no longer be visible in your subscription but they remain in the underlying storage ---- 2020-08-21 17:28:48 UTC - Nathan Mills: ok, just to make sure I understand correctly. With `consumer_backlog_eviction` the messages still get written to the topic just removed from subscriptions that have exceeded the backlog quota, but if someone uses the `producer_exception` policy, then if any of the subscriptions exceed the the backlog quota it will cause the producer to disconnect? ---- 2020-08-21 17:29:11 UTC - Addison Higham: yes ---- 2020-08-21 17:29:20 UTC - Nathan Mills: :thumbsup: thanks ---- 2020-08-21 19:18:54 UTC - Nathan Mills: So here is what I'm trying to figure out, I'm investigating reports of missing messages. So I created a function, stop the function, and reset the cursor for the input topic to before it was created. When I look at internal stats initially I get ``` "canvas-cdc%2Ffiltered%2Fcdc-filter-96": { "markDeletePosition": "8367455:-1", "readPosition": "8367455:0", "waitingReadOp": false, "pendingReadOps": 0, "messagesConsumedCounter": -16267653, "cursorLedger": 9274512, "cursorLedgerLastEntry": 1, "individuallyDeletedMessages": "[]", "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z", "state": "Open", "numberOfEntriesSinceFirstNotAckedMessage": 1, "totalNonContiguousDeletedMessagesRange": 0, "properties": {} }``` after a couple of minutes with the function still stopped the internal stats for the cursor looks like: ``` "canvas-cdc%2Ffiltered%2Fcdc-filter-96": { "markDeletePosition": "9216422:3654", "readPosition": "9216422:3655", "waitingReadOp": false, "pendingReadOps": 0, "messagesConsumedCounter": 2189250, "cursorLedger": 9274512, "cursorLedgerLastEntry": 2, "individuallyDeletedMessages": "[]", "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z", "state": "Open", "numberOfEntriesSinceFirstNotAckedMessage": 1, "totalNonContiguousDeletedMessagesRange": 0, "properties": {} }``` The read position seems to jump forward without the function running. Would this be caused by the backlog quota policy which is currently ```{ "destination_storage" : { "limit" : 5368709120, "policy" : "consumer_backlog_eviction" } }``` ---- 2020-08-21 19:21:57 UTC - Addison Higham: yes, that is what would be expected ---- 2020-08-21 19:23:06 UTC - Nathan Mills: Any recommended settings for the backlog policy since it lives at the namespace? Just increase the limit to an arbitrary large size? ---- 2020-08-21 19:23:46 UTC - Addison Higham: @Nathan Mills sorry should keep that threaded, is there a default backlog quota set? ---- 2020-08-21 19:23:54 UTC - Addison Higham: cluster wide one I mean ---- 2020-08-21 19:24:05 UTC - Nathan Mills: I think so, need to validate that though ---- 2020-08-21 19:27:00 UTC - Nathan Mills: yes, The one above is the default quota that was inherited ---- 2020-08-21 19:27:14 UTC - Nathan Mills: I guess I could just set the limit to `-1`? ---- 2020-08-21 19:27:21 UTC - Nathan Mills: for that namespace ---- 2020-08-21 19:35:13 UTC - Joe Selvi: @Joe Selvi has joined the channel ---- 2020-08-21 19:37:31 UTC - Addison Higham: IDK if -1 is a valid "unlimited" value, there is a field call to `remove-backlog-quota` but I think that will just set you back to the cluster default ---- 2020-08-21 19:37:54 UTC - Nathan Mills: yeah I tried that, but it looks like it just set it back to the cluster default. ---- 2020-08-21 19:38:07 UTC - Nathan Mills: I'll try a value larger than the topic size and see what happens ---- 2020-08-21 19:38:18 UTC - Addison Higham: your best bet may be an arbitrarily large number, but it is pretty crappy UX, there *might* be a issue/bug for this ---- 2020-08-21 19:38:59 UTC - Addison Higham: it is like 2 "empty" states, need a "unset use default" and "unlimited" ---- 2020-08-21 19:40:19 UTC - Nathan Mills: yeah that would be nice. ---- 2020-08-21 19:40:47 UTC - Addison Higham: if you want to look and see if there is an issue for this or file one, that would be super great ---- 2020-08-21 20:13:59 UTC - Nathan Mills: thanks for the help, looks like `-1` does work to disable the backlog quota for a namespace. But we have a default TTL as well, and you aren't able to disable it at the namespace level. So I set it back to it's max of 68 years, and created <https://github.com/apache/pulsar/issues/7875> ---- 2020-08-21 20:24:26 UTC - Vil: I came across this by Twitter. written by the Confluence folks <https://www.confluent.io/blog/kafka-fastest-messaging-system/> Any comments from us? Technical parts are a bit too deep for me. The only thing I can say it looks indeed ‘fair’ to me. Like they at least tried to make it fair. I am still beginner so I can not say whether statements are true or not ---- 2020-08-21 20:27:33 UTC - Frank Kelly: Is there any documentation on the backwards compatibility strategy for Apache pulsar e.g. is the expectation that a minor version upgrade will be backwards compatible e.g. 2.6 client -> 2.5 server OR 2.5 client -> 2.6 server? Thanks in advance ---- 2020-08-21 20:39:31 UTC - Addison Higham: See <#C5Z1W2BDY|random> where there has been some discussions about it, there are some issues with the way they configure bookkeeper disk that make the test not very apple-to-apples in comparison +1 : Vil, Sijie Guo ---- 2020-08-21 20:40:03 UTC - Addison Higham: oh okay, cool ---- 2020-08-21 21:40:17 UTC - Vil: thanks for pointer ---- 2020-08-21 22:09:20 UTC - Jorge Miralles: Hello, is there a way to delete messages acked or outside the retention limit? ----
