2018-01-16 19:11:30 UTC - Daniel Ferreira Jorge: do you see what I mean? 
basically, it is impossible to reprocess old data with pulsar using 
subscriptions (which is on of the strong features that pulsar have over kafka, 
being able to act as a streaming platform and a queue)
----
2018-01-16 19:13:41 UTC - Daniel Ferreira Jorge: What is the lifecycle of a 
subscription? If I don't have any consumers consuming from it, will it still 
exist? Because then I can create a bootstrap consumer that will not consume 
anything just as an excuse to create a subscription. Will this work?
----
2018-01-16 19:15:09 UTC - Matteo Merli: > What is the lifecycle of a 
subscription? 

Once it’s created, it retains all messages published after that (minus explicit 
TTL). Subscriptions can be dropped by explicitely unsubscribing (in `Consumer` 
API) or through the REST/CLI .
----
2018-01-16 19:16:03 UTC - Matteo Merli: > Because then I can create a 
bootstrap consumer that will not consume anything just as an excuse to create a 
subscription. Will this work? 

Yes, that would be the same as creating the subscription and leave it there
----
2018-01-16 19:24:30 UTC - Daniel Ferreira Jorge: I think that works for me... 
thanks!
----
2018-01-17 02:26:06 UTC - Jaebin Yoon: I'm getting the following errors in the 
broker. In the classpath, there is "pulsar-checksum-1.21.0-incubating.nar"  I'm 
bringing up pulsar broker in the embedded environment with simple wrapper. Is 
there something I need to set the environment to use that package correctly? 

```2018-01-17 02:17:32,604 - WARN  - [pulsar-io-49-3:ServerCnx@163] - 
[/100.127.66.35:47937] Got exception: 
org/apache/pulsar/checksum/utils/Crc32cChecksum
java.lang.NoClassDefFoundError: org/apache/pulsar/checksum/utils/Crc32cChecksum
        at 
org.apache.pulsar.broker.service.Producer.verifyChecksum(Producer.java:144)
        at 
org.apache.pulsar.broker.service.Producer.publishMessage(Producer.java:124)
        at 
org.apache.pulsar.broker.service.ServerCnx.handleSend(ServerCnx.java:622)```
----
2018-01-17 02:29:31 UTC - Sijie Guo: ah
----
2018-01-17 02:30:43 UTC - Sijie Guo: @Jaebin Yoon I think the 
`pulsar-broker-shaded` is a bit problematic at this moment. basically it 
relocated `org.apache.pulsar.checksum` to a relocated package, but it doesn’t 
update the manifest in the nar package, so it fails to load the class.
----
2018-01-17 02:31:35 UTC - Jaebin Yoon: I'm not using pulsar-broker-shaded any 
more (had enough problem).
----
2018-01-17 02:31:43 UTC - Sijie Guo: oh okay
----
2018-01-17 02:32:03 UTC - Jaebin Yoon: but still getting this error. not sure 
*.nar should be treated differently.
----
2018-01-17 02:32:12 UTC - Sijie Guo: one second let me check
----
2018-01-17 02:33:07 UTC - Sijie Guo: did you see pulsar-checksum.jar in your 
classpath?
----
2018-01-17 02:33:29 UTC - Jaebin Yoon: i see 
`pulsar-checksum-1.21.0-incubating.nar` not jar
----
2018-01-17 02:35:08 UTC - Sijie Guo: is your application a maven project or 
not? do you mind pointing me how do you include the pulsar-broker dependency in 
your application?
----
2018-01-17 02:36:32 UTC - Jaebin Yoon: it's gradle.  some part of build.gradle
```    compile 'org.apache.pulsar:pulsar-broker:1.21.0-incubating'
    compile "org.apache.pulsar:pulsar-client-admin:1.21.0-incubating"
    compile "org.apache.pulsar:pulsar-client-tools:1.21.0-incubating"```
----
2018-01-17 02:37:57 UTC - Jaebin Yoon: Does this *.nar have c++ binary?
----
2018-01-17 02:37:57 UTC - Sijie Guo: can you include “compile 
‘org.apache.pulsar:pulsar-checksum:1.21.0-incubating’”
----
2018-01-17 02:38:21 UTC - Jaebin Yoon: actually i tried that. and it installed 
that nar package. not jar.
----
2018-01-17 02:38:36 UTC - Sijie Guo: yes checksum is using nar to packaging a 
c++ implementation of crc32c.
----
2018-01-17 02:38:46 UTC - Sijie Guo: 
org.apache.pulsar:pulsar-checksum:jar:1.21.0-incubating’
----
2018-01-17 02:38:53 UTC - Sijie Guo: specify the “jar”?
----
2018-01-17 02:39:11 UTC - Jaebin Yoon: let me try that.
----
2018-01-17 02:41:19 UTC - Jaebin Yoon: I'm getting this error : 
```> Could not resolve all dependencies for configuration 
':compileClasspath'.
   > Could not find pulsar-checksum-1.21.0-incubating.jar 
(org.apache.pulsar:pulsar-checksum:1.21.0-incubating).
     Searched in the following locations:
         
<https://repo.maven.apache.org/maven2/org/apache/pulsar/pulsar-checksum/1.21.0-incubating/pulsar-checksum-1.21.0-incubating-1.21.0-incubating.jar>```
----
2018-01-17 02:41:53 UTC - Sijie Guo: oh I probably write in a wrong way
----
2018-01-17 02:42:03 UTC - Sijie Guo: let me check gradle syntax (sorry about 
that)
----
2018-01-17 02:44:04 UTC - Sijie Guo: 
org.apache.pulsar:pulsar-checksum:1.21.0-incubating@jar ?
----
2018-01-17 02:45:22 UTC - Jaebin Yoon: yeah that resolves at least... let me 
try to package to see if that select *.jar.
----
2018-01-17 02:45:32 UTC - Sijie Guo: cool
----
2018-01-17 02:48:18 UTC - Jaebin Yoon: I haven't seen *.nar package. so not 
sure why it installed that nar package not jar.
----
2018-01-17 02:51:55 UTC - Sijie Guo: I think the pulsar-checksum is defined as 
a `nar` module, <https://search.maven.org/#search%7Cga%7C1%7Cpulsar-checksum> 
both nar and jar are published to the maven artifact. I guess in gradle, if you 
don’t specify an ext/classifier, it would download the artifact that defined in 
the pom file, so if you specify “jar” in the compile denpendencies, it would 
resolve the jar artifact. maven seems to be using a different strategy.
----
2018-01-17 02:56:15 UTC - Jaebin Yoon: I see. I didn't know what nar file is. 
I'm learning that now. ^^ 
<https://medium.com/hashmapinc/nifi-nar-files-explained-14113f7796fd>
Any reason that the pulsar packages nar format?
----
2018-01-17 02:58:39 UTC - Sijie Guo: @Jaebin Yoon I think NAR provides an easy 
way to package native code. I am not sure if we can make it a jar module. 
@Matteo Merli probably have more context about that.
----
2018-01-17 03:02:50 UTC - Jaebin Yoon: @Sijie Guo that solved the issue. now I 
can produce/consume messages !!! That was the last piece. thanks a lot Sije for 
your help!
----
2018-01-17 03:03:20 UTC - Sijie Guo: awesome! glad to hear that.
----
2018-01-17 03:03:25 UTC - Sijie Guo: you are welcome
----
2018-01-17 16:07:37 UTC - Daniel Ferreira Jorge: Hi again! Are the kafka 
connectors compatible with pulsar using the kafka client wrapper?
----
2018-01-17 16:09:57 UTC - Matteo Merli: Some of them might be, some others not. 
It really depends on the connector implementation, because I've seen some 
assumptions made in some of them, for example around the topic name not 
including some characters...
----
2018-01-17 16:10:49 UTC - Matteo Merli: I think that most would work, 
especially if not doing fancy things with the API
----
2018-01-17 16:12:32 UTC - Matteo Merli: In one case (elasticsearch sink), I had 
to fix a couple of bugs in the connector, e.g.: race condition that was not 
happening with Kafka since the consumer assignment was taking few sec instead 
of few ms in Pulsar 
----
2018-01-17 16:20:55 UTC - Daniel Ferreira Jorge: Oh, ok, so tests have to be 
made for each connector... that is fine! Also, I would like to request 
something... When time permits, I think that an option to reset the cursor to a 
point in time, when the subscription is first created, before any messages have 
been consumed yet, would be great.
----
2018-01-17 16:24:56 UTC - Daniel Ferreira Jorge: Also, another question: is it 
possible to run bookkeeper with JBOD (meaning multiple ledgers/journal disks 
per bookie)?
----
2018-01-17 16:27:16 UTC - Matteo Merli: &gt;When time permits, I think that an 
option to reset the cursor to a point in time, when the subscription is first 
created, before any messages have been consumed yet, would be great.

Yes, that is something I heard more than once at this point 
:slightly_smiling_face:
----
2018-01-17 16:31:07 UTC - Matteo Merli: &gt; Also, another question: is it 
possible to run bookkeeper with JBOD (meaning multiple ledgers/journal disks 
per bookie)? 

Yes, though the default LedgerStorage that comes with Pulsar bookies 
(DbLedgerStorage which stores indexes in RocksDB) doesn’t support it (It’s a 
trivial thing to add, but since we always used it in combination with RAID, 
there was never the need to do it). Other BookKeeper ledger storage 
implementation (eg: the SortedLedgerStorage which is the default in BookKeeper) 
support multiple directories for ledger storage.
----
2018-01-17 16:32:30 UTC - Matteo Merli: For journal: yes in any case you can 
use multiple disks for journal. In case of fast SSDs, it’s also convenient to 
have multiple journals on the same disk, to take advantage of the multiple IO 
queues of the disk.
----
2018-01-17 16:49:48 UTC - Daniel Ferreira Jorge: Ok, thanks for the info! 
Another feature that I've seen around and would be nice to have (in the long 
term... I'm not familiar with pulsar's architecture but I think this is non 
trivial) Is a second storage layer (cheaper). For use cases that messages are 
not deleted, the messages could be sent to maybe S3 / Cloud Storage / HDFS. 
Something like "--send-to-cold-storage-after-retention-expires". Then, pulsar 
could have a interface that serves messages directly from this storage in case 
we need to reprocess messages (this I believe would be non trivial). Or maybe 
just a very simple producer that streams messages from the "cold storage" back 
into bookkeeper to be reprocessed...
----
2018-01-17 16:56:23 UTC - Matteo Merli: Oh, yes, you mentioned this earlier 
regarding Pravega. We thought about it for a very long time now, but never got 
around to actually do it. I think it’s mostly a prioritazation thing. The 
change it’s non-trivial but it shouldn’t be too hard to implement. It could be 
done even with a separate process that copies the data to S3/HDFS and then 
updates the metadata to reflect the data is not in a BK ledger anymore. All the 
rest can remain the same (eg: still referring to it as ledgerId/entryId) and 
this would be at the storage level, so it would be 0 changes in the broker 
dispatcher code.
----
2018-01-17 17:05:30 UTC - Daniel Ferreira Jorge: Oh, yes, I also think this 
have very low priority... It can be done by other means... for instance, 
sending all the messages to BigQuery and back, using DataFlow... that is how we 
do it with Cloud Pub/Sub to persist important topics...
----
2018-01-17 19:04:02 UTC - Allen Wang: Hey, question on routing mode for 
partitioned topics. What is the default configuration and what is used in the 
Kafka adaptor?
----
2018-01-17 19:06:23 UTC - Matteo Merli: The default is to use the hash of the 
key on a message. If the message has no key, the producer will use a “default” 
partition (picks 1 random partition and use it for all the messages it 
publishes). 

This is to maintain the same ordering guarantee when no partitions are there: 
per-producer ordering.
----
2018-01-17 19:06:44 UTC - Matteo Merli: The same applies when using the Kafka 
wrapper
----
2018-01-17 19:08:45 UTC - Allen Wang: I see. Thanks.
----

Reply via email to