Slack digest for #general - 2018-02-23

Apache Pulsar Slack Fri, 23 Feb 2018 09:20:03 -0800

2018-02-22 19:16:18 UTC - Sijie Guo: @Masakazu Kitajo: I think @Matteo Merli 
setup a jenkins job for the slack digest job. so if you have jenkins access, 
you can access that
----
2018-02-22 19:17:09 UTC - Matteo Merli: Yes, I replied on the mailing list.. it 
was my mistake on the cron schedule…
----
2018-02-22 19:17:20 UTC - Matteo Merli: it went off every min for 1h
----
2018-02-22 19:18:27 UTC - Sijie Guo: oh i see
----
2018-02-22 19:25:14 UTC - Sijie Guo: @SansWord Huang sorry for late response. 
just saw your replies.

- you don’t need large capacity for journal disk. but it is critical to
latency, you probably want a hdd with battery-backup-unit or an ssd, since it
is doing fsyncs. ledger disks are basically the place where the data is
eventually stored. so you need to caculate that based on how many data you are
going to store.

&gt; data first written into journal disc then “flush” into ledger storage?

yes data is first written to journal disk and the write is responded once the
data is fsynced to journal disk. and the data is asynchronously indexed and
flushed back to ledger storage.

&gt; when will data be rebalanced?

if you are using bookkeeper in the log/messaging workload, you typically don’t
need data rebalance, because when a ledger is created, there are old ledgers
deleted because the data has been expired due to retention.

however if you are using bookkeeper for long term storage, you might need some
sort of data rebalance. There was a BP (bookkeeper proposal) in bookkeeper for
that purpose. but you can still do it using autorecovery to rebalance the data
manually.

so this topic depends on what are you using pulsar (bookkeeper) for.

&gt; how do I know my data is already replicated?

if a ledger is underreplicated, it will be listed at zookkeeper under an
`underreplicated` znode. there are bookkeeper CLI and also metrics for that as
well.
----
2018-02-22 19:57:02 UTC - Karthik Palanivelu: @Sijie Guo I cannot use the
deployment model directly from source code. ASG is autoscaling group which will
bring up a new instance in case of a node failure. In ZK case if a node fails a
new ZK node comes up with a different IP. This need to be updated as a ZK node
in pulsar by replacing the old IP. To avoid it AWS allows us to generate Static
IPs. We can use these IPs for ZK so that we can hard code it in Pulsar. In this
scenario if a ZK fails new ZK node comes up with a assigned IP. I am checking
is there a better way to handle this scenario?
----
2018-02-22 20:00:55 UTC - Matteo Merli: @Karthikeyan Palanivelu If you’re using
ASG with ZK nodes, you could also assign DNS names to each ZK server. That way
there’s no need to change the configuration in other ZK ensemble members when a
node is replaced with one with a different IP
----
2018-02-22 20:57:02 UTC - Karthik Palanivelu: @Matteo Merli we have a separate
system to manage DNS which makes one more point of failure.
----
2018-02-22 22:54:41 UTC - Matteo Merli: Sure, I was thinking more of the AWS
managed DNS
----
2018-02-23 06:14:58 UTC - SansWord Huang: @Sijie Guo Thanks for all answers,
these helps me a lot on understanding how Pulsar work.
----
2018-02-23 06:20:43 UTC - SansWord Huang: @SansWord Huang uploaded a file:
<https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error>
and commented: By an experiment today I put too many messages into Pulsar and
Bookies node shut down.
After extend storage they use, I've tried to restarted all bookies.
And here comes two problem:
1. I've skipped all message using pulsar-admin, when will disc space be
released?
2. one of my bookie node can not restart with the following error message, what
can I do?
----
2018-02-23 07:00:43 UTC - SansWord Huang: @SansWord Huang commented on
@SansWord Huang’s file
<https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error>:
The first question, once I’ve produce more messages, old ledgers will be
deleted and disk space is released.
----
2018-02-23 07:01:28 UTC - SansWord Huang: @SansWord Huang commented on
@SansWord Huang’s file
<https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error>:
The second, I still don’t know why, but I decide to delete this node’s journal
and ledgers and start again.
----
2018-02-23 07:18:56 UTC - Sijie Guo: @SansWord Huang sorry for late response.
just saw the message now.
----
2018-02-23 07:19:55 UTC - Sijie Guo: for the first question 1) the ledgers are
deleted on new ledgers rolled. new ledgers rolled based on time or size. so if
you produce new messages, it will triggered ledger rolling, it then will delete
ledgers that are all ready skipped.
----
2018-02-23 07:21:18 UTC - Sijie Guo: for the second questions 2) it seems
during replaying journal, it tries to replay the entries and it encountered
issues on inserting those entries. I am wondering if your disks were full at
that time?
----
2018-02-23 08:11:50 UTC - SansWord Huang: yes, I’ve noticed even I’ve expanded
the disc, it’s not enough for journal to replay.
so the quickest way is to delete data and restart this book keeper node.

but lesson I learned is that
1. I should really separate disc for journal and ledgers.
2. if not doing so, I should save some space for ledgers to be able to playback
while journal is growing.
----
2018-02-23 08:18:47 UTC - Sijie Guo: yeah i see
----
2018-02-23 10:43:31 UTC - Till Rathschlag: @Till Rathschlag has joined the
channel
----
2018-02-23 10:55:27 UTC - Till Rathschlag: Hello everybody, I'm currently
evaluating pulsar and I try to understand if it fits to the following usecase:
I like to use pulsar (among others) as a task queue. I want my task producer to
generate as many jobs as the consumers can work on, so I need some kind of
communication consumers -&gt; producer. I tried to build this with
acknowledging but noticed that this is only propagated to pulsar and not back
to the producer. So my question is, how would I do this? I thought about the
following:
- Provide some other topic for job acknowledging
- Monitor the ack-ratio from the producer service
Is pulsar the right tool for this? I would be glad if someone can share their
experience with this. Thanks in advance!
----
2018-02-23 16:58:33 UTC - Matteo Merli: @Till Rathschlag The primary function
of a messaging system is to decouple the producers and the consumer and that’s
way we don’t have correlation of consumers acks to producer
:slightly_smiling_face:

However, if you’re not requiring exact precision, you can try using backlog
quota to stop the producer.
You can configure a very low quota (eg: 10MB or 1MB…) and the default action is
to block the producers when the consumers accumulate that amount of “backlog”
in the queue.

I’m saying it’s not precise because the check for quota is only done
periodically in background (every 1min by default I think) for efficiency
reasons, so a user can go a bit over quota before getting stopped.

If you need a more finer control, you could use a 2nd topic. For example:
* Consumer gets a message, process it
* Consumer sends confirmation on the 2nd topic (referring to a particular
msgId for 1st topic)
* Consumer acks the message

Producer can do a kind of “semaphore” limiting the number of “in-processing”
messages, by waiting for confirmations on the 2nd topic. This could work even
if there are multiple producers, because you can ignore msg Ids that were
published by other producers
----

Slack digest for #general - 2018-02-23

Reply via email to