2019-01-02 12:55:24 UTC - Vincent Ngan: Any examples for me to write my own 
schemas?
----
2019-01-02 15:50:53 UTC - Sijie Guo: 
<https://github.com/apache/pulsar/tree/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema>

all the existing schemas are available under this directory.
----
2019-01-02 16:32:15 UTC - Emma Pollum: Can someone explain the difference 
between an unacked message and the backlog in pulsar?
----
2019-01-02 16:38:30 UTC - Sijie Guo: backlog is measured using the number of 
unacked messages
----
2019-01-02 16:54:31 UTC - Emma Pollum: Interesting. I have a function worker 
that has created a topic _compaction under public/functions that is showing 2 
unacked messages but no backlog for the subscription.
----
2019-01-02 17:04:59 UTC - Shalin: I ran some performance testing using 
`pulsar-perf produce my-topic`, it filled up my disk with ledgers - 
obviously(or not). I am trying to delete the ledgers using `pulsar-admin 
namespaces clear-backlog public/local/default`, which deletes the topic, but 
the journal and ledgers still remain. Is there anything else that prevents the 
ledgers from being deleted?
----
2019-01-02 18:11:08 UTC - Sijie Guo: can you share me the details how you have 
learned “2 unacked messages but no backlog”?
----
2019-01-02 18:43:51 UTC - Sijie Guo: I think a few things to be noted here:

- journal and ledgers are garbage collected lazily in the background. so it 
might not be reflected immediately.

to understand how data deletion is working, brokers deletes data by segments. 
only when the segments are deleted, the bookies will garbage collected their 
data.

so when you clear-backlog, it most likely only move cursors to latest. after 
moving the cursors, it will start deleting segments, the deletion will be 
honoring to the retention policy.

in order to troubleshoot this, I would suggest you using `pulsar-admin topics 
stats-internal` to list the internal stats of your `my-topic`.
----
2019-01-02 18:50:12 UTC - Shalin: Gotcha. But it says `my-topic` doesn't exist 
anymore after I ran `clear-backlog`
----
2019-01-02 18:51:10 UTC - Sijie Guo: okay. can you check the logs of bookies to 
see if it is doing garbage collection?
----
2019-01-02 18:57:15 UTC - Shalin: Nope. Nothing there.
----
2019-01-02 18:57:45 UTC - Sijie Guo: can you show me the latest few lines?
----
2019-01-02 18:59:58 UTC - Shalin: ```
Jan  2 18:59:25 ip-10-0-1-244 pulsar: at 
org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) 
[org.apache.pulsar-managed-ledger-original-2.2.0.jar:2.2.0]
Jan  2 18:59:25 ip-10-0-1-244 pulsar: at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
[org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
```
----
2019-01-02 19:01:27 UTC - Shalin: ```
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 
[GarbageCollectorThread-17-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Enter minor compaction, 
suspendMinor false
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 
[GarbageCollectorThread-17-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Do compaction to compact 
those files lower than 0.2
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 
[GarbageCollectorThread-17-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Compaction: entry log 
usage buckets[10% 20% 30% 40% 50% 60% 70% 80% 90% 100%] = [0, 0, 0, 0, 0, 0, 0, 
0, 0, 0]
```
----
2019-01-02 19:02:01 UTC - Shalin: ^ Thats the last Garbage collector lines.
----
2019-01-02 19:02:22 UTC - Sijie Guo: entry log usage buckets[10% 20% 30% 40% 
50% 60% 70% 80% 90% 100%] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

it seems that the compaction (garbage collection) is happening. is the logging 
before your test?
----
2019-01-02 19:02:40 UTC - Shalin: No after
----
2019-01-02 19:03:13 UTC - Sijie Guo: then the logs should have been garbage 
collected. did the disk usage went down?
----
2019-01-02 19:03:47 UTC - Shalin: No, its constant.
----
2019-01-02 19:04:18 UTC - Sijie Guo: can you

`ls data/bookkeeper/ledgers/current` and `ls data/bookkeeper/journal/current`
----
2019-01-02 19:05:10 UTC - Shalin: ```
total 974M
drwxr-xr-x. 4 root root  109 Jan  2 17:12 .
drwxr-xr-x. 3 root root   21 Jan  2 14:57 ..
-rw-r--r--. 1 root root 974M Jan  2 17:12 3.log
-rw-r--r--. 1 root root    0 Jan  2 15:53 4.log
-rw-r--r--. 1 root root    2 Jan  2 15:53 lastId
-rw-r--r--. 1 root root   16 Jan  2 17:12 lastMark
drwxr-xr-x. 2 root root  123 Jan  2 14:57 ledgers
drwxr-xr-x. 2 root root 4.0K Jan  2 18:02 locations
-rw-r--r--. 1 root root  162 Jan  2 14:57 VERSION
```
----
2019-01-02 19:05:43 UTC - Shalin: ```
total 4.1G
drwxr-xr-x. 2 root root   90 Jan  2 16:47 .
drwxr-xr-x. 3 root root   21 Jan  2 14:57 ..
-rw-r--r--. 1 root root 2.1G Jan  2 15:50 1680f11ba0f.txn
-rw-r--r--. 1 root root 2.1G Jan  2 16:47 1680f11ba10.txn
-rw-r--r--. 1 root root  64M Jan  2 16:47 1680f11ba11.txn
-rw-r--r--. 1 root root  162 Jan  2 14:57 VERSION
```
----
2019-01-02 19:07:46 UTC - Sijie Guo: okay -

the latest few journal files will be kept - you can tune the setting 
`journalMaxBackups` : 
<https://github.com/apache/pulsar/blob/master/conf/bookkeeper.conf#L293>

if you don’t want to keep those journal files
----
2019-01-02 19:09:02 UTC - Sijie Guo: in the ledgers directory, it seems that 
`3.log` was deleted : -rw-r--r--. 1 root root 974M Jan  2 17:12 3.log

can you search the log of bookie to see if there any logging message regarding 
`3.log`?
----
2019-01-02 19:11:07 UTC - Shalin: The last one :
```
Jan  2 16:00:18 ip-10-0-1-185 pulsar: 16:00:18.209 [db-storage-15-1] INFO  
org.apache.bookkeeper.bookie.EntryLogger - Flushing entry logger 3 back to 
filesystem, pending for syncing entry loggers : [BufferedChannel{logId=3, 
logFile=data/bookkeeper/ledgers/current/3.log}].
```
----
2019-01-02 19:12:52 UTC - Shalin: Ok, there seems to be something wrong with 
the functions I registered `ImportError: No module named _pulsar` - do you 
think that could throw off the garbage collection by the bookie. The bookie and 
broker are on the same node
----
2019-01-02 19:13:52 UTC - Shalin: Maybe signaling `bookie` that the ledgers can 
be deleted?
----
2019-01-02 19:14:03 UTC - Sijie Guo: well it should not be related. because 
these are different components. what bookie garbage collection cares is whether 
the ledgers are deleted or not.
----
2019-01-02 19:14:20 UTC - Sijie Guo: are you using one topic or multiple topics?
----
2019-01-02 19:14:29 UTC - Sijie Guo: can you make sure topics are deleted?
----
2019-01-02 19:15:03 UTC - Sijie Guo: by that mean, can you just run 
`bin/pulsar-admin namespaces topics &lt;namespace&gt;` to check if your topic 
is still there?
----
2019-01-02 19:16:16 UTC - Shalin: Yup. I ran `pulsar-perf` on `my-topic` and 
`public/default` is empty
----
2019-01-02 19:17:13 UTC - emersontres: @emersontres has joined the channel
----
2019-01-02 19:19:27 UTC - Sijie Guo: okay then can you restart the bookie? so 
the gc should be restarted in around 10 minutes, I would like to see if it 
garbage collects 3.log.
----
2019-01-02 19:22:11 UTC - Shalin: :thumbsup:
----
2019-01-02 19:28:32 UTC - Shalin: Is it just `bookieformat` and `initbookie`?
----
2019-01-02 19:29:20 UTC - Sijie Guo: no
----
2019-01-02 19:29:30 UTC - Sijie Guo: just restart the bookie process
----
2019-01-02 19:29:48 UTC - Sijie Guo: `bookieformat` will reformat the whole disk
----
2019-01-02 19:30:13 UTC - Shalin: Ah ok
----
2019-01-02 19:52:33 UTC - Shalin: I get an `InvalidCookieException` when 
restarting.
----
2019-01-02 19:52:41 UTC - Shalin: ```There are directories without a cookie, 
and this is neither a new environment, nor is storage expansion enabled. Empty 
directories are [data/bookkeeper/journal/current, 
data/bookkeeper/ledgers/current]
```
----
2019-01-02 19:53:12 UTC - Sijie Guo: are you running inside docker?
----
2019-01-02 19:53:21 UTC - Sijie Guo: it seems that your `data` directory is 
gone?
----
2019-01-02 19:57:15 UTC - Shalin: It seems be there
----
2019-01-02 19:57:36 UTC - Shalin: Ah, nevermind.
----
2019-01-02 19:57:43 UTC - Shalin: It was a path issue.
----
2019-01-02 19:57:57 UTC - Sijie Guo: ok
----
2019-01-02 20:26:15 UTC - Shalin: Still there.
----
2019-01-02 21:13:26 UTC - Emma Pollum: I've been looking through the  
prometheus metrics
----
2019-01-02 21:16:35 UTC - Sijie Guo: Can you share the bookie log file, so I 
can exam later?
----
2019-01-02 21:18:04 UTC - Sijie Guo: Can you try run “pulsar-admin topics 
stats-internal” to get details of that topic?
----
2019-01-02 22:55:55 UTC - Emma Pollum: I've already processed the messages, and 
have not had it reproduce. Is it a 1:1 relationship backlog:unacked ?
----
2019-01-03 02:38:58 UTC - Vincent Ngan: Thanks
----
2019-01-03 03:04:42 UTC - Sijie Guo: No. Unack is the messages have been 
delivered and no acked; the backlog is both unacked and undelivered.
----
2019-01-03 03:05:59 UTC - Sijie Guo: So typically if there are unacked 
messages, backlog should not be zero. Unless there something not accurate 
regarding metrics collection 
----

Reply via email to