Hi all,

It works, the dsgc removes the blobs flawlessly.

The key was to have at least a referenced blob before delete the unreferenced one.

Thanks for your guidance,

Ruben Lozano

El 19/07/2019 a las 10:01, Amit Jain escribió:
Hi,

The 2nd case is when the version GC is needed which would delete the older
revisions from the node store.

The 3rd case is a bit of a problem. The problem is that DSGC takes a
paranoid approach in that if there are no blob references it complains with
an exception rather than wipe out the whole datastore (We might have had a
problem with the NodeStore collect references). I believe that's the right
approach though.

For your case the way you can test is to have some references which aren't
deleted such that the process will proceed. But If you not just testing but
know what such situations will be present in your application then rather
than running DSGC you can just list all the blobs ids from the DataStore
and delete them directly. Will that work for you?

Thanks
Amit

On Fri, Jul 19, 2019 at 1:13 PM Ruben Lozano <[email protected]>
wrote:

Hi,

I'm still trying to delete that blob when the nodes that reference it
are deleted.

Searching for delete the last reference to the blob I ended disabling
the node versioning, as I couldn't delete the node history version.


I have three cases right now:

- If I delete the node with versioning, run the version gc and the blob
gc, I ended with a frozenNode reference (protected) that avoids the blob
for being physically deleted by the blob gc.

- If I delete the node without versioning, avoid to run the version gc
and only run the blob gc, I ended with a reference from a logically
deleted node but the blob gc doesn't delete the blob.

- If I create the file without versioning, run the version gc and the
blob gc, I ended with a IOException "Marked references not available",
because the blob is not being referenced anywhere.


I thought that blob gc should delete unreferenced blobs, not throw an
exception.  I'm stuck, because I don't know in what case the gc is going
to delete the blob.


Thanks for your help,

Ruben Lozano


El 15/07/2019 a las 14:57, Ruben Lozano escribió:
Hi again,

First of all, thanks for your answer.


After the node deletion, I used the versionGarbageCollector before the
MarkSweepGarbageCollector as you said, but the blob still is being
referenced and therefore is not being deleted.

ns.getClock().waitUntil(ns.getClock().getTime() + 1000);
vGC.gc(0, TimeUnit.MILLISECONDS);

The node, and his children nodes that I added, are being deleted
properly by the version garbage collector but if I use the
checkConsistency:


Number of valid blob references marked under mark phase of Blob
garbage collection [2]

Blob garbage collection completed in 22.61 ms (22 ms). Number of blobs
deleted [0] with max modification time of [2019-07-15 14:33:06.117]


Probably I'm missing something, but if the file nodes are being
deleted shouldn't the blob references being completely deleted?

Thanks for your time,

Ruben Lozano


El 11/07/2019 a las 12:36, Amit Jain escribió:
Hi,

You need to run version GC before doing data store garbage collection
(dsgc) and is a pre-requisite for that.

You would need to call VersionGarbageCollector#gc to delete older node
reversions for dsgc to be effective. Do take a look at the test case
which
sets up the deleted nodes to be version collected before running
dsgc. The
version garbage collector uses a max age parameter which should be past
before it would collect corresponding nodes.

Also, there's a max age parameter for deleting only aged blobs which you
have set to 1ms so that should be ok.

Thanks
Amit

[1]

https://github.com/apache/jackrabbit-oak/blob/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoBlobGCTest.java#L152

On Thu, Jul 11, 2019 at 3:33 PM Ruben
Lozano<[email protected]>
wrote:

Hi, greetings from Spain

I have been working with oak for a month in a spring boot application
using the oak API to create the content repository service.

I can upload and download large files, but I have a problem with the
file delete services.

After invoking the node.remove and the session.save operations,  If I
try to get the file, the node is being deleted properly, but in the
blobs collection the file space remains occupied there.

In order to empty the deleted blob node I have tried to use the
VersionGarbageCollector and the MarkSweepGarbageCollector, but none of
those worked.

The way I've been calling the MarkSweepGarbageCollector is:

MarkSweepGarbageCollector gc = new MarkSweepGarbageCollector(new
DocumentBlobReferenceRetriever(documentNodeStore),
(GarbageCollectableBlobStore) documentNodeStore.getBlobStore(),
(ThreadPoolExecutor) Executors.newFixedThreadPool(1), ADMIN, 5,
1,"mongodb://" + "localhost" + ":" + PORT);

gc.collectGarbage(false);


The collector can find the proper blobs but they're not being deleted:

Collected (115) blob references

Number of valid blob references marked under mark phase of Blob garbage
collection [138]

Number of blobs present in BlobStore : [23]

Blob garbage collection completed in 56.33 ms (56 ms). Number of blobs
deleted [0] with max modification time of [2019-07-11 10:22:18.875]


I'm sure I'm doing something wrong, maybe I need to create a new
session
or mark the blob for deletion somehow.


Thanks for your help.





Reply via email to