Yesterday, I put the finishing touches on the FileStorageNode. This uses the StorageNode API to read/write Atomese s-expressions to a flat file. It's fast, its compact. It's 10x faster than using plain scheme (guile) to dump Atoms: this is thanks to code originally written by Alexey Potapov and Anatoly Belikov -- I wrote a wrapper around it to use the StorageNode API.
Some stats: I tested two datasets: a MOZI biology dataset, and a natural language dataset, of 7 million and 20 million Atoms, respectively. When these are loaded into the AtomSpace (in RAM), they take up 632 and 775 bytes/Atom of RSS (operating system resident set size). This is very typical for Atoms in the AtomSpace. (I put these two datasets up at https://linas.org/datasets/ for Amirouche.) Dumped to a file, this becomes 55 and 154 bytes/Atom, for plain, uncompressed Atomese s-expressions. When compressed with bzip2, it shrinks to 4 and 6 bytes/Atom! Tiny! Clearly, storing searchable indexes into the AtomSpace costs a huge amount of RAM. The actual data content in typical Atoms is .. tiny. See https://wiki.opencog.org/w/FileStorageNode and the demo in https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-store.scm -- Linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA34MAwQsHvrhzjtY4rHqUd6RBDAXt21kVzhp_U%3D_tw5JuQ%40mail.gmail.com.
