Thanks for the great comments! Keep them coming! I've responded with my
thoughts from the comments provided so far, and would like to keep this
discussion going for another round or two. This should be a feature YOU all
want to use, and therefore designed the way you want to use it, not the way
I first imagined it to be implemented.

I'm commenting on several emails in one and the same email, inline below:

On Fri, Feb 18, 2011 at 3:55 PM, Rick Bullotta <
[email protected]> wrote:

> Hi, Tobias.
>
> I had a few posts on this a couple weeks back.  Similar ideas to what
> you've
> suggested (in fact, we've done a blog, wiki, and forum engine using Neo as
> the datastore).  A few other design criteria I'd throw into the mix:
>
> - Pluggable BLOB/CLOB storage providers (e.g. FileSystem, Native Neo, S3,
> MongoDB, etc.)
>

With FileSystem or NativeNeo being the default one (don't know what the
difference between those two would be).

As a general rule of thumb, I am very skeptical about pluggability in things
that concernes the storage format. But it might be worth exploring in this
case.


> - The internal representation of the property might include:
>        - Storage provider name
>        - Storage provider key (that is meaningful to the underlying storage
> provider such as a file name, S3 GUID, neo node id, etc.)
>        - Including an optional mime type would be very useful (to make it
> easy to stream them via the REST API) - very useful for images, XML, JSON,
> media like video/sound, etc...
>

Yes, mime type would be extremely useful. Thank you for that suggestion.
Sadly that also means that we would need something other than
ReadableByteChannel for the properties, something that also specifies the
mime type.


> - These properties should only cache the reference, never the content, in
> memory...
>

For sure.


> - A variant of this capability that we use is storing JSON objects
> (currently in Neo). I could envision special traversal capabilities or even
> a "native" JSON property type in Neo
>

I have been thinking of native JSON support as a completely separate
feature. It is on my list of other property types that would be useful:
* BLOBs (the concerne of this email)
* JSON like objects. Defined as something like this bastardized
Java-generics / BNF hybrid:
  jsonObject := Number | String | Map<String, jsonObject> | List<jsonObject>
* Dates, transferred as java.util.Date. This type should probably be added
to the JSON-like-object above as well.


> - Ideally the storage providers could (optionally) participate in
> transactions
>
>
I think they always should. At least to some extent. The way I've been
thinking about storing blob properties in a transactionally safe way is:
* On setProperty(key, BLOB):
  1. Get unique ID from the BLOB store.
  2. Write the data to the blobstore under the given ID
     With the native store this would be:
     a) Get filename (ID maps uniquely to a filename)
     b) Write the data to the file.
     c) Optionally flush the file (if flush isn't done here it has to be
done on commit)
  3. Store the blob ID as reference to the blob in the property record for
the tx.
* On commit:
  1. Write the blob ID from the blobstore to the tx-log
  2. Write the blob ID from the blobstore to the property record
* On rollback:
  1. Remove the blob with the given ID from the blobstore
  2. Mark the ID as free in the blobstore
* In HA and (incremental) backup:
  * When transferring a transaction log, the blobstore files that were
written during that transaction needs to be transferred as well, this might
require a slightly more involved method of copying the transaction log.
Currently we just read all bytes, we might have to visit each command and
enlist the blobstore files for each blob property command. This shouldn't
add much overhead to what we currently do, since we visit each command to
search for the Done command.

The uniqueness of the ID generator for the blobstore, with the required
flushing of the blob files should make this transactionally safe, without
requiring writing the blob data to the transaction log. The tricky part
comes with ID reuse for the blobstore. To conserve space when removing or
overwriting a blob property one would want the previous blob file to be
removed (when the transaction commits of course, never earlier than that).
The problem with that would be for backups. If you want to be able to roll
forward from a backup to a particular point in time (that is not the latest
time at which a backup was taken), you need to keep the removed blob files
around until they have been backed up. This would be done by renaming them
instead of removing them, so that the backup process can still reach them,
while not being in the way for normal operation. This would be done in the
normal way on unix-systems (hardlink on prepare, unlink of previous
reference on commit, unlink of new on rollback), for Windows I have no idea
how this would work, does Windows filesystems have hardlinks?


On Fri, Feb 18, 2011 at 4:39 PM, Craig Taverner <[email protected]> wrote:

> I was wondering about one part of your original comment, that we might want
> to exclude large 'blobs' from the transaction log. Does this mean the main
> difference between the current ideas for long strings and large blobs is
> that the strings will be transaction safe, while the blobs are not? In
> addition then, the HA support would presumably mirror the blobs in an
> 'eventually consistent' way, instead of using the logfile? I would think
> that some non-acid, eventually consistent, approach to blobs makes a lot of
> sense, but of course we need to be exposed very explicitly to the
> users/developers, so they understand it.
>
> Have I understood it?
>

No, I would still want ACID guarantees, and I've outlined how I envision
this working above.

As I also outlined, in HA the blob files would be transfered along with the
transaction logs for which they were written, so from HA's point of view it
would be as if the blob data *was* included in the transaction log.


>
> I note that Rick talks about storage providers 'optionally participating in
> transactions'. It seems the question of whether or not the blobs are atomic
> is important, and potentially flexible?
>

I don't intend on including multiple blobstore providers in my first spike
of this. There compelling reason for doing so in a later phase (instead of
just deciding that we should use the built in native filesystem based
blobstore) is that it would allow us to use another store that is tailored
for handling these kinds of data, and allow us to focus on being kickass at
the graph aspect of the data.

Another compelling feature about using a distributed K/V store such as S3 or
MongoDB for storing blobs is that we in HA wouldn't have to replicate the
blob data, since it would already be available for distributed access from
the distributed K/V store.


On Fri, Feb 18, 2011 at 5:24 PM, Paul A. Jackson <[email protected]>
 wrote:

> Does it go without saying that when this is implemented that a neo instance
> would still be able to open a graph from a prior version?  Would this be an
> automatic one-time conversion, or would there be a utility that would
> convert from one format to the other, or something else?
>

A very valid question indeed. Since this would introduce a brand new
property type there would be no conflict with anything that might have
existed in a previous version of a Neo4j store. When moving to a new version
of Neo4j that has support for this kind of properties all data from the
previous version would of course work without any need for upgrade. The
reverse is of course not true. If you have used a newer version of Neo4j,
with support for blob properties you would not be able to go back to a
version that lacks this support. In theory it would of course be possible to
do so if the blob property feature had not been used, but it is my goal that
no one should ever want to migrate from a newer to an older version of
Neo4j.

The same is of course true for the short string feature, in the discussion
thread of which this discussion about blob properties and large string
properties was first brought up. All strings stored with the previous
version would of course be readable with the new version, but as you start
writing strings with the new version, some of them (if short enough) will be
stored differently from how they would have been stored with the previous
version. So if you had stored the string "hello" with Neo4j 1.2 it would,
with Neo4j 1.3, be read from the DynamicStringStore, as previously. Nothing
would change with that string. But if you store the string "hello" with
Neo4j 1.3 it will be stored as a short string, without involving the
DynamicStringStore. We are considering creating a migration tool that would
compress the DynamicStringStore of an existing Neo4j installation by
converting to short strings where possible.

The blob property feature would upgrade in a similar way. Except there is of
course no conversion to be done, so the migration tool would not have to be
involved.

For the change from one stringstore and one arraystore to a largestore and a
smallstore, there would have to be a migration tool involved, since the
previous stringstore and arraystore would not exist in the new store
version. Unless we decide to try and do something clever with the property
types to keep those around for backwards compatibility. Something we could
of course do. The Neo4j PropertyStore currently uses 12 different property
types, and with a byte to encode the property type we could have up to 256
property types, we could use this to have one type for DynamicStringStore
string (the current type) and another one for LargeStoreString and yet
another for SmallStoreString, to preserve backwards compatibility without
the need for a migration tool. Writing new strings would the of course only
write LargeStore string and ShortStore string, but when reading all three
types would be possible. The drawback of this would be that memory mapping
would be spread over four files instead of only two.

Keep the comments coming.

Cheers,
Tobias

 -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On
> Behalf Of Tobias Ivarsson
> Sent: Friday, February 18, 2011 9:20 AM
> To: Neo user discussions
> Subject: [Neo4j] Better support for large property data
>
> Having tackled short strings, I feel up for taking a stab at long strings,
> and large binary data objects.
>
> I know that Rick Bullotta is really interested in this, and I can imagine
> others wanting to store large properties as well. I would love to get your
> input on the ideas I have, as well as hearing about the ideas you might
> have.
>
> The way I see it there are two different kinds of large data objects.
>
> The first one is long strings, or text. Imagine building a blog engine on
> Neo4j, the text body of a blog post is likely going to be around a thousand
> characters. That is a lot of blocks in the DynamicStringStore. But you
> still
> want to support shorter strings (the title of the post for example),
> without
> much overhead, so you don't want to increase the block size for the
> DynamicStringStore. In your code you want to deal with these values as
> String objects though, you don't want a different object type just because
> the string happens to be longer.
>
> The second one is large binary data objects. Data objects that are too
> large
> to want to have allocated as a String object, or even as a byte[] object.
> You want to manipulate them through some sort of streaming interface. These
> data objects are also so large that you would prefer if their content
> wasn't
> written to the transaction logs, because that would mean that Neo4j needed
> to rotate the log extremely frequently, and since you keep the logical logs
> for HA and backup, it would fill up your disks twice as quickly as it
> needed. Properties like this would, for example, be used for storing images
> that are included in the blog posts.
>
>
> For long Strings (the first point), the solution I'm thinking of is to
> replace the stringstore and arraystore with a smallstore and a largestore.
> Both being dynamic block stores as they are today, but with different block
> sizes. Then store both arrays and strings in both of these stores. The type
> of the data stored in the block is stored in the property record for the
> property that references the blocks anyhow, so there isn't a great
> advantage
> of having different block stores for strings and arrays.
>
> For BLOBs (the second point), we need additions to the API, since you want
> to work with these things in a streaming fashion.
> I am thinking that we use java.nio.channels.ReadableByteChannel for these
> properties. Why ReadableByteChannel you ask? Why not InputStream?
> First reason: InputStream can be converted to ReadableByteChannel, and vice
> versa:
>
> http://download.oracle.com/javase/6/docs/api/index.html?java/nio/channels/Ch
> annels.html<http://download.oracle.com/javase/6/docs/api/index.html?java/nio/channels/Channels.html>
> Second reason: ReadableByteChannel is a really simple interface (only three
> methods) if you want to write your own custom implementation.
>
> Setting a BLOB property would then look like this:
>
> ReadableByteChannel myBlob = ...
> node.setProperty("a_blob", myBlob);
>
> Getting would look like this:
>
> ReadableByteChannel myBlob =
> (ReadableByteChannel)node.getProperty("a_blob");
>
>
> Perhaps we could then, also come up with some nice API for appending to a
> BLOB property:
>
> ReadableByteChannel moreData = ...
> ReadableByteChannel myBlob =
> (ReadableByteChannel)node.getProperty("a_blob");
> node.setProperty( "a_blob", BlobUtils.append(myBlob, moreData) );
>
>
-- 
Tobias Ivarsson <[email protected]>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to