I was wondering about one part of your original comment, that we might want
to exclude large 'blobs' from the transaction log. Does this mean the main
difference between the current ideas for long strings and large blobs is
that the strings will be transaction safe, while the blobs are not? In
addition then, the HA support would presumably mirror the blobs in an
'eventually consistent' way, instead of using the logfile? I would think
that some non-acid, eventually consistent, approach to blobs makes a lot of
sense, but of course we need to be exposed very explicitly to the
users/developers, so they understand it.

Have I understood it?

I note that Rick talks about storage providers 'optionally participating in
transactions'. It seems the question of whether or not the blobs are atomic
is important, and potentially flexible?

On Fri, Feb 18, 2011 at 3:55 PM, Rick Bullotta <
[email protected]> wrote:

> Hi, Tobias.
>
> I had a few posts on this a couple weeks back.  Similar ideas to what
> you've
> suggested (in fact, we've done a blog, wiki, and forum engine using Neo as
> the datastore).  A few other design criteria I'd throw into the mix:
>
> - Pluggable BLOB/CLOB storage providers (e.g. FileSystem, Native Neo, S3,
> MongoDB, etc.)
> - The internal representation of the property might include:
>        - Storage provider name
>        - Storage provider key (that is meaningful to the underlying storage
> provider such as a file name, S3 GUID, neo node id, etc.)
>        - Including an optional mime type would be very useful (to make it
> easy to stream them via the REST API) - very useful for images, XML, JSON,
> media like video/sound, etc...
> - These properties should only cache the reference, never the content, in
> memory...
> - A variant of this capability that we use is storing JSON objects
> (currently in Neo). I could envision special traversal capabilities or even
> a "native" JSON property type in Neo
> - Ideally the storage providers could (optionally) participate in
> transactions
>
> Best,
>
> Rick
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On
> Behalf Of Tobias Ivarsson
> Sent: Friday, February 18, 2011 9:20 AM
> To: Neo user discussions
> Subject: [Neo4j] Better support for large property data
>
> Having tackled short strings, I feel up for taking a stab at long strings,
> and large binary data objects.
>
> I know that Rick Bullotta is really interested in this, and I can imagine
> others wanting to store large properties as well. I would love to get your
> input on the ideas I have, as well as hearing about the ideas you might
> have.
>
> The way I see it there are two different kinds of large data objects.
>
> The first one is long strings, or text. Imagine building a blog engine on
> Neo4j, the text body of a blog post is likely going to be around a thousand
> characters. That is a lot of blocks in the DynamicStringStore. But you
> still
> want to support shorter strings (the title of the post for example),
> without
> much overhead, so you don't want to increase the block size for the
> DynamicStringStore. In your code you want to deal with these values as
> String objects though, you don't want a different object type just because
> the string happens to be longer.
>
> The second one is large binary data objects. Data objects that are too
> large
> to want to have allocated as a String object, or even as a byte[] object.
> You want to manipulate them through some sort of streaming interface. These
> data objects are also so large that you would prefer if their content
> wasn't
> written to the transaction logs, because that would mean that Neo4j needed
> to rotate the log extremely frequently, and since you keep the logical logs
> for HA and backup, it would fill up your disks twice as quickly as it
> needed. Properties like this would, for example, be used for storing images
> that are included in the blog posts.
>
>
> For long Strings (the first point), the solution I'm thinking of is to
> replace the stringstore and arraystore with a smallstore and a largestore.
> Both being dynamic block stores as they are today, but with different block
> sizes. Then store both arrays and strings in both of these stores. The type
> of the data stored in the block is stored in the property record for the
> property that references the blocks anyhow, so there isn't a great
> advantage
> of having different block stores for strings and arrays.
>
> For BLOBs (the second point), we need additions to the API, since you want
> to work with these things in a streaming fashion.
> I am thinking that we use java.nio.channels.ReadableByteChannel for these
> properties. Why ReadableByteChannel you ask? Why not InputStream?
> First reason: InputStream can be converted to ReadableByteChannel, and vice
> versa:
>
> http://download.oracle.com/javase/6/docs/api/index.html?java/nio/channels/Ch
> annels.html
> Second reason: ReadableByteChannel is a really simple interface (only three
> methods) if you want to write your own custom implementation.
>
> Setting a BLOB property would then look like this:
>
> ReadableByteChannel myBlob = ...
> node.setProperty("a_blob", myBlob);
>
> Getting would look like this:
>
> ReadableByteChannel myBlob =
> (ReadableByteChannel)node.getProperty("a_blob");
>
>
> Perhaps we could then, also come up with some nice API for appending to a
> BLOB property:
>
> ReadableByteChannel moreData = ...
> ReadableByteChannel myBlob =
> (ReadableByteChannel)node.getProperty("a_blob");
> node.setProperty( "a_blob", BlobUtils.append(myBlob, moreData) );
>
>
> Comment please.
> --
> Tobias Ivarsson <[email protected]>
> Hacker, Neo Technology
> www.neotechnology.com
> Cellphone: +46 706 534857
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to