Re: [Neo4j] Brainstorming on my project: neo4john

John cyuczieekc Sun, 31 Jul 2011 10:46:11 -0700

Hey Niels, thanks for the concise reply.

On Sun, Jul 31, 2011 at 5:10 PM, Niels Hoogeveen
<pd_aficion...@hotmail.com>wrote:


>
> Hi John,
>
> I think when approaching a project there are two distinct issues at play,
> one is the tooling level,
> another is the actual solution you are trying to create for an actual
> problem.
>
> I seem to want a generic solution for multiple problems.
Something generic enough that it can be applied specifically. The tooling
(if I understand this right) should be able to be used by the user to mould
to his own needs, sort of like java and/or eclipse can be used to build
whatever program the user wants to code.
  I want this to be a foundation, such that supposedly I could spend 99% of
the time inside this doing my work, rather than using the OS and its
applications...

When looking at the tooling level it is great to have as much covered as
> possible.
> Neo4j offers a graph database and pretty good integration with Lucene.
> This overall is a good choice of tools, because there is hardly any
> overlapping functionality.
> Neo4j offers storage and navigation, while Lucene provides indexing. So the
> tools are pretty much orthogonal to each other.
>
> When adding BDB to the mix, things become a bit messier. BDB offers
> indexing and storage,
> so now you have to decide what to use BDB for. If you choose to only use it
> for indexing,
> like an alternative to Lucene, things remain pretty much orthogonal.
>
as I understand it here that's exactly what bdb-index is supposed to do for
you (replace or be similar in interface/usage as Lucene index) for your
graph-collections

> When you decide to use BDB for storage, the question becomes: what to store
> in Neo4j and what to store in BDB.
>
the way it is right now, I could implement what I got, in either bdb or in
neo4j; i don't need both;
btw, lucene index seems as fast as bdb possibly 1ms slower, from what I've
tested (granted that it was superficially tested)

>
> When it comes to storing and retrieving properties to entities both seem to
> be pretty fast, and unless you have serious performance issues with the
> storage of properties, either Neo4j or BDB is suitable for the task.
>
> When it comes to storing relationships between entities, Neo4j is by far
> the better solution. Fetching a relationship is a really cheap action, since
> it "only" involves moving a file pointer to a certain position (id * record
> length) and read the record (ie. if that data is not available in the cache
> already).
>
as I've seen it though, I need to use an index (ie. lucene) such that I
could check with neo4j if A->B exists
where A has 1 million outgoing relationships to 1 million different nodes of
which B is one
else, it's over 700ms to a few seconds by using findSinglePath
(of course I might've missed something)
however when using an index then ~1ms


> When having a relationships it is also cheap to fetch the associated nodes
> (again moving a file pointer to a position, or read it from the cache). And
> while we are at it, when having a node or a relationship, it is again cheap
> to fetch the properties associated to that node.
>
> The motto of Neo4j seems to be, keep it local stupid.
>
> This works great, unless things are not local and this is where indexing
> comes into play.
>
> Suppose we know a name or a certain value and want to know what nodes or
> relationships it is associated with, doing a local search becomes
> ineffective. We could iterated over all nodes (and or all relationships) and
> check for that particular value, but that doesn't scale beyond a couple of
> thousand nodes or relationships.
>
that is one use case that I need, but this search is done by bdb ~0ms
instead of me doing any iterations via java code
though in my case value is either the string name of a Node or just another
node id

>
> One option could be to do the indexing in the graph. We could create a node
> that can easily be addressed through the reference node, that functions as a
> tree root and traverse over he index to find a particular node or
> relationship.
>
did you do this btw, with SortedTree? or similar, within graph-collections
? I admit I only superficially skimmed it at some point and notice some
acquireLock() method that attracted my attention - unrelated

>
> It works, but is not as fast as dedicated indexing. A dedicated index will
> fetch index blocks in one read operation and manipulate those index blocks
> in memory, where an index build in Neo4j would model an index block as a set
> of nodes that need to be read one after another (and likely from very
> different places in the store). So a dedicated index is more local than
> Neo4j can be when manipulating the index trees.
>
that lucene index ie. RelationshipIndex works rather well 0 to 1ms results
with is, similar to bdb, so using it would be a must for me, assuming I have
millions of relationships :)


> A dedicated index will win hands down from Neo4j when it comes to raw speed
> of an index lookup/manipulation and likely consume less memory doing so.
>
> Neo4j already supports Lucene, which is great for certain jobs (full text
> indexing, composite queries), but is probably (I would have to run tests to
> verify this assumption) slower than BDB when it comes to simple key-value
> mappings. Lucene is also not very good at handling unicity constraints, an
> area where a more regular key-value store like BDB has advantage too.
>
from my superficial tests seems only 1ms slower than bdb

>
> All this is just about the tooling level of an application (fun in its own
> right, but it doesn't solve any real problems). Things become more
> interesting when we start looking at an actual application.
>
I kind of agree, but from where I stand this tooling level seems like 90% of
the entire job, I mean, once I have the foundating it's all about playing
with it and adding the data(nodes) which depending on their context can be
realtime programs running on their own

>
> So my question is, what use cases do you want to solve with your neo4john
> project.
>
Any :)
I don't know, anything that can be built upon a layer that is based on
connectivity/accessibility, so practically anything I'd say, we should be
able to model anything (without regard to how much it consumes in bytes) it
should be doable, but at the same time the modeling itself would be the
source code the javadoc and the binary , all in one :)
 Ok so big words, but, I don't know what use cases to think of, I would
imagine my entire operating system built on this, disregard how slow it
would be for now, just imagine it is built upon this tooling, the use cases
and the tooling are practically one, such that you can built another tooling
on top of another tooling and that tooling can help you build some
application or other tooling ...

>
> Your example with buttons on a screen is a bit too high level, because it
> contains a lot more tooling than just neo4j and or BDB. You would need
> presentation (GUI or HTML) and reactiveness (how to respond to input) and
> you would need to somehow model your domain.
>
yeah definitely those would be required to be built upon this, ...
I mean, starting bottom-up like that, we'd see how and what needs to be
built, like for example, the java code could parse a portion of the graph
and execute it as if it were a script, but all the commands and inputs and
outputs would be in the graph domain
- the code is in the graph - represented by nodes and their rels
- the input could be 1-to-1 mapped from keyboard and mouse directly in-graph
such that when you press keys, they would be appended to some ordered list
and in-graph-programs would be able to listen on those events when a list
got an element appended or specifically this keybuffer list
- the output could be newly created nodes or stuff like that
basically could map the computer software and how it all works right now, on
top of a graph, such that it's more accessible and you could see the
structure. Like right now, if you would pick a random RAM address from
memory, would be somewhat tough to tell which program is using it and from
that program which structure which function and so on, but if the foundation
would've been a graph we'd just parse the parents ie. incoming rels for that
byte so to speak, which would likely be just a node not a byte...
  A graph could be alive, change its structures just like RAM changes its
contents in response to programs, but with a graph I would be able to see
exactly how it's composed and fiddle with it and most importantly understand
it (almost)just by parsing it.

>
> So my suggestion would be to first list a couple of real world scenarios
> you want to solve with your neo4john project and then look at your tooling
> to see what trade-offs you need to make to implement it. You may need a mix
> of Neo4j, Lucene and BDB, but maybe you don't need all three to solve your
> particular problem.
>
by tooling you mean neo4j, lucene or bdb ? or you mean the foundation ie.
the final program that would be fed some input data and expected to get some
output data ? in which case I'm not sure which would be the tooling which
the data lol

real world scenarios... like I don't know make any system, like a program
(which say i'd usually do in java) but make it on top of neo4john, that is
neo4john would be my foundation on which I make and base my program, say
make it networking chat program or something that would create a 3D object;
the tooling should be able to say interpret the graph and create what it
says, I guess in a way it would be like an accessibility framework whatever
this means heh, like 1-to-1 mapping between java code and graph nodes, such
that manipulating graph-nodes would yield some result in the 3D world or ...
well more like the the in-graph data is a higher level mapping on top of the
lower level java/OS stuff for starters...
 still seems too generic as an example.... I don't know what examples to
give you, you try and give an example and I'll see if it should be supported
xD
but generally, if you think about it we kind of live in a hierarchical world
and we're using our brains to sort of cache it, make a copy of it or parts
of it and even we're creating things that don't exist in real world, in our
brains; so something along those lines I want neo4john to be, a framework
for creating stuff which ultimately at the lowest level are interconnected
somehow, ie. start from any point in the (big) system and be able to parse
and understand the entire system (sure it would take you a while, since
you're kinda caching it in your brain for that purpose of understanding it),
but anyway, I kind of need this to be a facilitator for me to be able to
design stuff in a connected way; as I see it with computers today stuff that
isn't connected in the computer requires you the user to hold the
connections into your brain, such that an intelligent computer wouldn't have
the connections and would have to compute them somehow (say like our brain
did it).

ok, what was I talking about ? :))
thing is, as you can tell, I cannot yet adequately grasp many things, and
what I want is one of them;
  I just need to know I can have this framework of connectivity upon which I
could build say a system or another framework or whatever, as long as I do
not need any external information about how the system works, because I can
deduce how it works by looking at it (you know bin+src+doc in one)
 So it should probably handle many use cases, Nodes would be symbols for
certain things, ie. can use one Node to mean your green car, or a node to
mean the letter a in that specific font, or just the global letter a, pretty
much how we use language we use a word to mean a lot of things, a word which
could be represented by one node, any multiple meanings could be added to it
and even the specific meanings could be picked up being represented by other
nodes which are clearly connected to that earlier node.
  I cannot visualize all of that yet but maybe going bottom-up I could get
there, having each level be consistent within itself and building upon it,
one system built upon another heh quoting matrixes here

For now what I kind of intended to do is all that, starting from key->value,
well actually starting from just Nodes, build one level on another, and get
to Pointer, Set, Ordered List, ... then something like hooks/events, see
about locks or multiversioning/snapshots - i think i brainstormed my way out
of locks once or twice without compromising parallelization and as always
keeping consistency on all levels/layers which is crazy i kno;

I think that once the tooling is done right, the application level is rather
I don't know irrelevant somehow, because anything should be able to be built
upon/with the tooling ... like for example with java, once java is
completed(jvm java.exe or wtw) any program could be built upon it...

>
> In any case, it's important to rise above the tooling level, because that
> is only a means to a goal. Even if your project provides additional tooling,
> there is still an application level to it. Focusing on the application level
> is good practice, because only there do you actually provide solutions.
>
still not sure which is tooling which application :)
for example in this gmail email that I'm writing I would think application
is the email that I am writing(all the specific text and data) and tooling
is the framework allowing me to do so? or it could be something like the
entire email interface with the fact that it has this and that buttons, and
the tooling is the framework which allows it to add this and that buttons...
i lost it :)



>
> Niels
>
>
> > Date: Sun, 31 Jul 2011 15:09:20 +0200
> > From: cyuczie...@gmail.com
> > To: user@lists.neo4j.org
> > Subject: [Neo4j] Brainstorming on my project: neo4john
> >
> > Hey guys,
> >
> > I've been thinking that I would like to have a topic (like this current
> one)
> > where I would be allowed to post anything related to brainstorming on my
> > project which is currently a mix of neo4j and berkeleydb java edition.
> That
> > is, I would like to start from scratch and explain and explore ideas,
> where
> > anyone could step in and say what on their mind, especially with notes on
> > how that would be better with neo4j rather than berkeleydb.
> >
> > But I'd like to know if this is a good idea to do here, and if any neo4j
> > people would allow me to do this here. This would probably mean you'd
> > receive lots of emails with this subject and Re: this subject, which you
> may
> > not want to receive, in which case I would suggest a filter to ignore
> such
> > emails (easily done within gmail for example) - but be sure not to ignore
> > the sender which is always user@lists.neo4j.org for any topic/subject
> not
> > just this one. So, anyone could potentially ignore my emails that I send
> > here, should they be annoyed or they be too many too soon.
> >   Still, I would not do this unless most (if not all) of you (mainly
> neo4j
> > devs I'd say) agree to allow me to post here. I would post replies only
> to
> > this topic... well you get the idea :)
> > Though you should reserve the right at any time to say stop if you don't
> > want me to post anymore (due to ie. too frequent post, too dumb content,
> > content seems like noise and doesn't help anyone) - that is, in the case
> you
> > allow me to post :) - so if allowed, please reply and say so, otherwise
> if
> > no replies with allowed or not, will default to `not allowed`, so I won't
> > try to post anymore :) - be kind lol
> >   If I know Peter, and I don't lol, he'd be happy with some brainstorming
> I
> > think, right? :) then, what about the others?
> > Btw, if you feel like saying that I'm "allowed" would be too much of a
> > responsibility or taking it from others, then maybe say that you wouldn't
> > mind if I posted or not, or would make no difference to you. Though the
> > neo4j guys&girls (ie. devs) would probably know if `me posting on this
> topic
> > would be a good idea, for them and the users using this mailing list`.
> >
> > If you're wondering why would I do this, most importantly because it
> helps
> > me by typing my thoughts rather than just thinking them in my head, if I
> > don't type them I get easily distracted by other things and they end up
> > being postponed/abandoned. Expressing my thoughts by typing them seems to
> be
> > bridging both the physical and the mental in a way that they're both
> happy
> > to do this heh.
> >   And also this might be helpful to others reading this, unless they get
> > annoyed by my way of writing (which means both me and him are at fault,
> or
> > rather the cause of his annoyance) or they get annoyed for other reasons
> but
> > still triggered by them reading what I write.
> >  I am not good at writing or at programming for that matter, and I'm
> aware
> > of this, but I believe that expressing myself in this written form might
> > help (at least) me (and I hope not at the expense of others ie. like
> spam)
> > and will likely,along the way, trigger some progress in me, which if you
> ask
> > me, is in everyone's interest: the more people "evolve" the better is for
> > everyone, no? yes,good :)
> >
> >   No one is required (or expected of me) to read what I write, btw; but
> you
> > should know that my subconscious, for some reason, likes knowing that
> > someone did read and got beneficial results from it, ie. got something
> > positive rather than negative (though any change is progress, except ie.
> if
> > you make a system on top of that saying that ie. `counter` must increase
> for
> > it to be considered progressing, so then while any change is progress at
> the
> > lower level even if counter decreases, at this higher level, counter
> > decreasing is not considered progress anymore; but then again at an even
> > higher level, over time counter could be increasing by 10 then decreasing
> by
> > 10 such that it would seem to be oscillating, and this would be
> considered
> > no progress, rather it would be considered constant, unless the
> oscillation
> > amplitude would change or increase ie. counter would increase by 15 and
> then
> > decrease by 15 over time, this would be progress as considered at this
> > level).
> >   So while I am sort of waiting for a "good enough" reason to hack my own
> > subconscious and change it (assumed that it's possible, hey
> neuroplasticity
> > would say so heh) such that i wouldn't require expressing my thoughts in
> > writing or feeling empowered knowing that others are reading that, (while
> > that) I am going along with what seems to be the next feel-empowered
> > step...(kinda forgot what I wanted to say here xD)
> >   Also the subconscious(not just mine I'd say) likes to know that it did
> > something, sort of like has a foundation for allowing itself to feel
> > empowered, by having something done in the physical worlds that it is
> proud
> > of, can be used by it as a permission slip to allow itself to feel happy
> > about it or rather empowered; so in this respect this me writing my
> thoughts
> > here stuff also helps with that. :)
> >   There's also some inherent desire to share, so by writing here I am
> > sharing my thoughts, me... this also works well as a permission slip for
> my
> > subconscious :P I mean, hi :)
> >
> > Thing is, I am not yet sure which would be better suited for me to use,
> > neo4j or berkeleydb. I like neo4j's features especially the transactions,
> > though I do not understand/know its limitations but I'm not sure if I
> need
> > all its features some of them seem like overkill (ie. properties, I'd
> need
> > one though ie. key="name" but only for some (few) nodes), on the other
> hand
> > I like the simplicity of berkeleydb upon which I've built so far part of
> the
> > project (btw project is here: https://github.com/13th-floor/neo4john ).
> >
> > But the brainstorming would be a bit more general, I think, such as I'd
> be
> > stating and following my thoughts about what I'd like to have done and
> how
> > I'd do it. But eventually limiting them to java and the layers upon I
> want
> > to build them upon (ie. neo4j or bdb).
> >
> > I've been trying a bottom-up approach, with my project, basically I want
> the
> > lowest accessible level to be a Symbol or rather a Node as you'd say,
> such
> > that I would know if it's connected to anything or not. Rather than how
> RAM
> > is "working", ie. like an array.
> >
> > Typically I want to be able to know if any part of the system is
> connected
> > to any other parts with the purpose of exploring the system and
> > understanding how it works. Ok too generic, let's take an example:
> > probably a bad example but say you are now in your browser as I am here,
> and
> > I see these buttons in the page Send, Save Now, Discard and I decide to
> > investigate how they work, where is the text that is inside them, and if
> > that text is changed will the button enlarge automatically or is the text
> > going to be cut off due to button remaining the same size,
> > these kind of questions I should be able to answer when using a system
> built
> > on the `system that I want to design`(= my project).
> >   And also questions like, how do I add another button, by checking how
> > other buttons are added, but likely they would be part of a list which
> > currently has 3 children those buttons, but those 3 children are in fact
> > Nodes which identify those buttons, any other details related to the
> buttons
> > is deeper, either children of those nodes, or simply children of a parent
> > which refers to the identifying node and to a node that identifies the
> > details for that button node... ok this is way too vague so ignore it;
> >   Sure you'd say there is some plugin in firefox which can show me info
> > about those buttons and stuff, but that took time to made, I need that
> > accessibility now, not after I make the plugin, lucky that someone made
> it
> > right? :) but how many things you already want to be able to access in a
> > manner that both you and your computer understand but they are only
> > accessible to the computer, and that even in the way that allows the
> > computer to execute them but not necessarily understand them (if them
> > computers had intelligence)
> >
> > But seriously xD, my intent is to have accessibility. I'd probably need
> just
> > a simple 3D tool to parse nodes and keep track of them, temporarily,
> while
> > I'm studying some system.., in fact I could be studying the system that
> is
> > the programming for this 3D tool that I use to study it, such that I
> could
> > customize it on the fly (sure by mistakingly removing some node I could
> cut
> > my access, ie. removing the keyboard/mouse from inputs would render the
> 3D
> > tool idle and I couldn't undo that action, but safeguards for that could
> > always be implemented and even some feature could be non-modifiable, or a
> > backup 3D tool could take over if it was set as a safeguard while working
> on
> > the real one, ...)
> >
> > ================== some brainstorming or something:
> > What I have done so far, with berkeleydb:
> > Typically I wanted an easily identifiable entity, Symbol or we can call
> it
> > simply a Node
> > This Node is uniquely identifiable in the system, ie. by it's long (ie.
> java
> > Long) identifier,  this is how is identified inside berkeleydb (neo4j
> also
> > does this too, a long is used for an ID, as I deduced)
> > So at this lowest level, there can be 0 or many nodes, where no nodes can
> > actually be duplicate without actually meaning you're referring to the
> same
> > node
> > This would be say Level 0.
> > ------------------
> > At the next level, Level 1, two random nodes could be grouped together
> such
> > that, one of them would be first and the other would be second (or last),
> > in other words this seems like an ordered list of 2 elements (Nodes),
> which
> > you neo4j call a relationship.
> > Node A --> Node B
> > this differs from
> > Node B --> Node A
> > that is, they are two different relationships, clearly indicating which
> node
> > is first
> >
> >   Neo4j here makes a new element (previous only Node being one) called
> > Relationship, and assigns it a long aka indentifier, such that this
> > relationship is also uniquely identifiable, and associates the two nodes
> > with this relationship. btw, I don't know this for sure, but I am
> guessing
> > this is what neo4j does, that is the relationship ID is part of its
> storage,
> > and not just a construct made only for java methods to use
> >
> >    Anyway, what I did, with berkeleydb, I got two primary databases,
> which
> > are always only just key->value where you can lookup data by key and
> return
> > the value associated with it.
> > Since you can't lookup by value, I had to create that second primary
> > database (not secondary database because this implies symmetric
> key-value,
> > that is, for any key only one value would be associated with it, so then
> > A->B and A->C cannot exist, with primary databse and secondary database
> in
> > berkeleydb)
> >
> > forwardDB:
> > firstNode -> secondNode
> >
> > backwardDB:
> > secondNode -> firstNode
> >
> > that's the format of the databases' contents
> >
> > a key can exist multiple times, but it will be seen as one ie.
> > A->B
> >   ->D
> >   ->C
> > X->F
> >  ->B
> > Y->C
> >
> > so basically, a key with multiple values can be seen as a set with those
> > values
> >   the values are in no (user)defined order (though they are internally
> > stored by berkeleydb+settings as sorted for faster lookups: you can
> actually
> > say is A->C in the database, and it will return true if yes)
> >   The values can be iterated with a Cursor (bdb cursor) - though if I
> > remember well, you can't jump to the last value of that specific key
> (though
> > you can jump to the last value in the database); I wasn't too happy about
> > this lack of functionality but I lived :)
> >
> > so this was my way of storing relationships, and I would always use both
> > start and end node in order to identify a relationship
> >
> > I am not yet sure why I decided this way is better than having a
> > Relationship entity uniquely identifiable by its long and have its ID
> > associated with each of the nodes. I guess I never needed to refer to
> that
> > relationship more than once, and seemed like extraneous info to have
> > RelationshipID as a middle.
> > Not to mention that it wouldn't feel like the Node is the element
> anymore,
> > it would be both Node and Relationship; still, if it's really needed I
> will
> > consider it. But for now, I didn't believe it would be needed.
> >
> > ------------------
> > there is actually another side level, or parallel level with Level 0
> here,
> > such that from java, we need to refer to the same Node across program
> > restarts
> > ie. first run, creating unique node, we don't know what it's ID is going
> to
> > be, since ie. maybe  other nodes were present in the database
> > so on the next application restart how would we know to get the same node
> > that we created on the last run ?
> >
> > one particular case where this would not be needed at first, would be
> when
> > you know the database is empty and you specify the node ID to be created,
> > ie. create or get node with ID = 1038
> > this might work at first, but its a very bad idea to use raw IDs like
> that,
> > rather just let bdb give me the next available id ie. by using a Sequence
> >
> > So then, another layer was needed, associated a String1 with a NodeID,
> this
> > is similar to adding neo4j index with key="name" and value=String1
> > this String1 would then be considered like the name of the node, this has
> no
> > other purpose than allowing the in-java program to access the same node
> > across application restarts, there is no data supposed to be stored in
> it.
> > Thus, here, in berkeleydb, I made `a primary and a seconday` databases
> > such that, I can ask the questions:
> > 1) what is the name of the node with this NodeID ?
> > 2) what is the NodeID for the node with this given name ?
> > this is a HashMap acting database (formed of a primary and a secondary
> > database)
> > so it acts like a HashMap that doesn't accept nulls
> >
> > you can call this the NamingLevel :) that is, just in case I or you need
> to
> > refer to this later using just one word rather than a phrase.
> >
> > --------------
> > Level 2 here would be of course based on Level 1
> > so since we have that kind of grouping in Level 1, we can then treat a
> Node
> > as a set or 0 or more elements(which are Nodes)
> > we can think of a node as being a parent for a bunch of nodes, simply
> > because it is the first node (ie. key) in a bunch of relationships
> > and we can think of that same node as being a child in one of more
> > relationships, or being pointer to by those nodes, because it is the end
> > node(second node) of the relationship.
> >
> > So here we can pretty much define some java entities like Pointer and
> Set,
> > and make sure that in java they are limited to what they do,
> > ie. Pointer should be able to have 0 or 1 children, not more, and user
> > shouldn't be allowed to add more
> > but of course this doesn't stop a user to use the underlayingNode
> directly
> > and add children by using the Level 1  methods directly on the
> > underlayingNode (using this naming from graph-collections by Niels, seems
> > more easy to understand)
> >   Basically the Pointer class would be on top of a Node, the
> > underlayingNode;
> > the problem here is, that although we've defined a new level/layer here,
> the
> > limitations that it enforces can be bypassed by going directly to the
> level
> > below it, Level 1 that is, and easily invalidate this as a Pointer.
> >   So I am not yet sure here, how would I deal with this issue, would I do
> a
> > check if the Pointer is still valid whenever any of its methods are used
> ?
> > and if it's invalid (ie. it now has 3 children instead of the 0 or 1
> > limitation) then what do I do? throw an exception I guess?
> >  What about knowing when and what code did the invalidation ? ie. like
> put a
> > hook on that Node such that when the user tried to attach more than 1
> child
> > (just an example of invalidating the Pointer) then this hook would throw
> at
> > that point such that in stack trace the offending code would be detected.
> >  The funny thing is, that to build such a hook system (which is
> definitely a
> > must at higher levels higher than Level 2 currently) one must make use of
> > levels like the one I am trying to define here namely Pointer and Sets
> and
> > ordered lists even, so it would seem as if, I need to implement these
> levels
> > first, without hooks, and then when done with them, build on top of them
> the
> > hook system which makes use of them, and then when this hook system is
> > complete rebuild the Pointer and sets and ordered lists again as even
> higher
> > levels but which this time make use of the hook system and are able to
> > detect such code that tries to invalidate the Pointer by working at lower
> > levels directly (of course unless the lowest level is below the hook
> system
> > heh then, hmm, can't think... need to get there first :/ )
> >
> > a Set again, same thing, a node would be use to identify this set, that
> node
> > would be underlayingNode, but this Set thingy is just a wrapper in java
> on
> > top of a Node, basically it will just allow the java coder/user to treat
> a
> > node as a set from within java, but that node would still act as if it's
> > just a simple start node that is, a node having multiple outgoing
> > relationships:
> > A->B
> > A->C
> > A->E
> > A->D
> > (order of children is undefined btw, it is a set not an ordered list, not
> at
> > this level anyway)
> > so A would be a Set, and B,C,E,D are its elements
> > this is how it can be seen from java, or it can be seen as it is stored
> too,
> > just as groups of start and end nodes
> >
> > so this Level 2 thingy is more of a level in java rather than a real
> level
> > as Level 1 and Level 0 are
> >
> > Also, on this level Pointer to Domain and DomainSet can be defined,
> > DomainPointer: that is, a pointer would be allowed to `point-to / have` a
> > child which is a child of a certain Set D which is the domain
> > ie. P->A
> > D->{A,C,B,X}
> > P is the pointer, A is the pointee, and D is the domain, and notice A is
> > child of D
> > so DomainPointer would enforce having a pointee only from that domain D
> >
> > There are also some variants, like for Pointer and DomainPointer, are
> they
> > allowed to be null ie. have no children / point to nothing ?
> >
> > DomainSet is a Set who's children must be children of a domain D
> > DS->{A,X,C}
> > D->{A,C,B,X}
> >
> > also note, a set cannot contain duplicates, almost forgot to say that;
> > if duplicates are needed, an intermediary node and special format would
> be
> > expected from that node
> > ie.
> > S->h->A
> > where h is an intermediary node
> > if such a set is ever to be defined, this would probably only make sense
> > when having an ordered list - that is, having duplicate elements would
> make
> > sense in a list, ordered or not (unordered list or a list where you don't
> > care about the order can be based on an ordered list)
> > Ordered List would be defined differently, as a double linked list, with
> > head/tail, elementcapsule (aka entry as seen in graph-collections by
> Niels
> > which is here btw: https://github.com/peterneubauer/graph-collections )
> > which has next/prev and pointer to the real element; this is easy to
> support
> > duplicates
> >
> > But also in an ordered list, I should be able to specify if the list can
> > contain null elements, ie entry.element has no children; or that if it
> can
> > contain elements only from a specific domain D.
> >
> > Back to the set with intermediary nodes, I don't see the need for it, but
> if
> > needed, then there are 3 variants:
> > 1. set will always use intermediary nodes even if nodes are not dups
> > 2. set will only use intermediary nodes for dup nodes (never tried)
> > 3. set will never use intermediary nodes, which means no dups are
> supported
> > there may be other ways to define these, but more complex, ie. not using
> > intermediary nodes but storing a counter for duplicate nodes, somewhere
> else
> > (neo4j could easily add a counter property on the relationship itself
>  S->A
> > if A is twice in the set then that very relationship would contain a
> > counter=2; all other rels would either have no counter which means it's 1
> or
> > have counter set to 1)
> >
> > in case 1. there will always be expected to have an intermediary node, so
> > it's easy to know
> > in case 2. you don't really know if the node h ie. S->h->A is
> intermediary
> > node or it's the element itself
> > ie.
> > S->C
> > S->h->A
> > S->B
> > so, you can tag h with another node called
> AllIntermediariesForSetsWithDups
> > ie.
> > AllIntermediariesForSetsWithDups->h
> > AllIntermediariesForSetsWithDups->x
> > S->C
> > S->h->A
> > S->B
> > B->x->A
> > and make sure that all intermediary nodes are unique, ie. created on the
> fly
> > when S.add(A) is executed, and simply don't make a set which uses
> > intermediary nodes from other set as its elements. ie. C->h  when S->h->A
>  ,
> > 'cause this way, h in C->h can be considered an intermediary node
> > so to avoid this, you can also
> > AllSetsWithDups->S
> > AllSetsWithDups->B
> > AllSetsWithDups->C
> > such that, when adding an element C.add(h) you would solve this system of
> 2
> > equations:
> > { AllSetsWithDups->(X*)->h
> > { AllIntermediariesForSetsWithDups -> h
> > such that if you find an (X*) ==  ie. A in our case, then you'll know
> that
> > "h" is part of another set as its intermediary node, and thus avoid using
> it
> > ie. throw
> >
> > limitations are the way to define systems
> >
> > So since I don't yet intend to use these (I think), I will skip further
> > brainstorming on sets with intermediary nodes; but later intermediary
> nodes
> > will be used in some places.
> >
> > this Level 2 only makes sense, if defining these: Pointer,Set,
> > DomainPointer, DomainSet and their variants with allowing null  or not
> > requires them to store metadata about them in the "graph", such that
> > a Pointer could store a link/relation from a parent to it: ie.
> > AllPointers -> P1
> > so this way, you know that P1 is a pointer, otherwise only the java code
> > would know that P1 is a pointer, by checking its underlayingNode == P1
> > but this in itself can be seen as a Set, that is AllPointers is a set
> having
> > its elements identify pointers as defined in this Level 2 as Pointer
> > so this would mean I'd happily make use of Set which is also defined
> here,
> > but then if you realize, adding this kind of metadata sort of requires
> that
> > non-metadata Pointer/Set etc. be defined and based upon those
> > That is, is eventually DomainSet uses a Pointer to point to the Domain,
> and
> > make sure that data is known to all that are trying to identify what
> "type"
> > the node (underlayingNode of the DomainSet) is, without knowing anything
> > about it, by checking its parents...
> >  Say, if two java programs are using the same environment (which I'm not
> yet
> > sure how to implement due to the need for isolation/serializability ...
> > we'll see; tho bdb supports opening the same environment/databases from
> two
> > or more java programs at the same time with common caching even, and it
> is
> > embedded db), and one of them is just exploring, it should be able to
> tell
> > what this node is treated as: ie. DomainSet, DomainPointer, Pointer...
> > though this will happen at higher levels, such metadata be added that
> is...
> >
> > btw, this is a graphviz picture of how a DomainSet would look like if it
> had
> > in-graph metadata:
> >
> https://github.com/13th-floor/neo4john/raw/master/diagrams/level1%20potential%20domainset.png
> >
> > so it's not a bad idea to first have a TreatAs_X  where X==Pointer  for
> > example, classes in java, such that they be just wrappers on top of Level
> 1,
> > without any metadata stored in-graph, and then use these to define
> Pointer
> > and Set etc. with in-graph metadata as above. And they could just do a
> check
> > if itself is still valid on each method call, as to avoid or early warn
> when
> > they detected they are no longer valid (due to ie. user doing Level 1
> > changes directly)
> >
> > Though I am not very happy about the user's ability to use Level 1
> directly
> > and change/invalidate constructs defined at higher levels; even though at
> > about Level 6 we could define some hook/event layer which could prevent
> and
> > directly poinpoint java code blocks trying to do these kind of invalid
> > changes.
> >
> > where was I? forgot xD
> >
> > so far, in my project, these TreatAs wrappers for Pointer and Set etc.
> are
> > not doing any checks to see if they're still valid or not, so for example
> I
> > could add 10 children for a Pointer and it would not complain that it has
> to
> > have 0 or 1 (unless I added some asserts lately, I should recheck)
> >
> > -----------
> > anyway somewhere on Level 3, an ordered list would be defined
> > here I am thinking if I need one without metadata in-graph first , or I
> > don't need one,... so far I assumed I don't, and thus allowed the ordered
> > list to be defined fully in-graph, but the code for this is not yet part
> of
> > my project (it is part of the old project which I was trying to copy from
> > but without all the extraneous checks and generic mambo-jumbo)
> >
> > this is a graphviz picture of how an ordered list would look with
> in-graph
> > metadata:
> >
> https://github.com/13th-floor/neo4john/raw/master/diagrams/level4%20ordered%20list.png
> >
> > there is an extension to this, such that it would also store a set along
> > with it, for fast finds, that is, to check if a certain element exist in
> the
> > ordered list, without parsing it entirely, it would check if it's in the
> set
> > first, since checking Set->X  which are two nodes you imagine, is
> lightning
> > fast because bdb is doing this internally something like searchFindBoth()
> > method (unsure).
> >   Now since I am here, I was thinking then, how do I fast find the
> > ElementCapsule (aka `entry` if you understand it better in
> graph-collections
> > terms), considering I've just use Set->X and X is indeed part of the
> ordered
> > list,
> > without having to parse the list again;
> > if I remember right, in the old project's code, I wouldn't actually parse
> > the list, but instead I would do it the right way, that is:
> > a=count parents of X
> > b=(count children of X)*3 or something where this would yield how many
> > elements are in list really
> > if a is bigger ie. 1million and b is like 200 then it might be wise to
> > iterate the list, or not, considering bdb can find this in 0ms for 1
> > millions rels anyway
> > so let us consider then, parents of X... we need to basically parse
> bottom
> > up from the element X which is any node (not an elementcapsule) upward to
> > the node identifying the list
> > So something like solve this:
> > ourList->randomECnode3719->randomElementIdentifyingNode189->X
> > AllElementCapsules->randomECnode3719
> > AllElements of ElementCapsules -> randomElementIdentifyingNode189->X
> >
> > the unknown is those equations are:
> > randomECnode3719 and randomElementIdentifyingNode189
> > that is how they can be found.
> >
> > So in the picture, to get symbol2's ElementCapsule without parsing the
> > entire list,
> > first fine unique50 which is by solving this:
> > AllElements of ElementCapsules -> (x) -> symbol2
> > and (x) would be unique50, because that unique50 is a node that is only
> used
> > in this list as an `AllElements of ElementCapsules` and no other list
> uses
> > it for this pupose. And if some other layer needs to add a comment to it,
> ie
> > a phrase, it would point to it, ie. be its parent, such as:
> > AllComments->F
> > F->unique50
> > F->phrase1
> > AllComments'Phrases->phrase1
> > this way, unique50 has phrase1 node associated with it, which could point
> to
> > other nodes identifying words which eventually identify letters and
> numbers
> > and all this could be interpreted by some programcode and be displayed in
> > some screen in a way, or the 3D tool can use them and show that phrase
> above
> > unique50 when it's on the viewport (shown on screen)
> >
> > where was I? I forgot my name :)
> >
> > that would probably do for now I guess, that's to give you an idea about
> > what garbage I could talk about in my brainstorming sessions, if allowed
> to
> > keep posting here
> >
> > Cheerios,
> > John
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Brainstorming on my project: neo4john

Reply via email to