from:"Marko Rodriguez"

Re: Thoughts on the new concat() step

2023-08-16 Thread Marko Rodriguez

You guys are ruining the language with your approach to string manipulation. 
Look to math()-step to see how you should handle non-graph data structure 
manipulations.

Marko.

> On Aug 16, 2023, at 4:35 AM, Stephen Mallette  wrote:
> 
> I think the syntax should just be changed from concat(Traversal) to
> concat(Traversal...) which would let you work around this issue by using
> constant():
> 
> g.V(3).as('a').values('code').concat(constant(' is in '),
> select('a').values('city'))
> 
> This would address the symmetry problem with concat(String...) without
> having to open up concat(Object...) (at least not yet). I think format()
> could be a neat step and there was an open issue for it at one point:
> 
> https://issues.apache.org/jira/browse/TINKERPOP-2334
> 
> i'd closed it once i saw that dave had suggested these lower level string
> functions. i'd like to see the lower level functions added first and then
> talk about bringing back to the format() idea.
> 
> On Tue, Aug 15, 2023 at 7:54 PM Valentyn Kahamlyk
>  wrote:
> 
>> Maybe for such situations it is more organic to add a new  `format` step?
>> It will also be very useful in many other situations and be able to replace
>> the asString step.
>> ```
>> g.V(3).format("%s is in %s", values("code", "city")))
>> ```
>> 
>> or other example for modern graph
>> ```
>> g.V().hasLabel("person").format("Person %s is %s years old", values("name",
>> "age"))
>> ```
>> 
>> On Tue, Aug 15, 2023 at 8:24 AM Kelvin Lawrence <
>> kelvin.r.lawre...@gmail.com>
>> wrote:
>> 
>>> Playing with Gremlin 3.7, and looking at concat() I kind of wish this
>>> worked
>>> 
>>> g.V(3).as('a').values('code').concat(' is in ',
>> select('a').values('city'))
>>> 
>>> rather than having to do
>>> 
>>> g.V(3).as('a').values('code').concat(' is in
>>> ').concat(select('a').values('city'))
>>> 
>>> 2:58 
>>> we do allow
>>> 
>>> g.V(3).as('a').values('code').concat(' is in ', 'Austin')  //String...
>>> 
>>> so it feels a little unbalanced. I wish I had noticed this before
>>> but I guess the type signature would have to be object... almost for that
>>> to work. Perhaps there is a possible compromise where we could do
>> something
>>> using Traversal... and String...
>>> 
>>> --
>>> Cheers, Kelvin
>>> 
>>

Re: [DISCUSS] Draft ASF Board Report - April 2022

2022-04-14 Thread Marko Rodriguez

Yes, that was the point — to cross lines and to show there are no demons on the 
other side. I take social media very seriously. My “rolfing"-style seeks to 
snap the mental lock that keeps men fearful of words (utterances of sound or 
characters on blank pages). It is exactly that style that broke man's obsessive 
fixation on relational tables as the only way in which to understand data. And 
as much as it ‘upsets’ those entrenched in selling relational databases, the 
profiteers of an entrenched zeitgeist will do anything to keep men in fear.

Marko.

> On Apr 13, 2022, at 2:43 PM, Daniel Craig  wrote:
> 
> People will make their own opinions.  For me, your twitter posts crossed
> the line.
> 
> On 2022/04/13 08:33:29 Marko Rodriguez wrote:
> Thanks Florian.
> 
> It sucks, man. Really sucks. The Apache Board meddled in TinkerPop so as to
> appease some voracious mob of anonymous Twitter fools. I contend that all
> would have been averted had someone stood up and simply said:
> 
> "Comon guys. Yes, Marko is an odd fella with a twisted sense of humor, but
> you are not going to use that against him (and thus, our project) to
> satiate whatever you think you are going to gain by ruining his
> relationship with TinkerPop. I suggest you drop this and move on with your
> life.”
> 
> They are grown adults succumbing to childish “racist Nazi”-talk. Either
> they lack wisdom or they bask in the easy power they can wield by adopting
> the psychology of our sad, depressed, dopamine addicted adolescents. In
> your lives, I urge you to standup to this “cancel culture” scourge.
> Companies are being drained of talent, people fear their colleagues,
> disillusionment and isolation are the symptoms of those afraid to speak
> their mind. Your mind is all you have and every thought that passes through
> it is valid, true, for it _exists_ and that is all the reason you need to
> say: “I think, therefore I am right."
> 
> Take care y’all,
> Marko.
> 
> 
> 
> On Apr 13, 2022, at 1:57 AM, Florian Hockmann 
> wrote:
> 
> Here is the final report I submitted to the board. I changed the structure a
> bit as I've used the ASF Board Report Wizard for this which already suggests
> a structure for board reports. This tool by the way also mentioned that we
> should not simply include statistics like activity on mailing lists, but
> only interpretations of those stats ("the story behind the metrics") if we
> want to talk about them.
> 
> 
> 
> Stephen also informed me about gremlin-rs which has a very basic version of
> a GLV for Rust and I mentioned the Discord events we had as they were also
> in the last quarter.
> 
> 
> 
> --
> 
> 
> 
> ## Description:
> 
> Apache TinkerPop is a graph computing framework for both graph databases
> 
> (OLTP) and graph analytic systems (OLAP).
> 
> 
> 
> 
> 
> ## Issues:
> 
> There are no issues requiring board attention.
> 
> 
> 
> ## Membership Data:
> 
> Community changes, past quarter:
> 
> - No new PMC members. Last addition was Joshua Shinavier on 2021-06-01.
> 
> - Mike Personick was added as committer on 2022-03-17. He has already
> 
> contributed great improvements around core aspects of Gremlin.
> 
> 
> 
> Stephen Mallette has decided to leave the PMC to focus on other aspects of
> his
> 
> career. His contributions as a PMC member will be missed.
> 
> 
> 
> ## Project Activity:
> 
> TinkerPop just released 3.5.3 and 3.6.0. Version 3.5.3 is mostly a
> maintenance
> 
> release. 3.6.0 represents a major release with breaking changes and a
> variety
> 
> of new features [1], including support for regular expressions directly in
> 
> Gremlin and better support for commonly used upsert-like functionality. The
> 
> default logging implementation in the distributions of Gremlin Server and
> 
> Gremlin Console was also changed in 3.6.0 from log4j 1.2.x to logback due to
> 
> the vulnerability CVE-2019-17571 [2].
> 
> 
> 
> These releases are accompanied by the first pre-release versions of
> 
> gremlin-go, making Gremlin natively available in Go which has been the
> mostly
> 
> requested programming language for which we did not offer a Gremlin Language
> 
> Variant (GLV)[3] yet by users over the last years. Notable about this new
> GLV
> 
> is also that it has not been developed by a single contributor but by a
> group
> 
> of contributors who collaborated on this, an effort that was mostly led by
> 
> committer Lyndon Bauto.
> 
> 
> 
> 
> 
> ### Releases:
> 
> 3.5.3 was released on 2022-04-04.
>

Re: [DISCUSS] Draft ASF Board Report - April 2022

2022-04-13 Thread Marko Rodriguez

Thanks Florian.

It sucks, man. Really sucks. The Apache Board meddled in TinkerPop so as to 
appease some voracious mob of anonymous Twitter fools. I contend that all would 
have been averted had someone stood up and simply said:

"Comon guys. Yes, Marko is an odd fella with a twisted sense of humor, but you 
are not going to use that against him (and thus, our project) to satiate 
whatever you think you are going to gain by ruining his relationship with 
TinkerPop. I suggest you drop this and move on with your life.”

They are grown adults succumbing to childish “racist Nazi”-talk. Either they 
lack wisdom or they bask in the easy power they can wield by adopting the 
psychology of our sad, depressed, dopamine addicted adolescents. In your lives, 
I urge you to standup to this “cancel culture” scourge. Companies are being 
drained of talent, people fear their colleagues, disillusionment and isolation 
are the symptoms of those afraid to speak their mind. Your mind is all you have 
and every thought that passes through it is valid, true, for it _exists_ and 
that is all the reason you need to say: “I think, therefore I am right."

Take care y’all,
Marko.



> On Apr 13, 2022, at 1:57 AM, Florian Hockmann  
> wrote:
> 
> Here is the final report I submitted to the board. I changed the structure a
> bit as I've used the ASF Board Report Wizard for this which already suggests
> a structure for board reports. This tool by the way also mentioned that we
> should not simply include statistics like activity on mailing lists, but
> only interpretations of those stats ("the story behind the metrics") if we
> want to talk about them.
> 
> 
> 
> Stephen also informed me about gremlin-rs which has a very basic version of
> a GLV for Rust and I mentioned the Discord events we had as they were also
> in the last quarter.
> 
> 
> 
> --
> 
> 
> 
> ## Description:
> 
> Apache TinkerPop is a graph computing framework for both graph databases
> 
> (OLTP) and graph analytic systems (OLAP).
> 
> 
> 
> 
> 
> ## Issues:
> 
> There are no issues requiring board attention.
> 
> 
> 
> ## Membership Data:
> 
> Community changes, past quarter:
> 
> - No new PMC members. Last addition was Joshua Shinavier on 2021-06-01.
> 
> - Mike Personick was added as committer on 2022-03-17. He has already
> 
> contributed great improvements around core aspects of Gremlin.
> 
> 
> 
> Stephen Mallette has decided to leave the PMC to focus on other aspects of
> his
> 
> career. His contributions as a PMC member will be missed.
> 
> 
> 
> ## Project Activity:
> 
> TinkerPop just released 3.5.3 and 3.6.0. Version 3.5.3 is mostly a
> maintenance
> 
> release. 3.6.0 represents a major release with breaking changes and a
> variety
> 
> of new features [1], including support for regular expressions directly in
> 
> Gremlin and better support for commonly used upsert-like functionality. The
> 
> default logging implementation in the distributions of Gremlin Server and
> 
> Gremlin Console was also changed in 3.6.0 from log4j 1.2.x to logback due to
> 
> the vulnerability CVE-2019-17571 [2].
> 
> 
> 
> These releases are accompanied by the first pre-release versions of
> 
> gremlin-go, making Gremlin natively available in Go which has been the
> mostly
> 
> requested programming language for which we did not offer a Gremlin Language
> 
> Variant (GLV)[3] yet by users over the last years. Notable about this new
> GLV
> 
> is also that it has not been developed by a single contributor but by a
> group
> 
> of contributors who collaborated on this, an effort that was mostly led by
> 
> committer Lyndon Bauto.
> 
> 
> 
> 
> 
> ### Releases:
> 
> 3.5.3 was released on 2022-04-04.
> 
> 3.6.0 was released on 2022-04-04.
> 
> 3.4.13 was released on 2022-01-10.
> 
> 3.5.2 was released on 2022-04-04.
> 
> 
> 
> ## Community Health:
> 
> As already mentioned in the last board report, we are seeing growing
> activity
> 
> on our Discord server. We now had the first live events on Discord in the
> last
> 
> quarter where Arthur Bigeard, developer of the Gremlin IDE G.V() [4],
> performed
> 
> a live demonstration of G.V().
> 
> 
> 
> We've learned that gremlin-rs [5], which is a Gremlin Language Variant for
> the
> 
> Rust programming language, recently added support for some advanced
> 
> capabilities normally reserved for TinkerPop's official drivers. It is
> 
> interesting to note this growth in the wider TinkerPop community, as Rust,
> 
> after Go, is probably the next most requested programming language for
> 
> official support.
> 
> 
> 
> ## Links
> 
> [1] https://tinkerpop.apache.org/docs/3.6.0/upgrade/#_tinkerpop_3_6_0_2
> 
> [2] https://nvd.nist.gov/vuln/detail/CVE-2019-17571
> 
> [3]
> https://tinkerpop.apache.org/docs/3.5.2/reference/#gremlin-drivers-variants
> 
> [4] https://gdotv.com/
> 
> [5] https://github.com/wolf4ood/gremlin-rs
> 
> 
> 
> Von: Florian Hockmann  
> Gesendet:

Re: [DISCUSS] Draft ASF Board Report - April 2022

2022-04-08 Thread Marko Rodriguez

Understood.

For the record, as I had said previously, my intention is to make sure people 
know exactly what happened on TinkerPop both in terms of what the Apache Board 
did and what the Apache TinkerPop PMC didn’t do.

There are young, talented developers that are coming into themselves with 
bright ideas. If their work could be stripped from them for something as inane 
as posting a Photoshop picture of their pet chicken in WW2 regalia to a social 
media site (and under no Apache capacity), they need to know this. More 
generally, regardless of how “right” or “wrong” you believe some behavior is, 
realize it is possible that one day, how you conduct yourself may be considered 
“wrong” by the powers that be and Apache is structured in such a way that you 
can be separated from your work as they have inserted a “must follow social 
norms” clause into Apache’s statute (this came after me signing my codebase 
over to Apache). As a side, around 10 Apache members from various projects, 
including an Apache Board member, resigned from Apache for the weakness of 
their argument for why I should be removed from Apache TinkerPop (to which I 
still have not received a response to my questions regarding their decision).

Moreover, such “moral elitism” is being used where it suits individuals and 
institutions best. Prior to starting TinkerPop3, Stephen and I were courted by 
IBM’s Kelvin Lawrence (now on the TinkerPop PMC). IBM wanted TinkerPop in an 
OSS foundation such as Apache or Eclipse. Stephen and I thought this was a good 
idea and went through the process. IBM’s intentions were not pure. When at the 
point of getting into Incubation, Sam Ruby (IBM and Apache Board member) 
realized my character to be one that wasn’t going to be pushed around, they 
wanted me off the project — in essence, IBM wanted my codebase and foer me to 
disappear. So much so that Sam Ruby threatened me with physical violence and 
the Apache Board did nothing but stood there and watch. If you know me, you 
know there is no amount of space nor time that can distance you from a 
grievance of mine. And so, I stood my ground as I continue to do so to this 
day. Imagine if I didn’t have the strength of character to do so. TinkerPop 
would be a nothing project. The beauty that is TinkerPop3 comes from the muse 
that drives me and the precision and steadfastness that Stephen wields. That 
core is the foundation from which everything else attached and has come to be.

To conclude, it would be a disservice to young bright developers to withhold 
information about the organization and people they may be handing their life’s 
work over to. Apache is not what it says it is. This organization is corrupt. 
Over this last year, I’ve spent my time as a consultant explaining to people 
what OSS has become. In doing so, as I tell my story other Apache members past 
and present have reached out to me to tell similar stories (and some even more 
outrageous than mine) of what the Apache Board has done in the past and is 
currently doing now. This information is all being collated so the next 
generation can make an informed decision regarding the direction they take 
their software (their creative energy).

Marko.

> On Apr 6, 2022, at 4:52 PM, Stephen Mallette  wrote:
> 
> I imagine Florian omitted my leaving the PMC in this draft as he was
> waiting for it to become public knowledge. I expected to send an email
> about it after the release at which point the report could have been
> amended.
> 
> 
> 
> On Wed, Apr 6, 2022 at 12:05 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello dev@,
>> 
>> Stephen Mallette has stepped down from both the PMC Chair and now the PMC.
>> I believe this should be included in the report given the significance of
>> the event. Furthermore, it would be good to address why the two primary
>> developers of Apache TinkerPop (10+ years) are still active members in the
>> project, but are no longer on the PMC. Noting this is important for the
>> historical record of the project and more generally, so others who may look
>> to submit their work to Apache can have full knowledge of what ~2 years ago
>> would be considered unthinkable, but now has become manifest: an
>> institution becoming so degenerate that it would separate a man from his
>> work against the will of his colleagues.
>> 
>> Thank you,
>> Marko.
>> 
>> http://markorodriguez.com <http://markorodriguez.com/> 
>> <http://markorodriguez.com/ <http://markorodriguez.com/>>
>> 
>> 
>>> On Apr 6, 2022, at 3:53 AM, Florian Hockmann 
>> wrote:
>>> 
>>> Here is the attached draft of our board report for this quarter - please
>> let
>>> me know if there is anything to add or edit.
>>> 
>>> 
>

Re: [DISCUSS] Draft ASF Board Report - April 2022

2022-04-06 Thread Marko Rodriguez

Hello dev@,

Stephen Mallette has stepped down from both the PMC Chair and now the PMC. I 
believe this should be included in the report given the significance of the 
event. Furthermore, it would be good to address why the two primary developers 
of Apache TinkerPop (10+ years) are still active members in the project, but 
are no longer on the PMC. Noting this is important for the historical record of 
the project and more generally, so others who may look to submit their work to 
Apache can have full knowledge of what ~2 years ago would be considered 
unthinkable, but now has become manifest: an institution becoming so degenerate 
that it would separate a man from his work against the will of his colleagues.

Thank you,
Marko.

http://markorodriguez.com 


> On Apr 6, 2022, at 3:53 AM, Florian Hockmann  wrote:
> 
> Here is the attached draft of our board report for this quarter - please let
> me know if there is anything to add or edit.
> 
> 
> 
> --
> 
> 
> 
> ## Description:
> 
> Apache TinkerPop is a graph computing framework for both graph databases
> 
> (OLTP) and graph analytic systems (OLAP).
> 
> 
> 
> ## Activity:
> 
> TinkerPop is currently in the process of releasing 3.5.3 and 3.6.0. Version
> 3.5.3 is mostly a maintenance release. 3.6.0 represents a major release with
> breaking changes and a variety of new features, including support for
> regular expressions directly in Gremlin and better support for commonly used
> upsert-like functionality.
> 
> The default logging implementation in the distributions of Gremlin Server
> and Gremlin Console will also be changed in 3.6.0 from log4j 1.2.x to
> logback due to the vulnerability CVE-2019-17571 [1].
> 
> 
> 
> These releases will be accompanied by the first pre-release versions of
> gremlin-go, making Gremlin natively available in Go which has been the
> mostly requested programming language for which we did not offer a Gremlin
> Language Variant (GLV)[2] yet by users over the last years.
> 
> Notable about this new GLV is also that it has not been developed by a
> single contributor but by a group of contributors who collaborated on this,
> an effort that was mostly led by committer Lyndon Bauto.
> 
> 
> 
> We have welcomed Mike Personick as a new committer who has already
> contributed great improvements around core aspects of Gremlin.
> 
> 
> 
> ## Issues:
> 
> There are no issues requiring board attention at this time.
> 
> 
> 
> ## Releases:
> 
> - 3.4.13 (January 10, 2022)
> 
> - 3.5.2 (January 10, 2022)
> 
> 
> 
> ## PMC/Committer:
> 
> - Last PMC addition was Kelvin Lawrence/Josh Shinavier - June 2021
> 
> - Last committer addition was Mike Personick - March 2022
> 
> 
> 
> ## Links
> 
> [1] https://nvd.nist.gov/vuln/detail/CVE-2019-17571
> 
> [2]
> https://tinkerpop.apache.org/docs/3.5.2/reference/#gremlin-drivers-variants
>

Re: [DISCUSS] Removal of Marko A. Rodriguez from Apache TinkerPop

2022-01-10 Thread Marko Rodriguez

Open like source.

* Nothing racist is implied or intended by that turn of phrase (recursive 
Kleene closure on that).

Marko.

> On Jan 10, 2022, at 9:10 AM, Stephen Mallette  wrote:
> 
> Marko, would you be open to a one-on-one conversation with me over video
> chat?
> 
> On Fri, Jan 7, 2022 at 11:05 AM Marko Rodriguez 
> wrote:
> 
>> Thanks for the update. I opened up my email this morning to ping:
>> 
>>“Any updates? I can’t believe open source software has come to
>> this. I look back and think, once you took your recent corporate position
>> and then were put on the Apache Board, it turned a once great software
>> development team, never mired in politics, into secret backdoor discussions
>> of what I can only presume to be of the nature  “What does he know? More
>> than what he mentioned? **k that guy! We’re Apache — we're not going to be
>> held hostage by some Nazi racist!” Stephen, you would be surprised by who
>> has connected with me after hearing of Apache’s move to remove me from my
>> project. The people you have allied yourself with (for whatever reason) are
>> not of the caliber of person that I know you to be. I’ve known you for 15
>> years now, working closely in harmony over numerous companies, and while
>> I’m aware of your life stresses and what that can do to a man, I’m certain
>> you are not of their breed. Don’t sully your soul by remaining entangled
>> with an organization that was once the life blood of open source software
>> and now, given all I’ve seen with my situation with them and have come to
>> learn from other, also its death. Good luck to you, old friend — I will
>> continue to remain in my holding pattern.”
>> 
>> Take care,
>> Marko.
>> 
>> 
>>> On Jan 7, 2022, at 4:28 AM, Stephen Mallette 
>> wrote:
>>> 
>>> Hi Marko, I know this thread is a week old at this point. I just wanted
>> to
>>> let you know it's not being ignored. Thank you for your patience.
>>> 
>>> On Wed, Jan 5, 2022 at 3:24 PM Marko Rodriguez 
>> wrote:
>>> 
>>>> Understood.
>>>> 
>>>> Marko.
>>>> 
>>>>> On Jan 5, 2022, at 12:31 PM, Stephen Mallette 
>>>> wrote:
>>>>> 
>>>>> Please allow some more time for a reply as I've been away for the New
>>>> Years
>>>>> Eve weekend.
>>>>> 
>>>>> On Wed, Jan 5, 2022 at 12:50 PM Marko Rodriguez 
>>>>> wrote:
>>>>> 
>>>>>> Hey Stephen,
>>>>>> 
>>>>>> Any movement on what I presented below? Meaning, do you 1.) agree with
>>>>>> inconsistent application of the “violated social norms” clause and if
>> so
>>>>>> 2.) do you plan to argue my point ‘in good faith’ (meaning, the
>>>> following
>>>>>> sentiment resonates with you: "every person has skeletons in their
>>>> closet
>>>>>> so why I are we attacking Marko after contributing his PhD work to
>>>> Apache
>>>>>> and then spending over a decade developing it only to kick him off the
>>>>>> project for telling jokes on Twitter?”).
>>>>>> 
>>>>>> If you don’t agree, then please tell me so I can move forward on my
>>>> side.
>>>>>> 
>>>>>> Thank you very much,
>>>>>> Marko.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Dec 31, 2021, at 2:14 PM, Marko Rodriguez 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> As you all may know, I was recently removed from TinkerPop for the
>>>> crime
>>>>>> of “being a Nazi troll.” When arguing I’m not a Nazi, I was told I
>>>>>> “violated social norms.” Assuming I violated social norms, I inquired
>>>> as to
>>>>>> where such social norms are specified as I never signed anything when
>>>>>> providing TinkerPop to Apache that mentioned ’social norms'. Moreover,
>>>> if
>>>>>> the crime of violating social norms is in fact how Apache wishes to
>>>> judge
>>>>>> people for the sake of removal by committee, then I believe this
>> statute
>>>>>> should be applied fairly and equally. Thus, please review the
>> following
>>>>>> “social norm violations” made by peo

Re: [DISCUSS] Removal of Marko A. Rodriguez from Apache TinkerPop

2022-01-07 Thread Marko Rodriguez

Thanks for the update. I opened up my email this morning to ping:

“Any updates? I can’t believe open source software has come to this. I 
look back and think, once you took your recent corporate position and then were 
put on the Apache Board, it turned a once great software development team, 
never mired in politics, into secret backdoor discussions of what I can only 
presume to be of the nature  “What does he know? More than what he mentioned? 
**k that guy! We’re Apache — we're not going to be held hostage by some Nazi 
racist!” Stephen, you would be surprised by who has connected with me after 
hearing of Apache’s move to remove me from my project. The people you have 
allied yourself with (for whatever reason) are not of the caliber of person 
that I know you to be. I’ve known you for 15 years now, working closely in 
harmony over numerous companies, and while I’m aware of your life stresses and 
what that can do to a man, I’m certain you are not of their breed. Don’t sully 
your soul by remaining entangled with an organization that was once the life 
blood of open source software and now, given all I’ve seen with my situation 
with them and have come to learn from other, also its death. Good luck to you, 
old friend — I will continue to remain in my holding pattern.”

Take care,
Marko.

> On Jan 7, 2022, at 4:28 AM, Stephen Mallette  wrote:
> 
> Hi Marko, I know this thread is a week old at this point. I just wanted to
> let you know it's not being ignored. Thank you for your patience.
> 
> On Wed, Jan 5, 2022 at 3:24 PM Marko Rodriguez  wrote:
> 
>> Understood.
>> 
>> Marko.
>> 
>>> On Jan 5, 2022, at 12:31 PM, Stephen Mallette 
>> wrote:
>>> 
>>> Please allow some more time for a reply as I've been away for the New
>> Years
>>> Eve weekend.
>>> 
>>> On Wed, Jan 5, 2022 at 12:50 PM Marko Rodriguez 
>>> wrote:
>>> 
>>>> Hey Stephen,
>>>> 
>>>> Any movement on what I presented below? Meaning, do you 1.) agree with
>>>> inconsistent application of the “violated social norms” clause and if so
>>>> 2.) do you plan to argue my point ‘in good faith’ (meaning, the
>> following
>>>> sentiment resonates with you: "every person has skeletons in their
>> closet
>>>> so why I are we attacking Marko after contributing his PhD work to
>> Apache
>>>> and then spending over a decade developing it only to kick him off the
>>>> project for telling jokes on Twitter?”).
>>>> 
>>>> If you don’t agree, then please tell me so I can move forward on my
>> side.
>>>> 
>>>> Thank you very much,
>>>> Marko.
>>>> 
>>>> 
>>>> 
>>>>> On Dec 31, 2021, at 2:14 PM, Marko Rodriguez 
>>>> wrote:
>>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> As you all may know, I was recently removed from TinkerPop for the
>> crime
>>>> of “being a Nazi troll.” When arguing I’m not a Nazi, I was told I
>>>> “violated social norms.” Assuming I violated social norms, I inquired
>> as to
>>>> where such social norms are specified as I never signed anything when
>>>> providing TinkerPop to Apache that mentioned ’social norms'. Moreover,
>> if
>>>> the crime of violating social norms is in fact how Apache wishes to
>> judge
>>>> people for the sake of removal by committee, then I believe this statute
>>>> should be applied fairly and equally. Thus, please review the following
>>>> “social norm violations” made by people in Apache and on Apache
>> TinkerPop.
>>>> Given that social norms are not specified anywhere, I offer simply what
>> I
>>>> believe fall within this fuzzy category.
>>>>> 
>>>>> 1. Roy Fielding stating I’m a Nazi troll. When asked for evidence of me
>>>> being part of the Nazi party, none was presented. As far as I know, the
>>>> Nazi party dissolved post WW2 and seems to exist as a word use by modern
>>>> folk to remove people they dislike from their positions. The question:
>> is
>>>> libel a violation of social norms?
>>>>> 
>>>>> 2. Sam Ruby in the past had threatened me with physical violence. If
>>>> threat of violence is not breaking social norms then that seems like a
>>>> break from social norms in and of itself. Thus, was Sam Ruby removed
>> from
>>>> his position in Apache? The question: is threat of violence a violation
>> of
>>>> social norms?
>>>

Re: [DISCUSS] Removal of Marko A. Rodriguez from Apache TinkerPop

2022-01-05 Thread Marko Rodriguez

Understood.

Marko.

> On Jan 5, 2022, at 12:31 PM, Stephen Mallette  wrote:
> 
> Please allow some more time for a reply as I've been away for the New Years
> Eve weekend.
> 
> On Wed, Jan 5, 2022 at 12:50 PM Marko Rodriguez 
> wrote:
> 
>> Hey Stephen,
>> 
>> Any movement on what I presented below? Meaning, do you 1.) agree with
>> inconsistent application of the “violated social norms” clause and if so
>> 2.) do you plan to argue my point ‘in good faith’ (meaning, the following
>> sentiment resonates with you: "every person has skeletons in their closet
>> so why I are we attacking Marko after contributing his PhD work to Apache
>> and then spending over a decade developing it only to kick him off the
>> project for telling jokes on Twitter?”).
>> 
>> If you don’t agree, then please tell me so I can move forward on my side.
>> 
>> Thank you very much,
>> Marko.
>> 
>> 
>> 
>>> On Dec 31, 2021, at 2:14 PM, Marko Rodriguez 
>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> As you all may know, I was recently removed from TinkerPop for the crime
>> of “being a Nazi troll.” When arguing I’m not a Nazi, I was told I
>> “violated social norms.” Assuming I violated social norms, I inquired as to
>> where such social norms are specified as I never signed anything when
>> providing TinkerPop to Apache that mentioned ’social norms'. Moreover, if
>> the crime of violating social norms is in fact how Apache wishes to judge
>> people for the sake of removal by committee, then I believe this statute
>> should be applied fairly and equally. Thus, please review the following
>> “social norm violations” made by people in Apache and on Apache TinkerPop.
>> Given that social norms are not specified anywhere, I offer simply what I
>> believe fall within this fuzzy category.
>>> 
>>> 1. Roy Fielding stating I’m a Nazi troll. When asked for evidence of me
>> being part of the Nazi party, none was presented. As far as I know, the
>> Nazi party dissolved post WW2 and seems to exist as a word use by modern
>> folk to remove people they dislike from their positions. The question: is
>> libel a violation of social norms?
>>> 
>>> 2. Sam Ruby in the past had threatened me with physical violence. If
>> threat of violence is not breaking social norms then that seems like a
>> break from social norms in and of itself. Thus, was Sam Ruby removed from
>> his position in Apache? The question: is threat of violence a violation of
>> social norms?
>>> 
>>> 3. danielfb@ is the mysterious character that had access to our
>> private@tinkerpop mailing list and said that a picture I made in
>> photoshop of one of my chickens in WW2 regalia was “offensive” to him (I
>> assume ‘him' given the name ‘daniel’). My response was initially to joke
>> (as I do), but then continued with (I paraphrase) “let’s talk more as I
>> think you will find me to be a jokester.” That man was never heard from
>> again. The question: is allowing seemingly random people on our private
>> mailing list in order to entrap me a violation of social norms?
>>> 
>>> 4. Roy Fielding was unhappy with the fact that no one on the TinkerPop
>> PMC cared about danielfb@’s allegation of me being a racist. In fact,
>> Jorge said (I paraphrase) "that’s not racism, he’s just being silly.” He
>> went on to note organizations that Apache could get behind that help fight
>> racism — unfortunately, that fell on deaf ears. Instead, Roy Fielding went
>> ahead and ignored the PMC's brush off saying (I paraphrase) “I know you are
>> friends and its hard to punish people you’ve worked with.” This seemed odd
>> to me because the email prior I had said “no one ever stands up for me
>> because most people never understand the point I’m trying to make with my
>> craft.” (I consider much of the work I do ‘art’). Thus, Roy Fielding pushed
>> an agenda placing thoughts/emotions in colleagues that did not exist. The
>> question: is baiting the group so they do his 'dirty work' not a violation
>> of social norms?
>>> 
>>> 5. Stephen Mallette and I  have worked together for over a decade. It
>> came as a shock to me that he said nothing in favor of my person when I was
>> deemed a “racist” and a “nazi.” The question: is not standing up for a
>> friend who has been there for you for many years not a violation of social
>> norms?
>>> 
>>> 6. Stephen Mallette knows what I was “charged with” was just some social
>> ploy using the rhetoric of the times to restructure power

Re: [DISCUSS] Removal of Marko A. Rodriguez from Apache TinkerPop

2022-01-05 Thread Marko Rodriguez

Hey Stephen,

Any movement on what I presented below? Meaning, do you 1.) agree with 
inconsistent application of the “violated social norms” clause and if so 2.) do 
you plan to argue my point ‘in good faith’ (meaning, the following sentiment 
resonates with you: "every person has skeletons in their closet so why I are we 
attacking Marko after contributing his PhD work to Apache and then spending 
over a decade developing it only to kick him off the project for telling jokes 
on Twitter?”). 

If you don’t agree, then please tell me so I can move forward on my side.

Thank you very much,
Marko.



> On Dec 31, 2021, at 2:14 PM, Marko Rodriguez  wrote:
> 
> Hello everyone,
> 
> As you all may know, I was recently removed from TinkerPop for the crime of 
> “being a Nazi troll.” When arguing I’m not a Nazi, I was told I “violated 
> social norms.” Assuming I violated social norms, I inquired as to where such 
> social norms are specified as I never signed anything when providing 
> TinkerPop to Apache that mentioned ’social norms'. Moreover, if the crime of 
> violating social norms is in fact how Apache wishes to judge people for the 
> sake of removal by committee, then I believe this statute should be applied 
> fairly and equally. Thus, please review the following “social norm 
> violations” made by people in Apache and on Apache TinkerPop. Given that 
> social norms are not specified anywhere, I offer simply what I believe fall 
> within this fuzzy category.
> 
> 1. Roy Fielding stating I’m a Nazi troll. When asked for evidence of me being 
> part of the Nazi party, none was presented. As far as I know, the Nazi party 
> dissolved post WW2 and seems to exist as a word use by modern folk to remove 
> people they dislike from their positions. The question: is libel a violation 
> of social norms?
> 
> 2. Sam Ruby in the past had threatened me with physical violence. If threat 
> of violence is not breaking social norms then that seems like a break from 
> social norms in and of itself. Thus, was Sam Ruby removed from his position 
> in Apache? The question: is threat of violence a violation of social norms?
> 
> 3. danielfb@ is the mysterious character that had access to our 
> private@tinkerpop mailing list and said that a picture I made in photoshop of 
> one of my chickens in WW2 regalia was “offensive” to him (I assume ‘him' 
> given the name ‘daniel’). My response was initially to joke (as I do), but 
> then continued with (I paraphrase) “let’s talk more as I think you will find 
> me to be a jokester.” That man was never heard from again. The question: is 
> allowing seemingly random people on our private mailing list in order to 
> entrap me a violation of social norms?
> 
> 4. Roy Fielding was unhappy with the fact that no one on the TinkerPop PMC 
> cared about danielfb@’s allegation of me being a racist. In fact, Jorge said 
> (I paraphrase) "that’s not racism, he’s just being silly.” He went on to note 
> organizations that Apache could get behind that help fight racism — 
> unfortunately, that fell on deaf ears. Instead, Roy Fielding went ahead and 
> ignored the PMC's brush off saying (I paraphrase) “I know you are friends and 
> its hard to punish people you’ve worked with.” This seemed odd to me because 
> the email prior I had said “no one ever stands up for me because most people 
> never understand the point I’m trying to make with my craft.” (I consider 
> much of the work I do ‘art’). Thus, Roy Fielding pushed an agenda placing 
> thoughts/emotions in colleagues that did not exist. The question: is baiting 
> the group so they do his 'dirty work' not a violation of social norms?
> 
> 5. Stephen Mallette and I  have worked together for over a decade. It came as 
> a shock to me that he said nothing in favor of my person when I was deemed a 
> “racist” and a “nazi.” The question: is not standing up for a friend who has 
> been there for you for many years not a violation of social norms?
> 
> 6. Stephen Mallette knows what I was “charged with” was just some social ploy 
> using the rhetoric of the times to restructure power by removing those 
> individuals that don’t tow some party line which I was never made aware of. 
> While I assert these are whimsical and without merit, you know what real 
> charges you have against yourself, Stephen, and I won’t get into those, but I 
> believe you would feel much better (less social stressed) as a person if you 
> were to say: “letting organizations condemn people so they can steal prestige 
> or money from them is not right and I take my stand against it.” As such, the 
> question: when a person living in a glass house throws stones, is that not a 
> violation of social norms?
> 
> 7. Stephen Mallette knows very well the quality of Josh

[DISCUSS] Removal of Marko A. Rodriguez from Apache TinkerPop

2021-12-31 Thread Marko Rodriguez

Hello everyone,

As you all may know, I was recently removed from TinkerPop for the crime of 
“being a Nazi troll.” When arguing I’m not a Nazi, I was told I “violated 
social norms.” Assuming I violated social norms, I inquired as to where such 
social norms are specified as I never signed anything when providing TinkerPop 
to Apache that mentioned ’social norms'. Moreover, if the crime of violating 
social norms is in fact how Apache wishes to judge people for the sake of 
removal by committee, then I believe this statute should be applied fairly and 
equally. Thus, please review the following “social norm violations” made by 
people in Apache and on Apache TinkerPop. Given that social norms are not 
specified anywhere, I offer simply what I believe fall within this fuzzy 
category.

1. Roy Fielding stating I’m a Nazi troll. When asked for evidence of me being 
part of the Nazi party, none was presented. As far as I know, the Nazi party 
dissolved post WW2 and seems to exist as a word use by modern folk to remove 
people they dislike from their positions. The question: is libel a violation of 
social norms?

2. Sam Ruby in the past had threatened me with physical violence. If threat of 
violence is not breaking social norms then that seems like a break from social 
norms in and of itself. Thus, was Sam Ruby removed from his position in Apache? 
The question: is threat of violence a violation of social norms?

3. danielfb@ is the mysterious character that had access to our 
private@tinkerpop mailing list and said that a picture I made in photoshop of 
one of my chickens in WW2 regalia was “offensive” to him (I assume ‘him' given 
the name ‘daniel’). My response was initially to joke (as I do), but then 
continued with (I paraphrase) “let’s talk more as I think you will find me to 
be a jokester.” That man was never heard from again. The question: is allowing 
seemingly random people on our private mailing list in order to entrap me a 
violation of social norms?

4. Roy Fielding was unhappy with the fact that no one on the TinkerPop PMC 
cared about danielfb@’s allegation of me being a racist. In fact, Jorge said (I 
paraphrase) "that’s not racism, he’s just being silly.” He went on to note 
organizations that Apache could get behind that help fight racism — 
unfortunately, that fell on deaf ears. Instead, Roy Fielding went ahead and 
ignored the PMC's brush off saying (I paraphrase) “I know you are friends and 
its hard to punish people you’ve worked with.” This seemed odd to me because 
the email prior I had said “no one ever stands up for me because most people 
never understand the point I’m trying to make with my craft.” (I consider much 
of the work I do ‘art’). Thus, Roy Fielding pushed an agenda placing 
thoughts/emotions in colleagues that did not exist. The question: is baiting 
the group so they do his 'dirty work' not a violation of social norms?

5. Stephen Mallette and I  have worked together for over a decade. It came as a 
shock to me that he said nothing in favor of my person when I was deemed a 
“racist” and a “nazi.” The question: is not standing up for a friend who has 
been there for you for many years not a violation of social norms?

6. Stephen Mallette knows what I was “charged with” was just some social ploy 
using the rhetoric of the times to restructure power by removing those 
individuals that don’t tow some party line which I was never made aware of. 
While I assert these are whimsical and without merit, you know what real 
charges you have against yourself, Stephen, and I won’t get into those, but I 
believe you would feel much better (less social stressed) as a person if you 
were to say: “letting organizations condemn people so they can steal prestige 
or money from them is not right and I take my stand against it.” As such, the 
question: when a person living in a glass house throws stones, is that not a 
violation of social norms?

7. Stephen Mallette knows very well the quality of Josh Shinavier’s 
contributions and the hollowness of his promises as over the years we have 
joked many times about it. So why would he be put on the PMC right after I was 
removed as you and I both know he is a “do-nothing” (says but never does). Was 
this a way for you to slow down the project as for many years you have been 
pushing off TinkerPop4 for reasons I’m unsure of (aging? corporate force?). 
Regardless, the question: is using an unsuspecting (arguably socially inept) 
person as a pawn in a social game to secure an outcome for yourself not a 
violation of a social norm?

If the 7 points I made above are all considered legitimate behaviors that do 
not violate Apache’s unspecified “social norm” statute, then I believe this 
statute should be revised given the composition of this organization — in 
particular, lifelong programmers typically lack the sophisticated circuitry 
necessary to comprehend and thrive in socially nuanced environments. If the 
argument is that not having such

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-05 Thread Marko Rodriguez

ularly affect end users, there was consensus in the community to
> release sooner than later. These changes did include some minor enhancements
> as well. After 3.5.1 released, it was announced that JanusGraph became the
> first graph provider to support the 3.5.x release line.
> 
> Development on 3.4.13, 3.5.2 and 3.6.0 is all well underway and it would be
> likely that we'd see releases of at least 3.4.13 and 3.5.2 this year. It is
> also likely that we will be reaching the end of the 3.4.x line of
> maintenance.
> 
> We've recently become aware of two new TinkerPop implementations in the
> Tibco Graph Database[1] and ArcadeDB[2]. That brings the total number of
> graph systems supporting TinkerPop to thirty.
> 
> We've recently become aware that the Tibco Graph Database[1] implemented
> TinkerPop support a couple of years ago and that there is a new
> implementation of TinkerPop with ArcadeDB[2] that was recently announced.
> That brings the total number of graph systems supporting TinkerPop to
> thirty.
> 
> ## Issues:
> There are no issues requiring board attention at this time.
> 
> ## Releases:
> - 3.4.12 (July 19, 2021)
> - 3.5.1 (July 19, 2021)
> 
> ## PMC/Committer:
> - Last PMC addition was Kelvin Lawrence/Josh Shinavier - June 2021
> - Last committer addition was Øyvind Sæbø - March 2021
> 
> ## Links
> [1] https://www.tibco.com/products/tibco-graph-database/ 
> <https://www.tibco.com/products/tibco-graph-database/>
> [2] https://arcadedb.com/ <https://arcadedb.com/>
> 
> 
> 
> On Sun, Oct 3, 2021 at 10:59 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> This looks good, though I think we should add some items regarding project
>> leadership.
>> 
>> First off, Tibco has been a TinkerPop-enabled graph database for over 5
>> years now. So that is nothing new.
>> 
>> Next, we should alert the Apache Board about the lack of contributions by
>> recently elected PMC members. More generally, why is the project removing
>> contributing members and replacing them with non-contributing members? I
>> bring up Josh in particular. Of his performance of late, I’ve noted a
>> single "VOTE +1” for a .toString() pull request by Stephen. Given the
>> response time to the PR, there wasn’t even sufficient time for Josh to have
>> compiled and tested the PR. This goes counter to what Stephen was arguing
>> to me (Marko) earlier regarding why the PMC members were elected — they are
>> needed to test the code, not necessarily contribute code/documentation/blog
>> posts/academic articles/etc. So… what is the truth here? What’s going on
>> with the leadership of this project? I believe this project is losing the
>> meritocracy that Apache so holds dear for "nepotism" (not genetic nepotism,
>> but through corporate affiliation). However, if “nepotism" is the direction
>> Apache is going, then I think this should be made clear as it’s fraudulent
>> to be underhanded about the reasoning behind the decisions being made for
>> the project. Finally, this might also be the reasoning why I was removed
>> from the project given my lack of support for Amazon in the OSS community
>> [1].
>> 
>> Thank you Stephen for your efforts on TinkerPop. You are a shining star.
>> 
>> Marko.
>> 
>> [1]
>> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
>> <
>> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
>>  
>> <https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine>>
>> [slides 1-37]
>>- Please note that these slides are no longer indexed by Google.
>> All other project slides/articles etc. are.
>>- Unfortunate that large companies would be threatened by such
>> small individuals. Is this what is happening with TinkerPop?
>> 
>> 
>>> On Oct 1, 2021, at 4:13 PM, Stephen Mallette 
>> wrote:
>>> 
>>> Here is the attached draft of our board report for this quarter.
>>> 
>>> 
>> --
>>> 
>>> ## Description:
>>> Apache TinkerPop is a graph computing framework for both graph databases
>>> (OLTP) and graph analytic systems (OLAP).
>>> 
>>> ## Activity:
>>> TinkerPop released 3.4.12 and 3.5.1 on July 19, 2021. These releases
>> came a
>>> bit earlier than expected to address a bug implementers had encountered
>> in
>>> 3.5.0. While the bug had a relatively simple workaround a

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-03 Thread Marko Rodriguez

Then don’t. I simply requested Stephen to add my original comments to the ASF 
Board Report for October 2021.

Marko.

> On Oct 3, 2021, at 11:20 AM, Joshua Shinavier  wrote:
> 
> I don't care to continue such a nasty conversation on a public list.
> 
> Josh
> 
> 
> On Sun, Oct 3, 2021 at 9:50 AM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Josh — if you are going to come onto this project, put your name on it,
>> then you need to do something. You promised me for 2 years that you would
>> work on TInkerPop4. You haven’t. You lied. I have no respect for liars.
>> That is my problem with you. You are ineffectual at best, two-faced at
>> worst. You want people to respect you with your accolades, but you don’t do
>> anything to earn said accolades. People who seek the respect of others are
>> not leaders, they are weak souls who will, when the time is right, do what
>> is best for them.
>> 
>> Marko.
>> 
>> 
>> 
>>> On Oct 3, 2021, at 10:42 AM, Joshua Shinavier  wrote:
>>> 
>>> Marko, why you are so concerned with what I am doing or not doing is
>> beyond
>>> me. Likewise, you make vague accusations against Stephen on this list,
>>> before calling him a "shining star"? How "nepotism", exactly? I commented
>>> earlier that you had "self-cancelled" because with all of this behavior,
>>> you seemed to be daring the rest of the world to take offense / get
>> annoyed
>>> / shut you out. I saw your edgy Twitter posts as an exercise in free
>>> speech, and I was against your removal from the PMC as you well know. Bad
>>> things were bound to happen, though; you are making an ass of yourself in
>>> every public forum, and obviously that does not reflect well on you. If
>> you
>>> want others to respect you and follow you, quit trying to tear others
>> down,
>>> and get back to producing. I don't understand your rage against Amazon,
>> but
>>> your ideas around the mm-ADT "economic machine" seemed good -- why not
>>> execute on them.
>>> 
>>> Josh
>>> 
>>> 
>>> On Sun, Oct 3, 2021 at 7:59 AM Marko Rodriguez > <mailto:okramma...@gmail.com <mailto:okramma...@gmail.com>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> This looks good, though I think we should add some items regarding
>> project
>>>> leadership.
>>>> 
>>>> First off, Tibco has been a TinkerPop-enabled graph database for over 5
>>>> years now. So that is nothing new.
>>>> 
>>>> Next, we should alert the Apache Board about the lack of contributions
>> by
>>>> recently elected PMC members. More generally, why is the project
>> removing
>>>> contributing members and replacing them with non-contributing members? I
>>>> bring up Josh in particular. Of his performance of late, I’ve noted a
>>>> single "VOTE +1” for a .toString() pull request by Stephen. Given the
>>>> response time to the PR, there wasn’t even sufficient time for Josh to
>> have
>>>> compiled and tested the PR. This goes counter to what Stephen was
>> arguing
>>>> to me (Marko) earlier regarding why the PMC members were elected — they
>> are
>>>> needed to test the code, not necessarily contribute
>> code/documentation/blog
>>>> posts/academic articles/etc. So… what is the truth here? What’s going on
>>>> with the leadership of this project? I believe this project is losing
>> the
>>>> meritocracy that Apache so holds dear for "nepotism" (not genetic
>> nepotism,
>>>> but through corporate affiliation). However, if “nepotism" is the
>> direction
>>>> Apache is going, then I think this should be made clear as it’s
>> fraudulent
>>>> to be underhanded about the reasoning behind the decisions being made
>> for
>>>> the project. Finally, this might also be the reasoning why I was removed
>>>> from the project given my lack of support for Amazon in the OSS
>> community
>>>> [1].
>>>> 
>>>> Thank you Stephen for your efforts on TinkerPop. You are a shining star.
>>>> 
>>>> Marko.
>>>> 
>>>> [1]
>>>> 
>> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
>>  
>> <https://www

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-03 Thread Marko Rodriguez

Josh — if you are going to come onto this project, put your name on it, then 
you need to do something. You promised me for 2 years that you would work on 
TInkerPop4. You haven’t. You lied. I have no respect for liars. That is my 
problem with you. You are ineffectual at best, two-faced at worst. You want 
people to respect you with your accolades, but you don’t do anything to earn 
said accolades. People who seek the respect of others are not leaders, they are 
weak souls who will, when the time is right, do what is best for them.

Marko.



> On Oct 3, 2021, at 10:42 AM, Joshua Shinavier  wrote:
> 
> Marko, why you are so concerned with what I am doing or not doing is beyond
> me. Likewise, you make vague accusations against Stephen on this list,
> before calling him a "shining star"? How "nepotism", exactly? I commented
> earlier that you had "self-cancelled" because with all of this behavior,
> you seemed to be daring the rest of the world to take offense / get annoyed
> / shut you out. I saw your edgy Twitter posts as an exercise in free
> speech, and I was against your removal from the PMC as you well know. Bad
> things were bound to happen, though; you are making an ass of yourself in
> every public forum, and obviously that does not reflect well on you. If you
> want others to respect you and follow you, quit trying to tear others down,
> and get back to producing. I don't understand your rage against Amazon, but
> your ideas around the mm-ADT "economic machine" seemed good -- why not
> execute on them.
> 
> Josh
> 
> 
> On Sun, Oct 3, 2021 at 7:59 AM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Hello,
>> 
>> This looks good, though I think we should add some items regarding project
>> leadership.
>> 
>> First off, Tibco has been a TinkerPop-enabled graph database for over 5
>> years now. So that is nothing new.
>> 
>> Next, we should alert the Apache Board about the lack of contributions by
>> recently elected PMC members. More generally, why is the project removing
>> contributing members and replacing them with non-contributing members? I
>> bring up Josh in particular. Of his performance of late, I’ve noted a
>> single "VOTE +1” for a .toString() pull request by Stephen. Given the
>> response time to the PR, there wasn’t even sufficient time for Josh to have
>> compiled and tested the PR. This goes counter to what Stephen was arguing
>> to me (Marko) earlier regarding why the PMC members were elected — they are
>> needed to test the code, not necessarily contribute code/documentation/blog
>> posts/academic articles/etc. So… what is the truth here? What’s going on
>> with the leadership of this project? I believe this project is losing the
>> meritocracy that Apache so holds dear for "nepotism" (not genetic nepotism,
>> but through corporate affiliation). However, if “nepotism" is the direction
>> Apache is going, then I think this should be made clear as it’s fraudulent
>> to be underhanded about the reasoning behind the decisions being made for
>> the project. Finally, this might also be the reasoning why I was removed
>> from the project given my lack of support for Amazon in the OSS community
>> [1].
>> 
>> Thank you Stephen for your efforts on TinkerPop. You are a shining star.
>> 
>> Marko.
>> 
>> [1]
>> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
>> <
>> https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine
>>  
>> <https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine>>
>> [slides 1-37]
>>- Please note that these slides are no longer indexed by Google.
>> All other project slides/articles etc. are.
>>- Unfortunate that large companies would be threatened by such
>> small individuals. Is this what is happening with TinkerPop?
>> 
>> 
>>> On Oct 1, 2021, at 4:13 PM, Stephen Mallette 
>> wrote:
>>> 
>>> Here is the attached draft of our board report for this quarter.
>>> 
>>> 
>> --
>>> 
>>> ## Description:
>>> Apache TinkerPop is a graph computing framework for both graph databases
>>> (OLTP) and graph analytic systems (OLAP).
>>> 
>>> ## Activity:
>>> TinkerPop released 3.4.12 and 3.5.1 on July 19, 2021. These releases
>> came a
>>> bit earlier than expected to address a bug implementers had encountered
>> in
>>> 3.5.0. While the bug had a relatively sim

Re: [DISCUSS] ASF Board Draft Report - October 2021

2021-10-03 Thread Marko Rodriguez

Hello,

This looks good, though I think we should add some items regarding project 
leadership.

First off, Tibco has been a TinkerPop-enabled graph database for over 5 years 
now. So that is nothing new.

Next, we should alert the Apache Board about the lack of contributions by 
recently elected PMC members. More generally, why is the project removing 
contributing members and replacing them with non-contributing members? I bring 
up Josh in particular. Of his performance of late, I’ve noted a single "VOTE 
+1” for a .toString() pull request by Stephen. Given the response time to the 
PR, there wasn’t even sufficient time for Josh to have compiled and tested the 
PR. This goes counter to what Stephen was arguing to me (Marko) earlier 
regarding why the PMC members were elected — they are needed to test the code, 
not necessarily contribute code/documentation/blog posts/academic articles/etc. 
So… what is the truth here? What’s going on with the leadership of this 
project? I believe this project is losing the meritocracy that Apache so holds 
dear for "nepotism" (not genetic nepotism, but through corporate affiliation). 
However, if “nepotism" is the direction Apache is going, then I think this 
should be made clear as it’s fraudulent to be underhanded about the reasoning 
behind the decisions being made for the project. Finally, this might also be 
the reasoning why I was removed from the project given my lack of support for 
Amazon in the OSS community [1].

Thank you Stephen for your efforts on TinkerPop. You are a shining star.

Marko.

[1] 
https://www.slideshare.net/slidarko/mmadt-a-virtual-machinean-economic-machine 

 [slides 1-37]
- Please note that these slides are no longer indexed by Google. All 
other project slides/articles etc. are.
- Unfortunate that large companies would be threatened by such small 
individuals. Is this what is happening with TinkerPop?


> On Oct 1, 2021, at 4:13 PM, Stephen Mallette  wrote:
> 
> Here is the attached draft of our board report for this quarter.
> 
> --
> 
> ## Description:
> Apache TinkerPop is a graph computing framework for both graph databases
> (OLTP) and graph analytic systems (OLAP).
> 
> ## Activity:
> TinkerPop released 3.4.12 and 3.5.1 on July 19, 2021. These releases came a
> bit earlier than expected to address a bug implementers had encountered in
> 3.5.0. While the bug had a relatively simple workaround and did not
> particularly affect end users, there was consensus in the community to
> release sooner than later. These changes did include some minor enhancements
> as well. After 3.5.1 released, it was announced that JanusGraph became the
> first graph provider to support the 3.5.x release line.
> 
> Development on 3.4.13, 3.5.2 and 3.6.0 is all well underway and it would be
> likely that we'd see releases of at least 3.4.13 and 3.5.2 this year. It is
> also likely that we will be reaching the end of the 3.4.x line of
> maintenance.
> 
> We've recently become aware of two new TinkerPop implementations in the
> Tibco Graph Database[1] and ArcadeDB[2]. That brings the total number of
> graph systems supporting TinkerPop to thirty.
> 
> We are aware that our committer growth has been slow and are considering
> ideas to improve our ability to attract and retain folks.
> 
> ## Issues:
> There are no issues requiring board attention at this time.
> 
> ## Releases:
> - 3.4.12 (July 19, 2021)
> - 3.5.1 (July 19, 2021)
> 
> ## PMC/Committer:
> - Last PMC addition was Kelvin Lawrence/Josh Shinavier - June 2021
> - Last committer addition was Øyvind Sæbø - March 2021
> 
> ## Links
> [1] https://www.tibco.com/products/tibco-graph-database/
> [2] https://arcadedb.com/

Re: Anything I could do to help?

2021-09-16 Thread Marko Rodriguez

Yo,

> I agree that the PMC is not about management in the sense you are speaking,
> but nor is it a technical body...

Yes. Thank you, Mr. Robot, for explaining.

>> it seems you are doing little to attract talented contributors
> 
> That's a point we've not discussed enough. You've mentioned a number of
> reasons why you think we haven't attracted fresh talent. I'd wonder if
> there are others to consider as well.

So if in the last 3 years no one new and exciting has come along, why in the 
hell would you witch hunt me — especially for something so stopid as a 
picture of my chicken wearing a WW2 outfit? Are you people this weak minded 
that you find something this 'off the wall' ridiculous to be “hurtful” and 
“shameful”? You think that sort of mindset yields a fountain of creative 
thinking? Where do you think my ideas have come from all these years — acting 
like everyone else? Jesus man, grow up and smell the roses, you turn a blind 
eye at the meat factory.

> Do we have a clear direction for the
> future that new people can find a connection to and be excited about? Is
> the code base approachable for someone who is looking at it for the first
> time? Have the long maintenance cycles on release lines of recent years
> helped users but not excited nor enticed potential contributors hoping to
> be more on the forefront of bigger changes?

Over the last 3 years, I've seen GremlinServer connection 
pool/threading/configuration stuff and test suite refactoring as the big 
majors. Yes, getting your GremlinServer code dope is important for users, but 
test suites — users don’t care. What they want to see is “what is next?” How 
does graph take over as the predominate data structure ESPECIALLY against the 
pressure to abandon NoSQL as BigData falls to the wayside and competition in 
the market dries up. If I’m a user, I’m thinking: MySQL. Why deal with headache 
of a technology space where the alpha dogs got fed up and left, the monopolies 
are sitting on a heap of code they can’t find free labor to maintain/advance, 
and OSS organizations like Apache are twiddling their thumbs talking about 
‘racism’ and ’sexism’ instead of making Apache a breeding ground for novelty 
and intellectual excellence.

There is so much to be done on TinkerPop and much of those ideas were advanced 
in mm-ADT: dynamic runtime parameterization of pipeline arguments, a bytecode 
that can be reasoned on, a VM architecture that can support a wide swathe of 
execution engines (not just the outdated OLAP/OLTP distinction of TP3), a 
unification of the graph data structure processes with other data structures 
such as list, map, etc., a compiler that isn’t as fickle as (though wonderful 
for its time) strategies, …

Maintaining code is super important and I thank you for handling it while I was 
on sabbatical and developing these ideas in mm-ADT, but now the project is 
stagnating, idle, with an ominous “we seek and destroy ‘racists’!!!”-vibe 
hovering over it. No intelligent human being with a creative spirit will be 
inclined to support TinkerPop and that is why you are left adding “career men” 
to the PMC thinking more about their resume than actually doing real work. And 
that is a vicious downward spiral  — a whole bunch of chiefs and not enough 
Indians. <— OH NO! RACISM! AAAH!

Outz,
Marko.

> 
> 
> 
> On Sun, Sep 12, 2021 at 1:18 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Understood. However, it seems you are doing little (if not in willful
>> opposition) to attract talented contributors. The PMC has replaced
>> non-corporate (those unbridled by common thought) with corporate minded
>> individuals who boast about contributing but don’t or have contributed in
>> the past but have aged out of performing at that level. Now you may argue
>> that the PMC is about “management,” but can you really say that with an
>> honest face given how little the PMC actually did for all those years
>> (meaning private@ has maybe 5 non-VOTE email conversations on it)? Next,
>> when it is publicly known that Apache TinkerPop kicks off PMC members who
>> don’t live according to “corporate norms” (completely separate from their
>> role at TinkerPop), can you honestly say that this inspires potential
>> talent to risk contributing their time and energy only to be judge for who
>> they are and how they act in a world ruled by this inane concept of
>> ‘canceling’ that even your own PMC members (Josh) speak of nonchalantly as
>> if its a natural state of the human condition and not some aberration of
>> the fear and despair people feel as competition is being killed out of our
>> dying industry by ‘inclusive and diverse' organizations like Apache who
>> have forced you to enact mental gymnastics in order to demonize your own
>> teammat

Re: Anything I could do to help?

2021-09-12 Thread Marko Rodriguez

Understood. However, it seems you are doing little (if not in willful 
opposition) to attract talented contributors. The PMC has replaced 
non-corporate (those unbridled by common thought) with corporate minded 
individuals who boast about contributing but don’t or have contributed in the 
past but have aged out of performing at that level. Now you may argue that the 
PMC is about “management,” but can you really say that with an honest face 
given how little the PMC actually did for all those years (meaning private@ has 
maybe 5 non-VOTE email conversations on it)? Next, when it is publicly known 
that Apache TinkerPop kicks off PMC members who don’t live according to 
“corporate norms” (completely separate from their role at TinkerPop), can you 
honestly say that this inspires potential talent to risk contributing their 
time and energy only to be judge for who they are and how they act in a world 
ruled by this inane concept of ‘canceling’ that even your own PMC members 
(Josh) speak of nonchalantly as if its a natural state of the human condition 
and not some aberration of the fear and despair people feel as competition is 
being killed out of our dying industry by ‘inclusive and diverse' organizations 
like Apache who have forced you to enact mental gymnastics in order to demonize 
your own teammates? Do you honestly believe talent is found in this world you 
have positioned yourself in? Talent lives in the young, fresh faced rebels who 
created our industry in the first place and without the quirky blog posts, the 
thought provoking technological advances, and the triumph of beauty over 
conformity, you will not find talent, only the droning on of the nothingness 
that has becomes this once great project. A project within an organization that 
has gone completely against the doctrines of Apache by being exclusive, 
desirous of a monoculture meant to halt innovation and stagnate progress much 
like what such thinking did to the automobile industry of the olden 
generation... 

Thoughts?,
Marko.

> On Sep 10, 2021, at 4:01 PM, Stephen Mallette  wrote:
> 
> Marko, I agree with your assertion that the project needs innovation and
> talented contributors to continue to thrive. It needs that as much as it
> needs stability and reliability for the users who depend on it today.
> Obviously, things can't quite be as they were, but perhaps they can become
> something new.
> 
> On Tue, Sep 7, 2021 at 6:34 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Hi guys/gals,
>> 
>> Looks like it’s just been Stephen nick-nacking away again as it’s been the
>> last few years. Given the recent big turnover in management, I was hoping
>> to eat my own words and see some performance out of Josh, but unfortunately
>> as given the last 15+ years, 'talk and walk’ (which is even worse than
>> ‘commit and split’). Given that Amazon Neptune is including openCypher in
>> their distribution and with Neo4j just took in a whomping $300+ million in
>> a Series , seems Apache TinkerPop will be falling to the
>> wayside unless some real innovation happens.
>> 
>> As such, perhaps I could offer a helping hand given my intimate knowledge
>> of the codebase and my master of the theory and history of graph computing
>> that I helped formulate over the last 15 years. With that said, I
>> completely understand if y’all need to hold to the narrative that I’m a
>> “Nazi racist” and thus, unworthy of contributing (after all, the "Nazi
>> code" I wrote over a decade has proven how detrimental ‘racism’ has been to
>> the integrity of the software). However, on the other hand, if y’all have
>> moved past such trivial concepts of ‘good and evil’, perhaps we can get
>> TinkerPop movin' again.
>> 
>> Take care mein comrades,
>> Marko.
>> 
>> http://markorodriguez.com <http://markorodriguez.com/> 
>> <http://markorodriguez.com/ <http://markorodriguez.com/>>

Re: Anything I could do to help?

2021-09-08 Thread Marko Rodriguez

Self-cancel? What does that mean. I was going about my business and Apache gave 
some fellow ‘danielfb' access to our private@tinkerpop mailing list. He said he 
was offended by a picture I posted on Twitter where I dressed up one of my 
chickens in WW2 paraphernalia. What that had to do with Apache, I don’t know. 
However Stephen didn’t seem to stand up for his long time collaborator and just 
acted as if everything would go away (Stephen has always had a ‘ostrich head in 
the sand’-type of approach to life for as long as I’ve known him). Continuing, 
it then happened that ‘danielfb’ wasn’t able to get support for getting me 
kicked off from TinkerPop (no one on the PMC cared about some random goofy 
Twitter picture), so Apache decided to say “well if Marko’s colleagues aren’t 
going to say he is a Nazi racist, then we are!” and they decided to step over 
TinkerPop and have me removed from the project. Next, notice that the TinkerPop 
PMC has gone completely quiet. Where is Kuppitz? Where is The Baptist? … It’s 
just become Stephen idling about on the codebase that hasn’t seen any major 
innovation in years… Where is any of his help?

So, not self-cancel, but Apache-cancel.
And ‘crazy’ is how you want to call it, cause arguably you are the craziest one 
of us all with the life choices you’ve made that constantly leave you high and 
dry (Aspergers in a son-of-a-gun especially in this highly social world), but I 
digress — the danielfb situation seems crazy.

Since you are now on the TinkerPop PMC, could you inquire as to who this 
‘danielfb’ fellow is and why he was given access to our private mailing list? 
Moreover, can you inquire as to why a picture I posted on Twitter had anything 
to do with Apache? I asked these questions but I was ignored by the Apache 
Board. They were quick to remove my email address from all the internal mailing 
lists.

Getting to the bottom of these problems would be beneficial as there are some 
dangling issues that should be resolved to determine whether Apache is a 
legitimate OSS organization or a “committee by whim” pushing ideologies and 
making sure people behave according to some unwritten agendas. There was a 
fellow on gremlin-users@ that hinted at such failings of Apache.

I appreciate your time. Hope you will get around to accomplishing good work for 
TinkerPop,
Marko.

http://markorodriguez.com <http://markorodriguez.com/>

> On Sep 7, 2021, at 7:23 PM, Joshua Shinavier  wrote:
> 
> Marko, I doubt anyone thinks you're actually a Nazi racist. Why you chose
> to self-cancel like that, the world may never know, but you might have
> shown more consideration toward those who wanted to support you in spite of
> all the craziness. I don't see us working together so soon after these
> weird rants on the dev list, but I won't speak for anyone else. You're
> still a TinkerPop contributor. Go ahead and do something.
> 
> Josh
> 
> 
> 
> 
> 
> On Tue, Sep 7, 2021 at 3:34 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Hi guys/gals,
>> 
>> Looks like it’s just been Stephen nick-nacking away again as it’s been the
>> last few years. Given the recent big turnover in management, I was hoping
>> to eat my own words and see some performance out of Josh, but unfortunately
>> as given the last 15+ years, 'talk and walk’ (which is even worse than
>> ‘commit and split’). Given that Amazon Neptune is including openCypher in
>> their distribution and with Neo4j just took in a whomping $300+ million in
>> a Series , seems Apache TinkerPop will be falling to the
>> wayside unless some real innovation happens.
>> 
>> As such, perhaps I could offer a helping hand given my intimate knowledge
>> of the codebase and my master of the theory and history of graph computing
>> that I helped formulate over the last 15 years. With that said, I
>> completely understand if y’all need to hold to the narrative that I’m a
>> “Nazi racist” and thus, unworthy of contributing (after all, the "Nazi
>> code" I wrote over a decade has proven how detrimental ‘racism’ has been to
>> the integrity of the software). However, on the other hand, if y’all have
>> moved past such trivial concepts of ‘good and evil’, perhaps we can get
>> TinkerPop movin' again.
>> 
>> Take care mein comrades,
>> Marko.
>> 
>> http://markorodriguez.com <http://markorodriguez.com/> 
>> <http://markorodriguez.com/ <http://markorodriguez.com/>>

Anything I could do to help?

2021-09-07 Thread Marko Rodriguez

Hi guys/gals,

Looks like it’s just been Stephen nick-nacking away again as it’s been the last 
few years. Given the recent big turnover in management, I was hoping to eat my 
own words and see some performance out of Josh, but unfortunately as given the 
last 15+ years, 'talk and walk’ (which is even worse than ‘commit and split’). 
Given that Amazon Neptune is including openCypher in their distribution and 
with Neo4j just took in a whomping $300+ million in a Series , 
seems Apache TinkerPop will be falling to the wayside unless some real 
innovation happens.

As such, perhaps I could offer a helping hand given my intimate knowledge of 
the codebase and my master of the theory and history of graph computing that I 
helped formulate over the last 15 years. With that said, I completely 
understand if y’all need to hold to the narrative that I’m a “Nazi racist” and 
thus, unworthy of contributing (after all, the "Nazi code" I wrote over a 
decade has proven how detrimental ‘racism’ has been to the integrity of the 
software). However, on the other hand, if y’all have moved past such trivial 
concepts of ‘good and evil’, perhaps we can get TinkerPop movin' again.

Take care mein comrades,
Marko.

http://markorodriguez.com

Re: [TinkerPop] Welcome Josh Shinavier as a TinkerPop PMC member

2021-06-04 Thread Marko Rodriguez

Ha! Gotta get those stats up so the monthly reports look good, eh?

Marko.

> On Jun 4, 2021, at 2:12 PM, Stephen Mallette  wrote:
> 
> The TinkerPop PMC is pleased to announce that Josh Shinavier has accepted the 
> invitation to become a PMC member. Thanks, Josh, for your continued support 
> of the project and we are happy to have you here.
> 
> Best regards,
> 
> The TinkerPop PMC
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to gremlin-users+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/gremlin-users/CAA-H43_qsz_9fcosM_Bk-%3DwQU-CjF61iszrxpZ6fi%3D-jr8Pweg%40mail.gmail.com
>  
> .

Re: 3.5.0 Announcement Volunteers

2021-05-02 Thread Marko Rodriguez

Do it Josh. Do what you say you will. For once.

Crazy talk? It’s honesty. And the lack of it in this project is what is driving 
it to malaise. Stephen, when you were in meetings with Apache Board about me, 
did you tell them about you? Bohahaha! ….  $1000 says you didn’t. Eeek — hide, 
run, hide. When people stop being honest and start chatting with me offline in 
a completely different tone than online with a “bro, you know I have a 
family”-style, this is when you are sacrificing the project for your other 
objectives. This is when you are just trying to make things look good for your 
employers, reputation, ’the board’ …. Be honest with yourself and everyone. 
That is where the creative spark lies.

So, I ask you:

1. OLAP bulk write.
2. gremlin-spark/ DataFrames.
3. All traversal parameters determined dynamically by traversals.

Do you still uphold that this is not “enterprise” or “innovation is not 
needed”?  Or are you going to say what you know is true: “Yes, these are great 
features! Damn, I wish we had this.” And next, you know my terms. How would you 
accomplish this? I’m glad you asked.

1. Tell ’The Board’ what ’social norms’ you have violated.
2. Tell them to kick you off the PMC because arguably, they are way 
worse than me telling jokes online. 
3. Watch them squirm as their ‘moral code’ implodes.
4. Then we start doing some real work on TinkerPop.

Easy peasy lemon squeezy,
Marko.

> On May 1, 2021, at 11:35 PM, Joshua Shinavier  wrote:
> 
> Okiedokie. Lots to unpack there, but let me just say this one thing:
> Haskell rocks.
> 
> Anyway, I think there are good technical reasons for optimism about the
> current state of TinkerPop. If you are suggesting some alternative to
> incremental improvements, plus prototyping the more substantial changes,
> the specifics of it are not clear from your emails. Maybe Stephen has more
> context on a previous line of conversation than I do, but I don't see how
> the crazy talk is adding anything of value.
> 
> Josh
> 
> 
> 
> On Sat, May 1, 2021 at 6:55 PM Marko Rodriguez  wrote:
> 
>> Josh — You have been talking for over 2 years now about what you will
>> accomplish. 2 years ago you asked to be a committer. Do you remember what I
>> said? "You have to do something to be a committer." However, I felt for you
>> because you were looking for a job and I fool-heartedly vouched for you
>> thinking you wouldn’t dare cross me once more with your empty promises.
>> However, once you got your name on the TinkerPop webpage, what have you
>> done since except parade it around on resumes and the like? And some
>> internal Uber code in Haskell is not accomplishing anything for TinkerPop.
>> You fooled us with your promises and now you act (once again) as if you
>> will do something in the future. I’ve worked with you for 15 years now —
>> think about it 15 years as your CTO in one company, CEO in another, and
>> your advisor at LANL — and it all comes to not. You know it. I would love
>> for you to finally prove me wrong and finally grab the bull by the horns
>> and accomplish something of value instead of relegating Gremlin to “the
>> bastard child of Ripple” and living off the successes of others with your
>> name all proud front-and-center on the work created by the hands of other
>> men.
>> 
>> This is the point people. You all have learned how to talk and act, but
>> what have you done in the last 3 years that keeps this project burning
>> beyond the whims of your dying organizations and fading careers? To claim
>> we are now in ‘enterprise world’ or ‘I promise to do’ all the while
>> allowing those who did do stuff to be butchered like pigs in front of your
>> own eyes. Cowards.
>> 
>> Stephen — you dilly-dally. Kuppitz left. I left. Your great collaborators
>> faded away … laying in wait for truth once more. You have only so many
>> decisions left to make before you will not come back from the void you are
>> staring into. Nut up — as leader of this project, create the thriving
>> environment we once enjoyed. Don’t let your social and political fears trap
>> you in mediocrity. You are a hero. You will only come to this point again
>> and again and again in your lives to come. Why waste time? Slay the dragon
>> and let us feast on the magical meat of creation once more — as in the time
>> when our dining halls were not filled with lost bards and delirious jesters.
>> 
>> Marko.
>> 
>> 
>> 
>> 
>> 
>> 
>>> On May 1, 2021, at 10:30 AM, Joshua Shinavier  wrote:
>>> 
>>> I think a great way to lose developers, and not gain new ones, is to make
>>> negativ

Re: 3.5.0 Announcement Volunteers

2021-05-01 Thread Marko Rodriguez

Josh — You have been talking for over 2 years now about what you will 
accomplish. 2 years ago you asked to be a committer. Do you remember what I 
said? "You have to do something to be a committer." However, I felt for you 
because you were looking for a job and I fool-heartedly vouched for you 
thinking you wouldn’t dare cross me once more with your empty promises. 
However, once you got your name on the TinkerPop webpage, what have you done 
since except parade it around on resumes and the like? And some internal Uber 
code in Haskell is not accomplishing anything for TinkerPop. You fooled us with 
your promises and now you act (once again) as if you will do something in the 
future. I’ve worked with you for 15 years now — think about it 15 years as your 
CTO in one company, CEO in another, and your advisor at LANL — and it all comes 
to not. You know it. I would love for you to finally prove me wrong and finally 
grab the bull by the horns and accomplish something of value instead of 
relegating Gremlin to “the bastard child of Ripple” and living off the 
successes of others with your name all proud front-and-center on the work 
created by the hands of other men.

This is the point people. You all have learned how to talk and act, but what 
have you done in the last 3 years that keeps this project burning beyond the 
whims of your dying organizations and fading careers? To claim we are now in 
‘enterprise world’ or ‘I promise to do’ all the while allowing those who did do 
stuff to be butchered like pigs in front of your own eyes. Cowards.

Stephen — you dilly-dally. Kuppitz left. I left. Your great collaborators faded 
away … laying in wait for truth once more. You have only so many decisions left 
to make before you will not come back from the void you are staring into. Nut 
up — as leader of this project, create the thriving environment we once 
enjoyed. Don’t let your social and political fears trap you in mediocrity. You 
are a hero. You will only come to this point again and again and again in your 
lives to come. Why waste time? Slay the dragon and let us feast on the magical 
meat of creation once more — as in the time when our dining halls were not 
filled with lost bards and delirious jesters.

Marko.

> On May 1, 2021, at 10:30 AM, Joshua Shinavier  wrote:
> 
> I think a great way to lose developers, and not gain new ones, is to make
> negative comments on the dev and/or users list, even if they are only half
> serious. Or more than half serious? I can't tell. In any case, I think
> TinkerPop is in a good place, and would be surprised if you truly don't
> agree. There are Gremlin implementations almost everywhere there are graph
> databases. To my mind, the scaffolding stage of the project -- building the
> structure and filling the space -- is done. Now we have a chance to go back
> and make things truly robust. Formalizing the data model, formalizing the
> semantics of traversals in a way which adds power without subtracting
> versatility. Building better bridges between TinkerPop-compatible graphs
> and the rest of the world's data. Other, OLAPy and distrtibuted-systems-y
> things I haven't thought as much about, but which others have. I think some
> of the changes will require a clean break from the existing code base,
> hence a new major version, but others can follow more of a
> replace-and-deprecate pattern.
> 
> Josh
> 
> 
> 
> On Sat, May 1, 2021 at 8:54 AM Marko Rodriguez  wrote:
> 
>> Hello,
>> 
>>> not quite the topic for this thread but...
>> 
>> Oh but it is. Over the last 3 years there has been little done to advance
>> the 50% area of the codebase that I wrote — the virtual machine, OLAP, and
>> language.
>> 
>>1. Talking with DataBricks about gremlin-spark, it’s odd that
>> DataFrames hasn’t been adopted.
>>2. Why can’t OLAP do bulk writes/updates?
>>3. Why can’t every parameter in a traversal be determined by a
>> traversal?
>>4. …
>> 
>> The problem I see is that TinkerPop doesn’t have any developers anymore.
>> All the work is focused on GremlinServer because you know GremlinServer.
>> And you know very well that it is because there is a lack of talent on the
>> project and in order to make it all look as everything is going swell, you
>> say “maintenance mode”, “innovation is over,” “software has gone
>> enterprise.”
>> 
>>If this is so, then why is DataBricks having to rewrite
>> gremlin-spark/?
>>If this is so, then why has mm-ADT solved the parameter traversal
>> problem?
>> 
>> You can meander in muck of small changes into the indefinite future or you
>> can be a leader and get back the real team that knows how to build quality,
>> innovative software.

Re: 3.5.0 Announcement Volunteers

2021-05-01 Thread Marko Rodriguez

Hello,

> not quite the topic for this thread but...

Oh but it is. Over the last 3 years there has been little done to advance the 
50% area of the codebase that I wrote — the virtual machine, OLAP, and 
language. 

1. Talking with DataBricks about gremlin-spark, it’s odd that 
DataFrames hasn’t been adopted.
2. Why can’t OLAP do bulk writes/updates?
3. Why can’t every parameter in a traversal be determined by a 
traversal?
4. …

The problem I see is that TinkerPop doesn’t have any developers anymore. All 
the work is focused on GremlinServer because you know GremlinServer. And you 
know very well that it is because there is a lack of talent on the project and 
in order to make it all look as everything is going swell, you say “maintenance 
mode”, “innovation is over,” “software has gone enterprise.”

If this is so, then why is DataBricks having to rewrite gremlin-spark/?
If this is so, then why has mm-ADT solved the parameter traversal 
problem?

You can meander in muck of small changes into the indefinite future or you can 
be a leader and get back the real team that knows how to build quality, 
innovative software. It takes courage, it takes being forthright, it takes 
standing up for greatness.

Else you will be left with what you have in your GoogleDocs table… meandering 
insignificance.

Be a man, do what men do and a work towards re-manifesting the beauty that once 
was else you will regret it in your olden years. And that will be a sad state 
of affairs my old friend.

Marko.

> 
> On Fri, Apr 30, 2021 at 9:55 AM Marko Rodriguez 
> wrote:
> 
>> Hello mein freunden,
>> 
>> I’d love to contribute a body of work from mm-ADT that is one of the main
>> issues with the Gremlin language: every step should support pipeline
>> arguments (i.e., every argument can be a dynamically/traversal determined
>> value). I solved this problem in mm-ADT elegantly and efficiently. A
>> beautiful feature indeed.
>> 
> 
> yep - that remains an open problem with Gremlin. The limitation Java
> lambdas imposed on GLVs wasn't realized in those early days unfortunately.
> 
> 
>> ….unfortunately, Apache Board overruled the TinkerPop PMC and had me
>> forcefully removed from the PMC for being (how do you say in American
>> English?) “Nazi Troll.” If the Board is willing to look past the SS on my
>> uniform and put me back in my rightful place as Obergruppenführer of the
>> PMC, then we shall be unstoppable!
>> 
>> Those are my terms. Boohaha.
>> 
>> Marko.
>> 
>>> On Apr 30, 2021, at 7:34 AM, Stephen Mallette 
>> wrote:
>>> 
>>> Wow, this is great - lots of volunteers! Here's a running list of what we
>>> have so far:
>>> 
>>> * UnifiedChannelizer - Stephen
>>> * gremlin-language - Josh
>>> * Gremlin.Net - Florian
>>> * gremlin-python - Kelvin
>>> 
>>> There's definitely a lot more topics to tackle. Let's keep expanding the
>>> list.
>>> 
>>> 
>>> 
>>> On Fri, Apr 30, 2021 at 9:27 AM Kelvin Lawrence >> 
>>> wrote:
>>> 
>>>> I am happy to help. The area I have been closest too is probably the
>>>> enhancements to the Python client. I could write something around those
>>>> features.
>>>> 
>>>> Cheers, Kelvin
>>>> 
>>>>> On Apr 30, 2021, at 04:30, f...@florian-hockmann.de wrote:
>>>>> 
>>>>> I could write something for .NET. Added GraphBinary support and
>>>> switching the JSON library could be interesting for some Gremlin.Net
>> users.
>>>>> 
>>>>> -Ursprüngliche Nachricht-
>>>>> Von: Stephen Mallette 
>>>>> Gesendet: Donnerstag, 29. April 2021 21:32
>>>>> An: dev@tinkerpop.apache.org
>>>>> Betreff: Re: 3.5.0 Announcement Volunteers
>>>>> 
>>>>> Right now, I think it's fine for these to just have each person's
>>>> individual style - might make the posts more interesting assuming we
>> get a
>>>> few more volunteers. If you can come up with a neat image that could go
>>>> with a tweet to promote the announcement (that we will push through the
>>>> TinkerPop account), that would be cool. We've not really come up with
>>>> anything that sort of iconifies the gremlin-language module, so if you
>> feel
>>>> like thinking about that, that would be neat.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Thu, Apr 29, 2021 at 2:45 PM Joshua Shinavier 
>>>> wrote:
>>>>&g

Re: 3.5.0 Announcement Volunteers

2021-04-30 Thread Marko Rodriguez

Hello mein freunden,

I’d love to contribute a body of work from mm-ADT that is one of the main 
issues with the Gremlin language: every step should support pipeline arguments 
(i.e., every argument can be a dynamically/traversal determined value). I 
solved this problem in mm-ADT elegantly and efficiently. A beautiful feature 
indeed.

….unfortunately, Apache Board overruled the TinkerPop PMC and had me forcefully 
removed from the PMC for being (how do you say in American English?) “Nazi 
Troll.” If the Board is willing to look past the SS on my uniform and put me 
back in my rightful place as Obergruppenführer of the PMC, then we shall be 
unstoppable! 

Those are my terms. Boohaha.

Marko.

> On Apr 30, 2021, at 7:34 AM, Stephen Mallette  wrote:
> 
> Wow, this is great - lots of volunteers! Here's a running list of what we
> have so far:
> 
> * UnifiedChannelizer - Stephen
> * gremlin-language - Josh
> * Gremlin.Net - Florian
> * gremlin-python - Kelvin
> 
> There's definitely a lot more topics to tackle. Let's keep expanding the
> list.
> 
> 
> 
> On Fri, Apr 30, 2021 at 9:27 AM Kelvin Lawrence 
> wrote:
> 
>> I am happy to help. The area I have been closest too is probably the
>> enhancements to the Python client. I could write something around those
>> features.
>> 
>> Cheers, Kelvin
>> 
>>> On Apr 30, 2021, at 04:30, f...@florian-hockmann.de wrote:
>>> 
>>> I could write something for .NET. Added GraphBinary support and
>> switching the JSON library could be interesting for some Gremlin.Net users.
>>> 
>>> -Ursprüngliche Nachricht-
>>> Von: Stephen Mallette 
>>> Gesendet: Donnerstag, 29. April 2021 21:32
>>> An: dev@tinkerpop.apache.org
>>> Betreff: Re: 3.5.0 Announcement Volunteers
>>> 
>>> Right now, I think it's fine for these to just have each person's
>> individual style - might make the posts more interesting assuming we get a
>> few more volunteers. If you can come up with a neat image that could go
>> with a tweet to promote the announcement (that we will push through the
>> TinkerPop account), that would be cool. We've not really come up with
>> anything that sort of iconifies the gremlin-language module, so if you feel
>> like thinking about that, that would be neat.
>>> 
>>> 
>>> 
 On Thu, Apr 29, 2021 at 2:45 PM Joshua Shinavier 
>> wrote:
 
 Sounds good. I'll write the announcement. If you have thoughts on the
 format, please feel free to share.
 
 Josh
 
 On Thu, Apr 29, 2021 at 10:56 AM Stephen Mallette
 
 wrote:
 
> On Thu, Apr 29, 2021 at 1:38 PM Joshua Shinavier 
> wrote:
> 
>> I would be happy to collaborate on gremlin-language if there is
 something
>> which needs doing.
>> 
>> Josh
>> 
>> 
> great josh - thanks! The upgrade docs sorta tuck that feature away
> in the provider section
> 
> 
 https://tinkerpop.apache.org/docs/3.5.0-SNAPSHOT/upgrade/#_gremlin_lan
 guage
> 
> because at this point it doesn't have direct user impact, but i
> think it might be useful to the community to write something in an
> announcement
 that
> helps describe what this module lays the foundation for. you've had
> some interesting ideas in this area that i'm not sure have gotten
> outside of
 the
> dev list as of yet.
> 
 
>>> 
>> 
>>

Re: [DISCUSS] exp4j

2019-07-30 Thread Marko Rodriguez

I could look it up, but I could also just write this email.

Do you know what parser they use? If ANTLR, mm-ADT is happy. And if its 
licensed Apache2, maybe we gut it for parts. Too Fast Too Furious style.



Marko.

http://rredux.com <http://rredux.com/>




> On Jul 30, 2019, at 3:07 PM, Stephen Mallette  wrote:
> 
> Interesting - didn't expect that as an answer.
> 
> fwiw, exp4j makes adding new functions and operator really easy. also just
> a few lines of code.
> 
> On Tue, Jul 30, 2019 at 2:26 PM Daniel Kuppitz  wrote:
> 
>> I agree. Calculators are the Hello World of ANTLR, thus it will be pretty
>> easy to make our own lib, and it will be super easy to add new functions
>> (e.g. if someone asks for STDDEV and PERCENTILE, it's really just a few
>> lines of code for us).
>> From a user perspective, there would be no difference compared to what we
>> have now, everything would be string-based.
>> 
>> Cheers,
>> Daniel
>> 
>> 
>> On Tue, Jul 30, 2019 at 4:34 AM Marko Rodriguez 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I think we should create our own math library. We will need it for
>> mm-ADT,
>>> Kuppitz has the ANTLR chops down, …
>>> 
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/>
>>> 
>>> 
>>> 
>>> 
>>>> On Jul 30, 2019, at 5:31 AM, Stephen Mallette 
>>> wrote:
>>>> 
>>>> Kuppitz just answered a question on gremlin-users that involved math()
>>>> which is backed by exp4j. That made me recall that exp4j is technically
>>> not
>>>> maintained anymore. While it is a stable library it seems a bit
>> worrisome
>>>> that we're a bit dead-ended there. The README currently says that the
>>>> author is looking for volunteers to replace him and it's been that way
>>> for
>>>> a while.
>>>> 
>>>> I"m not sure what the alternatives are to exp4j and I imagine that
>>>> alternatives might come with expression syntax changes which wouldn't
>> be
>>>> good.
>>>> 
>>>> Anyone have any thoughts on this?
>>> 
>>> 
>>

Re: [DISCUSS] exp4j

2019-07-30 Thread Marko Rodriguez

Hi,

I think we should create our own math library. We will need it for mm-ADT, 
Kuppitz has the ANTLR chops down, …

Marko.

http://rredux.com 




> On Jul 30, 2019, at 5:31 AM, Stephen Mallette  wrote:
> 
> Kuppitz just answered a question on gremlin-users that involved math()
> which is backed by exp4j. That made me recall that exp4j is technically not
> maintained anymore. While it is a stable library it seems a bit worrisome
> that we're a bit dead-ended there. The README currently says that the
> author is looking for volunteers to replace him and it's been that way for
> a while.
> 
> I"m not sure what the alternatives are to exp4j and I imagine that
> alternatives might come with expression syntax changes which wouldn't be
> good.
> 
> Anyone have any thoughts on this?

Re: [DISCUSS] code freeze 3.3.8/3.4.3

2019-07-26 Thread Marko Rodriguez

My bio is choice.


> On Jul 26, 2019, at 8:40 AM, Florian Hockmann  
> wrote:
> 
> My bio is still up to date.
> 
> -Ursprüngliche Nachricht-
> Von: Stephen Mallette  
> Gesendet: Freitag, 26. Juli 2019 16:29
> An: dev@tinkerpop.apache.org
> Betreff: [DISCUSS] code freeze 3.3.8/3.4.3
> 
> Code freeze for the tp34 and tp33 branches goes into effect at close of 
> business today. The master branch remains open for business. We have a few 
> PRs that need reviews still:
> 
> https://github.com/apache/tinkerpop/pull/1165
> https://github.com/apache/tinkerpop/pull/1166
> https://github.com/apache/tinkerpop/pull/1167
> https://github.com/apache/tinkerpop/pull/1169
> 
> As usual, please use this thread for release related issues during this week. 
> If someone wants to pick up release manager duties please let me know.
> 
> Also note that, given our slightly revised policy around the Contributor 
> Listing we now call for updates to bios during code freeze week. So, for 
> committers and/or PMC members, your name is listed on the TinkerPop home page 
> in the Contributor List[1] with your "bio". If you are active on the project, 
> your "bio" reflects what you have been working on and what you expect to be 
> working on with respect to TinkerPop for recent times (i.e.
> for the previous six months and the following six months). If you are 
> currently inactive on the project, your "bio" reflects the full scope of all 
> your contributions throughout your active periods. You can refer to the 
> contributor listing policy[2] for full details.
> 
> Please take a moment to update your bio directly in Git[3] or, if you would 
> prefer, please reply to this post with your bio update and it will be added 
> for you. If no changes are required, please reply to this email to confirm 
> that this is the case.
> 
> [1] http://tinkerpop.apache.org/#contributors
> [2]
> http://tinkerpop.apache.org/docs/current/dev/developer/#contributor-listing
> [3]
> https://github.com/apache/tinkerpop/blob/master/docs/site/home/index.html
>

mm-ADT: A Multi-Model Abstract Data Type

2019-07-09 Thread Marko Rodriguez

Hello everyone,

Over the last few months, Kuppitz, Stephen, and I have been working on a 
database virtual machine specification called mm-ADT.

http://rredux.com/mm-adt/ 

This is our first public showing. While there is a large body of work here, it 
is still very rough. We have hit the wall with what we can achieve using reason 
alone. In order to push forward in certain areas, prototyping is required.

———

I see mm-ADT as the next generation of the work being done at Apache TinkerPop. 
The primary impetus for its development was the need to have a formal bytecode 
specification for TinkerPop3 in order to aid language designers and developers 
of non-JVM implementations of the GremlinVM. However, the work ran away with 
itself and much more was realized.

1. mm-ADT has an extensible type system.
2. mm-ADT works for graphs, tables, key/values, documents, etc.
3. mm-ADT approaches “strategies” in a completely different way via the 
crown jewel of mm-ADT: references.
4. mm-ADT processing is based on stream ring theory which provides a 
nice algebra for reasoning on maps, filters, flatmaps, etc.
5. mm-ADT supports user-defined schemas.
6. mm-ADT has both compile-time and runtime optimization techniques.

This specification provides a generally interesting approach to developing 
loosely coupled database systems. I see the need for such work in the open 
source world where storage systems, processors, and languages are being 
developed independently of one another. mm-ADT can serve as the integration 
point for these technologies.

We would love to hear your thoughts on the work. At this point, we are open to 
any ideas, directions, and the like.

Enjoy!,
Marko.

http://rredux.com

Re: mm-ADT to TinkerPop3

2019-06-14 Thread Marko Rodriguez

Hey,

> One thing I wonder at the moment which I don't think has come up in
> relation to mm-ADT discussion yet is DSLs. By every account, people are
> either using DSLs now or as soon as they learn about them, they immediately
> see the value and start to organize their code around them. So, any
> thoughts yet on how DSLs work under mm-ADT (in relation to TP3 and/or
> future) or is the model largely the same as what we do now?

mm-ADT is a bytecode specification. While we have a human readable/writable 
text representation (currently being called mm-ADT-bc), mm-ADT is primarily for 
machine consumption. Thus, when it comes to higher-level languages like Gremlin 
or a custom DSL, they would compile to mm-ADT bytecode.  Thus, if Gremlin 
compiles to mm-ADT, then all the Gremlin DSL infrastructure would just work as 
is. However, things can get a more interesting.

You can create derived types of arbitrary complexity in mm-ADT.

[define,person,[name:@string,age:@int,knows:@person*]]

From a DSL perspective, users can make their own objects. Look at the friends 
field. It is not container, but just zero or more person objects 
(sequence/stream). When this model is embedded in a graph database (and there 
are different ways to specify the embedding), those people could be referenced 
via a “knows"-edge.

As you can see, there is nothing “graph” here. No vertices, no edges… just a 
domain model.  But with mm-ADT-bc, you can create processes over that domain 
model and thus, traverse the “graph”:

[db][values,people]  // people is defined, I just don’t show it in this 
email
[has,name,eq,marko]
[values,knows]
[value,age]
[sum]

There is nothing pretty about mm-ADT-bc to a human user, but that is where DSLs 
would come in. Languages that make it easy to write mm-ADT-bc.

If Gremlin were the higher-level language, the following traversal would create 
the above bytecode:
g.V().has(‘person',‘name’,’marko’).out(‘knows’).values(‘age’).sum()

How do you see this being used from your perspective?

Marko.

http://rredux.com



> 
> 
> On Thu, Jun 13, 2019 at 6:25 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> Various stakeholders in Apache TinkerPop have been wondering weather
>> mm-ADT can be leveraged in TinkerPop3. While I originally planned for
>> mm-ADT to form the foundation of TinkerPop4, there are a subset of features
>> in mm-ADT that could really help TP3 moving forward. Here is a preliminary
>> outline of the mm-ADT features that could push the TP3 roadmap.
>> 
>>1. Type system: mm-ADT has a nominal type system for the built-in
>> types and a structural type system for all derived types. Bytecode
>> instructions that CRUD on database data can by statically typed and
>> reasoned on at compile time.
>> 
>>2. Strategies: mm-ADT has a completely different approach to query
>> optimization than TP3. While there are compile-time strategies for
>> manipulating a query into a semantically equivalent, though computationally
>> more efficient form, the concept of “provider strategies” (indices) goes
>> out the window in favor of reference graphs. The primary benefit of the
>> mm-ADT model is that the implementation for providers will be much simpler,
>> less error prone, doesn’t require custom instructions, and is able to
>> naturally capitalize on other internal provider optimizations such as
>> schemas, denormalizations, views, etc.
>> 
>>3. Instruction Set: mm-ADT’s instruction set is less adhoc than
>> TP3. Relational operators are polymorphic. Math operators are polymorphic.
>> Container (collection) operators are polymorphic. Unlike TP3, a “vertex” is
>> just a map like any other map. Thus, has(), value(), where(), select(),
>> etc. operate across all such derivations. Moreover, mm-ADT’s instruction
>> set greatly reduces the number of ways in which an expression can be
>> represented, relying primarily on reference graphs (see #2 above) as the
>> means of optimization. This should help limit the degrees of freedom in the
>> Gremlin language and reduce its apparent complexity to newcomers.
>> 
>>4. References: mm-ADT introduces references (pointers) as
>> first-class citizens. References form one of the primary data types in
>> mm-ADT with numerous usages including:
>>* Query planning. (providers exposing secondary data
>> access paths via reference graphs -- see #2 above)
>>* Modeling complex objects. (will not come into play given
>> TP3’s central focus on the property graph data type).
>>* Bytecode arguments. (nested bytecode are dynamic
>> references and every instruc

mm-ADT to TinkerPop3

2019-06-13 Thread Marko Rodriguez

Hello,

Various stakeholders in Apache TinkerPop have been wondering weather mm-ADT can 
be leveraged in TinkerPop3. While I originally planned for mm-ADT to form the 
foundation of TinkerPop4, there are a subset of features in mm-ADT that could 
really help TP3 moving forward. Here is a preliminary outline of the mm-ADT 
features that could push the TP3 roadmap.

1. Type system: mm-ADT has a nominal type system for the built-in types 
and a structural type system for all derived types. Bytecode instructions that 
CRUD on database data can by statically typed and reasoned on at compile time.

2. Strategies: mm-ADT has a completely different approach to query 
optimization than TP3. While there are compile-time strategies for manipulating 
a query into a semantically equivalent, though computationally more efficient 
form, the concept of “provider strategies” (indices) goes out the window in 
favor of reference graphs. The primary benefit of the mm-ADT model is that the 
implementation for providers will be much simpler, less error prone, doesn’t 
require custom instructions, and is able to naturally capitalize on other 
internal provider optimizations such as schemas, denormalizations, views, etc.

3. Instruction Set: mm-ADT’s instruction set is less adhoc than TP3. 
Relational operators are polymorphic. Math operators are polymorphic. Container 
(collection) operators are polymorphic. Unlike TP3, a “vertex” is just a map 
like any other map. Thus, has(), value(), where(), select(), etc. operate 
across all such derivations. Moreover, mm-ADT’s instruction set greatly reduces 
the number of ways in which an expression can be represented, relying primarily 
on reference graphs (see #2 above) as the means of optimization. This should 
help limit the degrees of freedom in the Gremlin language and reduce its 
apparent complexity to newcomers.

4. References: mm-ADT introduces references (pointers) as first-class 
citizens. References form one of the primary data types in mm-ADT with numerous 
usages including:
* Query planning. (providers exposing secondary data access 
paths via reference graphs -- see #2 above)
* Modeling complex objects. (will not come into play given 
TP3’s central focus on the property graph data type).
* Bytecode arguments. (nested bytecode are dynamic references 
and every instruction’s arguments can take references (even the opcode 
itself!)).
* Remote proxies. (TP3 detached vertices are awkward and 
limiting in comparison to mm-ADT proxy references).
* Schemas. (will probably not come into play, but “person” 
vertices are possible in mm-ADT. Thus, if TP3 wants to introduce graph schemas, 
mm-ADT provides the functionality).

I’ll leave it at that for now. Any questions, please ask.

Take care,
Marko.

http://rredux.com

I believe mm-ADT needs the concept of a sequence.

2019-06-05 Thread Marko Rodriguez

Hello,

I think that mm-ADT has new data type.

sequence: zero or more objects.

…where the two collections are a type of sequence:

list: an integer-based index of a sequence.
map: an object-based index of a sequence.

Why does a sequence exist? It exists because we don’t have a name for this 
anonymous type:

@object*

The above type/pattern is one of two things:

1. A multi-referent reference.
2. A sequence.

Given that references are implicitly dereferenced, I don’t think we should mix 
references in our patterns as references ultimately are the thing they are 
referencing. 

As such @object* is a “sequence.” Its not a list, because a list is typed as:

[@object*]

And what is that really saying? — its saying that a list is container of a 
sequence. What is the nature of that container? integer-based indexing.

A sequence is like a “structureless list.” Why? Because these are the only 
mutation instructions that work on it:

1. [append,@object]
2. [delete,@object]

You can’t [insert] because there is no way to say where to insert — there are 
no indices.

Finally, if “sequence" is the name of @object*, then I think a reference can be 
redefined to only allow one-to-one linking.

[db][,people]

This doesn’t return a reference to zero or more person objects. No, it returns 
a reference to a single sequence. Likewise, with respects to indexing:

-> [has,name,eq,$x.@string] => *

We have a reference to zero or more persons — a sequence of persons.

Finally, this also means that the following types are subtypes of sequence as 
they are all subtypes of @object*.

@object{2}
@object?
@object{5,100}
@object+
@object{0}
@object

With respect to the last type, this means that every denoted thing is a 
sequence. 

‘hello'

That is a sequence of one element that is a string — i.e. @string. Or, more 
specifically, @‘hello’.

This then means that [append] and [delete] can be applied everywhere that 
supports 0 or >1 quantification.

[constant,’hello’]   // ‘hello'
[append,42]  // ‘hello’ 42
[append,’marko’] // ‘hello’ 42 ‘marko'
[delete,’hello’] // 42 ‘marko'

What scares me is that we don’t have a way of representing instances of a 
multi-element sequence in our language. The below is ambiguous with sequences 
of containers. 

[name:marko, projects:tinkerpop,lop,gremlin]

Maybe we do:

[name:marko, projects:]

…where

[name:marko]
<=>
[name:]

…or perhaps:

[name:marko, projects:(tinkerpop,lop,gremlin)]

Its a pattern match group.

Finally, realize that this gets at a problem that I can’t seem to solve without 
the concept of a “sequence” (a containerless collection). And if you say that 
“sequences” are just lists, then bytecode will be a nasty tangle of [unfold] 
instructions — and overly complex reference graphs.

Thoughts?,
Marko.

http://rredux.com

mm-ADT breakthrough regarding writes and partial data loading

2019-05-29 Thread Marko Rodriguez

Hi,

*** This email is primarily for Kuppitz (and Stephen might appreciate the 
general idea — especially the last part). ***

We have 3 types of containers in mm-ADT.

1. map-> [:]
2. list   -> [ ]
3. ?sequence? -> .*

This #3 thing doesn’t have a name, but its what is iterated by a reference — 
its “the referents.” However, we have been using this “thing” extensively as it 
is how we model the following database structures:

Graph: [db][values,V]  => *
RDBMS: [db][values,people] => *
RDF:   [db][triples]   => *

I’ve been banging my head against the wall all day trying to figure out how to 
write to these #3 things! That is, how will we do:

Graph: add/update/delete vertex
RDBMS: add/update/delete row
RDF:   add/delete statement
...

While I was gardening this afternoon, it struck me! 

* is exactly what it says it is — a reference to zero or more 
person objects.

This isn’t the table! No, its a cursor to the head of a 
sequence/stream/iterator. Its not a “container” — its transient!  A result set.

So then what is the people-table? Its a list! 

Here is the full bytecode to create an mm-ADT RDBMS.

[define,row,map,[(@string,@object)*]]
[define,table,list,[@row*]]
[define,db,map,[(@string:@table)*]]

1. a row extends map with the constraint that all keys must be strings.
2. a table extends list with the constraint that all elements must be rows.
3. a db extends map with the constraint that all keys are strings (table name) 
with values that are tables.

So that is the meta-model. What about a particular instance of an RDBMS? In 
bytecode, here is how you define the people-table and the person-object. 

[define,person,row,[name:@string,age:@int]]
[define,people,table,[@person*]]

Thus, people-table is a list of zero or more person-rows. Now lets CREATE TABLE 
and put me in it: 

[db][insert,[create,people,[[[create,person,[name:marko,age:29]]

##

Okay. So here is where the fun beings… What is the output signature of the 
following bytecode?

[db][value,people]

Its:

@people

"Whoa! Chill out there cowboy?! So are you saying that when you access the 
people-table, you get back the entire list representation of the table? That 
could be an insane amount of data!?”

To that I say — "yes, you are right, the above bytecode will do that. Its a 
very dangerous piece of bytecode." However, the bytecode below does something 
different…and this is what is going to open a vast new territory for the mm-ADT 
spec:

[db][,people] => 

[] is a “get by reference” where [value] is a “get by value”. Why is this 
cool? Well, according to the mm-ADT specification, operations on a reference 
MUST be semantically equivalent to operations on the object itself. Thus, we 
can now read/write/delete the table like any old list!

[db][,people][insert,[create,person,[name:josh,age:32]]]

Tada! Appending to the reference will push the append to the database with the 
only data transfer being the  reference (cheap) fetched and the 
peson-object pushed (basically INSERT INTO people (josh,32)).

 @stephen: this is where you will get excited. 

[db][,people][has,name,eq,marko]
=> person[name:marko,age:29]

As expected. I got the person-row with name:marko. All this ‘by value’.

Now, let me formally define this new [] instruction:

opcode   : 
arguments: @object,@pattern? 

The optional pattern says: “what aspects of the referents do you want by 
value?” In other words, want do you want to know for certain in the referent 
pattern (it all connects!).

[db][,people,@person[age:@int]][has,name,eq,marko]
=> person[name:,age:29]

And there you have it. A half populated object. With the @person[age:@int] 
pattern, we are saying, I only want the age-property of the person object by 
value. Everything else, by reference. Thus, if we actually want the name, well, 
we have the reference to go and get it (which is a database call), but if our 
bytecode optimizer is good and we never use the name, well, we only grab the 
data we need! 

You may be thinking, but don’t we need the name for the next instruction: 
[has,NAME,eq,marko]?!  To that I say: “You poor fool. Realize that we have a 
 and if there is an index on the people table, well 
[has,name,@cop,@string] is an instruction pattern and thus, we are in the 
reference graph! And if we don’t have an index on the people-table, then YES, 
we will need to pull the name by value cause we will be filtering in the 
processor, not the storage system. Its all so self-consistent I can barely 
contain my bowels.

Three problems solved with []

1. ?sequences? are no longer these weird anonymous data structures. 
They are cursors in the classic database sense.
2. The same instructions we use for manipulating containers can 
manipulate ‘remote’ containers. (referenced containers)
3. We can selectively populate only portions of an object with all the 
remaining

Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez

Hey,

> OK. I see the "referent" concept is broader than I had thought. They are
> not just pointers, but (paraphrasing) expressions awaiting evaluation. The
> "referent pattern" is more or less the type of the expression, i.e. the
> type of whatever the expression evaluates to.

Yes. As I see it.

Reference = Referent Pattern + Instruction Patterns + Referents
* Referent Pattern = The most specific description possible of 
the structure of the referents of the reference.
* Instruction Patterns = The result of streaming the referents 
through the instruction.
* Referents = The objects pointed to be the reference. 
Accessing them is a dereference (a manifestation).

Why are references more than just pointers to referents? Kuppitz and I are 
realizing that ?all? database optimization can be understood as techniques to 
avoid dereferencing. If the reference can tell the VM what to expect from a 
dereference, then certain instructions can avoid dereferencing the reference. 

Page 17 of the mm-ADT spec has a summary table of the subsequent subsections 
that show how foreign keys, schemas, indices, denormalizations, views, 
pipelines, and aggregations are all just variations on this theme.

Now, what I realized today what that there are two types of “referent patterns."
* The reference’s referent pattern describes what to expect (locally). 
// read-oriented
* The database’s type pattern describes what is legal (globally).   
 // write-oriented
I didn’t have the second and I need it. Enter a type system. We need a way to 
extend the mm-ADT’s type system. Enter you.

> For example, in Haskell
> notation:
> 
>  sum [1,2,3] :: Int
> 
> Here, "sum [1,2,3]" is the reference. The referent is something which has
> yet to be determined (the number 6). We know that the referent's type is
> Int, and we can type-check the expression to be verify that it will produce
> an Int. Another example:
> 
>  fmap (\n -> "number " ++ show n) $ filter (> 1) [1,2,3] :: [String]
> 
> Here, "fmap ... [1,2,3]" is the reference, and the referent is a list of
> strings: ["number 2","number 3”].

Bingo. So there are three types of “right hand sides” to an instruction pattern.

[instruction pattern]->result pattern (a description of what too expect)
[instruction pattern]->result object  (the actual result to expect — 
memoization-style)
[instruction pattern]->bytecode   (a different computation that should be 
executed)

…where the first is the second given a specific enough pattern ;). That is, so 
the pattern  is so specific that its an explicit object. And the last is 
something new that I realized this weekend with respects to writing, that might 
not be necessary….

> Instruction patterns seem like additional "referents" to me, with the
> difference that they are applied to objects, and that they are composed of
> concrete instructions.

Yes. Though not “concrete” instructions, but instruction patterns. For 
instance, what is a database hash-index over the name-attribute?

*person{name:@string, age:@int /  // referent pattern
  [has,name,eq,$x.@string]->*person{name:x, age:@int} }   // instruction pattern

The concrete instruction [has,name,eq,marko] will match the above instruction 
pattern and will return a reference to person objects whose name is now known 
and whose age is still just some integer.

> Instruction patterns seem like additional "referents" to me

I would argue that referent patterns are actually just instruction patterns as:

[value,name]->@string  // instruction pattern

is the same thing as

{name:@string} // referent pattern

…I don’t have an ultra-confident reason why references aren’t just instruction 
patterns, but its related to behavioral differences in reading/writing. More 
exploration is required.

> If a referent is nullary (has some type "a"), an
> instruction pattern seems unary (has some type "a->b", consuming an "a" and
> producing a "b"). But I need to grok more.

Sorta. The semantics of X->Y are:

“Given a matching X-instruction executed on the reference’s referents, 
the result is Y."

If Y is sufficient to solve the computation, the VM avoided dereferencing… and 
this is database optimization in a nutshell.

Marko.

http://rredux.com

> 
> 
> On Sun, May 26, 2019 at 6:51 PM Marko Rodriguez 
> wrote:
> 
>> Hello,
>> 
>>>> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>>>> 
>>>> We have lost information about the “schema.” This is not good as
>>>> compile-time write validation is not possible.
>>>> 
>>> 
>>> So far, I am thinki

Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez

Hello,

>> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>> 
>> We have lost information about the “schema.” This is not good as
>> compile-time write validation is not possible.
>> 
> 
> So far, I am thinking: yeah, dealing with schema changes can be tricky.

This is not a “schema change” but a greater specification of what the referents 
are. Again, a reference is defined by its referent pattern (and instruction 
patterns). The referent pattern is a description of the current instances 
(referents) while the “schema” is a description of what is legal for all 
instances (referents). Without “schema,” I lose compile-time validation.

> I then create a people-key on the db map that maintains a person-reference.
>> 
> 
> OK. I think by people-key you mean the primary key for the person type,
> i.e. the vertex id. Correct me if I am wrong.

No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map with 
the keys being the table names and the values being *{:} references to maps 
(i.e. rows).

> I see this as a type plus a constraint. And... you won't be surprised to
> hear me say this... you express it with a select statement:
> 
>youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)

Well, type definitions like this won’t happen at runtime. The VM will just be 
able to tell you if the range has been restricted. It won’t create new types. 
But yea, referent patterns are (now) a type plus a constraint. Before, they 
were just constraints and that is why I lost schema information at runtime.

Take care,
Marko.

http://rredux.com

Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez

Hello,

>> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>> 
>> We have lost information about the “schema.” This is not good as
>> compile-time write validation is not possible.
>> 
> 
> So far, I am thinking: yeah, dealing with schema changes can be tricky.

This is not a “schema change” but a greater specification of what the referents 
are. Again, a reference is defined by its referent pattern (and instruction 
patterns). The referent pattern is a description of the current instances 
(referents) while the “schema” is a description of what is legal for all 
instances (referents). Without “schema,” I lose compile-time validation.

> I then create a people-key on the db map that maintains a person-reference.
>> 
> 
> OK. I think by people-key you mean the primary key for the person type,
> i.e. the vertex id. Correct me if I am wrong.

No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map with 
the keys being the table names and the values being *{:} references to maps 
(i.e. rows).

> I see this as a type plus a constraint. And... you won't be surprised to
> hear me say this... you express it with a select statement:
> 
>youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)

Well, type definitions like this won’t happen at runtime. The VM will just be 
able to tell you if the range has been restricted. It won’t create new types. 
But yea, referent patterns are (now) a type plus a constraint. Before, they 
were just constraints and that is why I lost schema information at runtime.

Take care,
Marko.

http://rredux.com

A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez

Hi,

*** This email is primarily for Josh (and Kuppitz).

I think we need Josh's type system core to mm-ADT. I’m facing the problem of 
having to model both a “range” (legal schema) and a “codomain” (current schema) 
of the referents of a reference. Let me explain with an example.

Suppose that there is an SQL table called ‘people’ and the table is empty. When 
I mm-ADT serves up a this table, it looks like this in mm-ADT:


[db][get,’people’] // *{name:@string, age:@int}

This says that ‘people’ is a pointer to maps containing a name-key with a 
string value and an age-key with an integer value.

Now lets say I insert some rows into this table. Now, according to the mm-ADT 
spec, every reference must have as much information as possible about the 
referents. Thus, the people-reference pattern can change. Lets assume it does 
and it now is:

[db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}

We have lost information about the “schema.” This is not good as compile-time 
write validation is not possible.

Thus, I want to make a distinction between “range” and “codomain”. Here is some 
bytecode:

[db][define,’person’,{name:@string,age:@int}]
[db][create,’people’,*person]

I define a type called person, where all such instances must match the 
respective map-pattern.
I then create a people-key on the db map that maintains a person-reference.

Now:

[db][add,’people’,{name:marko,age:29}]
[db][add,’people’,{name:josh,age:32}]
...
[db][get,’people']// *person{name:@string, age:!gt(20)&!lt(33)}
[db][type,’person']   // {name:@string, age:@int}

Thus, when I get the reference at people, I see the “codomain” of current 
person referents, but when I get the person-type, I get the “range” of legal 
person referents.

In this way, “types” become central to mm-ADT, where schema is crucial in 
specifying a referent range.

—I have more to say on the necessity of multi-types (union of types) and their 
role in pattern definitions.

Thoughts?,
Marko.

http://rredux.com

Next steps on the mm-ADT specification.

2019-05-24 Thread Marko Rodriguez

Hello,

** This email is primarily for Kuppitz, Josh, and Stephen. ***

I sent the DRAFT specification to George to get some feedback regarding a 
potential home for the work. 

As it stands, I think we have a really solid approach. Once the concept of a 
“instruction patterns” hit home, we were off to the races. The general trends 
that I would like to see us get solid:

1. The mm-ADT ?Programming? Language [Kuppitz and Marko]
* This is developing into an interesting body of work unto 
itself.
* What is the role of this language beyond communication in the 
doc and use in the binary serialization protocol?
* Should we have a whole “language spec” section ?!
- I would like to get the language solid first so we 
don’t have to update so much when things change.

2. The mm-ADT Modeling Secondary Structures [Marko, Kupppitz, Stephen]
* Kuppitz and I had a great run on these ideas this week.
- We understand indices, foreign keys, schemas, 
denormalizations, views, aggregations...
* However, we need to show how to express all known database 
techniques in mm-ADT.

3. The model-ADTs Specifications [Marko, Josh]
* We have property graphs 60% covered.
* We have RDBMs “covered” in the running examples section.
* We need to fill out the RDF, Document, Wide-Column, RDBMS, …
* Depends heavily on the 1 & 2 above AND 4 below.

4. mm-ADT Writes [Marko, Kuppitz, Josh]
* We have ‘reading’ understood, but what about writes?
* All I have thus far is how you would “write to references.” 
(table inserts, vertex add, ..)
* This will effect our story on views, denormalization, foreign 
keys, …
* This is probably our biggest unknown right now.

Anywho, this weekend I’m going to continue reading through the various database 
textbooks I have and mapping the concepts to mm-ADT. I plan to start hard 
charging again early Monday morning.

Epic week.

Take care,
Marko.

http://rredux.com

Modeling SQL Table Schema with Map-References

2019-05-20 Thread Marko Rodriguez

Hi,

This is mainly for Josh/Kuppitz/Mallette, but others might find it interesting.

// a map-reference demonstrating how various RDBMS table features are captured 
via key/value ?patterns and reference instructions.
// created a new concept called self* which is just a pointer to the current 
reference (like "this" in java).
// you can see that an instruction pattern match on some of these hardcore 
operations simply returns self
// and thus, are just no-ops. however, these noops tell you the nature of the 
table definition.

{ name:?string, age:?int . |// table columns and 
their datatypes
  [count]:20,   // table size   
  [order,age,desc,name,asc]:self*,  // table rows have a 
sort order
  [dedup,[[valueMap,name,age]]]:self*,  // name and age are the 
primary key
  [has,name,eq,?string]:{name:?string,age:?int}*,   // table has an index 
on name
  [identity]:self* }*   // just showing how 
identity is a no-op


CREATE TABLE people (
  name varchar(255),
  age int,
  PRIMARY KEY (name,age)
)
CREATE INDEX people_name_idx ON people (name)



*** If the formatting sucks: 
https://gist.github.com/okram/41d2d40214a9411b0e69cc972ad9cb61 


I don’t know how to force a sort on a table, so the [order] instruction isn’t 
being used in the CREATE TABLE. However, in general, this is how RDBMS tables 
and TP4 map-references are related.

Neat?,
Marko.

http://rredux.com

Re: The Bytecode Pattern-Matching Model

2019-05-17 Thread Marko Rodriguez

Hi,

Kuppitz makes fun of me for my constant use of the word “tuple” for anything 
that has to do with TP4 structure/.

Perhaps this is the API:
https://gist.github.com/okram/84912722a2c00f26f07f1c4825eacd50 
<https://gist.github.com/okram/84912722a2c00f26f07f1c4825eacd50>
My response below to Stephen is still worth reading as its more detailed and I 
assume you understand it for the link above.

What I like about the updated API:

Are you only talking RDBMS? 
TMap. “relations"
Are you only talking GraphDB? 
TMap, TPrimitive. “vertices” and “edges” and their property 
values.
@Josh: want to build a type system over graphdb —> 
“vertex+edge” = “relations”.
Are you only talking DocumentDB? 
TMap, TList, TPrimitive. “objects” containing “objects”, 
“lists”, “primitives"
Are you only talking Wide-Column? 
TMap. “relations"
…

I’ll stop for now. I don’t want to overload y’all. And its the freakin’ 
weekend… oh wait, everyday is the weekend for me.

Peace in the Far East (LA),
Marko.

http://rredux.com <http://rredux.com/>




> On May 17, 2019, at 7:58 AM, Marko Rodriguez  wrote:
> 
> Hi,
> 
> Thanks for your question. 
> 
> I suppose that a “limit bandwidth”-optimization could be based on the 
> provider looking at all the instructions in the submitted instruction and 
> then use that information to constrain what bytecode patterns it exposes. A 
> simple ProviderStrategy would be the means of doing that.
> 
> Perhaps showing you what I think the Tuple API should look like would help. 
> This API would represent the primary way in which the TP VM interacts with 
> the structure/ provider. Thus, this is for all cookies in the cookie jar!
> 
> 
> 
> public interface Tuple extends Iterator> {
> 
>   public boolean hasKey(Object key);
>   public boolean hasValue(Object value);
>   public  Tuple get(Object key);
>   public A value();
>   public long count();
>   public boolean hasNext();
>   public Tuple next();
> 
>   public boolean match(Instruction instruction);
>   public Tuple apply(Instruction instruction);
>   
> }
> 
> 
> 
> Structure neo4j = Neo4jStructureFactory.open(config1)
> Tuple db = neo4j.root(); 
>   => { type:graph | [V] }#1
> 
> //
> 
> Let a = 
> 
> { type:vertex, name:marko, age:29 | [inE] [outE] }#1
> 
> a.count()   => 1
> a.value()   => 
> Map.of('type','vertex','name','marko','age',29)
> a.get('type')   => { 'vertex' }#1
> a.get('name')   => { 'marko' }#1
> a.hasKey('blah')=> false
> a.match(Instruction.of('outE')) => true
> 
> //
> 
> b = a.apply(Instruction.of('outE’))
> 
> { type:edge, label:?string | [outV] [inV] }#?
> 
> b.count()  => -1
> b.hasKey('weight') => null// not false because all we 
> know is type:edge & label:?string about #? of things.
> b.hasKey('type')   => true
> b.hasKey('label')  => true
> b.get('label') => { ?string }#?   // ?string is something 
> like Unknown.of(Type.string())
> 
> //
> 
> c = b.apply(Instruction.of('inV'))
> 
> { type:vertex }#?
> 
> c.count()  => -1
> c.value()  => Map.of('type','vertex')
> c.hasNext()=> true
> c.next()   => { type:vertex, name:stephen, age:17 | [inE] [outE] }
> c.hasNext()=> true
> c.next()   => { type:vertex, name:kuppitz | [inE] [outE] }
> c.hasNext()=> false
> c.count()  => 0
> 
> //
> 
> d = { type:vertex, name:kuppitz | [inE] [outE] }
> 
> e = d.get('name')
> 
> { kuppitz }#1
> 
> e.count() => 1
> e.value() => 'kuppitz'
> 
> //
> 
> Let f = 
> 
> { type:edge | [outV] [inV] [has,label,eq,?0] }?10
> 
> f.count()=> 10
> f.get('type')=> { 'edge' }#10
> f.match(Instruction.of('has','label',P.eq,'knows'))  => true
> 
> //
> 
> g = f.apply(Instruction.of('has','label',P.eq,'knows'))
> 
> { type:edge, label:knows | [outV] [inV] }#1
> 
> g.count()  => 1
> g.hasNext()=> true
> g.next()   => { type:edge, label:knows | [outV] [inV] }#1  // its 
> iteration is itself!
> g.hasNext()=> false// g lost the 
> reference
> g.count()  => 0
> 
>

Re: The Bytecode Pattern-Matching Model

2019-05-17 Thread Marko Rodriguez

Hi,

Thanks for your question. 

I suppose that a “limit bandwidth”-optimization could be based on the provider 
looking at all the instructions in the submitted instruction and then use that 
information to constrain what bytecode patterns it exposes. A simple 
ProviderStrategy would be the means of doing that.

Perhaps showing you what I think the Tuple API should look like would help. 
This API would represent the primary way in which the TP VM interacts with the 
structure/ provider. Thus, this is for all cookies in the cookie jar!



public interface Tuple extends Iterator> {

  public boolean hasKey(Object key);
  public boolean hasValue(Object value);
  public  Tuple get(Object key);
  public A value();
  public long count();
  public boolean hasNext();
  public Tuple next();

  public boolean match(Instruction instruction);
  public Tuple apply(Instruction instruction);
  
}



Structure neo4j = Neo4jStructureFactory.open(config1)
Tuple db = neo4j.root(); 
  => { type:graph | [V] }#1

//

Let a = 

{ type:vertex, name:marko, age:29 | [inE] [outE] }#1

a.count()   => 1
a.value()   => 
Map.of('type','vertex','name','marko','age',29)
a.get('type')   => { 'vertex' }#1
a.get('name')   => { 'marko' }#1
a.hasKey('blah')=> false
a.match(Instruction.of('outE')) => true

//

b = a.apply(Instruction.of('outE’))

{ type:edge, label:?string | [outV] [inV] }#?

b.count()  => -1
b.hasKey('weight') => null// not false because all we 
know is type:edge & label:?string about #? of things.
b.hasKey('type')   => true
b.hasKey('label')  => true
b.get('label') => { ?string }#?   // ?string is something like 
Unknown.of(Type.string())

//

c = b.apply(Instruction.of('inV'))

{ type:vertex }#?

c.count()  => -1
c.value()  => Map.of('type','vertex')
c.hasNext()=> true
c.next()   => { type:vertex, name:stephen, age:17 | [inE] [outE] }
c.hasNext()=> true
c.next()   => { type:vertex, name:kuppitz | [inE] [outE] }
c.hasNext()=> false
c.count()  => 0

//

d = { type:vertex, name:kuppitz | [inE] [outE] }

e = d.get('name')

{ kuppitz }#1

e.count() => 1
e.value() => 'kuppitz'

//

Let f = 

{ type:edge | [outV] [inV] [has,label,eq,?0] }?10

f.count()=> 10
f.get('type')=> { 'edge' }#10
f.match(Instruction.of('has','label',P.eq,'knows'))  => true

//

g = f.apply(Instruction.of('has','label',P.eq,'knows'))

{ type:edge, label:knows | [outV] [inV] }#1

g.count()  => 1
g.hasNext()=> true
g.next()   => { type:edge, label:knows | [outV] [inV] }#1  // its iteration 
is itself!
g.hasNext()=> false// g lost the 
reference
g.count()  => 0

//

Cool? Questions?

Thanks,
Marko.

http://rredux.com <http://rredux.com/>




> On May 17, 2019, at 6:57 AM, Stephen Mallette  wrote:
> 
> This is a nicely refined representation of this concept. I think I've
> followed this abstractly since you first started discussing it, but I've
> struggled with the implementation of it and how it would best work (which
> is probably the reason I keep thinking that I"m not following the
> abstraction hehe). You nicely wrote this from the perspective of the
> individual providers which I think connected me more to the more concrete
> aspect of things, which leads me to this question:  Does the provider send
> the instructions by looking at the query or do they just provide all the
> possible instructions and TP figures it out? (i feel like i've kinda read
> it both ways at different times).
> 
> On Fri, May 17, 2019 at 8:12 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> This email is primarily for Kuppitz and Josh. Kuppitz offered me his
>> attention yesterday. I explained to him an idea I’ve been working on this
>> week. I’ve been frustrated lately because emails and IM are so hard to
>> express abstract ideas. Fortunately, Kuppitz was patient with me. Then he
>> got it. Then he innovated on it. I was elated.
>> 
>>https://twitter.com/twarko/status/1129117666910674944 
>> <https://twitter.com/twarko/status/1129117666910674944> <
>> https://twitter.com/twarko/status/1129117666910674944 
>> <https://twitter.com/twarko/status/1129117666910674944>>
>> 
>> Josh was interested in what this was all about. I had to go to leave for
>> hockey, but I gave him a fast

The Bytecode Pattern-Matching Model

2019-05-17 Thread Marko Rodriguez

Hello,

This email is primarily for Kuppitz and Josh. Kuppitz offered me his attention 
yesterday. I explained to him an idea I’ve been working on this week. I’ve been 
frustrated lately because emails and IM are so hard to express abstract ideas. 
Fortunately, Kuppitz was patient with me. Then he got it. Then he innovated on 
it. I was elated.

https://twitter.com/twarko/status/1129117666910674944 


Josh was interested in what this was all about. I had to go to leave for 
hockey, but I gave him a fast break down. He sorta got the vibe, but wanted to 
know more…..



There is only one type of “tuple.”

{ }#?

The notation says: there are objects, but I don’t know how many of them there 
are…..if you want to know more, iterate.



Let us begin..


——TP4 WITH PROVIDER A——

g.

{ [V] }#1

There is one object. Thus, what you see is all that I know about this object. 
In particular, what I know is that it can be mapped via the bytecode 
instruction [V].

Let us apply [V].

{ name:?string | [has,age,?0,?1] [has,id,eq,?0] }#?

There are some number of objects. If you want to know what they are, iterate. 
However, I am aware of a feature that they all share. I do know for a fact (by 
the way I was designed by my creator ProviderA) that every one of the objects 
has a name-key to some string value. Also, two has() bytecode patterns are 
available.

Let us apply [hasKey,name]. 

{ name:?string | [has,age,?0,?1] [has,id,eq,?0] }#?

The instruction didn't match any of the available bytecode patterns. Thus, the 
instruction has to evaluated. Did you need to iterate and filter out those that 
don’t have a name-key? No. As I told you, I know that every one of the objects 
has a name-key.

Let us apply [has,id,eq,1]. 

{ name:marko, age:29 | [inE] [outE] }#1

There is one thing. It has primitive key/value data —  a name and an age. 

Let us apply [values,name]. 

{ marko }#1

That bytecode instruction didn't match any the available bytecode patterns. The 
instruction was evaluated and there is one thing: the string “marko.”

We did: 

g.V().hasKey(‘name’).hasId(1).values(‘name’)

The query you provided used an index on id. How do we know that? You didn’t 
have to iterate all the objects and filter on id. I was able to jump from all 
vertices to the one with id=1.

——TP4 WITH PROVIDER B——

{ type:person, name:?string, age:?int | [has,name,eq,?0] }?10

There are 10 objects. Some providers can’t determine how many objects there are 
without full iteration. But, by the way I was designed, I know. I also know 
that all the object have a type:person key/value. I also know they all have a 
name-key and int-key with known value types.

What am I?

CREATE TABLE people {
  name varchar(100),
  age int
}
CREATE INDEX people_name_idx ON people (name);

——TP4 WITH PROVIDER C——

g.V().has(‘name’,’marko’).has(‘age’,gt(20)).id()

This is easy. My creator, ProviderC, provides multi-key indices. And when the 
database instance was created, a (name,age)-index was created. Also, because 
you only want the id of those vertices named marko whose age is greater than 
20, I don’t have to manifest the vertices, I can simply get the id out of the 
index. This is what I provided for each instruction of your query...

1. { type:graph | [V] }#1
2. { type:vertex | [has,name,eq,?0] [has,age,?0,?1] [id] }#?
3. { type:vertex, label:person, name:marko | [has,age,?0,?1] [id] }#?
4. { type:vertex, label:person, name:marko, age:gt(20) | [id] }#?
5. { type:int }#?

Unlike ProviderA, all the objects in me have a type-key. It is just something I 
like to do. Call it my quirk. Thus, on line #2, I know that there are some 
number of vertex objects. And do you see my multi-property index there? On line 
#3, I know for a fact that every one of those objects has a name:marko entry. 
Finally, by line #5, I don’t know how many id-objects there are, but I do know 
they are all integers. If you want to know what they are, iterate.

Below are the possible "bytecode pattern”-paths that are available off of the 
graph object. At any point through this pattern, you could iterate.

[V]
   / | \ 
  / [id]\
 /   \
  [has,name,eq,?0][has,age,?0,?1]
 / \ /  \
/   \   /\
[has,age,?0,?1][id][has,name,eq,?0]  [id]
   |  |
  [id]   [id]


*** In case the diagram above looks weird in your mail client: 
https://gist.github.com/okram/f7f20a3c33aa7caca7c28e85fd16be3f 


——TP4 WITH PROVIDER D——

I support "vertex-centric indices.” For certain queries, I don’t have to 
manifest/iterate the incident edges of a vertex to

Re: N-Tuple Transactions?

2019-05-15 Thread Marko Rodriguez

Wow. I totally understood what you wrote.

Question: What is the TransactionLog in a distributed environment?
e.g. Akka-driven traversers spawned from the same query 
migrating around the cluster mutating stuff.

Thanks for the lesson,
Marko.

http://rredux.com 




> On May 15, 2019, at 8:58 AM, Joshua Shinavier  wrote:
> 
> Hi Stephen,
> 
> More the latter. TinkerPop transactions would be layered on top of the
> native transactions of the database (if any), which gives the VM more
> control over the operational semantics of a computation in between database
> commits. For example, in many scenarios it would be desirable not to mutate
> the graph at all until a traversal has completed, so that the result does
> not depend on the order of evaluation. Consider a traversal which adds or
> deletes elements as it goes. In some cases, you want writes and reads to
> build on each other, so that what you wrote in one step is accessible for
> reading in the next step. This is a very imperative style of traversal for
> which you need to understand how the VM builds a query plan in order to
> predict the result. In many other cases, you might prefer a more functional
> approach, for which you can forget about the query plan. Without VM-level
> transactions, you don't have this choice; you are at the mercy of the
> underlying database. The extra level of control will be useful for
> concurrency and parallelism, as well -- without it, the same programs may
> have different results when executed on different databases.
> 
> Josh
> 
> 
> 
> 
> On Wed, May 15, 2019 at 6:47 AM Stephen Mallette 
> wrote:
> 
>> Hi Josh, interesting... we have graphs with everything from no transactions
>> like TinkerGraph to more acid transactional systems and everything in
>> between - will transaction support as you describe it cover all the
>> different transactional semantics of the underlying graphs which we might
>> encounter? or is this an approach that helps unify those different
>> transactional semantics under TinkerPop's definition of a transaction?
>> 
>> On Wed, May 15, 2019 at 9:23 AM Joshua Shinavier 
>> wrote:
>> [...]

A Novel "Bytecode" Optimization Mechanism for TP4

2019-05-15 Thread Marko Rodriguez

Hi,

Thinking last night, I came up with another way of doing bytecode optimization 
in TP4 that has some interesting properties.

1. Providers don't write custom strategies.
2. Providers don’t write custom instructions.
3. TP4 can be ignorant of large swaths of optimization techniques.
==> Instead, providers define custom Sequences.
- In Java, Sequence.iterator() => Iterator
- In mm-ADT, a sequence tuple.

———

Assume the following {graph} tuple.

{ type:graph V: }

g.V()
=compiles to=>
[V]
=evaluates to=>
 

When an instruction is applied to a tuple, it first sees if that tuple has a 
respective "instruction-key.” If so, the value of that key is returned. Thus, 
[V] => .

 is a “sequence” (an iterator) of vertex tuples. It is a reference/pointer 
to a bunch of vertices.

< type:V, parent:{graph}, bytecode:[[V]], hasId: >

If all you did was g.V(), then  would be dereferenced (iterated) yielding 
all the vertex tuples of the graph.  Note that the {graph} tuple was the one 
who said there was a V-key that returned . In other words, the graph 
provider knows what a  sequence is as it created it! Thus, the provider 
knows how to generate an iterator of vertex tuples when .iterator() is 
called.

Moving on….

// g.V().hasId(1) 
[V][hasId,1] 
  ==> 

Note above that the provider’s created  sequence has a hasId-key that 
returns a  sequence. Again, like ,  is a reference/pointer 
to a bunch of vertex tuples.

< type:V.hasId, parent:, bytecode:[[V][hasId,1]] >

If all you did was g.V().hasId(1), then  would be dereferenced 
(iterated) yielding v[1]. Note that  was created by  which was 
created by {graph}. Thus, the graph provider indirectly created  and 
thus,  knows how to dereference/iterate itself with respects to the 
{graph} (follow the parent-key chain back to {graph}). Assume for this graph 
provider, a  dereference/iteration performs an index lookup by id.

Notice how we haven’t done anything with bytecode strategies. g.V().hasId() was 
able to trigger an index lookup. No custom strategy. No custom instructions. 
Why? Because the graph provider delays dereferencing these sequences and thus, 
delays manifesting vertex objects! When it finally has to manifest vertex 
objects, it has an instruction-provenance chain that allows it to be smart 
about how to get the data — i.e. an index lookup is possible.

A dereference doesn’t just happen when the end of the bytecode is reached. No, 
it also happens when a sequence doesn’t have a respective instruction-key. 
Watch...

// g.V().hasId(1).has(‘name’,’marko’)
[V][hasId,1][has,name,marko] 
  ==> { type:vertex, id:1, name:marko, outE: }

The  sequence from previous does not have a has-key. Thus, the 
sequence chain can no longer delay evaluation.  is dereferenced, index 
lookup occurs, and v[1] is flatmapped into the processor stream. The 
has(name,marko) instruction is evaluated on v[1]. The v[1] tuple doesn’t have a 
has-key so the HasFunction does its standard evaluation on a vertex (no delayed 
evaluation as we are back into standard TP-stream processing).

Moving on...

// g.V().hasId(1).has(‘name’,’marko’).outE()
[V][hasId,1][has,name,marko][outE]
  ==> 

When the v[1] vertex tuple is flatmapped into the processor stream from 
, HasFunction lets it live, and then the [outE] instruction is called. 
The v[1] vertex tuple has an outE-key. Thus, instead of OutEdgesFunction 
evaluating on v[1], v[1] provides an  sequence object to the processor 
stream.

< type:outE, parent:{vertex id:1}, bytecode:[[outE]], hasLabel: >

If no more instructions, outE is dereferenced. Since v[1] created the , 
it must have the logic to create an iterator() of outgoing incident edge tuples.

Moving on...

// g.V().hasId(1).has(‘name’,’marko’).outE(‘knows’)
[V][hasId,1][has,name,marko][outE][hasLabel,knows]
  ==> 

< type:outE.hasLabel, parent:, bytecode:[[outE][hasLabel,knows]], 
inV: >

Do you see where this is going?

// g.V().hasId().has(‘name’,’marko’).out(‘knows’)
[V][hasId,1][has,name,marko][outE][hasLabel,knows][inV]
  ==> 

< type:outE.hasLabel.inV, parent:, 
bytecode:[[outE][hasLabel,knows][inV]] >

When the  sequence is dereferenced, v[1] will know how to 
get all its know-adjacent vertices. Guess what cool things just happened? 
1. We didn’t materialize any incident edges.
2. We used a vertex-centric index to directly grab the v[1] knows-edges 
off disk. 

——

Here is why this direction may prove fruitful for TP4:

1. Optimizations don’t have to be determined at compile via strategies. 
* Instead they can be determined at runtime via this “delayed 
evaluation”-mechanism.
2. Optimizations don’t have to be global to a type, they can also be 
local to an instance.
* v[1] could have a out(‘knows’)-index, but v[4] might not!
* This realization happens at runtime, not at compile time.
* Now think about

N-Tuple Transactions?

2019-05-13 Thread Marko Rodriguez

Hello Josh (others),

You mentioned a week or so ago that the n-tuple model should be able to capture 
both indices and transactions.

At the time, I scoffed at the notion. However, as you know, I have been 
recently enlightened to how n-tuples can model indices and am using it 
extensively in the spec. What I would like to contemplate now is how you figure 
transactions being modeled?

Thanks,
Marko.

http://rredux.com

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-12 Thread Marko Rodriguez

Hi,

> Machine machine = RemoteMachine
>.withStructure(NeptuneStructure.class, config1)
>.withProcessor(AkkaProcessor.class, config2)
>.withCompiler(CypherCompiler.class, config3)
>.open(config0);


Yea, I think something like this would work well. 

I like it because it exposes the three main components that TinkerPop is gluing 
together:

Language
Structure
Process

Thus, I would have it:

withStructure()
withProcessor()
withLanguage()

Marko.

http://rredux.com <http://rredux.com/>


> On May 10, 2019, at 8:27 AM, Dmitry Novikov  wrote:
> 
> Stephen, Remote Compiler - very interesting idea to explore. Just for 
> brainstorming, let me imagine how this may look like:
> 
> 
> 1. If the client supports compilation - compiles on the client side
> 2. If remote supports compilation - compiles on the server side
> 3. If neither client and remote support compilation, `config3` could contain 
> the path to microservice.  Microservice does compilation and either return 
> bytecode, either send bytecode to remote and proxy response to the client. 
> Microservice could be deployed on remote as well.
> 
> `config3` may look like respectively:
> 
> 1. `{compilation: 'embedded'}`
> 2. `{compilation: 'remote'}`
> 2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`
> 
> On 2019/05/10 13:45:50, Stephen Mallette  wrote: 
>>> If VM, server or compiler is implemented in another language, there is
>> always a possibility to use something like gRPC or even REST to call
>> microservice that will do query→Universal Bytecode conversion.
>> 
>> That's an interesting way to handle it especially if it could be done in a
>> completely transparent way - a Remote Compiler of some sort. If we had such
>> a thing then the compilation could conceivably happen anywhere, client or
>> server of the host programming language.
>> 
>> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov 
>> wrote:
>> 
>>> Hello,
>>> 
>>> Marko, thank you for the clear explanation.
>>> 
>>>> I don’t like that you would have to create a CypherCompiler class (even
>>> if its just a wrapper) for all popular programming languages. :(
>>> 
>>> Fully agree about this. For declarative languages like SQL, Cypher and
>>> SPARQL complex compilation will be needed, most probably requiring AST
>>> walk. Writing compilers for all popular languages could be possible in
>>> theory, but increases the amount of work n times (where n>language count)
>>> and complicates testing. Also, libraries necessary for the task might not
>>> be available for all languages.
>>> 
>>> In my opinion, to avoid the situation when the number of supported query
>>> languages differs depending on client programming language, it is
>>> preferable to introduce a plugin system. The server might have multiple
>>> endpoints, one for Bytecode, one for SQL, Cypher, etc.
>>> 
>>> If VM, server or compiler is implemented in another language, there is
>>> always a possibility to use something like gRPC or even REST to call
>>> microservice that will do query→Universal Bytecode conversion.
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> On 2019/05/10 12:03:30, Stephen Mallette  wrote:
>>>>> I don’t like that you would have to create a CypherCompiler class
>>> (even
>>>> if its just a wrapper) for all popular programming languages. :(
>>>> 
>>>> Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
>>>> that GLVs can support the g.sparql() step properly. It seems like no
>>> matter
>>>> what you do, you end up with a situation where the language designer has
>>> to
>>>> do something in each programming language they want to support. The bulk
>>> of
>>>> the work seems to be in the "compiler" so if that were moved to the
>>> server
>>>> (what we did in TP3) then the language designer would only have to write
>>>> that once per VM they wanted to support and then provide a more
>>> lightweight
>>>> library for each programming language they supported on the client-side.
>>> A
>>>> programming language that had the full compiler implementation would have
>>>> the advantage that they could client-side compile or rely on the server.
>>> I
>>>> suppose that a lightweight library would then become the basis for a
>>> future
>>>> full blown compiler in that language...

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-09 Thread Marko Rodriguez

Hello Dmitry,

> In TP3 compilation to Bytecode can happen on Gremlin Client side or Gremlin 
> Server side:
> 
> 1. If compilation is simple, it is possible to implement it for all Gremlin 
> Clients: Java, Python, JavaScript, .NET...
> 2. If compilation is complex, it is possible to create a plugin for Gremlin 
> Server. Clients send query string, and server does the compilation.

Yes, but not for the reasons you state. Every TP3-compliant language must be 
able to compile to TP3 bytecode. That bytecode is then submitted, evaluated by 
the TP3 VM, and a traverser iterator is returned.

However, TP3’s GremlinServer also supports JSR223 ScriptEngine which can 
compile query language Strings server side and then return a traverser 
iterator. This exists so people can submit complex Groovy/Python/JS scripts to 
GremlinServer. The problem with this access point is that arbitrary code can be 
submitted and thus while(true) { } can hang the system! dar.

> For example, in Cypher for Gremlin it is possible to use compilation to 
> Bytecode in JVM client, or on the server when using [other language 
> clients][1].

I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I would 
say that all TP3-compliant query languages must be able to compile to TP3 
bytecode.

> My current understanding is that TP4 Server would serve only for I/O purposes.

This is still up in the air, but I believe that we should:

1. Only support one data access point.
TP4 bytecode in and traversers out.
2. The TP4 server should have two components.
(1) One (or many) bytecode input locations (IP/port) that pass 
the bytecode to the TP4 VM.
(2) Multiple traverser output locations where distributed 
processors can directly send halted traversers back to the client.

For me, thats it. However, I’m not a network server-guy so I don’t have a clear 
understanding of what is absolutely necessary.

> Where do you see "Query language -> Universal Bytecode" part in TP4 
> architecture? Will it be in the VM? Or in middleware? How will clients look 
> like in TP4?

TP4 will publish a binary serialization specification.
It will be dead simple compared to TP3’s binary specification.
The only types of objects are: Bytecode, Instruction, Traverser, Tuple, and 
Primitive.

Every query language designer that wants to have their query language execute 
on the TP4 VM (and thus, against all supporting processing engines and data 
storage systems) will need to have a compiler from their language to TP4 
bytecode.

We will provide 2 tools in all the popular programming languages (Java, Python, 
JS, …).
1. A TP4 serializer and deserializer.
2. A lightweight network client to submit serialized bytecode and 
deserialize Iterator into objects in that language. 

Thus, if the Cypher-TP4 compiler is written in Scala, you would:
1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode object 
during your compilation process.
2. use our org.apache.tinkerpop.machine.io 
.RemoteMachine object to send the 
Bytecode and get back Iterator objects.
- RemoteMachine does the serialization and deserialization for 
you.

I originally wrote out how it currently looks in the tp4/ branch, but realized 
that it asks you to write one too many classes. Thus, I think we will probably 
go with something like this:

Machine machine = RemoteMachine.
withStructure(NeptuneStructure.class, config1).
withProcessor(AkkaProcessor.class, config2).
  open(config0);

Iterator results = machine.submit(CypherCompiler.compile("MATCH 
(x)-[knows]->(y)”));

Thus, you would only have to provide a single CypherCompiler class.

If you have any better ideas, please say so. I don’t like that you would have 
to create a CypherCompiler class (even if its just a wrapper) for all popular 
programming languages. :(

Perhaps TP4 has a Compiler interface and compilation happens server side….? But 
then that requires language designers to write their compiler in Java … hmm…..

Hope I’m clear,
Marko.

http://rredux.com

The TP4 Universal Model ==> A Multi-Model ADT

2019-05-09 Thread Marko Rodriguez

Hi,

*** I’ve started a GoogleDoc called “A Multi-Model Data Type Specification.” ***

An abstract data type is a data structure + operations to manipulate it. 
Classic examples include:
1. stacks — arrays with push() and pop() operations.
2. lists — arrays with add(), remove(), get(), etc. operations.
3. graphs — networks with out(), in(), etc. operations.
4. …

Databases can be defined by their ADT. Database ADTs typically involve a data 
structure+indices and a set of data manipulation operations. 
1. key/value — pairs+key-index with get(), remove(), put(), etc. 
operations.
2. relational — relations+indices with select(), project(), join(), 
etc. operations.
3. RDF — statements+spog-indices with subject(), predicate(), object(), 
match(), etc. operations.
4. graph — vertices+edges+indices with has(), values(), out(), in(), 
etc. operations.
5. …

In the spec thus far, I argue that the database industry has become overly 
fixated on classifying databases into discrete categories each with their own 
unique terminology (vertices/edge, tables/rows, documents, statements) and 
overlapping operations. I believe the primary reason for this is that databases 
are monolithic systems composed of a query language, a processing engine, and a 
data storage system. When all these pieces are assembled by the database 
engineering team, a “data perspective” (ADT) is set in stone.

I believe this has unnecessarily created database technology silos.

———

What we are trying to do at Apache TinkerPop is create a multi-model ADT that 
spans the various database categories by using a generic lexicon and set of 
operations capable of performing all database operations. People may argue: 

“Why not just use the relational ADT and table/row lexicon as it can 
emulate every other know ADT relatively naturally?”

I believe we are basically doing that with n-tuples (i.e. "schemaless rows"). 
However, what makes our approach unique is that our ADT doesn’t assume that it 
will solely be used by a monolithic database system. Instead, our ADT is 
designed on the assumption that the storage system, the processing engine, and 
the query language are independent components that are ultimately integrated 
into a “synthetic database” (a database that is custom assembled to meet the 
data modeling and performance requirements of an end user’s applications). 
Synthetic databases are possible with our multi-model ADT.

A multi-model ADT compliant property graph query language assumes a basic 
property graph ADT embedding.

{graph}
  V()
{vertex}
  id()
  label()
  outE()
  inE()
{edge}
  id()
  label()
  outV()
  inV()

The query language says that it understands map-tuples ({}) of type graph, 
vertex, and edge. Moreover, along with core bytecode (has,values,…), it states 
that these tuples should be able to be processed using by the provided property 
graph-specific instructions. Thus,

g.V(1).out(‘knows’).values(‘name’)
  === Gremlin compiles to Basic Property Graph Bytecode ==>
V().filter(id().is(1)).outE().filter(label().is(‘knows’)).inV().values(‘name’)

Without strategies, the above bytecode would execute as a series of inefficient 
linear “scan and filter” operations — the basic functional requirements of a 
“property graph." However, the data storage system says that it has various 
indices (and accessing instructions) for these types of tuples.

{graph}
  V()
  V(object)
{vertex}
  id()
  label()
  outE()
  inE()
  out(string..)
  in(string...)
{edge}
  id()
  label()
  outV()
  inV()

Thus, its property graph ADT extends the basic property graph ADT used by the 
query language. This enables TP4 strategies to rewrite the submitted bytecode 
to use the data storage system’s supported instructions (i.e. optimizations).

V().filter(id().is(1)).outE().filter(label().is(‘knows’)).inV().values(‘name’)
  === Property Graph Bytecode compiles to Data Storage System Optimized 
Bytecode ==>
V(1).out(‘knows').values(‘name’)

This bytecode is then passed to the processing engine which seamlessly operates 
on the data storage system’s tuples as defined by the instructions.

Question: What is the out(‘knows’) instruction? Simple, its a 
FlatMapFunction that calls the following method on the TP4 Vertex 
interface.

> Vertex.out(String… labels)

The data storage system says that its vertex tuple objects supports the 
out(string…) instruction and thus, the data storage system is organizing 
incident edges by label in its respective substrate (i.e. disk or memory). 
Great!

IMPORTANT: Notice that our multi-model ADT’s operations are bytecode 
instructions. There is no longer a concept of “pointers.” A pointer is simply a 
map instruction! This means that our multi-model specification is not just a 
data structure specification, but also a bytecode specification. I believe we 
will ultimately have one spec driving TP4 VM development!



!@#$@#^$%@&@%^@#$#
Now

What is a TP4 structure and what is a TP4 property graph structure?

2019-05-07 Thread Marko Rodriguez

Hello,

*** I believe the following email provides the most elegant TP4 structure/ 
proposal to date. ***

Every database (withStructure()) is understood by TP4 as an unordered sequence 
of #type'd tuples.

{#type:?, ...}
{#type:?, ...}
{#type:?, ...}
...

This is the TP4 “universal model.”

Every graph database is understood by TP4 as an unordered sequence of tuples 
with the following #types.

{#type:index, ...}
{#type:vertex, ...}
{#type:edge, ...}
...

This is the TP4 “property graph model."

It is up to the underlying database to organize tuples in memory and on disk as 
it sees fit. For example, an RDBMS-based graph database may want an 
indices-table, a vertices-table, and an edges-table. Or, for schema-oriented 
graph data, it may want a person-table, a knows-table, a project-table, a 
created-table, etc. The TP4 VM doesn't care how the tuples are organized by the 
database. The TP4 VM interprets the entire database (data + indices) as an 
unordered heterogenous sequence of #type’d tuples. 



All structure providers (i.e. database vendors) must implement a 
db(string...)-instruction which will emit all tuples of a particular #type.

db() -> a sequence of all tuples
db('vertex') -> a sequence of all #type=vertex tuples
db('index','vertex') -> a sequence of all #type=index and #type=vertex tuples.

The difference between a graph database, a relational database, a document 
database, etc. is wholly dependent on the tuple #types and their respective 
keys.

###
### A TP4 Property Graph Database Specification ###
###

#type=index OR vertex OR edge
...
#type=index.#vertex-ids=(pointer(object...) to sequence)  
[OPTIONAL]
#type=index.#vertex-property=(pointer(string,object) to sequence) 
[OPTIONAL]
...
#type=vertex.#id=(object)
#type=vertex.#label=(string)
#type=vertex.#outE=(pointer to sequence)
#type=vertex.#inE=(pointer to sequence)
#type=vertex.#outE-labels=(pointer(string...) to sequence)  [OPTIONAL]
#type=vertex.#inE-labels=(pointer(string...) to sequence)   [OPTIONAL]
#type=vertex.#out-labels=(pointer(string...) to sequence) [OPTIONAL]
#type=vertex.#in-labels=(pointer(string...) to sequence)  [OPTIONAL]
...
#type=edge.#id=(object)
#type=edge.#label=(string)
#type=edge.#outV=(pointer to vertex)
#type=edge.#inV=(pointer to vertex)

The above specifies how the TP4 “universal model” is constrained to encode a 
TP4 "property graph model."

=== EXAMPLE #1 ===

Suppose a graph database provider supports #vertex-ids and #out-labels. The 
graph database's Structure implementation would express this fact via its 
getStrategies() implementation returning: 
GraphCentricIndexStrategy.create(‘vertex-ids')
VertexCentricIndexStrategy.create(‘out-labels')

These TP4 provided strategies would then perform the following bytecode 
rewrites:

g.V(1).out('knows').values('name’)
  == Gremlin compiles to TP4 universal bytecode ==>
db('vertex').has('#id',1).values('#outE').has('#label','knows').values('#inV').values('name')
  == GraphCentricIndexStrategy rewrites the bytecode to ==>
db('index').values('#vertex.ids').apply(1).values('#outE').has('#label','knows').values('#inV').values('name')
  == VertexCentricIndexStrategy rewrites the bytecode to ==>
db('index').values('#vertex.ids').apply(1).values('#out.labels').apply('knows').values('name')

If the original TP4 universal bytecode were to execute, it would do a linear 
scan of all vertex-tuples and filter out those that don’t have an #id=1. It 
would then do a linear scan of all outgoing incident edge-tuples to v[1] and 
filter out those edges that don’t have #label=knows. The result would be 
semantically correct, but the evaluation would be painfully slow for large 
graph datasets. Indices can be used to speed up data access. In the TP4 
universal model, an index is simply a pointer to a sub-sequence of tuples in 
db().

GraphCentricIndexStrategy re-writes the original compiled TP4 universal 
bytecode to use a graph-centric index (pointer) to directly access the vertex 
with #id=1.
VertexCentricIndexStrategy re-writes the previous strategized TP4 bytecode to 
use a vertex-centric index (pointer) to directly access all the incident 
knows-edges of v[1]. 

NOTES:
  values(‘#vertex-ids') references an n-arg pointer. apply(object...) resolves 
the pointer to its tuple sequence using the provided arguments.
  values('#outE') references a 0-arg point. apply() is not required as the 
pointer is immediately resolved to its tuple sequence.

[TO BE CONTINUED…]

Tomorrow I will write this same email but from the perspective of a relational 
data structure encoded in the TP4 universal model.

Feedback is much appreciated.

Take care,
Marko.

http://rredux.com

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

#v-idx-name is a global name 
index? Likewise, how does TP4 know that #outE.knows is a vertex-centric index 
on outgoing knows-edges?
I believe that we would specify a set of common optimization #keys that are 
generally found in graph database systems.

Much like we require #id, #label, #outE, #inE, #outV, #inV keys on respective 
#type:vertex and #type:edge tuples, we could have a standard naming convention 
for indices.

#outE.: put on vertex-tuples with a pointer to all incident 
-edges.
#v-idx-: put on an index-tuple with a pointer to all vertex with a 
key= and a value=@[0].
…

In this way, a “graphdb” is a set of tuple #types, a set of tuple #keys, and a 
set of standard tuple pointers (optimizations). In the situation that TP4 
doesn’t have an optimization represented, well, the provider can always write a 
strategy.

Thoughts?,
Marko.

http://rredux.com <http://rredux.com/>







> 
> Cheers,
> Daniel
> 
> 
> 
> On Tue, May 7, 2019 at 11:04 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Here is a thought…
>> 
>> Think about what this means for Apache Cassandra. Users will be able to
>> define a graphdb-based n-tuple schema that has #fields denoting global
>> indices, vertex-centric indices, full-text search indices, denormalized
>> data, etc. … It will be possible for Apache TinkerPop to take the user's
>> n-tuple schema and then CREATE TABLEs accordingly in Cassandra and thus,
>> optimized for the user’s graph data and queries. In other words,
>> theoretically, we should be able to create an optimially index’d Cassandra
>> keyspace for a user’s specific n-tuple data. In other words, TP4 creates
>> Titans! And best of all, there is no “Titan” middle layer. We just talk
>> directly to Cassandra. Cassandra gives us tuples and resolves pointers.
>> 
>> Now hold on to your undergarments.
>> 
>> A user can state that they plan to have numerous concurrent users
>> interacting with the database.
>>- Connect RxJavaSerialProcessor.
>> A user can state that they plan to do batch analytics weekly.
>>- Connect SparkProcessor.
>> A user can state that their queries tend to jump around the graph.
>>- Connect AkkaProcessor.
>> …
>> 
>> Given a questionnaire and a user’s graphdb-based n-tuple schema, we could
>> provide a service that does:
>> 
>>Processing your questionnaire answers and n-tuple schema.
>> 
>>Using Apache Cassandra as the underlying structure.
>>Generating CQL script:
>>...for creating appropriate indices.
>>...for denormalizing row data in order to limit pointer
>> chasing.
>>Using PipesProcessor as the primary real-time processor.
>>Using SparkProcessor as the primary batch processor.
>>Exposing Gremlin as a graph query language.
>>Exposing Cypher as a graph query language.
>>Configuring TP4VM MachineServer for 127.0.0.1:8080.
>> 
>>Your database is ready for download:
>> 
>> http://tinkerpop.apache.org/db-generator/GraphDB+Cassandra+Pipes+Spark+Gremlin+Cypher.zip
>> <
>> http://tinkerpop.apache.org/db-generator/GraphDB+Cassandra+Pipes+Spark+Gremlin+Cypher.zip
>>  
>> <http://tinkerpop.apache.org/db-generator/GraphDB+Cassandra+Pipes+Spark+Gremlin+Cypher.zip>
>>> 
>> 
>> Its wild, but its not crazy. From what I can see, it is theoretically
>> possible to provide a service of this nature with TP4.
>> 
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>> <http://rredux.com/>>
>> 
>> 
>> 
>> 
>>> On May 7, 2019, at 11:14 AM, Marko Rodriguez >> <mailto:okramma...@gmail.com>>
>> wrote:
>>> 
>>> When your hot, your hot. Going to keep pushin’.
>>> 
>>> What is the difference between a relational database's and a graph
>> database’s encoding of a property graph? The short answer, NOTHING. The
>> n-tuple model for a subset of the TinkerPop toy graph is:
>>> 
>>> {#type:vertex, #id:1, #label:person, name:marko, age:29,
>> #outE:*{#outV=*{#id=1}}}
>>> {#type:vertex, #id:4, #label:person, name:josh, age:32}
>>> {#type:vertex, #id:2, #label:person, name:vadas, age:27}
>>> {#type:edge, #id:7, #label:knows, #outV:*{#id=1}, #inV:*{#id=2}}
>>> {#type:edge, #id:8, #label:knows, #outV:*{#id=1}, #inV:*{#id=4}}
>>> 
>>> MySQL and Neo4j would talk with TP4 via the same above tuples. What
>> makes MySQL different from Neo4j is how the pointers are

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

Here is a thought…

Think about what this means for Apache Cassandra. Users will be able to define 
a graphdb-based n-tuple schema that has #fields denoting global indices, 
vertex-centric indices, full-text search indices, denormalized data, etc. … It 
will be possible for Apache TinkerPop to take the user's n-tuple schema and 
then CREATE TABLEs accordingly in Cassandra and thus, optimized for the user’s 
graph data and queries. In other words, theoretically, we should be able to 
create an optimially index’d Cassandra keyspace for a user’s specific n-tuple 
data. In other words, TP4 creates Titans! And best of all, there is no “Titan” 
middle layer. We just talk directly to Cassandra. Cassandra gives us tuples and 
resolves pointers.

Now hold on to your undergarments.

A user can state that they plan to have numerous concurrent users interacting 
with the database.
- Connect RxJavaSerialProcessor.
A user can state that they plan to do batch analytics weekly.
- Connect SparkProcessor.
A user can state that their queries tend to jump around the graph.
- Connect AkkaProcessor.
…

Given a questionnaire and a user’s graphdb-based n-tuple schema, we could 
provide a service that does:

Processing your questionnaire answers and n-tuple schema.

Using Apache Cassandra as the underlying structure.
Generating CQL script:
...for creating appropriate indices.
...for denormalizing row data in order to limit pointer chasing.
Using PipesProcessor as the primary real-time processor.
Using SparkProcessor as the primary batch processor.
Exposing Gremlin as a graph query language.
Exposing Cypher as a graph query language.
Configuring TP4VM MachineServer for 127.0.0.1:8080.

Your database is ready for download:

http://tinkerpop.apache.org/db-generator/GraphDB+Cassandra+Pipes+Spark+Gremlin+Cypher.zip

<http://tinkerpop.apache.org/db-generator/GraphDB+Cassandra+Pipes+Spark+Gremlin+Cypher.zip>

Its wild, but its not crazy. From what I can see, it is theoretically possible 
to provide a service of this nature with TP4.

Marko.

http://rredux.com <http://rredux.com/>

> On May 7, 2019, at 11:14 AM, Marko Rodriguez  wrote:
> 
> When your hot, your hot. Going to keep pushin’.
> 
> What is the difference between a relational database's and a graph database’s 
> encoding of a property graph? The short answer, NOTHING. The n-tuple model 
> for a subset of the TinkerPop toy graph is:
> 
> {#type:vertex, #id:1, #label:person, name:marko, age:29, 
> #outE:*{#outV=*{#id=1}}}
> {#type:vertex, #id:4, #label:person, name:josh, age:32}
> {#type:vertex, #id:2, #label:person, name:vadas, age:27}
> {#type:edge, #id:7, #label:knows, #outV:*{#id=1}, #inV:*{#id=2}}
> {#type:edge, #id:8, #label:knows, #outV:*{#id=1}, #inV:*{#id=4}}
> 
> MySQL and Neo4j would talk with TP4 via the same above tuples. What makes 
> MySQL different from Neo4j is how the pointers are resolved.
> 
> First, lets talk about the MySQL schema that ultimately holds the graphdb 
> instance data.
> 
> CREATE TABLE #V {
>   #id int NOT NULL PRIMARY KEY,
>   #label varchar,
>   name varchar,
>   age int,
>   #outE varchar
> }
> 
> CREATE TABLE #E {
>   #id int NOT NULL PRIMARY KEY,
>   #label varchar,
>   #outV int,
>   #inV int,
>   FOREIGN KEY (#outV) REFERENCES V(#id),
>   FOREIGN KEY (#inV) REFERENCES V(#id)
> }
> 
> We know how graph databases will execute bytecode so lets focus on how MySQL 
> will do it.
> 
> // g.V()
> db().values(‘#V’)
>   == strategizes to ==>
> db().sql(‘SELECT * FROM #V’)
> 
> // g.V().outE(‘knows’).id()
> db().values(‘#V’).values(‘#outE’).has(‘#label’,’knows’).values(‘#id’)
>   == strategizes to ==>
> db().sql(‘SELECT * FROM #V’).sql(‘SELECT * FROM #E WHERE 
> #outV=$id’).by(‘#id’).has(‘#label’,’knows’).values(‘#id’)
>   == strategizes to ==>
> db().sql(‘SELECT * FROM #V’).sql(‘SELECT * FROM #E WHERE #label=knows AND 
> #outV=$id’).by(‘#id’).values(‘#id’)
>   == strategizes to ==>
> db().sql(‘SELECT #id FROM #V’).sql(‘SELECT #id FROM #E WHERE #label=knows AND 
> #outV=$id’).by(‘#id’)
>   == strategizes to ==>
> db().sql(‘SELECT #E.#id FROM #V, #E WHERE #E.#label=knows AND 
> #E.#outV=#V.#id’)
> 
> What just happened? The RDBMS TP4 compiler strategy knows that rows can not 
> have direct reference to one another. Thus, in order to get the outgoing 
> edges of a vertex, a join is required. What is interesting is that the 
> n-tuple model’s #outE provides the all the necessary information to perform 
> the join. Since we defined the #outV foreign key as a reference to the 
> primary key #id, we know that the *{#outV=...} pointer is simply referencing 
> the

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

When your hot, your hot. Going to keep pushin’.

What is the difference between a relational database's and a graph database’s 
encoding of a property graph? The short answer, NOTHING. The n-tuple model for 
a subset of the TinkerPop toy graph is:

{#type:vertex, #id:1, #label:person, name:marko, age:29, 
#outE:*{#outV=*{#id=1}}}
{#type:vertex, #id:4, #label:person, name:josh, age:32}
{#type:vertex, #id:2, #label:person, name:vadas, age:27}
{#type:edge, #id:7, #label:knows, #outV:*{#id=1}, #inV:*{#id=2}}
{#type:edge, #id:8, #label:knows, #outV:*{#id=1}, #inV:*{#id=4}}

MySQL and Neo4j would talk with TP4 via the same above tuples. What makes MySQL 
different from Neo4j is how the pointers are resolved.

First, lets talk about the MySQL schema that ultimately holds the graphdb 
instance data.

CREATE TABLE #V {
  #id int NOT NULL PRIMARY KEY,
  #label varchar,
  name varchar,
  age int,
  #outE varchar
}

CREATE TABLE #E {
  #id int NOT NULL PRIMARY KEY,
  #label varchar,
  #outV int,
  #inV int,
  FOREIGN KEY (#outV) REFERENCES V(#id),
  FOREIGN KEY (#inV) REFERENCES V(#id)
}

We know how graph databases will execute bytecode so lets focus on how MySQL 
will do it.

// g.V()
db().values(‘#V’)
  == strategizes to ==>
db().sql(‘SELECT * FROM #V’)

// g.V().outE(‘knows’).id()
db().values(‘#V’).values(‘#outE’).has(‘#label’,’knows’).values(‘#id’)
  == strategizes to ==>
db().sql(‘SELECT * FROM #V’).sql(‘SELECT * FROM #E WHERE 
#outV=$id’).by(‘#id’).has(‘#label’,’knows’).values(‘#id’)
  == strategizes to ==>
db().sql(‘SELECT * FROM #V’).sql(‘SELECT * FROM #E WHERE #label=knows AND 
#outV=$id’).by(‘#id’).values(‘#id’)
  == strategizes to ==>
db().sql(‘SELECT #id FROM #V’).sql(‘SELECT #id FROM #E WHERE #label=knows AND 
#outV=$id’).by(‘#id’)
  == strategizes to ==>
db().sql(‘SELECT #E.#id FROM #V, #E WHERE #E.#label=knows AND #E.#outV=#V.#id’)

What just happened? The RDBMS TP4 compiler strategy knows that rows can not 
have direct reference to one another. Thus, in order to get the outgoing edges 
of a vertex, a join is required. What is interesting is that the n-tuple 
model’s #outE provides the all the necessary information to perform the join. 
Since we defined the #outV foreign key as a reference to the primary key #id, 
we know that the *{#outV=...} pointer is simply referencing the #id of the #V 
table. Thus, constructing our SQL is easy. However, on our initial pass, we are 
doing a new SQL call to grab the edges for each vertex emitted from the 
sql(SELECT * FROM #V) instruction. Fortunately, we are able to recursively 
merge SQL calls until we can no longer merge. Note the algorithmic nature of 
this process (easy to code). Each strategy-pass folds as much information into 
the preceding SQL instruction as possible so we don’t have to shuffle tuples 
back and forth from MySQL. In the end, we reach a single sql() query “fix 
point” and tada — TP4 bytecode strategized to use MySQL’s SQL execution engine. 
You have successfully encoded (in one particular way) a property graph into a 
relational database.

Drop mic,
Marko.

http://rredux.com <http://rredux.com/>

> On May 7, 2019, at 10:20 AM, Marko Rodriguez  wrote:
> 
> I’m in the pocket so I’m just going to riff…
> 
> The concept of a pointer as I have defined it is a "0-argument function.” 
> That is, we currently have Pointer as:
> 
> Pointer implements Supplier>
> 
> However, pointers can be n-arg functions!
> 
> Pointer implements Function, Iterator>
> 
> If so, providers can support dynamic tuple fields.
> 
> {#id:1, #label:person, name:marko, age:29, #outE_by_label:*{#outV=*{#id:1}, 
> #label=@[0]}}
> 
> I’m using @[0] to denote the first argument passed into the pointer.
> 
> What this tuple is saying is that the provider is able to do a direct lookup 
> of all incident edges based on edge label. Thus, as TP4, we can optimize:
> 
> values(‘#outE’).has(‘#label’,‘knows’)
>   == to ==>
> values(‘#outE_by_label’).by(‘knows’)
> 
> Lets say a user is doing outE(‘knows’,’likes’). With #outE_by_label we can 
> only reference one edge sequence as we only use one argument. Thus, the 
> compiler would have to write out the following bytecode:
> 
> union(values(‘#outE_by_label’).by(‘knows’), 
> values(‘#outE_by_label’).by(‘likes’))
> 
> However, lets say another provider has this field on their vertex-tuples:
> 
> {#id:1, #label:person, name:marko, #outE_by_labels:*{#outV=*{#id:1}, #label 
> within @}}
> 
> This tuple is saying that it can get a single sequence from a pointer that 
> references an arbitrary number of incident edges by label, where “within” is 
> a predicate like =. Thus, TP4 can compile outE(‘knows’,’likes’) to the 
> following bytecode:
> 
> values(‘#outE_by_labels’).by(‘knows’).by(‘likes’)
> 
> *** I’m starting to see #fields as bytecode! ….. foggy vis

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

I’m in the pocket so I’m just going to riff…

The concept of a pointer as I have defined it is a "0-argument function.” That 
is, we currently have Pointer as:

Pointer implements Supplier>

However, pointers can be n-arg functions!

Pointer implements Function, Iterator>

If so, providers can support dynamic tuple fields.

{#id:1, #label:person, name:marko, age:29, #outE_by_label:*{#outV=*{#id:1}, 
#label=@[0]}}

I’m using @[0] to denote the first argument passed into the pointer.

What this tuple is saying is that the provider is able to do a direct lookup of 
all incident edges based on edge label. Thus, as TP4, we can optimize:

values(‘#outE’).has(‘#label’,‘knows’)
== to ==>
values(‘#outE_by_label’).by(‘knows’)

Lets say a user is doing outE(‘knows’,’likes’). With #outE_by_label we can only 
reference one edge sequence as we only use one argument. Thus, the compiler 
would have to write out the following bytecode:

union(values(‘#outE_by_label’).by(‘knows’), 
values(‘#outE_by_label’).by(‘likes’))

However, lets say another provider has this field on their vertex-tuples:

{#id:1, #label:person, name:marko, #outE_by_labels:*{#outV=*{#id:1}, #label 
within @}}

This tuple is saying that it can get a single sequence from a pointer that 
references an arbitrary number of incident edges by label, where “within” is a 
predicate like =. Thus, TP4 can compile outE(‘knows’,’likes’) to the following 
bytecode:

values(‘#outE_by_labels’).by(‘knows’).by(‘likes’)

*** I’m starting to see #fields as bytecode! ….. foggy vision right now.

——

I’m starting to think that there are no graph interfaces, relational 
interfaces, document interfaces, etc. No — instead, everything is always in 
terms of tuples!

1. Database providers communicate with TP4 by giving it tuples.
- these tuples “abstractly” represent objects like vertices, 
edges, rows, documents, etc.
2. TP4 communicates with database providers by either 
(1) joining tuple sequences to create a new tuple sequence (for 
relational-style data) or 
(2) jumping to a tuple sequence via #fields (for pointer-style 
data).

Thats it. Thats all there is to the game.

// Gremlin
g.V().out(‘knows’).values(’name’)

// TP4 Bytecode
== minimum tuple-field requirement to call yourself a graphdb ==>
db().values(‘#V’).values(‘#outE’).has(‘#label’,’knows’).values(‘#inV’).values(‘name’)
== edge-label vertex-centric index optimization ==>
db().values(‘#V’).values(‘#out_label’).by(‘knows’).values(‘name’)
== denormalized adjacent vertex property data for read-heavy systems ==>
db().values('#V’).values(‘#out_label_key’).by(‘knows’).by(‘name’)

So what makes a graphdb? The tuple types and their respective fields.
- you have to have #type=vertex and #type=edge (you can have other 
types in there too!, but the graphdb instruction set only cares about vertices 
and edges).
- you have to have #id and #label
- you have to have #outE and #inE
- you have to have #outV and #inV (that reference a single #type=vertex)
If your n-tuple representation has all that, then you are a “graphdb” and 
Gremlin will be able to process you.

Now check this mind-numbing entailment. Your n-tuple representation could have 
other tuple #types. For instance, #type=row.

{#type:row, #table:yearly_expenses, person:*{#type:vertex,#id:1}, january:$10, 
february:$12, march:$54, april:…. }

I have a graphdb mixed with a relationaldb all within the same db! You could 
effectively encode this in Cassandra (same keyspace) or an RDBMS (same 
database). Watch and learn:

// what are the names of the friends of the person that spent the least 
last year?
sql("SELECT person FROM yearly_expenses WHERE 
MIN(january+february+…)”).out(‘knows’).values(‘name’)

We aren’t talking “multi-model” …. no my friend, we are talking “hybrid-model.”

Its all just tuples and pointers.

Outz,
Marko.

http://rredux.com <http://rredux.com/>

> On May 7, 2019, at 7:26 AM, Marko Rodriguez  wrote:
> 
> Whoa.
> 
> Check out this trippy trick.
> 
> First, here is how you define a pointer to a map-tuple.
> 
>   *{k1?v1, k2?v2, …, kn?vn}
>   * says “this is a pointer to a map" { }
>   ? is some comparator like =, >, <, !=, contains(), etc.
> 
> Assume the vertex map tuple v[1]:
> 
> {#id:1, #label:person, name:marko, age:29} 
> 
> Now, we can add the following fields:
> 
> 1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV field 
> that is a pointer to the the v[1] vertex tuple.
> 2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing 
> knows-edges.
> 3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} // 
> references all strong outgoing knows-edges
> 
> By using different types of poin

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

Whoa.

Check out this trippy trick.

First, here is how you define a pointer to a map-tuple.

*{k1?v1, k2?v2, …, kn?vn}
* says “this is a pointer to a map" { }
? is some comparator like =, >, <, !=, contains(), etc.

Assume the vertex map tuple v[1]:

{#id:1, #label:person, name:marko, age:29} 

Now, we can add the following fields:

1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV field 
that is a pointer to the the v[1] vertex tuple.
2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing 
knows-edges.
3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} // 
references all strong outgoing knows-edges

By using different types of pointers, a graph database provider can make 
explicit their internal structure. Assume all three fields above are in the 
v[1] vertex tuple. This means that:

1. all of v[1]’s outgoing edges are group together. <— linear scan
2. all of v[1]’s outgoing knows-edges are group together. <— indexed by 
label
3. all of v[1]’s strong outgoing knows-edges are group together <— 
indexed by label and weight

Thus, a graph database provider can describe the way in which it internally 
organizes adjacent edges — i.e. vertex-centric indices! This means then that 
TP4 can do vertex-centric index optimizations automatically for providers!

1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) // grab all 
edges, then filter on label, then filter on weight.
2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all 
knows-edges, then filter on weight.
3. values(“#outE.knows.weight_gt_85”) // grab all strong knows-edges.

*** Realize that Gremlin outE() will just compile to bytecode values(“#outE”).

Freakin’ crazy! … Josh was interested in using the n-tuple structure to 
describe indices. I was against it. I believe I still am. However, this is 
pretty neat. As Josh was saying though, without a rich enough n-tuple 
description of the underlying database, there should be no reason for providers 
to have to write custom strategies and instructions ?!?!?!?!? crazy!?

Marko.

http://rredux.com <http://rredux.com/>

> On May 7, 2019, at 4:44 AM, Marko Rodriguez  wrote:
> 
> Hey Josh,
> 
>> I think of your Pointer as a reference to an entity. It does not contain
>> the entity it refers to, but it contains the primary key of that entity.
> 
> Exactly! I was just thinking that last night. Tuples don’t need a separate ID 
> system. No -- pointers reference the primary key of a tuple! Better yet 
> perhaps, they can reference one-to-many. For instance:
> 
> { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
> 
> Thus, a pointer is defined by a pattern match. Haven’t thought through the 
> consequences, but … :)
> 
>> Here, I have invented an Entity class to indicate that the pointer resolves
>> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
>> unit element).
> 
> Ah — the 0-tuple. Neat thought.
> 
> I look forward to your slides from the Knowledge Graph Conference. If I 
> wasn’t such a reclusive hermit, I would have loved to have joined you there.
> 
> Take care,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
>> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez > <mailto:okramma...@gmail.com>> wrote:
>> 
>>> Hey Josh,
>>> 
>>>> I am feeling the tuples... as long as they can be typed, e.g.
>>>> 
>>>> myTuple.get(Integer) -- int-indexed tuples
>>>> myTuple.get(String) -- string-indexed tuples
>>>> In most programming languages, "tuples" are not lists, though they are
>>> typed by a list of element types. E.g. in Haskell you might have a tuple
>>> with the type
>>>>(Double, Double, Bool)
>>> 
>>> 
>>> Yes, we have Pair, Triple, Quadruple, etc. However
>>> for base Tuple of unknown length, the best I can do in Java is . :|
>>> You can see my stubs in the gist:
>>>https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>>> #21-42)
>>> 
>>>> If this is in line with your proposal, then we agree that tuples should
>>> be the atomic unit of data in TP4.
>>> 
>>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>>> to get y

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-07 Thread Marko Rodriguez

Hey Josh,

> I think of your Pointer as a reference to an entity. It does not contain
> the entity it refers to, but it contains the primary key of that entity.

Exactly! I was just thinking that last night. Tuples don’t need a separate ID 
system. No -- pointers reference the primary key of a tuple! Better yet 
perhaps, they can reference one-to-many. For instance:

{ id:1, label:person, name:marko, age:29, outE:*(outV=id) }

Thus, a pointer is defined by a pattern match. Haven’t thought through the 
consequences, but … :)

> Here, I have invented an Entity class to indicate that the pointer resolves
> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
> unit element).

Ah — the 0-tuple. Neat thought.

I look forward to your slides from the Knowledge Graph Conference. If I wasn’t 
such a reclusive hermit, I would have loved to have joined you there.

Take care,
Marko.

http://rredux.com <http://rredux.com/>


> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Hey Josh,
>> 
>>> I am feeling the tuples... as long as they can be typed, e.g.
>>> 
>>> myTuple.get(Integer) -- int-indexed tuples
>>> myTuple.get(String) -- string-indexed tuples
>>> In most programming languages, "tuples" are not lists, though they are
>> typed by a list of element types. E.g. in Haskell you might have a tuple
>> with the type
>>>(Double, Double, Bool)
>> 
>> 
>> Yes, we have Pair, Triple, Quadruple, etc. However
>> for base Tuple of unknown length, the best I can do in Java is . :|
>> You can see my stubs in the gist:
>>https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 <
>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>> #21-42)
>> 
>>> If this is in line with your proposal, then we agree that tuples should
>> be the atomic unit of data in TP4.
>> 
>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>> to get your feedback on:
>> 
>>1. pointers (tuple entries referencing tuples).
>>2. sequences (multi-value tuple entries).
>>3. # hidden map keys :|
>>- sorta ghetto.
>> 
>> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
>> an intense syntax and its hard to strategize.
>> 
>> I really want to nail down this “universal model” (tuple structure and
>> tuple-oriented instructions) as then I can get back on the codebase and
>> start to flush this stuff out with confidence.
>> 
>> See ya,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>> <http://rredux.com/>>
>> 
>> 
>>> 
>>> Josh
>>> 
>>> 
>>> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez >> <mailto:okramma...@gmail.com>
>> <mailto:okramma...@gmail.com <mailto:okramma...@gmail.com>>> wrote:
>>> Hi,
>>> 
>>> I spent this afternoon playing with n-tuples, pointers, data model
>> interfaces, and bytecode instructions.
>>> 
>>>https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> <
>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>>
>>> 
>>> *** Kuppitz: They are tuples :). A Map extends Tuple>.
>> Tada!
>>> 
>>> What I like about this is that it combines the best of both worlds
>> (Josh+Marko).
>>>* just flat tuples of arbitrary length.
>>>* pattern matching for arbitrary joins. (k1=k2 AND k3=k4
>> …)
>>>* pointers chasing for direct links. (edges, foreign
>> keys, document _id references, URI resolutions, …)
>>>* sequences are a special type of tuple used for multi-valued
>> entries.
>>>* has()/values()/etc. work on all tuple types! (maps, lists,
>> tuples, vertices, edges, rows, statements, documents, etc.)
>>> 
>>> Thoughts?,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>> http://rredux.com/ <http://rredux.com/>>>

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-06 Thread Marko Rodriguez

Hey Josh,

> I am feeling the tuples... as long as they can be typed, e.g.
> 
>  myTuple.get(Integer) -- int-indexed tuples
>  myTuple.get(String) -- string-indexed tuples
> In most programming languages, "tuples" are not lists, though they are typed 
> by a list of element types. E.g. in Haskell you might have a tuple with the 
> type
> (Double, Double, Bool)

Yes, we have Pair, Triple, Quadruple, etc. However for 
base Tuple of unknown length, the best I can do in Java is . :| You can 
see my stubs in the gist:
https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
<https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> (LINES #21-42)

> If this is in line with your proposal, then we agree that tuples should be 
> the atomic unit of data in TP4.

Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However, I 
suspect that we will disagree on some of my tweaks. Thus, I’d really like to 
get your feedback on:

1. pointers (tuple entries referencing tuples).
2. sequences (multi-value tuple entries).
3. # hidden map keys :|
- sorta ghetto.

Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its an 
intense syntax and its hard to strategize. 

I really want to nail down this “universal model” (tuple structure and 
tuple-oriented instructions) as then I can get back on the codebase and start 
to flush this stuff out with confidence.

See ya,
Marko.

http://rredux.com <http://rredux.com/>

> 
> Josh
> 
> 
> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> Hi,
> 
> I spent this afternoon playing with n-tuples, pointers, data model 
> interfaces, and bytecode instructions.
> 
> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> 
> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>
> 
> *** Kuppitz: They are tuples :). A Map extends Tuple>. Tada!
> 
> What I like about this is that it combines the best of both worlds 
> (Josh+Marko).
> * just flat tuples of arbitrary length.
> * pattern matching for arbitrary joins. (k1=k2 AND k3=k4 …)
> * pointers chasing for direct links. (edges, foreign keys, 
> document _id references, URI resolutions, …)
> * sequences are a special type of tuple used for multi-valued entries.
> * has()/values()/etc. work on all tuple types! (maps, lists, tuples, 
> vertices, edges, rows, statements, documents, etc.)
> 
> Thoughts?,
> Marko.
> 
> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
> <http://rredux.com/>>
> 
>

N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-06 Thread Marko Rodriguez

Hi,

I spent this afternoon playing with n-tuples, pointers, data model interfaces, 
and bytecode instructions.

https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 


*** Kuppitz: They are tuples :). A Map extends Tuple>. Tada!

What I like about this is that it combines the best of both worlds (Josh+Marko).
* just flat tuples of arbitrary length.
* pattern matching for arbitrary joins. (k1=k2 AND k3=k4 …)
* pointers chasing for direct links. (edges, foreign keys, 
document _id references, URI resolutions, …)
* sequences are a special type of tuple used for multi-valued entries.
* has()/values()/etc. work on all tuple types! (maps, lists, tuples, 
vertices, edges, rows, statements, documents, etc.)

Thoughts?,
Marko.

http://rredux.com

A collection of examples that map a query language query to provider bytecode.

2019-05-06 Thread Marko Rodriguez

Hello,

I’m experimenting with moving between X query language and Y bytecode via 
Universal Bytecode.

The general (and very difficult) goal of TP4 is to be able to execute queries 
(from any known query language) against any database (regardless of underlying 
data model) using any processing engine.
- e.g. Gremlin over MySQL (as Relational) using RxJava.
- e.g. SQL over Cassandra (as WideColumn) using Flink.
- e.g. SPARQL over MongoDB (as Document) using Akka.
- e.g. Cypher over Neptune (as Graph) using Pipes.
- e.g. ...

——

NOTES:
1. Realize that databases are both processors and structures.
- MySQL has its own SQL engine.
- Cassandra has its own CQL engine.
- MongoDB has its own DocumentQuery engine.
- …
2. What can be processed by the database’s engine should be evaluated 
by the database (typically).
3. What can not be processed by the database’s engine should be 
evaluated by the processor.
4. DATABASE_ENGINE->PROCESSOR->DATABASE_ENGINE->PROCESSOR->etc.
- data may move from database to processor back to database 
back to processor, etc. to yield the final query result.

The universal bytecode chunks in the examples to come assume the following 
n-tuple structure accessible via db():

[0][id:1, label:person, name:marko, outE:*1]
[1][0:*2, 1:*3]
[2][id:7, label:knows, outV:*0, inV:*4]
[3][id:8, label:knows, outV:*0, inV:*5]
[4][id:2, label:person, name:vadas]
[5][id:4, label:person, name:josh]

- All tuples have an id that is outside the id-space of the underlying data 
model.
- Field values can have pointers to other tuples via *-prefix notation.

Every compilation goes:

Query language -> Universal Bytecode -> Data Model Bytecode -> Provider 
Bytecode

——

How do you compile Gremlin to universal bytecode for evaluation over MySQL?

—— GREMLIN QUERY ——

g.V().has(“name”,”marko”).out(“knows”).values(“name”)

—— UNIVERSAL BYTECODE ——

   ==> (using tuple pointers)

db().has(‘label’, within(‘person’,’project’)).
 has(‘name’,’marko’)
  values(‘outE’).has(‘label’,’knows’).
  values(‘inV’).values(‘name’)

—— RELATIONAL BYTECODE ——

   ==>

R(“people”,”projects").has(“name”,”marko”).
 join(R(“knows”)).by(“id”,eq(“outV”)).
 join(R(“people”)).by(“inV”,eq(“id”)).
  values(“name”)

—— JDBC BYTECODE ——

   ==>

union(
  sql(‘SELECT name FROM people as p1, knows, people as p2 WHERE 
p1.id=knows.outV AND knows.inV=p2.id’),
  sql(‘SELECT name FROM projects as p1, knows, projects as p2 WHERE 
p1.id=knows.outV AND knows.inV=p2.id’))

The assumed SQL tables are:

CREATE TABLE people (
id int,
label string, // person
name string,
PRIMARY KEY id
);

CREATE TABLE knows (
id int,
label string, // knows
name string,
outV int,
inV int,
PRIMARY KEY id,
FOREIGN KEY (outV) REFERENCES people(id),
FOREIGN KEY (inV) REFERENCES people(id)
);

There needs to be two mapping specifications (Graph->Universal & 
Universal->Relational)
- V() -> vertex tables are people+projects.
- label() -> plural is table name
- outE -> person.outE.knows is resolved via knows.outV (foreign key to 
person table by id)
- inV -> knows.values(‘inV’) is resolved to person (foreign key to 
person table by id)

Next, we are assuming that a property graph is encoded in MySQL as we have 
outE, inV, etc. column names. If we want to interpret any relational data as 
graph, then it is important to denote which tables are “join tables” and what 
the column-names are for joining. What about when a “join table” references 
more than 2 other rows? (i.e. n-ary relation — hypergraph) ? Is there a general 
solution to looking at any relational schema in terms of a binary property 
graph?

——

How do you compile SQL to universal bytecode for evaluation over Cassandra?

—— SQL QUERY ——

SELECT p2.name FROM people as p1, knows, people as p2 
  WHERE p1.name=marko AND p1.id=knows.outV AND knows.inV=p2.id

—— UNIVERSAL BYTECODE ——

   ==> (using tuple pointers)

db().has(‘label’,’person’).has(‘name’,’marko’).
  values(‘outE’).has(‘label’,’knows’).
  values(‘inV’).values(‘name’)

—— WIDE-COLUMN BYTECODE ——

   ==>

R(‘people’).has(‘name’,’marko’).
  values(‘outE’).has(‘label’,’knows’).values(‘inV’).as(‘$inV’)
R(‘people’).has(‘id’,eq(path(‘$inV’))).values(‘name’)

—— CASSANDRA BYTECODE ——

   ==> 

cql('SELECT outE:knows FROM people WHERE name=marko’).
cql('SELECT name FROM people WHERE id=$inV’).by(‘inV’)

There needs to be a mapping specification from SQL->Universal
- knows.outV is foreign key a person row.
- person.outE.knows is referenced by a knows.outV.

The people-table is is defined as below where each edge is a column.

CREATE TABLE people (
id int,
name string,
age int
outE list>,
PRIMARY KEY (id,name)
);

The last bytecode is chained CQL where each result from

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-05-06 Thread Marko Rodriguez

Hey Josh,


> One more thing is needed: disjoint unions. I described these in my email on
> algebraic property graphs. They are the "plus" operator to complement the
> "times" operator in our type algebra. A disjoint union type is just like a
> tuple type, but instead of having values for field a AND field b AND field
> c, an instance of a union type has a value for field a XOR field b XOR
> field c. Let me know if you are not completely sold on union types, and I
> will provide additional motivation.

Huh. That is an interesting concept. Can you please provide examples?

>> The instructions:
>>1. relations can be “queried” for matching tuples.
>> 
> 
> Yes.

One thing I want to stress. The “universal bytecode” is just standard 
[op,arg*]* bytecode save that data access is via the “universal model's" db() 
instruction. Thus, AND/OR/pattern matching/etc. is all available. Likewise 
union(), repeat(), coalesce(), choose(), etc. are all available.

db().and(as('a').values('knows').as('b'),
 or(as('a').has('name','marko'),
as('a').values(‘created').count().is(gt(1))),
 as('b').values(’created').as('c')).
 path(‘c')

As you can see, and()/or() pattern matching is possible and can be nested.
  *** SIDENOTE: In TP3, such nested and()/or() pattern matching is expressed 
using match() where the root grouping is assumed to be and()’d together.
  *** SIDENOTE: In TP4, I want to get rid of an explicit match() bytecode 
instruction and replace it with and()/or() instructions with prefix/suffix 
as()s.
  *** SIDENOTE: In TP4, in general, any nested bytecode that starts with as(x) 
is path(x) and any bytecode that ends with as(y) is where(eq(path(y)).

> 
>>2. tuple values can be projected out to yield primitives.
>> 
> 
> Or other tuples, or tagged values. E.g. any edge projects to two vertices,
> which are (trivial) tuples as opposed to primitive values.

Good point. I started to do some modeling and I’ve been getting some good 
mileage from a new “pointer” primitive. Assume every N-Tuple has a unique ID 
(outside the data models id space). If so, the TinkerPop toy graph as N-Tuples 
is:

[0][id:1,name:marko,age:29,created:*1,knows:*2]
[1][0:*3]
[2][0:*4,1:*5]
[3][id:3,name:lop,lang:java]
[4][id:2,name:vadas,age:27]
[5][id:4,name:josh,age:32,created*:…]

I know you are thinking that vertices don’t have “outE” projections so this 
isn’t inline with your thinking. However, check this out. If we assume that 
pointers are automatically dereferenced on reference then:

db().has(‘name’,’marko’).values(‘knows’).values(‘name’) => vadas, josh

Pointers are useful when a tuple has another tuple as a value. Instead of 
nesting, you “blank node.” DocumentDBs (with nested list/maps) would use this 
extensively.

> Grumble... db() is just an alias for select()... grumble…

select() and project() are existing instructions in TP3 (TP4?).

SELECT
db() will iterate all N-Tuples
has() will filter out those N-Tuples with respective key/values.
and()/or() are used for nested pattern matching.

PROJECT
values() will project out the n-tuple values.

> Here, we are kind of mixing fields with property keys. Yes,
> db().has('name', 'marko') can be used to search for elements of any type...
> if that type agrees with the out-type of the "name" relation. In my
> TinkerPop Classic example, the out type of "name" is (Person OR Project),
> so your query will get you people or projects.

Like indices, I don’t think we should introduce types. But this is up for 
further discussion...

> Which is to say that we define the out-type of "name" to be the disjoint
> union of all element types. The type becomes trivial. However, we can also
> be more selective if we want to, restricting "name" only to a small subset
> of types.

Hm… I’m listening. I’m running into problems in my modeling when trying to 
generically fit things into relational tables. Maybe typing is necessary :(.


> Good idea. TP4 can provide several "flavors" of interfaces, each of which
> is idiomatic for each major class of database provider. Meeting the
> providers halfway will make integration that much easier.

Yes. With respects to graphdb providers, they want to think in terms of 
Vertex/Edges/etc. We want to put the bytecode in their language so:

1. It is easier for them to write custom strategies.
2. inV() can operate on their Vertex object without them having to 
implement inV().
*** Basically just like TP3 is now. GraphDB providers implement 
Graph/Vertex/Edge and everything works! However, they will then want to write 
custom instructions/strategies to do use their databases optimizations such as 
vertex-centric indices for outE(‘knows’).has(‘stars’,gt(3)).inV().


> I think we will see steps like V() and R() in Gremlin, but do not need them
> in bytecode. Again, db() is just select(), V() is just select(), etc. The
> model-specific interfaces

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-05-02 Thread Marko Rodriguez

es and URI-based 
identifiers.
4. “Agnostic” data(bases) such Redis, Ignite, Spark, etc. can easily 
support common data structures and their respective development communities.
- With TP4, vendors can expand their product offering into 
communities they are only tangentially aware of.
- E.g. Redis can immediately “jump into” the RDF space 
without having background knowledge of that space.
- E.g. Ignite can immediately “jump into” the property 
graph space...
- E.g. Spark can immediately “jump into” the document 
space…
5. All TP4-enabled processors automatically work over all TP4-enabled 
databases.
- JanusGraph gets dynamic query routing with Akka.
- Amazon Neptune gets multi-threaded query execution with 
rxJava.
- ComosDB gets cluster-oriented OLAP query execution with Spark.
- …
6. Language designers that have compilers to TP4 bytecode can work with 
all supporting TP4 databases/processors.
- Neo4j no longer has to convince vendors to implement Cypher.
- Amazon doesn’t have to choose between Gremlin, SPARQL, 
Cypher, etc.
- Their customers can use their favorite language.
- Obviously, some languages are better at 
expressing certain computations than others (e.g. SQL over graphs is horrible).
- Some impedance mismatch issues can arise 
(e.g. RDF requires URIs for ids).
- A plethora of new languages may emerge as designers don’t 
have to convince vendors to support it.
- Language designers only have to develop a compiler to 
TP4 bytecode.

And there you have it — I believe Apache TinkerPop is on the verge of offering 
a powerful new data(base) theory and technology.

The Database Virtual Machine

Thanks for reading,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 30, 2019, at 4:47 PM, Marko Rodriguez  wrote:
> 
> Hello,
> 
>> First, the "root". While we do need context for traversals, I don't think
>> there should be a distinct kind of root for each kind of structure. Once
>> again, select(), or operations derived from select() will work just fine.
> 
> So given your example below, “root” would be db in this case. 
> db is the reference to the structure as a whole.
> Within db, substructures exist. 
> Logically, this makes sense.
> For instance, a relational database’s references don’t leak outside the RDBMs 
> into other areas of your computer’s memory.
> And there is always one entry point into every structure — the connection. 
> And what does that connection point to:
>   vertices, keyspaces, databases, document collections, etc. 
> In other words, “roots.” (even the JVM has a “root” — it called the heap).
> 
>> Want the "person" table? db.select("person"). Want a sequence of vertices
>> with the label "person"? db.select("person"). What we are saying in either
>> case is "give me the 'person' relation. Don't project any specific fields;
>> just give me all the data". A relational DB and a property graph DB will
>> have different ways of supplying the relation, but in either case, it can
>> hide behind the same interface (TRelation?).
> 
> In your lexicon, for both RDBMS and graph:
>   db.select(‘person’) is saying, select the people table (which is 
> composed of a sequence of “person" rows)
>   db.select(‘person’) is saying, select the person vertices (which is 
> composed of a sequence of “person" vertices)
> …right off the bat you have the syntax-problem of people vs. person. Tables 
> are typically named the plural of the rows. That
> doesn’t exist in graph databases as there is just one vertex set (i.e. one 
> “table”).
> 
> In my lexicon (TP instructions)
>   db().values(‘people’) is saying, flatten out the person rows of the 
> people table.
>   V().has(label,’person’) is saying, flatten out the vertex objects of 
> the graph’s vertices and filter out non-person vertices.
> 
> Well, that is stupid, why not have the same syntax for both structures?
> Because they are different. There are no “person” relations in the classic 
> property graph (Neo4j 1.0). There are only vertex relations with a 
> label=person entry.
> In a relational database there are “person” relations and these are bundled 
> into disjoint tables (i.e. relation sets — and schema constrained).
> 
> The point I’m making is that instead of trying to fit all these data 
> structures into a strict type system that ultimately looks like
> a bunch of disjoint relational sets, lets mimic

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-30 Thread Marko Rodriguez

; Whereas select() matches on fields of a relation, has() matches on property
> values and other higher-order things. If you want properties of properties,
> don't use has(); use select()/from(). Most of the time, you will just want
> to use has().
> 
> Agreed that every *entity* should have an id(), and also a label() (though
> it should always be possible to infer label() from the context). I would
> suggest TEntity (or TElement), which has id(), label(), and value(), where
> value() provides the raw value (usually a TTuple) of the entity.
> 
> Josh
> 
> 
> 
> On Mon, Apr 29, 2019 at 10:35 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello Josh,
>> 
>>> A has("age",29), for example, operates at a different level of
>> abstraction than a
>>> has("city","Santa Fe") if "city" is a column in an "addresses" table.
>> 
>> So hasXXX() operators work on TTuples. Thus:
>> 
>> g.V().hasLabel(‘person’).has(‘age’,29)
>> g.V().hasLabel(‘address’).has(‘city’,’Santa Fe’)
>> 
>> ..both work as a person-vertex and an address-vertex are TTuples. If these
>> were tables, then:
>> 
>> jdbc.db().values(‘people’).has(‘age’,29)
>> jdbc.db().values(‘addresses’).has(‘city’,’Santa Fe’)
>> 
>> …also works as both people and addresses are TTables which extend
>> TTuple.
>> 
>> In summary, its its a TTuple, then hasXXX() is good go.
>> 
>> // IGNORE UNTIL AFTER READING NEXT SECTION //
>> *** SIDENOTE: A TTable (which is a TSequence) could have Symbol-based
>> metadata. Thus TTable.value(#label) -> “people.” If so, then
>> jdbc.db().hasLabel(“people”).has(“age”,29)
>> 
>>> At least, they
>>> are different if the data model allows for multi-properties,
>>> meta-properties, and hyper-edges. A property is something that can either
>>> be there, attached to an element, or not be there. There may also be more
>>> than one such property, and it may have other properties attached to it.
>> A
>>> column of a table, on the other hand, is always there (even if its value
>> is
>>> allowed to be null), always has a single value, and cannot have further
>>> properties attached.
>> 
>> 1. Multi-properties.
>> 
>> Multi-properties works because if name references a TSequence, then its
>> the sequence that you analyze with has(). This is another reason why
>> TSequence is important. Its a reference to a “stream” so there isn’t
>> another layer of tuple-nesting.
>> 
>> // assume v[1] has name={marko,mrodriguez,markor}
>> g.V(1).value(‘name’) => TSequence
>> g.V(1).values(‘name’) => marko, mrodriguez, markor
>> g.V(1).has(‘name’,’marko’) => v[1]
>> 
>> 2. Meta-properties
>> 
>> // assume v[1] has name=[value:marko,creator:josh,timestamp:12303] // i.e.
>> a tuple value
>> g.V(1).value(‘name’) => TTuple // doh!
>> g.V(1).value(‘name’).value(‘value’) => marko
>> g.V(1).value(‘name’).value(‘creator’) => josh
>> 
>> So things get screwy. — however, it only gets screwy when you mix your
>> “metadata” key/values with your “data” key/values. This is why I think
>> TSymbols are important. Imagine the following meta-property tuple for v[1]:
>> 
>> [#value:marko,creator:josh,timestamp:12303]
>> 
>> If you do g.V(1).value(‘name’), we could look to the value indexed by the
>> symbol #value, thus => “marko”.
>> If you do g.V(1).values(‘name’), you would get back a TSequence with a
>> single TTuple being the meta property.
>> If you do g.V(1).values(‘name’).value(), we could get the value indexed by
>> the symbol #value.
>> If you do g.V(1).values(‘name’).value(‘creator’), it will return the
>> primitive string “josh”.
>> 
>> I believe that the following symbols should be recommended for use across
>> all data structures.
>>#id, #label, #key, #value
>> …where id(), label(), key(), value() are tuple.get(Symbol). Other symbols
>> for use with propertygraph/ include:
>>#outE, #inV, #inE, #outV, #bothE, #bothV
>> 
>>> In order to simplify user queries, you can let has() and values() do
>> double
>>> duty, but I still feel that there are lower-level operations at play, at
>> a
>>> logical level even if not at a bytecode level. However, expressing the a
>>> traversal in terms of its lowest-level relational operations may also be
>>> useful for query optimization.
>> 
>> One thing that I’m doing, that perhaps you haven’t caught

TP4 + Cypher

2019-04-30 Thread Marko Rodriguez

Hello,

I had the most interesting meeting this morning with Dmitry Novikov (the author
of Cypher-for-Gremlin). The fellow is sharp and has a thorough understanding of
Gremlin (language + mechanics). Here are two points to consider:

1.
https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-extensions

- This page presents the issues that he is running into trying
to get Cypher-for-Gremlin to be 100% openCypher compliant.
- When he went through each problem one-by-one, I was able to
say that most of his issues are known and have respective solutions in TP4.
- However, there are some concepts he presented that I was
completely unaware of. (e.g. generators!)

2. Neo4j is interested in working closely with TP4.
- They want Cypher to be the reference implementation language
for TP4 property graphs.
- I think this is a great idea.
- I see SPARQL being the reference implementation language for
TP4 RDF stores.
- I see SQL being the reference implementation language for TP4
RDBMs.
- Finally, I see Gremlin as the multi-model assembly language
for the TP4 VM.
- graphs, triples, tables, documents, .. Gremlin can do
it all.

I really like Dmitry and believe collaborating with him will benefit the
project. When tp4/ stabilizes, I offered that he start working on a
org.apache.tinkerpop.language.cypher . With both of us working
side-by-side, we should be able to rectify all the points he identifies in (1)
above and at the same time, riff on each others’ knowledge to gain a deeper
understanding of what all of this is all about!

Any thoughts?,
Marko.

http://rredux.com

Re: [DISCUSS] The Two Protocols of TP4

2019-04-29 Thread Marko Rodriguez

Hi,

> Currently users can send either bytecode or groovy scripts to be executed
> on the server. I'm saying we replace "groovy scripts evaluation" with
> "gremlin groovy traversal execution”.

I concur. But why even send Gremlin-Groovy traversals? Just send bytecode.
- assuming we can get rid of lambdas

> In TP3, it's possible for the user to submit to the script engine something
> like "Thread.sleep(4000)" that will be executed inside a sandboxed vm.
> I'm proposing we get rid of this approach in TP4 and, as gremlin groovy
> script are still useful (for example, you can store a bunch of traversals
> to execute in a text file), we replace it with a language recognition
> engine that will parse what is sent and evaluate it, using a restricted
> grammar set. The variant for gremlin strings would still be groovy/java but
> the user won't be able to submit arbitrary groovy instructions.

Understood. Again, I would make this super simple by just sending bytecode.

One thing I’m pushing for is a “reference implementation server.” No more 
monolithic GremlinServer. The reference server has the following features:

- Sits on a socket waiting for bytecode.
- Executes bytecode and returns traversers.
- For distributed processors, can send traversers back to client from 
any machine in the cluster.

From this reference server, providers can extend it as they see fit. Perhaps 
someone wants to execute Groovy scripts!

- ScriptEngineStrategy
- ScriptEngineFlatMap
- [ex:script,groovy,Thread.sleep(1000)]

In other words, our reference implementation server is bare bones, rock solid, 
speedy, and safe. How the pieces are reassembled by the provider is up to them.

Thoughts?,
Marko.

http://rredux.com

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez

Hey,

Check this out:


Machine machine = LocalMachine.open();
TraversalSource jdbc =
Gremlin.traversal(machine).
withProcessor(PipesProcessor.class).
withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
  
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
  path("x", "y").toList());
System.out.println(“\n\n”)
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
  path("x", "y").explain().toList());


[[{NAME=marko, AGE=29}, {CITY=santa fe, NAME=marko}], [{NAME=josh, AGE=32}, 
{CITY=san jose, NAME=josh}]]


[Original   [db, values(people)@x, db, 
values(addresses)@y, hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCStrategy[db(), values(people)@x, db(), values(addresses)@y, 
hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCQueryStrategy   [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
PipesStrategy   [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientVerificationStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
---
Compilation [FlatMapInitial]
Execution Plan [PipesProcessor] [InitialStep[FlatMapInitial]]]





I basically look for a db.values.db.values.has-pattern in the bytecode and if I 
find it, I try and roll it into a single provider-specific instruction that 
does a SELECT query.

Here is JDBCQueryStrategy (its ghetto and error prone, but I just wanted to get 
the basic concept working):

https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java
 
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java>
Here is SqlFlatMapStep (hyper-ghetto… but whateva’):

https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java
 
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java>

Na na!,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 29, 2019, at 11:50 AM, Marko Rodriguez  wrote:
> 
> Hello Kuppitz,
> 
>> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
>> important in RDBMS. I don't know, maybe you can convince SQL people that
>> dropping a value is the same as setting its value to NULL. It would work
>> for you and me and everybody else who's familiar with Gremlin, but SQL
>> people really love their NULLs….
> 
> Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge. 
> 
> 
>> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
>> At least valueMaps would then only have String-keys.
>> I see the issue with that (naming collisions), but it's still better than
>> the enums in my opinion (which became a pain when started to implement
>> GLVs).
> 
> So, TSymbols are not Java enums. They are simply a “primitive”-type that will 
> have a serialization like:
> 
>   symbol[id]
> 
> Meaning, that people can make up Symbols all day long without having to 
> update serializers. How I see them working is that they are Strings prefixed 
> with #.
> 
> g.V().outE() <=>   g.V().values(“#outE”)
> g.V().id()   <=>   g.V().value(“#id”)
> g.V().hasLabel(“person") <=>   g.V().has(“#label”,”person”)
> 
> Now that I type this out, perhaps we don’t even have a TSymbol-class. 
> Instead, any

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez

Hello Kuppitz,

> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
> important in RDBMS. I don't know, maybe you can convince SQL people that
> dropping a value is the same as setting its value to NULL. It would work
> for you and me and everybody else who's familiar with Gremlin, but SQL
> people really love their NULLs….

Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge. 

> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
> At least valueMaps would then only have String-keys.
> I see the issue with that (naming collisions), but it's still better than
> the enums in my opinion (which became a pain when started to implement
> GLVs).

So, TSymbols are not Java enums. They are simply a “primitive”-type that will 
have a serialization like:

symbol[id]

Meaning, that people can make up Symbols all day long without having to update 
serializers. How I see them working is that they are Strings prefixed with #.

g.V().outE() <=>   g.V().values(“#outE”)
g.V().id()   <=>   g.V().value(“#id”)
g.V().hasLabel(“person") <=>   g.V().has(“#label”,”person”)

Now that I type this out, perhaps we don’t even have a TSymbol-class. Instead, 
any String that starts with # is considered a symbol. Now watch this:

g.V().label()  <=>   g.V().value(“#label”)
g.V().labels() <=>   g.V().values(“#label”)

In this way, we can support Neo4j multi-labels as a Neo4jVertex’s #label-Key 
references a TSequence.

g.V(1).label() => TSequence
g.V(1).labels() => String, String, String, …
g.V(1).label().add(“programmer”)
g.V(1).label().drop(“person”)

So we could do “meta()”, but then you need respective “hasXXX”-meta() methods. 
I think #symbol is easiest .. ?

> Also, what I'm wondering about now: Have you thought about Stored
> Procedures and Views in RDBMS? Views can be treated as tables, easy, but
> what about stored procedures? SPs can be found in many more DBMS, would be
> bad to not support them (or hack something ugly together later in the
> development process).

I’m not super versed in RDBMS technology. Can you please explain to me how to 
create a StoreProcedure and the range of outputs a StoredProcedure produces? 
From there, I can try and “Bytecode-ize” it.

Thanks Kuppitz,
Marko.

http://rredux.com <http://rredux.com/>




> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>> interested… ***
>> 
>> So I did a lot of thinking this weekend about structure/ and this morning,
>> I prototyped both graph/ and rdbms/.
>> 
>> This is the way I’m currently thinking of things:
>> 
>>1. There are 4 base types in structure/.
>>- Primitive: string, long, float, int, … (will constrain
>> these at some point).
>>- TTuple: key/value map.
>>- TSequence: an iterable of v objects.
>>- TSymbol: like Ruby, I think we need “enum-like” symbols
>> (e.g., #id, #label).
>> 
>>2. Every structure has a “root.”
>>- for graph its TGraph implements TSequence
>>- for rdbms its a TDatabase implements
>> TTuple
>> 
>>3. Roots implement Structure and thus, are what is generated by
>> StructureFactory.mint().
>>- defined using withStructure().
>>- For graph, its accessible via V().
>>- For rdbms, its accessible via db().
>> 
>>4. There is a list of core instructions for dealing with these
>> base objects.
>>- value(K key): gets the TTuple value for the provided key.
>>- values(K key): gets an iterator of the value for the
>> provided key.
>>- entries(): gets an iterator of T2Tuple objects for the
>> incoming TTuple.
>>- hasXXX(A,B): various has()-based filters for looking
>> into a TTuple and a TSequence
>>- db()/V()/etc.: jump to the “root” of the withStructure()
>> structure.
>>- drop()/add(): behave as one would expect and thus.
>> 
>> 
>> 
>> For RDBMS, we have three interfaces in rdbms/.
>> (machine/machine-core/structure/rdbms)
>> 
>>1. TDatabase implements TTuple // the root
>> structure that indexes the tables.
>>2. TTable implements TSequence> // a table is a sequence
>> of rows
>>3. TRow implements TTuple> // a row has string column
>> names
>> 
>> I

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez

Hello Josh,

> A has("age",29), for example, operates at a different level of abstraction 
> than a
> has("city","Santa Fe") if "city" is a column in an "addresses" table.

So hasXXX() operators work on TTuples. Thus:

g.V().hasLabel(‘person’).has(‘age’,29)
g.V().hasLabel(‘address’).has(‘city’,’Santa Fe’)

..both work as a person-vertex and an address-vertex are TTuples. If these were 
tables, then:

jdbc.db().values(‘people’).has(‘age’,29)
jdbc.db().values(‘addresses’).has(‘city’,’Santa Fe’)

…also works as both people and addresses are TTables which extend 
TTuple.

In summary, its its a TTuple, then hasXXX() is good go.

// IGNORE UNTIL AFTER READING NEXT SECTION //
*** SIDENOTE: A TTable (which is a TSequence) could have Symbol-based metadata. 
Thus TTable.value(#label) -> “people.” If so, then
jdbc.db().hasLabel(“people”).has(“age”,29)

> At least, they
> are different if the data model allows for multi-properties,
> meta-properties, and hyper-edges. A property is something that can either
> be there, attached to an element, or not be there. There may also be more
> than one such property, and it may have other properties attached to it. A
> column of a table, on the other hand, is always there (even if its value is
> allowed to be null), always has a single value, and cannot have further
> properties attached.

1. Multi-properties.

Multi-properties works because if name references a TSequence, then its the 
sequence that you analyze with has(). This is another reason why TSequence is 
important. Its a reference to a “stream” so there isn’t another layer of 
tuple-nesting.

// assume v[1] has name={marko,mrodriguez,markor}
g.V(1).value(‘name’) => TSequence
g.V(1).values(‘name’) => marko, mrodriguez, markor
g.V(1).has(‘name’,’marko’) => v[1]

2. Meta-properties

// assume v[1] has name=[value:marko,creator:josh,timestamp:12303] // i.e. a 
tuple value
g.V(1).value(‘name’) => TTuple // doh!
g.V(1).value(‘name’).value(‘value’) => marko
g.V(1).value(‘name’).value(‘creator’) => josh

So things get screwy. — however, it only gets screwy when you mix your 
“metadata” key/values with your “data” key/values. This is why I think TSymbols 
are important. Imagine the following meta-property tuple for v[1]:

[#value:marko,creator:josh,timestamp:12303]

If you do g.V(1).value(‘name’), we could look to the value indexed by the 
symbol #value, thus => “marko”.
If you do g.V(1).values(‘name’), you would get back a TSequence with a single 
TTuple being the meta property.
If you do g.V(1).values(‘name’).value(), we could get the value indexed by the 
symbol #value.
If you do g.V(1).values(‘name’).value(‘creator’), it will return the primitive 
string “josh”.

I believe that the following symbols should be recommended for use across all 
data structures.
#id, #label, #key, #value
…where id(), label(), key(), value() are tuple.get(Symbol). Other symbols for 
use with propertygraph/ include:
#outE, #inV, #inE, #outV, #bothE, #bothV

> In order to simplify user queries, you can let has() and values() do double
> duty, but I still feel that there are lower-level operations at play, at a
> logical level even if not at a bytecode level. However, expressing the a
> traversal in terms of its lowest-level relational operations may also be
> useful for query optimization.

One thing that I’m doing, that perhaps you haven’t caught onto yet, is that I’m 
not modeling everything in terms of “tables.” Each data structure is trying to 
stay as pure to its conceptual model as possible. Thus, there are no “joins” in 
property graphs as outE() references a TSequence, where TEdge is an 
interface that extends TTuple. You can just walk without doing any type of 
INNER JOIN. Now, if you model a property graph in a relational database, you 
will have to strategize the bytecode accordingly! Just a heads up in case you 
haven’t noticed that.

Thanks for your input,
Marko.

http://rredux.com <http://rredux.com/>

> 
> Josh
> 
> 
> 
> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>> interested… ***
>> 
>> So I did a lot of thinking this weekend about structure/ and this morning,
>> I prototyped both graph/ and rdbms/.
>> 
>> This is the way I’m currently thinking of things:
>> 
>>1. There are 4 base types in structure/.
>>- Primitive: string, long, float, int, … (will constrain
>> these at some point).
>>- TTuple: key/value map.
>>- TSequence: an iterable of v objects.
>>- TSymbol: like Ruby, I think we need “enum-like” symbols
>> (e.g., #

The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez

Hi,

*** This email is primarily for Josh (and Kuppitz). However, if others are 
interested… ***

So I did a lot of thinking this weekend about structure/ and this morning, I 
prototyped both graph/ and rdbms/.

This is the way I’m currently thinking of things:

1. There are 4 base types in structure/.
- Primitive: string, long, float, int, … (will constrain these 
at some point).
- TTuple: key/value map.
- TSequence: an iterable of v objects.
- TSymbol: like Ruby, I think we need “enum-like” symbols 
(e.g., #id, #label).

2. Every structure has a “root.”
- for graph its TGraph implements TSequence
- for rdbms its a TDatabase implements TTuple

3. Roots implement Structure and thus, are what is generated by 
StructureFactory.mint().
- defined using withStructure().
- For graph, its accessible via V().
- For rdbms, its accessible via db().

4. There is a list of core instructions for dealing with these base 
objects.
- value(K key): gets the TTuple value for the provided key.
- values(K key): gets an iterator of the value for the provided 
key.
- entries(): gets an iterator of T2Tuple objects for the 
incoming TTuple.
- hasXXX(A,B): various has()-based filters for looking into a 
TTuple and a TSequence
- db()/V()/etc.: jump to the “root” of the withStructure() 
structure.
- drop()/add(): behave as one would expect and thus.



For RDBMS, we have three interfaces in rdbms/. 
(machine/machine-core/structure/rdbms)

1. TDatabase implements TTuple // the root structure 
that indexes the tables.
2. TTable implements TSequence> // a table is a sequence of rows
3. TRow implements TTuple> // a row has string column names

I then created a new project at machine/structure/jdbc). The classes in here 
implement the above rdbms/ interfaces/

Here is an RDBMS session:

final Machine machine = LocalMachine.open();
final TraversalSource jdbc =
Gremlin.traversal(machine).
withProcessor(PipesProcessor.class).
withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));

System.out.println(jdbc.db().toList());
System.out.println(jdbc.db().entries().toList());
System.out.println(jdbc.db().value("people").toList());
System.out.println(jdbc.db().values("people").toList());
System.out.println(jdbc.db().values("people").value("name").toList());
System.out.println(jdbc.db().values("people").entries().toList());

This yields:

[]
[PEOPLE:]
[]
[, ]
[marko, josh]
[NAME:marko, AGE:29, NAME:josh, AGE:32]

The bytecode of the last query is:

[db(), values(people), entries]

JDBCDatabase implements TDatabase, Structure. 
*** JDBCDatabase is the root structure and is referenced by db() *** 
(CRUCIAL POINT)

Assume another table called ADDRESSES with two columns: name and city.

jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)

The above is equivalent to:

SELECT city FROM people,addresses WHERE people.name=addresses.name

If you want to do an inner join (a product), you do this:


jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")

The above is equivalent to:

SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name

NOTES:
1. Instead of select(), we simply jump to the root via db() (or V() for 
graph).
2. Instead of project(), we simply use value() or values().
3. Instead of select() being overloaded with by() join syntax, we use 
has() and path().
- like TP3 we will be smart about dropping path() data once its 
no longer referenced.
4. We can also do LEFT and RIGHT JOINs (haven’t thought through FULL 
OUTER JOIN yet).
- however, we don’t support ‘null' in TP so I don’t know if we 
want to support these null-producing joins. ?

LEFT JOIN:
* If an address doesn’t exist for the person, emit a “null”-filled path.

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
choose(has(“name”,eq(path(“x”).by(“name”))),
  identity(),
  path(“y”).by(null).as(“y”)).
  path(“x”,”y")

SELECT * FROM addresses LEFT JOIN people ON people.name=addresses.name

RIGHT JOIN:

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
choose(has(“name”,eq(path(“x”).by(“name”))),
  identity(),
  path(“x”).by(null).as(“x”)).
  path(“x”,”y")


SUMMARY:

There are no “low level” instructions. Everything is based on the standard 
instructions that we know and love. Finally, if not apparent, the above 
bytecode chunks would ultimately get strategized

Re: A TP4 Structure Agnostic Bytecode Specification (The Universal Structure)

2019-04-25 Thread Marko Rodriguez

as('p','ex:age').
 select('T').by('s',path('x')).has('p','ex:name').project('o').as('z')

- This can all be made run-time optimized with the match() 
instruction.
- Again, realize that this is bytecode, not Gremlin. I’m simply 
writing the bytecode in a Gremlin-like syntax to make it easier to read.

3. Unlike TP3, an RDF graph is not being embedded in a property graph.
- TP4 will support native RDF. (unlike TP3)
- A SPARQL compiler to TP4 bytecode will be W3C compliant. 
(unlike TP3)



DELETE

I never discussed delete(). It is simple:

1. select(variable).delete() 
- deletes the entire global sequence.
- select('vertices').delete() in an RDBMS is equivalent to DROP TABLE.
- select('T').delete() in an triple store deletes all the data.
- select('V').delete() in a property graph deletes all the data
- select('V').has(id,1).delete('outE') 
// g.V(1).outE().drop()
- select('V').has(id,1).project('outE').has('weight',gt(0.5)) 
// g.V(1).outE().has('weight',gt(0.5)).drop()
2. delete(key...)
- deletes the key/value entry of a tuple.
- select('V').delete('name') in property graphs removes all names from 
the vertices.



Hope this clears up any confusions.

Take care,
Marko.

http://rredux.com





> On Apr 25, 2019, at 11:46 AM, Marko Rodriguez  wrote:
> 
> Hello,
> 
> This email proposes a TP4 bytecode specification that is agnostic to the 
> underlying data structure and thus, is both:
> 
>   1. Turing Complete: the instruction set has process-oriented 
> instructions capable of expressing any algorithm (universal processing).
>   2. Pointer-Based: the instruction set has data-oriented instructions 
> for moving through referential links in memory (universal structuring).
> 
> Turing Completeness has already been demonstrated for TinkerPop using the 
> following sub-instruction set.
>   union(), repeat(), choose(), etc. // i.e. the standard program flow 
> instructions
> 
> We will focus on the universal structuring aspect of this proposed bytecode 
> spec. This work is founded on Josh Shinavier’s Category Theoretic approach to 
> data structures. My contribution has been to reformulate his ideas according 
> to the idioms and requirements of TP4 and then deduce a set of TP4-specific 
> implementation details.
> 
> TP4 REQUIREMENTS:
>   1. The TP4 VM should be able to process any data structure (not just 
> property graphs).
>   2. The TP4 VM should respect the lexicon of the data structure (not 
> just embed the data structure into a property graph).
>   3. The TP4 VM should allow query languages to naturally process their 
> respective data structures  (standards compliant language compilation).
> 
> Here is a set of axioms defining the structures and processes of a universal 
> data structure.
> 
> THE UNIVERSAL STRUCTURE:
>   1. There are 2 data read instructions — select() and project().
>   2. There are 2 data write instructions — insert() and delete().
>   3. There are 3 sorts of data  — tuples, primitives, and sequences.
>   - Tuples can be thought of as “key/value maps.”
>   - Primitives are doubles, floats, integers, booleans, Strings, 
> etc.
>   - Sequences are contiguous streams of tuples and/or primitives. 
>   4 Tuple data is accessed via keys. 
>   - A key is a primitive used for referencing a value in the 
> tuple. (not just String keys)
>   - A tuple can not have duplicate keys.
>   - Tuple values can be tuples, primitives, or sequences.
> 
> Popular data structures can be defined as specializations of this universal 
> structure. In other words, the data structures used by relational databases, 
> graphdbs, triplestores, document databases, column stores, key/value stores, 
> etc. all demand a particular set of constraints on the aforementioned axioms.
> 
> 
> /// A Schema-Oriented Multi-Relational Structure (RDBMS) ///
> 
> 
> RDBMS CONSTRAINTS ON THE UNIVERSAL STRUCTURE:
>   1. There are an arbitrary number of global tuple sequences (tables)
>   2. All tuple keys are Strings. (column names)
>   3. All tuple values are primitives. (row values)
>   4. All tuples in the same sequence have the same keys. (tables have 
> predefined columns)
>   5. All tuples in the same sequence have the same primitive value type 
> for the same key.

A TP4 Structure Agnostic Bytecode Specification (The Universal Structure)

2019-04-25 Thread Marko Rodriguez

Hello,

This email proposes a TP4 bytecode specification that is agnostic to the 
underlying data structure and thus, is both:

1. Turing Complete: the instruction set has process-oriented 
instructions capable of expressing any algorithm (universal processing).
2. Pointer-Based: the instruction set has data-oriented instructions 
for moving through referential links in memory (universal structuring).

Turing Completeness has already been demonstrated for TinkerPop using the 
following sub-instruction set.
union(), repeat(), choose(), etc. // i.e. the standard program flow 
instructions

We will focus on the universal structuring aspect of this proposed bytecode 
spec. This work is founded on Josh Shinavier’s Category Theoretic approach to 
data structures. My contribution has been to reformulate his ideas according to 
the idioms and requirements of TP4 and then deduce a set of TP4-specific 
implementation details.

TP4 REQUIREMENTS:
1. The TP4 VM should be able to process any data structure (not just 
property graphs).
2. The TP4 VM should respect the lexicon of the data structure (not 
just embed the data structure into a property graph).
3. The TP4 VM should allow query languages to naturally process their 
respective data structures  (standards compliant language compilation).

Here is a set of axioms defining the structures and processes of a universal 
data structure.

THE UNIVERSAL STRUCTURE:
1. There are 2 data read instructions — select() and project().
2. There are 2 data write instructions — insert() and delete().
3. There are 3 sorts of data  — tuples, primitives, and sequences.
- Tuples can be thought of as “key/value maps.”
- Primitives are doubles, floats, integers, booleans, Strings, 
etc.
- Sequences are contiguous streams of tuples and/or primitives. 
4 Tuple data is accessed via keys. 
- A key is a primitive used for referencing a value in the 
tuple. (not just String keys)
- A tuple can not have duplicate keys.
- Tuple values can be tuples, primitives, or sequences.

Popular data structures can be defined as specializations of this universal 
structure. In other words, the data structures used by relational databases, 
graphdbs, triplestores, document databases, column stores, key/value stores, 
etc. all demand a particular set of constraints on the aforementioned axioms.


/// A Schema-Oriented Multi-Relational Structure (RDBMS) ///


RDBMS CONSTRAINTS ON THE UNIVERSAL STRUCTURE:
1. There are an arbitrary number of global tuple sequences (tables)
2. All tuple keys are Strings. (column names)
3. All tuple values are primitives. (row values)
4. All tuples in the same sequence have the same keys. (tables have 
predefined columns)
5. All tuples in the same sequence have the same primitive value type 
for the same key. (tables have predefined row value types)

Assume the following tables in a relational database.

vertices
id  label   name   age
1   person  marko  29
2   person  josh   35

edges
id outV label inV
0  1knows 2

An SQL query is presented and then the respective TP4 bytecode is provided 
(using fluent notation vs. [op,arg*]*).

// SELECT * FROM vertices WHERE id=1
select(‘vertices’).has(‘id’,1) 
  => v[1]
// SELECT name FROM vertices WHERE id=1
select(‘vertices’).has(‘id’,1).project('name’) 
  => "marko"
// SELECT * FROM edges WHERE outV=1
select('edges’).has('outV’,1) 
  => e[0][v[1]-knows->v[2]]
// SELECT * FROM edges WHERE outV=(SELECT 'id' FROM vertices WHERE name=‘marko')
select('edges’).has(‘outV’,within(select(‘vertices’).has(‘name’,’marko’).project(‘id’)))
 
  => e[0][v[1]-knows->v[2]] 
// SELECT vertices.* FROM edges,vertices WHERE outV=1 AND id=inV
select(‘vertices’).has(‘id’,1).select('edges').by('outV',eq('id')).select('vertices').by('id',eq('inV'))
 
  => v[2]

VARIATIONS:
1. Relational databases that support JSON-blobs values can have nested 
“JSON-constrained” tuple values.
2. Relational databases the materialize views create new tuple 
sequences.

//
/// A Schema-Less, Recursive, Bi-Relational Structure (GRAPHDB) //
//

GRAPHDB CONSTRAINTS ON THE UNIVERSAL STRUCTURE:
1. There are two sorts of tuples: vertex and edge tuples.
2. All tuples have an id key and a label key.
- id keys reference primitive values.
- label keys reference String values.
3. All vertices are in a tuple sequence denoted “V”.
4. Vertex tuples have an inE and outE key each containing edge tuples.
5. Edge tuples have an outV and inV key

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-24 Thread Marko Rodriguez

Hey,

Thinking through things more and re-reading your emails.

Its like this:

From an object you want to be able to go to the relations in which that 
object is a particular entry.
From that relation you want to go to another object referenced in 
another entry.

For instance assume this set of 3-tuple relations:

talk_table
speaker  listener  statement
markojosh  “sup bro"
markokuppitz   “dude man"

Lets say I’m at josh and I want to know what marko said to him:

josh.adjacents(‘talk’,’listener’, …) // and this is why you have 
from().restrict().to()

Using your from()/restrict()/to() notation:

josh.from(‘talk’,’listener’).restrict(‘speaker’,marko).to(‘statement’) 
=> “sup bro”

I want to get some terminology down:

Relation: a tuple with key/value entries. (basically a map)
Key: A relation column name.
Value: A relation column value.

So there are three operations:

1. Get the relations in which the current object is a value for the 
specified key. [select] // like a back()
2. Filter out those relations that don’t have a particular value for a 
particular key. [filter]
3. Get those objects in the remaining relations associated with a 
particular key. [project] // like a forward()

What did Kuppitz hear from Marko?

kuppitz.select(‘talk’,’listener’).filter(‘speaker’,marko).project(‘statement’) 
=> “dude man”

So, how do we do this with just goto pointer chasing?

kuppitz.goto(‘listener’).filter(goto(‘speaker’).is(marko)).goto(‘statement’)

That is, I went from Kuppitz to all those relations in which he is a listener. 
I then filtered out those relations that don’t have marko as the speaker. I 
then went to the statements associated with those remaining relations. However, 
with this model, I’m assuming that “listener” is unique to the talk_table and 
this is not smart…

Anywho, is this more in line with what you are getting at?

Thanks for your patience,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 24, 2019, at 11:30 AM, Marko Rodriguez  wrote:
> 
> Hi,
> 
> I think I understand you now. The concept of local and non-local data is what 
> made me go “ah!”
> 
> So let me reiterate what I think you are saying.
> 
> v[1] is guaranteed to have its id data local to it. All other information 
> could be derived via id-based "equi-joins.” Thus, we can’t assume that a 
> vertex will always have its properties and edges co-located with it. However, 
> we can assume that it knows where to get its property and edge data when 
> requested. Assume the following RDBMS-style data structure that is referenced 
> by com.example.MyGraph.
> 
> vertex_table
> id label
> 1  person
> 2  person
> …
> 
> properties_table
> id  name   age
> 1   marko  29
> 2   josh   35
> …
> 
> edge_table
> id outV  label  inV
> 0  1knows   2
> …
> 
> If we want to say that the above data structure is a graph, what is required 
> of “ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style 
> graph encodings? Assume ComplexType is defined as:
> 
> interface ComplexType
>   Iterator adjacents(String label, Object... identifiers)
> 
> Take this basic Gremlin traversal:
> 
> g.V(1).out(‘knows’).values(‘name’)
> 
> I now believe this should compile to the following:
> 
> [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
> 
> Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local 
> caching of data on these respective objects, then the bytecode isn’t 
> rewritten and the following cascade of events occurs:
> 
> mygraph
> [goto,V,1] => 
>   mygraph.adjacents(“V”,1) => 
> SELECT * FROM vertex_table WHERE id=1
> myvertex1
> [goto,outE,knows] => 
>   myvertex1.adjacents(“outE”,”knows”) => 
> SELECT id FROM edge_table WHERE outV=1 AND label=knows
> myedge0
> [goto,inV,knows] => 
>   myedge1.adjacents(“inV”) => 
> SELECT vertex_table.id FROM vertex_table, edge_table WHERE 
> vertex_table.id=edge_table.inV AND edge_table.id=0
> myvertex2
> [goto,properties,name] => 
>   myvertex2.adjacents(“properties”,”name”) => 
> SELECT name FROM properties_table WHERE id=2
> “josh"
> 
> Lets review the ComplexType adjacents()-method:
> 
> complexType.adjacents(label,identifiers...)
> 
> complexType must have sufficient information to represent the tail of the 
> relation.
> label specifies the relation type (we will always assume that a single String 
> is sufficient)
> identifiers... must contain sufficient information to identify the head of 
> the relation.
> 
> The return of the the method adjacents() is then the object(s) on the other 
> side of the relation(s).
>

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-24 Thread Marko Rodriguez

Hi,

I think I understand you now. The concept of local and non-local data is what 
made me go “ah!”

So let me reiterate what I think you are saying.

v[1] is guaranteed to have its id data local to it. All other information could 
be derived via id-based "equi-joins.” Thus, we can’t assume that a vertex will 
always have its properties and edges co-located with it. However, we can assume 
that it knows where to get its property and edge data when requested. Assume 
the following RDBMS-style data structure that is referenced by 
com.example.MyGraph.

vertex_table
id label
1  person
2  person
…

properties_table
id  name   age
1   marko  29
2   josh   35
…

edge_table
id outV  label  inV
0  1knows   2
…

If we want to say that the above data structure is a graph, what is required of 
“ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style graph 
encodings? Assume ComplexType is defined as:

interface ComplexType
  Iterator adjacents(String label, Object... identifiers)

Take this basic Gremlin traversal:

g.V(1).out(‘knows’).values(‘name’)

I now believe this should compile to the following:

[goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]

Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local 
caching of data on these respective objects, then the bytecode isn’t rewritten 
and the following cascade of events occurs:

mygraph
[goto,V,1] => 
  mygraph.adjacents(“V”,1) => 
SELECT * FROM vertex_table WHERE id=1
myvertex1
[goto,outE,knows] => 
  myvertex1.adjacents(“outE”,”knows”) => 
SELECT id FROM edge_table WHERE outV=1 AND label=knows
myedge0
[goto,inV,knows] => 
  myedge1.adjacents(“inV”) => 
SELECT vertex_table.id FROM vertex_table, edge_table WHERE 
vertex_table.id=edge_table.inV AND edge_table.id=0
myvertex2
[goto,properties,name] => 
  myvertex2.adjacents(“properties”,”name”) => 
SELECT name FROM properties_table WHERE id=2
“josh"

Lets review the ComplexType adjacents()-method:

complexType.adjacents(label,identifiers...)

complexType must have sufficient information to represent the tail of the 
relation.
label specifies the relation type (we will always assume that a single String 
is sufficient)
identifiers... must contain sufficient information to identify the head of the 
relation.

The return of the the method adjacents() is then the object(s) on the other 
side of the relation(s).

Now, given the way I have my data structure organized, I could beef up the 
MyXXX implementation such that MyStrategy rewrites the base bytecode to:

[goto,V,1] [goto,out,knows][goto,properties,name]

The following cascade of events occurs:

mygraph
[goto,V,1] => 
  mygraph.adjacents(“V”,1) => 
SELECT * FROM vertex_table WHERE id=1
myvertex1
[goto,out,knows] => 
  myvertex1.adjacents(“outE”,”knows”) => 
SELECT vertex_table.id FROM vertex_table,edge_table WHERE outV=1 AND 
label=knows AND inV=vertex_table.id
myvertex2
[goto,properties,name] => 
  myvertex2.adjacents(“properties”,”name”) => 
SELECT name FROM properties_table WHERE id=2
“josh"

Now, I could really beef up MyStrategy when I realize that no path information 
is used in the traversal. Thus, the base bytecode compiles to:

[my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE … lots 
of join equalities]

This would then just emit “josh” given the mygraph object.

——

To recap.

1. There are primitives.
2. There are Maps and Lists.
3. There are ComplexTypes.
4. ComplexTypes are adjacent to other objects via relations.
- These adjacent objects may be cached locally with the 
ComplexType instance.
- These adjacent objects may require some database lookup.
- Regardless, TP4 doesn’t care — its up to the provider’s 
ComplexType instance to decide how to resolve the adjacency.
5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with 
appropriately provided toString() is all that leaves the TP4 VM.

Finally, to solve the asMap()/asList() problem, we simply have:

asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
asList() => complexType.adjacents(‘asList’)

It is up to the complexType to manifest a Map or List accordingly.

I see this as basically a big flatmap system. ComplexTypes just map from self 
to any number of logical neighbors as specified by the relation.

Am I getting it?,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 24, 2019, at 9:56 AM, Joshua Shinavier  wrote:
> 
> On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez 
> wrote:
> 
>> Hi,
>> 
>> I think we are very close to something useable for TP4 structure/. Solving
>> this problem elegantly will open the flood gates on tp4/ development.
>> 
> 
> Yes, and formality often brings elegance. I don't think we can do much
> better than relational algebra and

Re: TP4 Processors now support both push- and pull-based semantics.

2019-04-24 Thread Marko Rodriguez

Hello,

> I think it would be better to either expose Flowable on the API (or Flow if 
> you don't want to be tied in to RxJava)

We definitely don’t want to expose anything “provider specific.” Especially at 
the Processor interface level. I note your Flow API reference in 
java.concurrent and have noticed that RxJava mimics many java.concurrent 
classes (Subscriber, Subscription, etc.). I will dig deeper.

> 1. Using Consumer will break the Rx chain. This is undesirable as it will 
> prevent backpressure and cancellation from working properly.

Understood about breaking the chain.

> 2. The Scheduler to run the traversal on can be set. For instance, in the 
> case where only certain threads are allowed to perform IO once the user has 
> the Flowable they can call subscribeOn before subscribe.
> 3. Backpressure strategy can be set, such as dropping results on buffer 
> overflow.
> 4. Buffer size can be set.

Hm. Here are my thoughts on the matter.

RxJava is just one of many Processors that will interact with TP4. If we start 
exposing backpressure strategies, buffer sizes, etc. at the Processor API 
level, then we expect other providers to have those concepts. Does Spark 
support backpressure? Does Hadoop? Does Pipes? ...

I believe such provider-specific parameterization should happen via 
language-agnostic configuration. For instance:

g = g.withProcessor(RxJavaProcessor.class, Map.of(“rxjava.backpressure”, 
“drop”, “rxjava.bufferSize”, 2000))
g.V().out().blah()

Unlike TP3, TP4 users will never interact with our Java API. They will never 
have a reference to a Processor instance. They only talk to the TP4 VM via 
Bytecode. However, with that said, systems that will integrate the TP4 VM (e.g. 
database vendors, data server systems, etc.) will have to handle Processor 
traverser results in some way (i.e. within Java). Thus, if they are a Reactive 
architecture, then they will want to be able to Flow, but we need to make sure 
that java.concurrent Flow semantics doesn't go too far in demanding 
“unreasonable” behaviors from other Processor implementations. (I need to study 
the java.concurrent Flow API)

Thus, I see it like this:

1. RxJava specific configuration is not available at the Process API 
level (only via configuration).
2. Drop Consumer and expose java.concurrent Flow in Processor so the 
chain isn’t broken for systems integrating the TP4 VM.
- predicated on java.concurrent Flow having reasonable 
expectations of non-reactive sources (i.e. processors).

Does this make sense to you?

———

Stephen said you made a comment regarding ParallelRxJava as not being 
necessary. If this is a true statement, can you explain your thoughts on 
ParallelRxJava. My assumptions regarding serial vs. parallel:

1. For TP4 VM vendors in a highly concurrent, multi-user environment, 
multi-threading individual queries is bad.
2. For TP4 VM vendors in a lowly concurrent, limited-user environment, 
multi-threading a single query is good.
- also related to the workload — e.g. ParallelRxJava for an AI 
system where one query at a time is happening over lots of data.

Thank you for your feedback,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 24, 2019, at 3:41 AM, brynco...@gmail.com wrote:
> 
> 
> 
> On 2019/04/23 13:07:09, Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote: 
>> Hi,
>> 
>> Stephen and Bryn were looking over my RxJava implementation the other day 
>> and Bryn, with his British accent, was like [I paraphrase]:
>> 
>>  “Whoa dawg! Bro should like totally not be blocking to fill an 
>> iterator. Gnar gnar for surezies.”
>> 
>> Prior to now, Processor implemented Iterator, where for RxJava, 
>> when you do next()/hasNext() if there were no results in the queue and the 
>> flowable was still running, then the iterator while()-blocks waiting for a 
>> result or for the flowable to terminate.
>> 
>> This morning I decided to redo the Processor interface (and respective 
>> implementations) and it is much nicer now. We have two “execute” methods:
>> 
>> Iterator  Processor.iterator(Iterator starts)
>> void Processor.subscribe(Iterator starts, Consumer 
>> consumer)
>> 
>> A processor can only be executed using one of the methods above. Thus, 
>> depending on context and the underlying processor, the VM determines whether 
>> to use pull-based or push-based semantics. Pretty neat, eh?
>> 
>>  
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/processor/Processor.java
>>  
>> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/processor/Processor.java>
>>

More emails from Marko. Yes!

2019-04-23 Thread Marko Rodriguez

Hi,

The parallel Josh/Marko/Pieter thread got me thinking… So, given

ComplexType
Iterator siblings(String label)
Iterator children(String label)

…lets see how both structure and processor providers can influence each other 
within the TP4 VM.

Lets take JanusGraph as the example structure.

JanusVertex implements ComplexType

Lets take Akka as the example processor. AkkaProcessor can document:

“If you want query routing functionality for your ComplexTypes, provide 
an akka:location child reference.”

The JanusGraph team plans to provide AkkaProcessor support so they do as asked.

janusVertex.children(“akka:location”) => 127.0.2.2

This is the physical location of the vertex in JanusGraph’s underlying 
Cassandra/HBase/etc. cluster. Now, an AkkaProviderStrategy can do the following:

g.V().has(’name’,’marko’).out(‘knows’).asMap()
==strategizesTo==>
g.V().has(’name’,’marko’).out(‘knows’).akka:route().asMap()

akka:route() is a provider-specific instruction that will look at the incoming 
object, check to see if it has an akka:location child reference, if it does, it 
will teleport the traverser to that machine for the final asMap() execution. 
(i.e. data local query routing). Why pull a bunch of map data over the wire 
when you can send the traverser to the hosting machine and populate the map 
there.

——

We have always talked about providers being able to have custom instructions 
(inserted via provider-specific strategies). What we haven’t discussed and what 
I bring up here is the idea that providers can require/recommend/etc. that data 
providers use certain reference types that they can capitalize on.

Thus, providers interact with other providers within the TP4 VM via:

1. Custom bytecode instructions (process interaction).
2. Custom type references (structure interaction).

Bye,
Marko.

http://rredux.com

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Marko Rodriguez

Hi,

I think we are very close to something useable for TP4 structure/. Solving this 
problem elegantly will open the flood gates on tp4/ development.

——

I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit of 
having two instructions for “pointer chasing” instead of one.

Lets put that aside for now and lets turn to modeling a Vertex. Go back to my 
original representation:

vertex.goto(‘label’)
vertex.goto(‘id’)
vertex.goto(‘outE’)
vertex.goto(‘inE’)
vertex.goto(‘properties’)

Any object can be converted into a Map. In TinkerPop3 we convert vertices into 
maps via:

g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
g.V().has(‘name’,’marko’).valueMap(true) => 
{id:1,label:person,name:marko,age:29}

In the spirit of instruction reuse, we should have an asMap() instruction that 
works for ANY object. (As a side: this gets back to ONLY sending primitives 
over the wire, no Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, 
the above is:

g.V().has(‘name’,’marko’).properties().asMap() => {name:marko,age:29}
g.V().has(‘name’,’marko’).asMap() => 
{id:1,label:person,properties:{name:marko,age:29}}

You might ask, why didn’t it go to outE and inE and map-ify that data? Because 
those are "sibling” references, not “children” references. 

goto(‘outE’) is a “sibling” reference. (a vertex does not contain an 
edge)
goto(‘id’) is a “child” reference. (a vertex contains the id)

Where do we find sibling references?
Graphs: vertices don’t contain each other.
OO heaps: many objects don’t contain each other.
RDBMS: rows are linked by joins, but don’t contain each other.

So, the way in which we structure our references (pointers) determines the 
shape of the data and ultimately how different instructions will behave. We 
can’t assume that asMap() knows anything about 
vertices/edges/documents/rows/tables/etc. It will simply walk all 
child-references and create a map.

We don’t want TP to get involved in “complex data types.” We don’t care. You 
can propagate MyDatabaseObject through the TP4 VM pipeline and load your object 
up with methods for optimizations with your DB and all that, but for TP4, your 
object is just needs to implement:

ComplexType
- Iterator children(String label)
- Iterator siblings(String label)
- default Iterator references(String label) { 
IteratorUtils.concat(children(label), siblings(label)) }
- String toString()

When a ComplexType goes over the wire to the user, it just represented as a 
ComplexTypeProxy with a toString() like v[1], 
tinkergraph[vertices:10,edges:34], etc. All references are disconnected. Yes, 
even children references. We do not want language drivers having to know about 
random object types and have to deal with implementing serializers and all that 
non-sense. The TP4 serialization protocol is primitives, maps, lists, bytecode, 
and traversers. Thats it!

*** Only Maps and Lists (that don’t contain complex data types) maintain their 
child references “over the wire.”

——

I don’t get your hypergraph example, so let me try another example:

tp ==member==> marko, josh

TP is a vertex and there is a directed hyperedge with label “member” connecting 
to marko and josh vertices.

tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)

Looks exactly like a property graph query? However, its not because goto(“inV”) 
returns 2 vertices, not 1. EdgeVertexFlatmapFunction works for property graphs 
and hypergraphs. It doesn’t care — it just follows goto() pointers! That is, it 
follows the ComplexType.references(“inV”). Multi-properties are the same as 
well. Likewise for meta-properties. These data model variations are not 
“special” to the TP4 VM. It just walks references whether there are 0,1,2, or N 
of them.

Thus, what is crucial to all this is the “shape of the data.” Using your 
pointers wisely so instructions produce useful results.

Does any of what I wrote update your comeFrom().goto() stuff? If not, can you 
please explain to me why comeFrom() is cool — sorry for being dense (aka “being 
Kuppitz" — thats right, I said it. boom!).

Thanks,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier  wrote:
> 
> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez 
> wrote:
> 
>> Hey Josh,
>> 
>> This gets to the notion I presented in “The Fabled GMachine.”
>>http://rredux.com/the-fabled-gmachine.html <
>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
>> “Structures, Processes, and Languages” section)
>> 
>> All that exists are memory addresses that contain either:
>> 
>>1. A primitive
>>2. A set of labeled references to other references or primitives.
>> 
>> Using y

[Article] Pull vs. Push-Based Loop Fusion in Query Engines

2019-04-23 Thread Marko Rodriguez

Hello,

I just read this article:

Push vs. Pull-Based Loop Fusion in Query Engines
https://arxiv.org/abs/1610.09166 

It is a really good read if you are interested in TP4. Here are some notes I 
jotted down:

1. Pull-based engines are inefficient when there are lots of filters().
- they require a while(predicate.test(next())) which introduces 
branch flow control and subsequent JVM performance issues.
- push-based engines simply don’t emit() if the 
predicate.test() is false. Thus, no branching.
2. Pull-based engines are better at limit() based queries.
- they only process what is necessary to satisfy the limit.
- push-based engines will provide more results than needed 
given their eager evaluation strategy (backpressure comes into play).
3. We should introduce a "collection()" operator in TP4 for better 
expressivity with list and map manipulation and so we don’t have to use 
unfold()…fold().
- [9,11,13].collection(incr().is(gt(10))) => [12,14]
- the ability to chain functions in a collection manipulation 
sequence is crucial for performance as you don’t create intermediate 
collections.
4. Given that some bytecode is best on a push-based vs. a pull-based 
(and vice versa), we can strategize for this accordingly.
- We have Pipes for pull-based.
- We have RxJava for push-based.
- We can even isolate sub-sections of a flow. For instance:
g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
==>becomes
g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
- where the local(bytecode) (TP3-style) is 
executed by Pipes and the root bytecode by rxJava.
5. They have lots of good tips for writing JVM performant 
operators/steps/functions.
- All their work is done in Scala.

Enjoy!,
Marko.

http://rredux.com

TP4 Processors now support both push- and pull-based semantics.

2019-04-23 Thread Marko Rodriguez

Hi,

Stephen and Bryn were looking over my RxJava implementation the other day and
Bryn, with his British accent, was like [I paraphrase]:

“Whoa dawg! Bro should like totally not be blocking to fill an
iterator. Gnar gnar for surezies.”

Prior to now, Processor implemented Iterator, where for RxJava, when
you do next()/hasNext() if there were no results in the queue and the flowable
was still running, then the iterator while()-blocks waiting for a result or for
the flowable to terminate.

This morning I decided to redo the Processor interface (and respective
implementations) and it is much nicer now. We have two “execute” methods:

Iterator Processor.iterator(Iterator starts)
void Processor.subscribe(Iterator starts, Consumer
consumer)

A processor can only be executed using one of the methods above. Thus,
depending on context and the underlying processor, the VM determines whether to
use pull-based or push-based semantics. Pretty neat, eh?

https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/processor/Processor.java

Check out how I do Pipes:

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L113-L126

Pipes is inherently pull-based. However, to simulate push-based semantics, I
Thread().start() the iterator.hasNext()/next() and just consume.accept() the
results. Thus, as desired, subscribe() returns immediately.

Next, here is my RxJava implementation.

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/SerialRxJava.java#L59-L65

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/AbstractRxJava.java#L66-L86

You can see how I turn a push-based subscription into a pull-based iteration
using the good ‘ol while()-block :).

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/AbstractRxJava.java#L98-L102

——

What I need to do next is to redo the RxJava execution planner such that nested
traversals (e.g. map(out( are subscription-based with the parent flowable.
I don’t quite know how I will do it — but I believe I will have to write custom
Publisher/Subscriber objects for use with Flowable.compose() such that onNext()
and onComplete() will be called accordingly within the consumer.accept(). It
will be tricky as I’m not too good with low-level RxJava, but thems the breaks.

Please note that my push-based conceptual skills are not the sharpest so if
anyone has any recommendations, please advise.

Take care,
Marko.

http://rredux.com

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Marko Rodriguez

Hey Josh,

This gets to the notion I presented in “The Fabled GMachine.”
http://rredux.com/the-fabled-gmachine.html 
<http://rredux.com/the-fabled-gmachine.html> (first paragraph of “Structures, 
Processes, and Languages” section)

 All that exists are memory addresses that contain either:

1. A primitive
2. A set of labeled references to other references or primitives.

Using your work and the above, here is a super low-level ‘bytecode' for 
property graphs.

v.goto("id") => 1
v.goto("label") => person
v.goto("properties").goto("name") => "marko"
v.goto("properties").goto("name").goto(0) => "m"
v.goto("outE").goto("inV") => v[2], v[4]
g.goto("V").goto(1) => v[1]

The goto() instruction moves the “memory reference” (traverser) from the 
current “memory address” to the “memory address” referenced by the goto() 
argument.

The Gremlin expression:

g.V().has(‘name’,’marko’).out(‘knows’).drop()

..would compile to:

g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()

…where free() is the opposite of malloc().

If we can get things that “low-level” and still efficient to compile, then we 
can model every data structure. All you are doing is pointer chasing through a 
withStructure() data structure. .

No one would ever want to write strategies for goto()-based Bytecode. Thus, 
perhaps there could be a PropertyGraphDecorationStrategy that does:

g = Gremlin.traversal(machine).withStructure(JanusGraph.class)  // this will 
register the strategy
g.V().has(‘name’,’marko’).out(‘knows’).drop() // this generates goto()-based 
bytecode underneath the covers
==submit==>
[goto,V][filter,[goto…]][goto][goto][free]] // Bytecode from the “fundamental 
instruction set” 
[V][has,name,marko][out,knows][drop] // PropertyGraphDecorationStrategy 
converts goto() instructions into a property graph-specific instruction set.
[V-idx,name,marko][out,knows][drop] // JanusGraphProviderStrategy converts 
V().has() into an index lookup instruction.

[I AM NOW GOING OFF THE RAILS]

Like fluent-style Gremlin, we could have an AssemblyLanguage that only has 
goto(), free(), malloc(), filter(), map(), reduce(), flatmap(), barrier(), 
branch(), repeat(), sideEffect() instructions. For instance, if you wanted to 
create an array list (not a linked list! :):

[“marko”,29,true]

you would do:

malloc(childrefs(0,1,2)).sideEffect(goto(0).malloc(“marko”)).sideEffect(goto(1).malloc(29)).sideEffect(goto(2).malloc(true))

This tells the underlying data structure (e.g. database) that you want to 
create a set of “children references" labeled 0, 1, and 2. And then you goto() 
each reference and add primitives. Now, if JanusGraph got this batch of 
instructions, it would do the following:

Vertex refs = graph.addVertex()
refs.addEdge(“childref", graph.addVertex(“value”,”marko”)).property(“ref”,0)
refs.addEdge(“childref", graph.addVertex(“value”,29)).property(“ref”,1)
refs.addEdge(“childref", graph.addVertex(“value”,true)).property(“ref”,2)

The reason for childref, is that if you delete the list, you should delete all 
the children referenced data! In other words, refs-vertex has cascading deletes.

list.drop()
==>
list.sideEffect(goto(0,1,2).free()).free()

JanusGraph would then do:

refs.out(“childref").drop()
refs.drop()

Or, more generally:

refs.emit().repeat(out(“childref”)).drop()

Trippy.

[I AM NOW BACK ON THE RAILS]

Its as if “properties”, “outE”, “label”, “inV”, etc. references mean something 
to property graph providers and they can do more intelligent stuff than what 
MongoDB would do with such information. However, someone, of course, can create 
a MongoDBPropertyGraphStrategy that would make documents look like vertices and 
edges and then use O(log(n)) lookups on ids to walk the graph. However, if that 
didn’t exist, it would still do something that works even if its horribly 
inefficient as every database can make primitives with references between them!

Anywho @Josh, I believe goto() is what you are doing with multi-references off 
an object. How do we make it all clean, easy, and universal?

Marko.

http://rredux.com <http://rredux.com/>

> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier  wrote:
> 
> Ah, glad you asked. It's all in the pictures. I have nowhere to put them 
> online at the moment... maybe this attachment will go through to the list?
> 
> Btw. David Spivak gave his talk today at Uber; it was great. Juan Sequeda 
> (relational <--> RDF mapping guy) was also here, and Ryan joined remotely. 
> Really interesting discussion about databases vs. graphs, and what category 
> theory brings to the table.
> 
> 
> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez  <mailto:okramma...@gmail.c

Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Marko Rodriguez

Whoa! — are you saying that we should write an ANTLR parser that compiles 
Gremlin-XXX into Bytecode directly?

Thus, for every Gremlin language variant, we will have an ANTLR parser.

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 23, 2019, at 5:01 AM, Jorge Bay Gondra  
> wrote:
> 
> Hi,
> Language recognition engines will give us a set of tokens, usually in some
> sort of tree but the result can be thought of nested collections, for
> example:
> 
> The following string "g.V().values('name')" could be parsed into something
> like [["g"], ["V"], ["values", "name"]].
> 
> Then, we would have to create some sort of "evaluator", that translates
> these string tokens into a traversal, similar to bytecode parsing and
> execution. This evaluator can use static evaluation of the tokens (like, do
> the tokens evaluate into something meaningful?), can be optimized with
> caching techniques (like preparing traversals) and more importantly, will
> only execute class methods that are whitelisted, i.e., users can't use it
> to execute arbitrary groovy code.
> 
> Best,
> Jorge
> 
> 
> On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi Jorge,
>> 
>>> Instead of supporting a ScriptEngine or enable providers to implement
>> one,
>>> TP4 could be a good opportunity to ditch script engines while continue
>>> supporting gremlin-groovy string literals using language recognition
>>> engines like ANTLR.
>> 
>> Huh…….. Can you explain how you think of using ANTLR vs
>> ScriptEngine.submit(String)
>> 
>>> Language recognition and parsing engines have several benefits over the
>>> current approach, most notably that it's safe to parse text using
>> language
>>> recognition as it results in string tokens, opposed to let users run code
>>> in a sandboxed vm.
>> 
>> How would the ANTLR-parsed text ultimately be executed?
>> 
>> Thanks,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>> <http://rredux.com/>>

Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Marko Rodriguez

Hi Jorge,

> Instead of supporting a ScriptEngine or enable providers to implement one,
> TP4 could be a good opportunity to ditch script engines while continue
> supporting gremlin-groovy string literals using language recognition
> engines like ANTLR.

Huh…….. Can you explain how you think of using ANTLR vs 
ScriptEngine.submit(String)

> Language recognition and parsing engines have several benefits over the
> current approach, most notably that it's safe to parse text using language
> recognition as it results in string tokens, opposed to let users run code
> in a sandboxed vm.

How would the ANTLR-parsed text ultimately be executed?

Thanks,
Marko.

http://rredux.com

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-22 Thread Marko Rodriguez

Hey Josh,

I’m digging what you are saying, but the pictures didn’t come through for me ? 
… Can you provide them again (or if dev@ is filtering them, can you give me 
URLs to them)?

Thanks,
Marko.


> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier  wrote:
> 
> On the subject of "reified joins", maybe be a picture will be worth a few 
> words. As I said in the thread 
> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on 
> property graph standardization, if you think of vertex labels, edge labels, 
> and property keys as types, each with projections to two other types, there 
> is a nice analogy with relations of two columns, and this analogy can be 
> easily extended to hyper-edges. Here is what the schema of the TinkerPop 
> classic graph looks like if you make each type (e.g. Person, Project, knows, 
> name) into a relation:
> 
> 
> 
> I have made the vertex types salmon-colored, the edge types yellow, the 
> property types green, and the data types blue. The "o" and "I" columns 
> represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. 
> property value type of String) of each relation. More than two arrows from a 
> column represent a coproduct, e.g. the out-type of "name" is Person OR 
> Project. Now you can think of out() and in() as joins of two tables on a 
> primary and foreign key.
> 
> We are not limited to "out" and "in", however. Here is the ternary 
> relationship (hyper-edge) from hyper-edge slide 
> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
>  of my Graph Day preso, which has three columns/roles/projections:
> 
> 
> 
> I have drawn Says in light blue to indicate that it is a generalized element; 
> it has projections other than "out" and "in". Now the line between relations 
> and edges begins to blur. E.g. in the following, is PlaceEvent a vertex or a 
> property?
> 
> 
> 
> With the right type system, we can just speak of graph elements, and use 
> "vertex", "edge", "property" when it is convenient. In the relational model, 
> they are relations. If you materialize them in a relational database, they 
> are rows. In any case, you need two basic graph traversal operations:
> project() -- forward traversal of the arrows in the above diagrams. Takes you 
> from an element to a component like in-vertex.
> select() -- reverse traversal of the arrows. Allows you to answer questions 
> like "in which Trips is John Doe the rider?"
> 
> Josh
> 
> 
> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> Hello,
> 
> I agree with everything you say. Here is my question:
> 
> Relational database — join: Table x Table x equality function -> Table
> Graph database — traverser: Vertex x edge label -> Vertex
> 
> I want a single function that does both. The only think was to represent 
> traverser() in terms of join():
> 
> Graph database — traverser: Vertices x Vertex x equality function -> 
> Vertices
> 
> For example, 
> 
> V().out(‘address’)
> 
> ==>
> 
> g.join(V().hasLabel(‘person’).as(‘a’)
>V().hasLabel(‘addresses’).as(‘b’)).
>  by(‘name’).select(?address vertex?)
> 
> That is, join the vertices with themselves based on some predicate to go from 
> vertices to vertices.
> 
> However, I would like instead to transform the relational database join() 
> concept into a traverser() concept. Kuppitz and I were talking the other day 
> about a link() type operator that says: “try and link to this thing in some 
> specified way.” .. ?? The problem we ran into is again, “link it to what?”
> 
> - in graph, the ‘to what’ is hardcoded so you don’t need to specify 
> anything.
> - in rdbms, the ’to what’ is some other specified table.
> 
> So what does the link() operator look like?
> 
> ——
> 
> Some other random thoughts….
> 
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole collection)
> 
> We can make a relational database join on single row (by providing a filter 
> to a particular primary key). This is the same as a table with one row. 
> Likewise, for graph in the join() context above:
> 
> V(1).out(‘address’) 
> 
> ==>
> 
> g.join(V(1).as(‘a’)
>V().hasLabel(‘addresses’).as(‘b’)).
>  by(‘name’).select(?address vertex?)
> 
> More thoughts please….
> 
> Marko.
> 
> http://rre

TP4 and Apache Cassandra

2019-04-22 Thread Marko Rodriguez

Hi,

Apache Cassandra has the following abstract data model.

Keyspace: A List of Tables
Table: A Map of Rows
Row: A Map of Primitive Values.

In Java syntax:

SortedMap>

https://www.ebayinc.com/stories/blogs/tech/cassandra-data-modeling-best-practices-part-1/
 


This structure can be processed with the proposed generalized 
has()/values()/drop()/add() steps.

Here are some CQL queries and then their “Gremlin-esque” representation:

** I’m assuming R(tableName) is shorthand for T(tableName).values() ***

SELECT COUNT(*) FROM users;
g.R(‘users’).count()

SELECT max(points), COUNT(*) FROM users;
g.R(‘users’).union(values(‘points’).max(),count())
// we need some sort of branching-project() in TP4 to do this better

SELECT lastname FROM cycling.cyclist_name LIMIT 5;
g.R(‘cycling.cyclist_name’).values(‘lastname’).limit(5)

SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104);
g.R(‘emp’).has(‘empID’,within(105,107,104)).values(‘first_name’,’last_name’) 
// assuming values() with two or more arguments produces a 
Map (where values().asMap() is valueMap())

SELECT * FROM playlists WHERE venue CONTAINS 'The Fillmore’;
g.R(‘playlists’).has(‘venue’,regex(‘*The Fillmore*’)).values()
// assuming values() with zero arguments produces a Map

SELECT sum(race_points) FROM cycling.cyclist_points WHERE id=e3b19ec4 AND 
race_points > 7;
g.R(‘cycling.cyclist_points’).has(‘id’,’e3b19ec4’).has(‘race_points’, 
gt(7)).values(‘race_points’).sum()

INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (c4b65263, 
'RATTO', 'Rissella')
g.T(‘cycling.cyclist_name’).add().value(‘id’,’c4b65263’).value(‘lastname’,’RATTO’).value(‘firstname’,’Rissella’)
// assuming value() is property() … still looking for general terms 
that are clear.

It seems pretty straightforward to support Cassandra in TP4. Cassandra is just 
nested Collections w/ no “Linkables.”

Take care,
Marko.

http://rredux.com

Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-19 Thread Marko Rodriguez

Hello,

I agree with everything you say. Here is my question:

Relational database — join: Table x Table x equality function -> Table
Graph database — traverser: Vertex x edge label -> Vertex

I want a single function that does both. The only think was to represent 
traverser() in terms of join():

Graph database — traverser: Vertices x Vertex x equality function -> 
Vertices

For example, 

V().out(‘address’)

==>

g.join(V().hasLabel(‘person’).as(‘a’)
   V().hasLabel(‘addresses’).as(‘b’)).
 by(‘name’).select(?address vertex?)

That is, join the vertices with themselves based on some predicate to go from 
vertices to vertices.

However, I would like instead to transform the relational database join() 
concept into a traverser() concept. Kuppitz and I were talking the other day 
about a link() type operator that says: “try and link to this thing in some 
specified way.” .. ?? The problem we ran into is again, “link it to what?”

- in graph, the ‘to what’ is hardcoded so you don’t need to specify 
anything.
- in rdbms, the ’to what’ is some other specified table.

So what does the link() operator look like?

——

Some other random thoughts….

Relational databases join on the table (the whole collection)
Graph databases traverser on the vertex (an element of the whole collection)

We can make a relational database join on single row (by providing a filter to 
a particular primary key). This is the same as a table with one row. Likewise, 
for graph in the join() context above:

V(1).out(‘address’) 

==>

g.join(V(1).as(‘a’)
   V().hasLabel(‘addresses’).as(‘b’)).
 by(‘name’).select(?address vertex?)

More thoughts please….

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 19, 2019, at 4:20 AM, pieter martin  wrote:
> 
> Hi,
> The way I saw it is that the big difference is that graph's have
> reified joins. This is both a blessing and a curse.
> A blessing because its much easier (less text to type, less mistakes,
> clearer semantics...) to traverse an edge than to construct a manual
> join.A curse because there are almost always far more ways to traverse
> a data set than just by the edges some architect might have considered
> when creating the data set. Often the architect is not the domain
> expert and the edges are a hardcoded layout of the dataset, which
> almost certainly won't survive the real world's demands. In graphs, if
> their are no edges then the data is not reachable, except via indexed
> lookups. This is the standard engineering problem of database design,
> but it is important and useful that data can be traversed, joined,
> without having reified edges.
> In Sqlg at least, but I suspect it generalizes, I want to create the
> notion of a "virtual edge". Which in meta data describes the join and
> then the standard to(direction, "virtualEdgeName") will work.
> In a way this is precisely to keep the graphy nature of gremlin, i.e.
> traversing edges, and avoid using the manual join syntax you described.
> CheersPieter
> 
> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>> Hi,
>> *** This is mainly for Kuppitz, but if others care. 
>> Was thinking last night about relational data and Gremlin. The T()
>> step returns all the tables in the withStructure() RDBMS database.
>> Tables are ‘complex values’ so they can't leave the VM (only a simple
>> ‘toString’).
>> Below is a fake Gremlin session. (and these are just ideas…) tables
>> -> a ListLike of rowsrows -> a MapLike of primitives
>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>> g.T(‘people’)==>t[people]gremlin>
>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>> in>
>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>> age:10}==>{name:josh,age:35}gremlin>
>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>> remlin>
>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>> h
>> Makes sense. Nice that values() and has() generally apply to all
>> ListLike and MapLike structures. Also, note how asMap() is the
>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>> can be turned into a primitive form as a data-rich result from the
>> VM.
>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). by(se
>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).

What makes 'graph traversals' and 'relational joins' the same?

2019-04-18 Thread Marko Rodriguez

Hi,

*** This is mainly for Kuppitz, but if others care. 

Was thinking last night about relational data and Gremlin. The T() step returns 
all the tables in the withStructure() RDBMS database. Tables are ‘complex 
values’ so they can't leave the VM (only a simple ‘toString’).

Below is a fake Gremlin session. (and these are just ideas…)
tables -> a ListLike of rows
rows -> a MapLike of primitives

gremlin> g.T()
==>t[people]
==>t[addresses]
gremlin> g.T(‘people’)
==>t[people]
gremlin> g.T(‘people’).values()
==>r[people:1]
==>r[people:2]
==>r[people:3]
gremlin> g.T(‘people’).values().asMap()
==>{name:marko,age:29}
==>{name:kuppitz,age:10}
==>{name:josh,age:35}
gremlin> g.T(‘people’).values().has(‘age’,gt(20))
==>r[people:1]
==>r[people:3]
gremlin> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)
==>marko
==>josh

Makes sense. Nice that values() and has() generally apply to all ListLike and 
MapLike structures. Also, note how asMap() is the valueMap() of TP4, but 
generalizes to anything that is MapLike so it can be turned into a primitive 
form as a data-rich result from the VM.

gremlin> g.T()
==>t[people]
==>t[addresses]
gremlin> g.T(‘addresses’).values().asMap()
==>{name:marko,city:santafe}
==>{name:kuppitz,city:tucson}
==>{name:josh,city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
 by(select(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
   values().asMap()
==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe}
==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}
==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
 by(’name’). // shorthand for equijoin on name column/key
   values().asMap()
==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe}
==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}
==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
 by(’name’)
==>t[people<-name->addresses]  // without asMap(), just the complex value 
‘toString'
gremlin>

And of course, all of this is strategized into a SQL call so its joins aren’t 
necessarily computed using TP4-VM resources.

Anywho — what I hope to realize is the relationship between “links” (graph) and 
“joins” (tables). How can we make (bytecode-wise at least) RDBMS join 
operations and graph traversal operations ‘the same.’?

Singleton: Integer, String, Float, Double, etc.
Collection: List, Map (Vertex, Table, Document)
Linkable: Vertex, Table

Vertices and Tables can be “linked.” Unlike Collections, they don’t maintain a 
“parent/child” relationship with the objects they reference. What does this 
mean……….?

Take care,
Marko.

http://rredux.com

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-16 Thread Marko Rodriguez

delete(): Map.clear() // equivalent to delete(all)
delete(all, ’marko’): Removes all the key/value pairs who value is 
marko
delete(“name"): Map.remove(“name")
delete(regex(“*n”)): Remove all key/value pairs where the key matches 
the regex.
delete(type(string).count(current).gt(3)): Remove all key/value pairs 
where the keys are strings and whose size is > 3. 

Now that the instructions above are generally applicable to collections. We can 
see if complex types can leverage them:

Property graph vertices: 
- g.V(1).has(’marko’) // vertex.values().contains(“name”)
- g.V(1).has(‘name’,’marko’) // 
vertex.get(“name”).equals(“marko”)
- g.V(1).get(‘name’) 
- g.V(1).add(‘name’,’josh’) // put(‘name’,’josh’)
- g.V(1).using(‘y’).is(within(V().using(‘x’))) // checks if 
vertex 1 in graph ‘y' is contained in graph ‘x’.
- g.V(1).delete() // deletes the vertex
- g.V(1).delete(‘name’) // deletes the vertex’s name property
- g.V(1).delete(all, ‘marko’) // deletes the vertex properties 
with a marko value
- g.V(1).delete(all, type(int).is(lt(3))) // deletes the vertex 
properties with values that are integers less than 3
- g.V(1).delete(“age", type(int).is(lt(3))) // deletes the 
vertex age properties with values that are integers less than 3
- g.V(1).out() // vertex.get(“outE”).unfold().get(“inV”) // 
crazy thought

RDF graph vertices:
g.V(uri:1).outE(‘foaf:knows’).has(‘ng’,uri2) // would determine 
if the triple is in the named graph uri:2.
g.V(uri:1).out(‘foaf:name’).id() // would return 
marko^^xsd:string
g.V(uri:1).delete() // DELETE uri:1 ?x ?y && ?x ?y uri:1

Relational table rows:
g.R(‘people’).has(‘name’,’marko’) // should filter out those 
rows that don’t have a name/marko entry.
g.R(‘people’).get(‘name’) // would emit the value of the name 
column of each row.
g.R(‘people’).is(within(map)) // would check if the row’s 
key/value pairs are in the map argument.
g.R(‘people’).count(local) // would return the number of colums 
in the row.
g.R(‘people’).toMap() // would turn the complex row object into 
the primitive TMap. // toMap() replaces valueMap().
g.R(‘people’).join(g.R(‘addresses’)).by(‘ssn’) // join will be 
added to TP4 instruction set
g.R(‘people’).has(‘age’,lt(10)).delete() // this deletes all 
rows from the people table that are < 10 years old
g.R(‘people’).has(‘age’,lt(10)).toMap().delete() // this clears 
the map, leaving the database row unchanged.

Document database:
g.D(‘uuid:1’).has(‘name’,’marko’) // should filter out those 
documents who don’t have a key/value of name/marko.
g.D(‘uuid:1’).get(‘name’) // will emit the value of the name 
key.
g.D(‘uuid:1’).delete() // deletes the document from the 
database.
g.D(‘uuid:1’).delete(‘name’) // delete the name key/value from 
the document (and subsequently, from the database)

For the most part, property graph vertices, relational database rows, and 
documentdb documents are just generalized maps…maps are just generalized lists… 
lists are just generalized strings…and strings are just generalized singletons.

Bye,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 15, 2019, at 1:07 PM, Marko Rodriguez  wrote:
> 
> Hello,
> 
>> I think this does satisfy your requirements, though I don't think I
>> understand all aspects the approach, especially the need for
>> TinkerPop-specific types *for basic scalar values* like booleans, strings,
>> and numbers. Since we are committed to the native data types supported by
>> the JVM.
> 
> TinkerPop4 will have VM implementations on various language-platforms. For 
> sure, Apache’s distribution will have a JVM and .NET implementation. The 
> purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.) types 
> is that we know its the same type across all VMs.
> 
>> To my mind, your approach is headed in the direction of a
>> TinkerPop-specific notion of a *type*, in general, which captures the
>> structure and constraints of a logical data type
>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
>>  
>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42>>,
>> and which can be used for query planning and optimization. These include
>> both scalar types as

Re: [DISCUSS] The Two Protocols of TP4

2019-04-16 Thread Marko Rodriguez

Hi,


> hmm - it sounds like supporting the vm protocol requires a session. like
> each "g" from a client needs to hold state on the server between requests.
> or am i thinking about it too concretely and this protocol is more of an
> abstraction of what's happening?

No, you are right. Its pretty analogous to TP3. The server holds a bunch of “g” 
instances. “g” instances are thread-safe and immutable. Submitted bytecode can 
have a source instruction that references a cached “g” on the server (e.g. via 
a UUID — though this is up to the Machine implementation). If it does, then 
that cached “g” is used to spawn the traversal via the operation instructions. 
Also, this is not just for “over the wire” communication. Its not specific to 
server behavior. The Machine interface can be a LocalMachine and still you have 
this notion of pre-compiled source instructions that were machine.registered().


https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41
 
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41>

Finally, if you want to build a Machine that doesn’t pre-compile the source 
instructions, well, this is what your Machine implementation looks like:


https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java
 
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java>

Marko.

> 
> 
> On Tue, Apr 16, 2019 at 1:58 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>>> i get the "submit" part but could you explain the "register" and
>>> "unregister" parts (referenced in another post somewhere perhaps)?
>> 
>> These three methods are from the Machine API.
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
>>  
>> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
>>  
>> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
>>> 
>> 
>> Bytecode is composed of two sets of instructions.
>>- source instructions
>>- operation instructions
>> 
>> source instructions are withProcessor(), withStructure(), withStrategy(),
>> etc.
>> operation instructions are out(), in(), count(), where(), etc.
>> 
>> The source instructions are expensive to execute. Why? — when you evaluate
>> a withStructure(), you are creating a connection to the database. When you
>> evaluate a withStrategy(), you are sorting strategies. It is for this
>> reason that we have the concept of a TraversalSource in TP3 that does all
>> that “setup stuff” once and only once for each g. The reason we tell people
>> to not do graph.traversal().V(), but instead g = graph.traversal(). Once
>> you have ‘g’, you can then spawn as many traversals as you want off that it
>> without incurring the cost of re-processing the source instructions again.
>> 
>> In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t
>> know about databases, processors, strategy compilation, etc. Thus, when you
>> Machine.register(Bytecode) you are sending over the source instructions,
>> having them processed at the TP4 VM and then all subsequent submits() with
>> the same source instruction header will use the “pre-compiled” source
>> bytecode cached in the TP4 VM. g.close() basically does
>> Machine.unregister().
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
>>  
>> <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112>
>>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
>>  
>> <https://gith

Re: [DISCUSS] The Two Protocols of TP4

2019-04-16 Thread Marko Rodriguez

Hi,

> i was thinking that an extensible bytecode model would be the solution for
> these kinds of things. without the scriptengine anymore (stoked to see that
> go away) graph providers with schema languages and other admin functions
> will need something to replace that. what's neat about that option is that
> such features would no longer need to be bound to just the JVM. Python
> users could use the JanusGraph clean utility to drop a database or use
> javscript to create a graph in DSE Graph. pretty cool.

Exactly!

However, lets say some provider decides they want to support ScriptEngine.

[[submit, [ex:script, gremlin-groovy, g.V.out.name]]] 

As you note, extensible bytecode will make it so that seemingly disparate 
operations all use the same “bytecode protocol” pattern. And you just made me 
realize the benefit of that for all the language drivers. Not only is our 
serialization protocol going to be dead simple (always primitives), but also 
our communication protocol (always bytecode->traversers) as well. 
Gremlin-Brainfuck might just be a reality! 
[https://en.wikipedia.org/wiki/Brainfuck 
]

- processor execution
- database operations
- server status inquiry
- HDFS file system management
- ...

For the last one:

[[submit, [hadoop:hdfs, head -10 /data.txt]]]

That returns Iterator>.

Its as if Strategies are like “server plugins.” If you make namespaced 
instructions with a corresponding Strategy that can handle those instructions, 
then you are basically communicating with a “plugin” server-side RPC-style.

Skys the limit,
Marko.

http://rredux.com

Re: [DISCUSS] The Two Protocols of TP4

2019-04-16 Thread Marko Rodriguez

Hi,

> i get the "submit" part but could you explain the "register" and
> "unregister" parts (referenced in another post somewhere perhaps)?

These three methods are from the Machine API.

https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>

Bytecode is composed of two sets of instructions.
- source instructions
- operation instructions

source instructions are withProcessor(), withStructure(), withStrategy(), etc.
operation instructions are out(), in(), count(), where(), etc.

The source instructions are expensive to execute. Why? — when you evaluate a 
withStructure(), you are creating a connection to the database. When you 
evaluate a withStrategy(), you are sorting strategies. It is for this reason 
that we have the concept of a TraversalSource in TP3 that does all that “setup 
stuff” once and only once for each g. The reason we tell people to not do 
graph.traversal().V(), but instead g = graph.traversal(). Once you have ‘g’, 
you can then spawn as many traversals as you want off that it without incurring 
the cost of re-processing the source instructions again.

In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t know 
about databases, processors, strategy compilation, etc. Thus, when you 
Machine.register(Bytecode) you are sending over the source instructions, having 
them processed at the TP4 VM and then all subsequent submits() with the same 
source instruction header will use the “pre-compiled” source bytecode cached in 
the TP4 VM. g.close() basically does Machine.unregister().

https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112

<https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112>

https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116

<https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116>

In short, we have just offloaded the TP3 TraversalSource work to TP4 Machine.

HTH,
Marko.

P.S. I don’t like the term “source instructions.” I’m thinking of calling them 
“meta instructions” or “setup instructions” or “staging instructions’ … ?

> 
> regarding this:
> 
>> just like processing instructions are extended via namespaced
> instructions and strategies, so are server instructions
> 
> i was thinking that an extensible bytecode model would be the solution for
> these kinds of things. without the scriptengine anymore (stoked to see that
> go away) graph providers with schema languages and other admin functions
> will need something to replace that. what's neat about that option is that
> such features would no longer need to be bound to just the JVM. Python
> users could use the JanusGraph clean utility to drop a database or use
> javscript to create a graph in DSE Graph. pretty cool.
> 
> 
> On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I believe there will only be two protocols in TP4.
>> 
>>1. The VM communication protocol. (Rexster)
>>2. The data serialization protocol. (Frames)
>> 
>> [VM COMMUNICATION PROTOCOL]
>> 
>>1. Register bytecode —returns—> bytecode.
>>2. Submit bytecode —returns—> iterator of traversers.
>>3. Unregister bytecode source —returns—> void
>> 
>> Here is a trippy idea. These operations are simply bytecode.
>> 
>>1. [[register,[bytecode]]] —returns—> single traverser referencing
>> bytecode.
>>2. [[submit, [bytecode]]] —returns—> many traversers referencing
>> primitives.
>>3. [[unregister, [bytecode]]] —returns —> no traversers.
>> 
>> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY THING
>> RETURNED IS ZERO OR MORE TRAVERSERS!
>> 
>> Now, think about JanusGraph. It has database operations such as create
>> index, create schema, drop graph, etc. These are just custom instructions
>> in the bytecode of submit.
>> 
>>[[submit, [[jg:createIndex,people-idx,person]]]
>> 
>> A JaunusGraph strategy will know what to do with that instruction and a
>> traverser can be returned. Traverser.of(“SUCCESS”). And there you have,
>> just l

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-15 Thread Marko Rodriguez

-property graph (named graph property?) with 
vertex-based literals.
… ?.

Like Graph.Features in TP3.

> IMO it's OK if URIs, in an RDF context, become Strings in a TP context. You
> can think of URI as a constraint on String, which should be enforced at the
> appropriate time, but does not require a vendor-specific class. Can you
> concatenate two URIs? Sure... just concatenate the Strings, but also be
> aware that the result is not a URI.

Cool.

Thanks for reading and providing good ideas.

Marko.

http://rredux.com



> On Mon, Apr 15, 2019 at 5:06 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I have a consolidated approach to handling data structures in TP4. I would
>> appreciate any feedback you many have.
>> 
>>1. Every object processed by TinkerPop has a TinkerPop-specific
>> type.
>>- TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
>> TList, …
>>- BENEFIT #1: A universal type system will protect us from
>> language platform peculiarities (e.g. Python long vs Java long).
>>- BENEFIT #2: The serialization format is constrained and
>> consistent across all languages platforms. (no more coming across a
>> MySpecialClass).
>>2. All primitive T-type data can be directly access via get().
>>- TBoolean.get() -> java.lang.Boolean | System.Boolean |
>> ...
>>- TLong.get() -> java.lang.Long | System.Int64 | ...
>>- TString.get() -> java.lang.String | System.String | …
>>- TList.get() -> java.lang.ArrayList | .. // can only
>> contain primitives
>>- TMap.get() -> java.lang.LinkedHashMap | .. // can only
>> contain primitives
>>- ...
>>3. All complex T-types have no methods! (except those afforded by
>> Object)
>>- TVertex: no accessible methods.
>>- TEdge: no accessible methods.
>>- TRow: no accessible methods.
>>- TDocument: no accessible methods.
>>- TDocumentArray: no accessible methods. // a document
>> list field that can contain complex objects
>>- ...
>> 
>> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
>> same query.
>>- e.g., read from JanusGraph and write to Neo4j.
>> REQUIREMENT #2: We need to make sure complex objects can not be queried
>> client-side for properties/edges/etc. data.
>>- e.g., vertices are universally assumed to be “detached."
>> REQUIREMENT #3: We no longer want to maintain a structure test suite.
>> Operational semantics should be verified via Bytecode ->
>> Processor/Structure.
>>- i.e., the only way to read/write vertices is via
>> Bytecode as complex T-types don’t have APIs.
>> REQUIREMENT #4: We should support other database data structures besides
>> graph.
>>- e.g., reading from MySQL and writing to JanusGraph.
>> 
>> ———
>> 
>> Assume the following TraversalSource:
>> 
>> g.withStructure(JanusGraphStructure.class, config1).
>>  withStructure(Neo4jStructure.class, conflg2)
>> 
>> Now, assume the following traversal fragment:
>> 
>>outE(’knows’).has(’stars’,5).inV()
>> 
>> This would initially be written to Bytecode as:
>> 
>>[[outE,knows],[has,stars,5],[inV]]
>> 
>> A decoration strategy realizes that there are two structures registered in
>> the Bytecode source instructions and would rewrite the above as:
>> 
>>[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>> 
>> A JanusGraph strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>> 
>> A Neo4j strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> A finalization strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> Now, when a TVertex gets to this CFunction, it will check its type, if its
>> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
>> the type is Neo4jVertex, it g

[DISCUSS] The Two Protocols of TP4

2019-04-15 Thread Marko Rodriguez

Hello,

I believe there will only be two protocols in TP4.

1. The VM communication protocol. (Rexster)
2. The data serialization protocol. (Frames)

[VM COMMUNICATION PROTOCOL]

1. Register bytecode —returns—> bytecode.
2. Submit bytecode —returns—> iterator of traversers.
3. Unregister bytecode source —returns—> void

Here is a trippy idea. These operations are simply bytecode.

1. [[register,[bytecode]]] —returns—> single traverser referencing 
bytecode.
2. [[submit, [bytecode]]] —returns—> many traversers referencing 
primitives.
3. [[unregister, [bytecode]]] —returns —> no traversers.

Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY THING 
RETURNED IS ZERO OR MORE TRAVERSERS!

Now, think about JanusGraph. It has database operations such as create index, 
create schema, drop graph, etc. These are just custom instructions in the 
bytecode of submit.

[[submit, [[jg:createIndex,people-idx,person]]]

A JaunusGraph strategy will know what to do with that instruction and a 
traverser can be returned. Traverser.of(“SUCCESS”). And there you have, just 
like processing instructions are extended via namespaced instructions and 
strategies, so are server instructions. Providers have an extensible framework 
to support all their custom operations because, in the end, its just bytecode, 
strategies, and resultant traversers! (everything is the same).
 
Next, in order to send bytecode and get back traversers ‘over the wire', there 
needs to be a serialization specification.

[DATA SERIALIZATION PROTOCOL]

1. I don’t know much about GraphBinary, but I believe its this without 
complex types.
- Why?
- bytecode is primitive.
- traversers are primitive (as they can’t reference 
complex types — see other [DISCUSS] from today).


Thoughts?,
Marko.

http://rredux.com

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-15 Thread Marko Rodriguez

Hello Stephen,

> I'd also wonder about how we treat subgraph() and tree()? could those be a
> List somehow??

Yes, Tree is List. Subgraph….h….shooting from the hip: you don’t get 
back a graph, its stored in:

g.withProcessor(TinkerGraphStructure.class, config1)

That is, the subgraph is written to one of the registered structures. You can 
then query it like any other registered structure. Remember, in TP4, we will 
support an arbitrary number of structures associated with a Bytecode source.

> isn't a URI a complex type? that list is expected to grow? maybe all
> complex types have simple type representations?

The problem with every complex type having a simple type representation is that 
the serializer will have to know about complex types (as objects). This is just 
more code for Python, JavaScript, Java, etc. to maintain. If the serialization 
format is ONLY primitives, and primitives come from a static set of ~10 types, 
then writing, testing, and maintaining serializers in other languages will be 
trivial.

Bytecode in [a nested list of primitives]
Traversers out [a collection of coefficient wrapped primitives]

Everything communicated over the wire is primitive! Basic. (TTraverser will 
have to be primitive, where get() returns a coefficient [bulk] and primitive 
[object] pair).

> sorry, if some of these questions/ideas are a bit half-cocked, but i read
> this really fast and won't be at my laptop for the rest of the day and
> wanted to get some thoughts out. i'm really really interested in seeing
> this aspect of TP done "right"….

No worries. Thanks for replying.

Some random ideas I was having.

- TXML: Assume an XML database. out() would be the children tags. 
value() would be the tag attribute value. label() would be the tag type. In 
other words, there is a clean mapping from the instructions to XML.
- TMatrix: Assume a database of nxm matricies. math() instruction will 
be augmented to support matrix multiplication. A matrix is a table with rows 
and columns. We would need some nice instructions for that.
- TJPEG: Assume a database of graphics. Does our instruction set have 
instructions that are useful for manipulating images? Probably need row/column 
type instructions like TMatrix.
- TObject: Assume an object database. value() are primitive fields. 
out() is object fields. id() is unique object identifier. label() is object 
class. has() is a primitive field filter.
- TTimeSeries: ? I don’t know anything about time series databases, but 
the question remains…do our instructions make sense for this data structure?
- https://en.wikipedia.org/wiki/List_of_data_structures 
<https://en.wikipedia.org/wiki/List_of_data_structures>

The point being. I’m trying to think of odd ball data structures and then 
trying to see if the TP4 instruction set is sufficiently general to encompass 
operations used by those structures.

The beautiful thing is that providers can create as many complex types as they 
want. These types are always contained with the TP4-VM and thus require no 
changes to the serialization format and respective objects in the deserializing 
language. Imagine, some XML database out there is using the TP4-VM, with the 
XPath language compiling to TP4 bytecode, and is processing their XML documents 
in real-time (Pipes/Rx), near-time (Flink/Akka), or batch-time (Spark/Hadoop). 
The TP4-VM has a life beyond graph! What a wonderful asset to the entire space 
of data processing!

…now think of the RDF community using the TP4-VM. SPARQL will be W3C-compilant 
and can execute in real-time, near-time, batch-time, etc. What a useful 
technology to adopt for your RDF triple-store. I could see Stardog using TP4 
for their batch processing. I could see Jena or OpenRDF importing TP4 to 
provide different SPARQL execution engines to their triple-store providers.

The TP4 virtual machine may just turn out to be a technological masterpiece.

Marko.

http://rredux.com

> 
> On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I have a consolidated approach to handling data structures in TP4. I would
>> appreciate any feedback you many have.
>> 
>>1. Every object processed by TinkerPop has a TinkerPop-specific
>> type.
>>- TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
>> TList, …
>>- BENEFIT #1: A universal type system will protect us from
>> language platform peculiarities (e.g. Python long vs Java long).
>>- BENEFIT #2: The serialization format is constrained and
>> consistent across all languages platforms. (no more coming across a
>> MySpecialClass).
>>2. All primitive T-type data can be directly access via ge

[DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

2019-04-15 Thread Marko Rodriguez

Hello,

I have a consolidated approach to handling data structures in TP4. I would 
appreciate any feedback you many have.

1. Every object processed by TinkerPop has a TinkerPop-specific type.
- TLong, TInteger, TString, TMap, TVertex, TEdge, TPath, TList, 
…
- BENEFIT #1: A universal type system will protect us from 
language platform peculiarities (e.g. Python long vs Java long).
- BENEFIT #2: The serialization format is constrained and 
consistent across all languages platforms. (no more coming across a 
MySpecialClass).
2. All primitive T-type data can be directly access via get().
- TBoolean.get() -> java.lang.Boolean | System.Boolean | ...
- TLong.get() -> java.lang.Long | System.Int64 | ...
- TString.get() -> java.lang.String | System.String | …
- TList.get() -> java.lang.ArrayList | .. // can only contain 
primitives
- TMap.get() -> java.lang.LinkedHashMap | .. // can only 
contain primitives
- ...
3. All complex T-types have no methods! (except those afforded by 
Object)
- TVertex: no accessible methods.
- TEdge: no accessible methods.
- TRow: no accessible methods.
- TDocument: no accessible methods.
- TDocumentArray: no accessible methods. // a document list 
field that can contain complex objects
- ...

REQUIREMENT #1: We need to be able to support multiple graphdbs in the same 
query.
- e.g., read from JanusGraph and write to Neo4j.
REQUIREMENT #2: We need to make sure complex objects can not be queried 
client-side for properties/edges/etc. data.
- e.g., vertices are universally assumed to be “detached."
REQUIREMENT #3: We no longer want to maintain a structure test suite. 
Operational semantics should be verified via Bytecode -> Processor/Structure.
- i.e., the only way to read/write vertices is via Bytecode as 
complex T-types don’t have APIs.
REQUIREMENT #4: We should support other database data structures besides graph.
- e.g., reading from MySQL and writing to JanusGraph.

———

Assume the following TraversalSource:

g.withStructure(JanusGraphStructure.class, config1).
  withStructure(Neo4jStructure.class, conflg2)

Now, assume the following traversal fragment:

outE(’knows’).has(’stars’,5).inV()

 This would initially be written to Bytecode as:

[[outE,knows],[has,stars,5],[inV]]

A decoration strategy realizes that there are two structures registered in the 
Bytecode source instructions and would rewrite the above as:

[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]

A JanusGraph strategy would rewrite this as:


[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]

A Neo4j strategy would rewrite this as:


[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]

A finalization strategy would rewrite this as:


[choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]

Now, when a TVertex gets to this CFunction, it will check its type, if its a 
JanusVertex, it goes down the JanusGraph-specific instruction branch. If the 
type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.

REQUIREMENT #1 SOLVED

The last instruction of the root bytecode can not return a complex object. If 
so, an exception is thrown. g.V() is illegal. g.V().id() is legal. Complex 
objects do not exist outside the TP4-VM. Only primitives can leave the 
VM-client barrier. If you want vertex property data (e.g.), you have to access 
it and return it within the traversal — e.g., g.V().valueMap().
BENEFIT #1: Language variant implementations are simple. Just 
primitives.
BENEFIT #2: The serialization specification is simple. Just primitives. 
(also, note that Bytecode is just a TList of primitives! — though TBytecode 
will exist.)
BENEFIT #3: The concept of a “DetachedVertex” is universally assumed.

REQUIREMENT #2 SOLVED

It is completely up to the structure provider to use structure-specific 
instructions for dealing with their particular TVertex. They will have to 
provide CFunction implementations for out, in, both, has, outE, inE, bothE, 
drop, property, value, id, label … (seems like a lot, but out/in/both could be 
one parameterized CFunction).
BENEFIT #1: No more structure/ API and structure/ test suite.
BENEFIT #2: The structure provider has full control of where the vertex 
data is stored (cached in memory or fetch from the db or a cut vertex or …). No 
assumptions are

TinkerPop4 Status Report #3

2019-04-11 Thread Marko Rodriguez

Hello,

I spent most of the last 1.5 weeks working on RxJavaProcessor (ReactiveX —
http://reactivex.io/ ), where 3 of those days were spent
in a nasty code hell trying to figure out how to do cyclic stream topologies
for repeat(). I’ve never read so much of someone else’s code in my life — I’ve
come to know the inner workings of RxJava quite well.

Without further ado, here is what the tp4/ branch is looking like these days:

Machine machine = LocalMachine.open()
TraversalSource g =
Gremlin.traversal(machine).withProcessor(RxJavaProcessor.class)

From g, you can spawn single-threaded Rx Flowables. Here is the SerialRxJava
processor code:

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/SerialRxJava.java

- So simple. 130 lines of code.

If you do:

TraversalSource g =
Gremlin.traversal(machine).withProcessor(RxJavaProcessor.class,
Map.of(“rx.threadPool.size”,10))

You can spawn multi-threaded Rx ParallelFlowables. Here is the ParallelRxJava
processor code:

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/ParallelRxJava.java

- So simple. 150 lines of code.

I now am completely confident that the TP4 CFunction intermediate
representation (Compilation.class) is sufficient to support all types of
execution engines.

- Pipes: single-threaded, pull-based.
- Beam: distributed, push-based.
- RxJava: multi-threaded, push-based

Implementing BeamProcessor was important to know that multi-machine execution
would come naturally and RxJava was important to know that multi-threading
would come naturally. I know that Akka will work just fine as it is both
multi-threaded and distributed. Therefore, I believe we have converted on the
chain of representational mappings that we will use in the TP4 VM.

Language ==> Bytecode ==> CFunction Intermediate Representation ==>
Processor-Specific Execution Plan

In TP3, we do:

Language ==> Bytecode ==> Pipes ==> Processor-Specific Execution Plan

…we foolishly embedded one execution engine within another and this has been a
cause of various pains that are now rectified in TP4.

——

Ted Wilmes did some preliminary benchmarking of Pipes vs. SerialRxJava vs.
ParallelRxJava. Here are his results for two traversals:

RxSerialTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
sample 6.988 ms/op
RxParallelTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
sample 11.633 ms/op
PipesTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
sample 6.627 ms/op

RxSerialTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
sample 3.592 ms/op
RxParallelTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
sample 7.897 ms/op
PipesTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
sample 3.887 ms/op

We should expect ParallelRxJava to shine when interacting with a data source
where lots of time is wasted on I/O. I’m hoping that ParallelRxJava will be
able to transform borderline real-time queries in TP3 into genuine real-time
queries in TP4.

——

One of the outstanding problems I’m having (and I have given up on for now) is
that I can’t figure out how to do cyclic stream topologies in ReactiveX.
Instead, I have to do the repetiion-based implementation of repeat() as defined
in Stream Ring Theory.

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/SerialRxJava.java#L108-L127

What I’m doing is creating MAX_REPETITIONS amount of “unrolled loops” with exit
streams for emit() and until() breaks. I then merge all those exit streams into
the main outgoing stream. What I would like to do is be able to send a
traverser back to a previous Operator. I’ve gone through about 5 different
implementations, but they each have their problems. If anyone is versed in
ReactiveX and can tell me the best way to do looping, I would appreciate it.
And yes, I’ve read many a

Re: [TP4 Benchmarking] Pipes vs. Single-Threaded RxJava vs. Multi-Threaded RxJava

2019-04-10 Thread Marko Rodriguez

Hello,

> I hadn't put together that each compilation could have its own processor.
> Very cool.

Yea. This is an important aspect of TP4. We do something similar in TP3, it is 
just not so overt — and its not configurable.

In TP3, for example, SparkGraphComputer uses Spark to do “global traversals” 
and uses Pipes to do “local traversals.”

global: the root bytecode, branch bytecode, repeat bytecode.
local: single input nested bytecode.

Where this TP3-distinction falls apart is in local traversals that have complex 
global patterns inside them. For example:

g.V().where(…)

In TP3, ‘…’ is always considered a local traversal. This makes sense for 

g.V().where(out(‘knows’))

However, imagine this situation:

g.V().where(match(repeat(union)))

This is where SparkGraphComputer will throw the typical VerificationStrategy 
exception of “can’t process beyond the adjacent vertex.” Why does it do that? 
Because it relies on Pipes to do the processing and Pipes can only see the data 
within SparkGraphComputer’s StarVertex. Bummer.

*** Many unfortunate complications were introduced by this seemingly innocuous 
interface:

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/TraversalParent.java
 
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/TraversalParent.java>

——

In TP4, we don’t make the global/local-traversal distinction as it is 
completely up to the processor provider to decide how they want to execute each 
Bytecode chunk. For example, a batch analytical processor strategy can look at 
where(…) and decide “eh, I’ll just use Pipes” or it can see that its 
“match(repeat(union))” and decide to continue to execute using its batch self. 
That explanation should leave you wondering how Spark could do that given the 
non-random access limitation of most batch processors. And if not, then you 
might be wondering — why don’t we just do that for TP3? I’ll leave the answer 
to a future post on ‘scoped traversers’ (which is big deal concept to come).

> Thanks for the benchmarking numbers. I had a tp3 inspired
> JMH-based module in progress when I saw your results so I added the two
> test traversals in. It doesn't do any parameterization of input sizes at
> this point but if you're interested in checking it out I pushed it to the
> tp4-jmh branch:
> https://github.com/apache/tinkerpop/blob/tp4-jmh/java/machine/machine-perf-test/src/main/java/org/apache/tinkerpop/benchmark/util/AbstractTraversalBenchmarkBase.java
>  
> <https://github.com/apache/tinkerpop/blob/tp4-jmh/java/machine/machine-perf-test/src/main/java/org/apache/tinkerpop/benchmark/util/AbstractTraversalBenchmarkBase.java>
> .

Awesome. Stephen is thinking through how we will do language agnostic testing 
in TP4, where our JVM-based VM will be one of many. I think when he gets that 
sorted out, we should figure out how to attach your benchmarking work to that 
same package so we can:

1. Verify the operational semantics of any TP4 VM (and underlying 
database + processor)
2. Benchmark the execution efficiency of any TP4 VM (and underlying 
database + processor)

It would be great to provide providers information such as:

TP4 .NET VM w/ LINQ+CosmosDB is 100% TP4 compliant.
The LINQ processor is in the top 90%-tile of single-machine, 
multi-threaded processors.
The CosmosDB structure is in the top 80%-tile of sharded 
transactional structures.


> RxSerialTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
>  sample  6.988  ms/op
> RxParallelTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
>  sample 11.633  ms/op
> PipesTraversalBenchmark.g_inject_unfold_incr_incr_incr_incr:g_inject_unfold_incr_incr_incr_incr·p0.50
>  sample  6.627  ms/op
> 
> RxSerialTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
>sample 3.592  ms/op
> RxParallelTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
>sample  7.897  ms/op
> PipesTraversalBenchmark.g_inject_unfold_repeat_times:g_inject_unfold_repeat_times·p0.50
>sample  3.887  ms/op

Pretty crazy how fast SerialRxJava is compared to Pipes — especially with my 
branch/repeat implementation being pretty freakin’ ghetto. I banged my head 
against the wall all yesterday morning trying to figure out how to do a loop in 
Rx. :| … have some new ideas this morning.

Thanks for the effort.

Take care,
Marko.

http://rredux.com


> On Mon, Apr 8, 2019 at 12:16 PM Marko Rodriguez  <mailto:okramma

[TP4 Benchmarking] Pipes vs. Single-Threaded RxJava vs. Multi-Threaded RxJava

2019-04-08 Thread Marko Rodriguez

Hi,

I implemented Multi-threaded RxJava this morning — its called ParallelRxJava. 
Single-threaded is called SerialRxJava.

The RxJavaProcessor factory will generate either depending on the Map.of() 
configuration:


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJavaProcessor.java#L49-L53
 


 You can see the source code for each RxJava implementation here:


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/SerialRxJava.java
 


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/ParallelRxJava.java
 


Given Ted’s comments last week, I decided to create a micro-benchmark @Test to 
compare SerialRxJava, ParallelRxJava, and Pipes.


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/test/java/org/apache/tinkerpop/machine/processor/rxjava/RxJavaBenchmark.java
 


The results are below. My notes on the results are as follows:

* ParallelRxJava used 7 threads.
* Times averaged over 30 runs (minus the first 2 runs — JVM warmup).
* SerialRxJava and Pipes are very close on non-branching traversals 
with a small input set.
* ParallelRxJava never beats Pipes, but does beat SerialRxJava on large 
input sets.
* My implementation of repeat() in RxJava is not good. I need to think 
of a better way to implement recursion (and branching in general).
* ParallelRxJava will probably shine when the query has a lot of 
database operations (e.g. out(), inE(), addV(), etc.).
* There is a lot of intelligence to to add to ParallelRxJava — e.g., 
** If the nested traversal is simple (only a few steps), don’t 
thread. For example, there is no need to thread the is() of 
choose(is(gt(3)),….).
** One of the beautiful things about TP4 is that each 
Compilation (nest) can have a different processor.
*** Thus, parallel for long sequences and serial for 
short sequences…or, Pipes for short sequences! (Beam uses Pipes for short 
sequences).

———

g.inject(input).unfold().incr().incr().incr().incr().iterate()

Input size: 10
Average time [seri]: 0.4
Average time [para]: 2.4
Average time [pipe]: 0.5

Input size: 100
Average time [seri]: 0.9664
Average time [para]: 4.335
Average time [pipe]: 0.8

Input size: 1000
Average time [seri]: 2.533
Average time [para]: 4.165
Average time [pipe]: 1.7

Input size: 1
Average time [seri]: 12.1
Average time [para]: 10.63
Average time [pipe]: 8.1

Input size: 10
Average time [seri]: 103.96667
Average time [para]: 95.06
Average time [pipe]: 59.94

——
——

g.inject(input).unfold().repeat(incr()).times(4).iterate()

Input size: 10
Average time [seri]: 1.334
Average time [para]: 4.8
Average time [pipe]: 0.833

Input size: 100
Average time [seri]: 2.9
Average time [para]: 8.87
Average time [pipe]: 1.033

Input size: 1000
Average time [seri]: 15.7
Average time [para]: 22.08
Average time [pipe]: 3.4

Input size: 1
Average time [seri]: 50.4
Average time [para]: 35.8
Average time [pipe]: 8.57

Input size: 10
Average time [seri]: 387.06668
Average time [para]: 271.2
Average time [pipe]: 60.56

——
——

One of the reasons for implementing a multi-threaded single machine processor 
was to see how threading would work with the intermediate CFunction 
representation. At first, I thought I was going to have to make CFunctions 
thread safe (as they can nest and can contain Compilations), but then I 
realized we provide a clone() method. Thus, its up to the processor to clone 
CFunctions accordingly across threads (“rails” in RxJava). For ParallelRxJava, 
its as simple as using ThreadLocal. Here is the MapFlow ReactiveX Function:


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/MapFlow.java
 


After using ThreadLocal in the map, flatmap, filter,

Re: A Question Regarding TP4 Processor Classifications

2019-04-04 Thread Marko Rodriguez

Hi,

> The Pipes implementation will no doubt be
> faster for execution of a single traversal but I'm wondering if the
> Flowable RxJava would beat the Pipes processor by providing higher
> throughput in a scenario when many, many users are executing queries
> concurrently.

Why would you think that? When a query comes in, you can just new 
Thread(pipes).run() and thread it. ? 

*** My knowledge of server architectures is pretty weak, but….

I think the model for dealing with many concurrent users will be up to the 
server implementation. If you are using LocalMachine, then its one query at a 
time. But if you are using RemoteMachine to, lets say, the TP4 MachineServer, 
traversals are executed in parallel. And, for most providers who will have 
their own server implementation (that can be communicated with by 
RemoteMachine), they will handle it as they see fit (e.g. doing complex stuff 
like routing queries to different machines in the cluster to load balance or 
whatever).

One thing I’m trying to stress in TP4 is “no more complex server 
infrastructure.” You can see our MachineServer implementation. Its ~100 lines 
of code and does parallel execution of queries. Its pretty brain dead simple, 
but with some modern thread/server techniques you all might have, we can make 
it a solid little server that meets most providers’ needs — else, they just 
roll those requirements into their server system.

https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote/MachineServer.java

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote/MachineServer.java>

> Regardless, providers having multiple processor options is a
> good thing and I don't mean to suggest any premature optimization. At this
> point, I think I'll put together some simple benchmarks just out of
> curiosity but will report back.

Yea, I’m trying to cover all the “semantic bases:”

Actor model: Akka
Map/Reduce model: Spark
Push-based model: RxJava
Pull-based model: Pipes

If the Compilation/Processor/Bytecode/Traverser/etc. classes are sufficiently 
abstract to naturally enable all these different execution models, then we are 
happy. So far so good…

Marko.

http://rredux.com

> 
> --Ted
> 
> On Thu, Apr 4, 2019 at 12:36 PM Marko Rodriguez 
> wrote:
> 
>> Hi,
>> 
>> This is a pretty neat explanation of why Pipes will be faster than RxJava
>> single-threaded.
>> 
>> The map-operator for Pipes:
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
>>> 
>> 
>> The map-operator for RxJava:
>> 
>> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
>> <
>> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
>>> 
>> 
>> RxJava has a lot of overhead. Pipes is as bare bones as you can get.
>> 
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/>
>> 
>> 
>> 
>> 
>>> On Apr 4, 2019, at 11:07 AM, Marko Rodriguez 
>> wrote:
>>> 
>>> Hello,
>>> 
>>> Thank you for the response.
>>> 
>>>> Excellent progress on the the RxJava processor. I was wondering if
>>>> categories 1 and 2 can be combined where Pipes becomes the Flowable
>> version
>>>> of the RxJava processor?
>>> 
>>> I don’t quite understand your questions. Are you saying:
>>> 
>>>  Flowable.of().flatMap(pipesProcessor)
>>> 
>>> or are you saying:
>>> 
>>>  “Get rid of Pipes all together and just use single-threaded RxJava
>> instead."
>>> 
>>> For the first, I don’t see the benefit of that. For the second, Pipes4
>> is really fast! — much faster than Pipes3. (more on this next)
>>> 
>>> 
>>>> In this case, though single threaded, we'd still
>>>> get the benefit of asynchronous execution of traversal steps versus
>>>> blocking execution on thread pools like the current TP3 model.
>>> 
>>> Again, I’m confused. Apologies. I believe that perhaps you think that
>> the Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If
>> so, note that this is not the case. The concept of Steps (chained
>>

Re: A Question Regarding TP4 Processor Classifications

2019-04-04 Thread Marko Rodriguez

Hi,

This is a pretty neat explanation of why Pipes will be faster than RxJava 
single-threaded.

The map-operator for Pipes:

https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
 
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java>

The map-operator for RxJava:

https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
 
<https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java>

RxJava has a lot of overhead. Pipes is as bare bones as you can get.

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 4, 2019, at 11:07 AM, Marko Rodriguez  wrote:
> 
> Hello,
> 
> Thank you for the response.
> 
>> Excellent progress on the the RxJava processor. I was wondering if
>> categories 1 and 2 can be combined where Pipes becomes the Flowable version
>> of the RxJava processor?
> 
> I don’t quite understand your questions. Are you saying:
> 
>   Flowable.of().flatMap(pipesProcessor)
> 
> or are you saying:
> 
>   “Get rid of Pipes all together and just use single-threaded RxJava 
> instead."
> 
> For the first, I don’t see the benefit of that. For the second, Pipes4 is 
> really fast! — much faster than Pipes3. (more on this next)
> 
> 
>> In this case, though single threaded, we'd still
>> get the benefit of asynchronous execution of traversal steps versus
>> blocking execution on thread pools like the current TP3 model.
> 
> Again, I’m confused. Apologies. I believe that perhaps you think that the 
> Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If so, 
> note that this is not the case. The concept of Steps (chained iterators) is 
> completely within the pipes/ package. The machine-core/ package compiles 
> Bytecode to a nested List of stateless, unconnected functions (called a 
> Compilation). It is this intermediate representation that ultimately is used 
> by Pipes, RxJava, and Beam to create their respective execution plan (where 
> Pipes does the whole chained iterator step thing).
> 
> Compilation: 
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
>  
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43>
> 
>   Pipes: 
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
>  
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47>
>   Beam: 
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
>  
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132>
>   RxJava: 
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
>  
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103>
> 
>> I would
>> imagine Pipes would beat the Flowable performance on a single traversal
>> side-by-side basis (thought perhaps not by much), but the Flowable version
>> would likely scale up to higher throughput and better CPU utilization when
>> under concurrent load.
> 
> 
> Pipes is definitely faster than RxJava (single-threaded). While I only 
> learned RxJava 36 hours ago, I don’t believe it will ever beat Pipes because 
> Pipes4 is brain dead simple — much simpler than in TP3 where a bunch of extra 
> data structures were needed to account for GraphComputer semantics (e.g. 
> ExpandableIterator).
> 
> I believe, given the CPU utilization/etc. points you make, that RxJava will 
> come into its own in multi-threaded mode (called ParallelFlowable) when 
> trying to get real-time performance from a query that touches/generates lots 
> of data (traversers). This is the reason for Category 2 — real-time, 
> multi-threaded, single machine. I only gave a quick pass last night at making 
> ParallelFlowable work, but gave up when various test cases were failing (— I 
> now believe I know th

Re: A Question Regarding TP4 Processor Classifications

2019-04-04 Thread Marko Rodriguez

Hello,

Thank you for the response.

> Excellent progress on the the RxJava processor. I was wondering if
> categories 1 and 2 can be combined where Pipes becomes the Flowable version
> of the RxJava processor?

I don’t quite understand your questions. Are you saying:

Flowable.of().flatMap(pipesProcessor)

or are you saying:

“Get rid of Pipes all together and just use single-threaded RxJava 
instead."

For the first, I don’t see the benefit of that. For the second, Pipes4 is 
really fast! — much faster than Pipes3. (more on this next)

> In this case, though single threaded, we'd still
> get the benefit of asynchronous execution of traversal steps versus
> blocking execution on thread pools like the current TP3 model.

Again, I’m confused. Apologies. I believe that perhaps you think that the 
Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If so, 
note that this is not the case. The concept of Steps (chained iterators) is 
completely within the pipes/ package. The machine-core/ package compiles 
Bytecode to a nested List of stateless, unconnected functions (called a 
Compilation). It is this intermediate representation that ultimately is used by 
Pipes, RxJava, and Beam to create their respective execution plan (where Pipes 
does the whole chained iterator step thing).

Compilation: 
https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43>

Pipes: 
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47>
Beam: 
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132>
RxJava: 
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103

<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103>

> I would
> imagine Pipes would beat the Flowable performance on a single traversal
> side-by-side basis (thought perhaps not by much), but the Flowable version
> would likely scale up to higher throughput and better CPU utilization when
> under concurrent load.

Pipes is definitely faster than RxJava (single-threaded). While I only learned 
RxJava 36 hours ago, I don’t believe it will ever beat Pipes because Pipes4 is 
brain dead simple — much simpler than in TP3 where a bunch of extra data 
structures were needed to account for GraphComputer semantics (e.g. 
ExpandableIterator).

I believe, given the CPU utilization/etc. points you make, that RxJava will 
come into its own in multi-threaded mode (called ParallelFlowable) when trying 
to get real-time performance from a query that touches/generates lots of data 
(traversers). This is the reason for Category 2 — real-time, multi-threaded, 
single machine. I only gave a quick pass last night at making ParallelFlowable 
work, but gave up when various test cases were failing (— I now believe I know 
the reason why). I hope to have ParallelFlowable working by mid-week next week 
and then we can benchmark its performance.

I hope I answered your questions or at least explained my confusion.

Thanks,
Marko.

http://rredux.com <http://rredux.com/>

> On Apr 4, 2019, at 10:33 AM, Ted Wilmes  wrote:
> 
> Hello,
> 
> 
> --Ted
> 
> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> 
>> Hello,
>> 
>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
>> (OLAP) execution models. In TP4, if a processing engine can convert a
>> bytecode Compilation into a working execution plan then that is all that
>> matters. TinkerPop does not need to concern itself with whether that
>> execution plan is “OLTP" or “OLAP" or with the semantics of its execution
>> (function oriented, iterator oriented, RDD-based, etc.). With that, here
>> are 4 categories of processors that I believe define the full spectrum of
>> what we will be dealing with:
>> 
>>1. Real-time single-threaded single-machine.
>>* This is STANDARD (OLTP) in TP3.
>>* T

Re: The Machine Interface of TP4.

2019-04-02 Thread Marko Rodriguez

Hi,

> I'm still not sure I follow how caching will work effectively. Like, I
> follow that you can have bytecode local and remote and if the same bytecode
> is seen in a cache the UUID can be sent in its stead but at least in TP3
> semantics the bytecode for:

There are two-levels to bytecode:

source instructions (withXXX)
instructions (out, in, count)

LocalMachine is just caching a compilation of the source instructions. This is 
necessary because TraversalSource no longer has any state so if you want to 
have the strategies pre-sorted and database connections open, you have to do it 
via a Machine implementation — via SourceCompilation. This is what the 
Machine.register() method does. 

Explained another way — TraversalSource is a Gremlin language-specific class. 
The TP4 virtual machine doesn’t know what a TraversalSource is. It only cares 
about Bytecode.

> and therefore the compiled script for both were equal and caching was easy.
> Is there a way to simulate that type of parameterization for bytecode so
> that local/remote caching works nicely and we can see some significant
> performance gains?

I have not thought through instruction bytecode parameterization and caching. 
This is different than Machine.register(). This would occur when you 
Machine.submit() -- if the bytecode already exists in a cache as a Compilation, 
then you just fetch it and you don’t have to re-apply strategies and 
re-generate the intermediate function representation. As such, this type of 
caching doesn’t effect the Machine interface definition and would just be a 
Machine-specific implementation detail. Not that its not important as we need 
to get bytecode parameterization and caching down, just that its a "different 
topic.” I forget how we did “bindings” in TP3, but I remember you saying its a 
ThreadLocal model and its janky. What do you recommend for bindings in TP4? 
Perhaps create a another email thread.

HTH,
Marko.

http://rredux.com

> sorry if you already answered this somewhere as i have this sense we had
> this conversation somewhere, but maybe i'm making that up.
> 
> On Wed, Mar 27, 2019 at 9:09 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>>> LocalMachine, it will lookup the registered UUID and if it exists, use
>> the
>>> pre-compiled source code.
>> 
>> So what Machine.register() does generally, is up to the implementation.
>> 
>> LocalMachine.register() does what TP3 does in TraversalSource. It
>> “pre-compiles”.
>> 
>>- sort strategies
>>TP3:
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138>
>>> 
>> 
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47>
>>> 
>>- sets up processor
>>TP3:
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
>>  
>> <https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141>
>>> 
>>- sets up structure
>>TP3: this came

A Question Regarding TP4 Processor Classifications

2019-04-02 Thread Marko Rodriguez

Hello,

TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER (OLAP) 
execution models. In TP4, if a processing engine can convert a bytecode 
Compilation into a working execution plan then that is all that matters. 
TinkerPop does not need to concern itself with whether that execution plan is 
“OLTP" or “OLAP" or with the semantics of its execution (function oriented, 
iterator oriented, RDD-based, etc.). With that, here are 4 categories of 
processors that I believe define the full spectrum of what we will be dealing 
with:

1. Real-time single-threaded single-machine.
* This is STANDARD (OLTP) in TP3.
* This is the Pipes processor in TP4.

2. Real-time multi-threaded single-machine.
* This does not exist in TP3.
* We should provide an RxJava processor in TP4.

3. Near-time distributed multi-machine.
* This does not exist in TP3.
* We should provide an Akka processor in TP4.

4. Batch-time distributed multi-machine.
* This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
* We should provide a Spark processor in TP4.

I’m not familiar with the specifics of the Flink, Apex, DataFlow, Samza, etc. 
stream-based processors. However, I believe they can be made to work in 
near-time or batch-time depending on the amount of data pulled from the 
database. However, once we understand these technologies better, I believe we 
should be able to fit them into the categories above.

In conclusion: Do these categories make sense to people? Terminology-wise -- 
Near-time? Batch-time? Are these distinctions valid?

Thank you,
Marko.

http://rredux.com

Re: [TinkerPop] What is the fundamental bytecode for TP4?

2019-03-30 Thread Marko Rodriguez

Hello,

> (As in SQL to "guide" (force?) you, then PL/SQL or TSQL or UDFs, etc.)  The 
> core should be simple, but not too simple, and avoid redundancy.

If you look at how I currently have it set up, we have “core instruction set” 
and “common instruction set.” Common is your standard count, group, sum, 
repeat, etc (~20 instructions). Core is only 6 instructions — branch, initial, 
map, flatmap, filter, and reduce. Every time an instruction is added to common, 
the respective core instruction is also added. The test suite for Pipes uses 
common and the test suite for Beam uses core. By just riding out these two 
instruction set branches I hope to see a pattern emerge and perhaps a 
“common/core instruction set” can be converged upon.

And yes, redundancy is a big flaw in the TP3 instruction set. This popped up on 
the radar early due to the stream ring theory article and initial stabs on TP4 
development. I suspect that the common instruction set will have 1/3 of the 
instructions of TP3.

> I kinda wonder if it just shoves the complexity down into analyzing the 
> arguments of the instructions themselves or other contexts associated with 
> the instructions.maybe too early to tell. I'm just really hoping that 
> TP4 can offer what TP3 didn't, which was an easy way to reason about complex 
> query patterns. we promised that with "tools" in TP3 but those never really 
> materialized (attempts were made, but nothing seemed to stick really).

The problem with TP3 reasoning is that you are reasoning at the “step” level, 
not at the “instruction” level. In TP3, after bytecode, the compilation goes to 
Pipes. This was a wrong move. It meant that we had to embed one execution 
engine (Pipes) into another (Spark, e.g.). In TP4, we compile from bytecode to 
CFunctions (coefficient functions). CFunctions do not assume an execution 
engine. They are simply Map, FlatMap, Reduce, Initial, Branch, and Filter 
functions (stateless functions). It is then up to the execution engine to 
coordinate these functions accordingly. Thus, the strategy reasoning in TP3 was 
awkward because you had to work at manipulating methods/fields on Pipe steps 
(i.e. object reasoning). In TP4, you manipulate [op,arg*]-instructions (i.e. 
primitive array reasoning).

I have not flushed out strategies to any great extent in TP4, but I believe 
they will be easier to write than in TP3. However, I sorta don’t think 
strategies are going to go the same direction as they did in TP3. I’m having 
some inklings that we are not thinking about bytecode optimization in the most 
elegant way… 

Take care,
Marko.

http://rredux.com <http://rredux.com/>

> On Mar 30, 2019, at 10:00 AM, Ben Krug  wrote:
> 
> As an outsider of sorts, this was my thought, too.  Supposedly 'mov' is 
> Turing-complete, but I wouldn't want to program with just that.
> (https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf 
> <https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf>)  
> 
> Ideally, you have a core language that guides you in how to think, model, and 
> approach, then probably extensions for greater flexibility.
> 
> Hopefully that's the goal.
> 
> On Sat, Mar 30, 2019 at 6:15 AM Stephen Mallette  <mailto:spmalle...@gmail.com>> wrote:
> Do you/kuppitz think that the reduced/core instruction set means that complex 
> strategy development is simplified? on the surface, less instructions sounds 
> like it will be easier to reason about patterns when providers go to build 
> strategies, but I'm not sure. I kinda wonder if it just shoves the complexity 
> down into analyzing the arguments of the instructions themselves or other 
> contexts associated with the instructions.maybe too early to tell. 
> I'm just really hoping that TP4 can offer what TP3 didn't, which was an easy 
> way to reason about complex query patterns. we promised that with "tools" in 
> TP3 but those never really materialized (attempts were made, but nothing 
> seemed to stick really). 
> 
> On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez  <mailto:okramma...@gmail.com>> wrote:
> Hello,
> 
> As you know, one of the major objectives of TP4 is to generalize the virtual 
> machine in order to support any data structure (not just graph).
> 
> Here is an idea that Kuppitz and I batted around yesterday and I spent this 
> morning implementing on the tp4/ branch. 
> 
> From the Stream Ring Theory paper [https://zenodo.org/record/2565243 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zenodo.org_record_2565243=DwMFaQ=adz96Xi0w1RHqtPMowiL2g=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko=FjRysuDd08uia7KALFxQ_-iXKg2cpK4E3xJLlo1XxGM=>],
>  we know that universal computation is possible with branch, initial, map, 
> flatmap, filter, reduce stream-based f

TinkerPop4 Status Report #2

2019-03-29 Thread Marko Rodriguez

Hello,

This is an update of what I’ve been up to on the tp4/ branch since the last
report 2 weeks ago.

1. Arguments
TP4 brings the concept of an Argument to the front and center.
An argument can either be a constant (e.g. 2) or a dynamically determined value
(e.g. out().count()). This means that users will be able to do things such as:
* has(‘name’,out(‘father’).value(‘name’)) // is he a jr?
* is(eq(out(‘mananger’))) // is he is own boss?
This flexibility is starting to make the steps bleed into each
other.
is(eq(select(‘a’))) == where(eq(‘a’))
One Gremlin-C# guy on Twitter was saying that Gremlin has too
many ways to do things. It will be nice if we can reduce the number of steps we
have with Arguments.

2. Console
Java9+ brings with it JShell. I posed the question on dev@ — do
we need GremlinConsole?

https://lists.apache.org/thread.html/b9083cf992b01bcfe4b82d14b9aa2d30c90707c4c134c6cfefade4ae@%3Cdev.tinkerpop.apache.org%3E

It is possible to configure JShell to look (and feel?) like the
GremlinConsole with a short startup script.
I would like to shoot for TP4 being as small and compact as
possible — less to build, less to document, less to maintain, …
Gremlin-Java -> JShell, Gremlin-Groovy -> GroovySh,
Gremlin-Python -> Python CLI, … why not reuse?
The most beautiful code is the code that was never written. The
greatest programmers are those that coded themselves out of a job. Let us be
great and beautiful.

3. Data Structures
I’m still trying to figure out how to generalize Gremlin out of
graph. Limited luck.
Worked with Kuppitz a bit on how to represent all steps using
just map, flatmap, reduce, filter, branch only! (its a little too nutz for my
tastes, but maybe…)
https://twitter.com/twarko/status/1109491874333515778

Ryan Wisnesky was kind enough to provide a demo of his Category
Query Language (CQL) on Monday. Cool stuff indeed.
Ryan pointed me to this paper which I found worthwhile:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.3252=rep1=pdf

This is the big unknown for me and I want to solve it. If we
can do this right, TinkerPop will permeate all things Apache…all things data.
https://twitter.com/twarko/status/1109540859442163712

4. The Machine
I introduced the Machine interface.

https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java

This interface encompasses both TraversalSource and
RemoteConnection functionality.
The general use is g =
Gremlin.traversal(machine).withProcess(...).withStrategy(...)
This move turned Gremlin into basically “nothing” — Gremlin is
a just the “builder-pattern” applied to Bytecode. Check out how small Gremlin
is!

https://github.com/apache/tinkerpop/tree/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin

Thats it. ?! … Gremlin is trivial. Much less to
consider for Gremlin-JS, Gremlin-C#, Gremlin-?? …

5. RemoteMachine, TraverserServer, and MachineServer
https://twitter.com/twarko/status/1110612168968265729

“GremlinServer” is too serial in concept. Receive bytecode,
execute bytecode, aggregate traversers, return traversers.
- This is bad. We need to start thinking distributed
execution and aggregation from the start. We need to blur the concept of a
“server.”

https://github.com/apache/tinkerpop/tree/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote

MachineServer — sits somewhere an accepts Bytecode.
(multi-threaded server)
RemoteMachine — can talk to a MachineServer to submit
Bytecode.(single

Re: The Machine Interface of TP4.

2019-03-27 Thread Marko Rodriguez

Hi,

> LocalMachine, it will lookup the registered UUID and if it exists, use the
> pre-compiled source code.

So what Machine.register() does generally, is up to the implementation.

LocalMachine.register() does what TP3 does in TraversalSource. It 
“pre-compiles”.

- sort strategies
TP3: 
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
 
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138>

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
 
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47>
- sets up processor
TP3: 
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
 
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141>
- sets up structure
TP3: this came for free because we created the traversal source 
from Graph.traversal().

This way when you keep spawning traversals off of the same “g” we don’t have to 
re-compile the source instructions.

> maybe i didn't follow properly but is this for the purpose of caching
> traversals to avoid the costs of traversal to bytecode compilation?


Note this is a SourceCompilation (just the source instructions are compiled), 
not the full instructions which is a Compilation.

>  in other words, is this describing a general way to cache compiled bytecode 
> so
> that it doesn't have to go through strategy application more than once?


To the concept of caching traversals, that is easy to do with the Machine 
interface. On Machine.submit(), a Map can exist. Same as 
TP3. However, check this, we can do it another way. Why even send the full 
Bytecode? If the RemoteMachine (which is local to the client) knows it already 
sent the same Bytecode before, it can send a single instruction Bytecode with 
an encoded UUID-like instruction. Thus, Map. Less data to 
transfer.

RemoteMachine (client side) can keep a Map and do the proper 
UUID-encoding.
MachineServer (server side) can then Map, where if the 
received Bytecode is a single UUID-like instruction, fast lookup. If not, can 
still look it up!

Thus, it is easy for us to do both types of caching with the Machine interface:

SourceCompilation: source bytecode caching.
Compilation: full bytecode caching.

Keep the questions coming.

Marko.

http://markorodriguez.com <http://markorodriguez.com/>


> 
> 
> 
> On Mon, Mar 25, 2019 at 8:48 AM Marko Rodriguez  <mailto:okramma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>> Here is how the TP4 bytecode submission infrastructure is looking.
>> 
>> In TP3, TraversalSource maintained the “pre-compilation” of strategies,
>> database connectivity, etc. This was not smart for the following reasons:
>> 
>>1. It assumed the traversal would execute on the same machine that
>> it was created on.
>>2. We had to make an explicit distinction between local and remote
>> execution via RemoteStrategy.
>>3. RemoteStrategy passes an excessive amount of data over the wire
>> on each traversal submission (the source instructions!).
>>4. RemoteStrategy is bug prone with traversal inspection and
>> RemoteStep, etc.
>> 
>> In TP4, we are now going to assume that Bytecode (a traversal) is always
>> submitted somewhere and this “somewhere" could be local or remote. This
>> “somewhere” must implement the Machine interface.
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
>> 
>> Machine makes explicit the TP4 communication protocol. The only objects
>> being transmitted are either Bytecode or Traversers. Simple.
>> 
>> Here is an example using LocalMachine:
>> 
>> Machine machine = LocalMachine.open();
>> TraversalSource g =
>> Gremlin.traversal(machine).withProcessor(…).withStructure(…).withStrategy(…)
>> 
>> The first time a traversal is generated from g, the Bytecode source
>> instructions are registered with the machine.
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/

1 2 3 >

1 - 100 of 258 matches

Mail list logo