[jira] [Created] (TINKERPOP-2214) Gremlin client with version 3.2.3 connect to server with version 3.4.1 auth failed

2019-05-12 Thread Fang Yong (JIRA)
Fang Yong created TINKERPOP-2214:


 Summary: Gremlin client with version 3.2.3 connect to server with 
version 3.4.1 auth failed
 Key: TINKERPOP-2214
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2214
 Project: TinkerPop
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.1
Reporter: Fang Yong


When I use client with 3.2.3 version and connect to gremlin server with version 
3.4.1, it auth failed with the following message

Incorrect type for : sasl - base64 encoded String is expected

I found this message is in SaslAuthenticationHandler and when this handler 
receive byte[] from client 3.2.3, it failed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

2019-05-12 Thread Joshua Shinavier
Hi Marko,

Just a quick reply for now, but this sounds like a good game plan, and
there is certainly a ton of research on query optimization, adaptive query
processing, etc. that could help to guide the development, and more
opportunities for static analysis (i.e. a type system) will help.

Btw. my initial implementation of MatchStep had some stats-driven
optimization built in. It was a little "off the top of my head", and also
not OLAP-friendly, but it would be worthwhile to revisit this in a bigger
and more formal way in TP4.

Josh


On Sun, May 12, 2019 at 10:37 AM Marko Rodriguez 
wrote:

> Hi,
>
> Thank you for your reply Josh. I think we are ultimately converging on:
>
> TinkerPop4 as an open source, programmable query planner.
>
> We are going in this direction because of our three requirements:
> 1. Easy for language designers to have their language work with
> various data processing/storage systems.
> 2. Easy for processing engines to execute language expressions
> over data.
> 3. Easy for data storage systems to expose their data for
> computation.
> All the “easy”-talk implies “hard”-work for us. :)
>
> Thus far, TinkerPop as a graph processing framework, query planning has
> been relatively straightforward as we have relied on
> (1) users knowing the shape of their data and writing queries that
> are ‘smart’.
> (2) compiler strategies rewriting common expressions into cheaper
> expressions.
> (3) our match()-step for dynamically resorting patterns based on
> runtime cost analyses.
> However, I think moving forward (to capture more complex data scenarios)
> we will need data storage system providers to expose statistics about their
> data. What does this entail? I believe it entails allowing data storage
> systems to expose:
> (1) data paths (i.e. supported references through the data)
> (2) data statistics (i.e. the time and space costs associated with
> particular data paths)
>
> For SQL query planning, it will take a few years for our framework to
> become top-notch. However, for Gremlin/Cypher, RQL/SPARQL,
> MongoQuery/XPath, CQL, etc. I believe we can pull it off with the resources
> we have on our first release as these “NoSQL”-systems tend to have simple
> 'data paths’ with, arguably, graph and RDF being the most difficult to
> reason on.
>
> ———
>
> What does the TP4 VM need to know and how will the various system
> components (language, processor, structure) provide that information?
>
> I believe we have been talking about this the whole time except now I am
> introducing costs.
> * What are the types of tuples and how do they relate?
> * How much does it cost to move through these tuple-relations?
>
> pg.graph
>   [data access instructions]
>   V()
>   V(id)
>   V(key,value)
>   [data costs instructions]
>   cost(V())
>   cost(V(id))
> pg.graph.vertex
>   [data access instructions]
>   out()
>   out(string)
>   [data costs instructions]
>   cost(out())
>   cost(out(string))
>   …
>
> In other words, for every type of tuple, we need to know what instructions
> it supports and we we need to know the time/space costs of said
> instructions.
>
> TP4 VM uses the “data cost”-instructions to construct the query
> plan.
> cost(out(‘knows’)) = { space:10345, time:O(1) } //
> in-memory graphdb
> cost(out(‘knows’)) = { space:10345, time:O(log(n)) } //
> RDBMS-encoded graphdb
> TP4 processors use the “data access”-instructions to process the
> data.
> out(‘knows’) -> Iterator
>
> What I showed above was PropertyGraphs as of TP2. What about when labels
> and schemas are involved? This is where Josh’s concept of “typing” comes
> into play.
>
> pg.graph
> pg.graph.vertex.person
>   out(‘knows’)
>   out(‘created’)
>   cost(out(‘knows’))
>   cost(out(‘created’))
> pg.graph.vertex.project
> pg.graph.edge.knows
> pg.graph.edge.created
> ...
>
> With schema, we of course get more refined statistics….
>
> Thus, I think that we are ultimately trying to create a Multi-Model ADT
> that exposes data paths (the explicit structure in the data including
> auxiliary indices) and data costs (the time/space-statistics of such
> paths). TP4 VM uses that information to:
>
> 1. Take unoptimized bytecode from the language provider (easy for
> them, only operational semantic query planning required).
> 2. Convert that bytecode into an optimized bytecode for the data
> storage system (easy for them, they only need to say what instructions they
> support and costs).
> 3. Submit that bytecode to processor for execution (easy for them,
> they simply use their query planner to execute the data storage optimized
> data flow).
>
> Thoughts?,
> Marko.
>
> http://rredux.com 
>
>
>
>
> > On May 11, 2019, at 9:08 AM, Joshua Shinavier  wrote:
> >
> > Oops, looked back at my email and noticed some nonsense. This:
> >
> > 

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-12 Thread Marko Rodriguez
Hi,

> Machine machine = RemoteMachine
>.withStructure(NeptuneStructure.class, config1)
>.withProcessor(AkkaProcessor.class, config2)
>.withCompiler(CypherCompiler.class, config3)
>.open(config0);


Yea, I think something like this would work well. 

I like it because it exposes the three main components that TinkerPop is gluing 
together:

Language
Structure
Process

Thus, I would have it:

withStructure()
withProcessor()
withLanguage()

Marko.

http://rredux.com 


> On May 10, 2019, at 8:27 AM, Dmitry Novikov  wrote:
> 
> Stephen, Remote Compiler - very interesting idea to explore. Just for 
> brainstorming, let me imagine how this may look like:
> 
> 
> 1. If the client supports compilation - compiles on the client side
> 2. If remote supports compilation - compiles on the server side
> 3. If neither client and remote support compilation, `config3` could contain 
> the path to microservice.  Microservice does compilation and either return 
> bytecode, either send bytecode to remote and proxy response to the client. 
> Microservice could be deployed on remote as well.
> 
> `config3` may look like respectively:
> 
> 1. `{compilation: 'embedded'}`
> 2. `{compilation: 'remote'}`
> 2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`
> 
> On 2019/05/10 13:45:50, Stephen Mallette  wrote: 
>>> If VM, server or compiler is implemented in another language, there is
>> always a possibility to use something like gRPC or even REST to call
>> microservice that will do query→Universal Bytecode conversion.
>> 
>> That's an interesting way to handle it especially if it could be done in a
>> completely transparent way - a Remote Compiler of some sort. If we had such
>> a thing then the compilation could conceivably happen anywhere, client or
>> server of the host programming language.
>> 
>> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov 
>> wrote:
>> 
>>> Hello,
>>> 
>>> Marko, thank you for the clear explanation.
>>> 
 I don’t like that you would have to create a CypherCompiler class (even
>>> if its just a wrapper) for all popular programming languages. :(
>>> 
>>> Fully agree about this. For declarative languages like SQL, Cypher and
>>> SPARQL complex compilation will be needed, most probably requiring AST
>>> walk. Writing compilers for all popular languages could be possible in
>>> theory, but increases the amount of work n times (where n>language count)
>>> and complicates testing. Also, libraries necessary for the task might not
>>> be available for all languages.
>>> 
>>> In my opinion, to avoid the situation when the number of supported query
>>> languages differs depending on client programming language, it is
>>> preferable to introduce a plugin system. The server might have multiple
>>> endpoints, one for Bytecode, one for SQL, Cypher, etc.
>>> 
>>> If VM, server or compiler is implemented in another language, there is
>>> always a possibility to use something like gRPC or even REST to call
>>> microservice that will do query→Universal Bytecode conversion.
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> On 2019/05/10 12:03:30, Stephen Mallette  wrote:
> I don’t like that you would have to create a CypherCompiler class
>>> (even
 if its just a wrapper) for all popular programming languages. :(
 
 Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
 that GLVs can support the g.sparql() step properly. It seems like no
>>> matter
 what you do, you end up with a situation where the language designer has
>>> to
 do something in each programming language they want to support. The bulk
>>> of
 the work seems to be in the "compiler" so if that were moved to the
>>> server
 (what we did in TP3) then the language designer would only have to write
 that once per VM they wanted to support and then provide a more
>>> lightweight
 library for each programming language they supported on the client-side.
>>> A
 programming language that had the full compiler implementation would have
 the advantage that they could client-side compile or rely on the server.
>>> I
 suppose that a lightweight library would then become the basis for a
>>> future
 full blown compiler in that languagehard one.
 
 
 
 On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez 
>>> wrote:
 
> Hello Dmitry,
> 
>> In TP3 compilation to Bytecode can happen on Gremlin Client side or
> Gremlin Server side:
>> 
>> 1. If compilation is simple, it is possible to implement it for all
> Gremlin Clients: Java, Python, JavaScript, .NET...
>> 2. If compilation is complex, it is possible to create a plugin for
> Gremlin Server. Clients send query string, and server does the
>>> compilation.
> 
> Yes, but not for the reasons you state. Every TP3-compliant language
>>> must
> be able to compile to TP3 bytecode. That bytecode is then