So, suppose I start from the relational tables in my original post. I guess, if I tried to translate that by hand, I'd get something like: -- all terms CREATE CLASS term extends V; CREATE property id integer; CREATE property term string; INSERT INTO term VALUES (0, 'OrientDB'); INSERT INTO term VALUES (1, 'is'); INSERT INTO term VALUES (2, 'cool');
-- all docs CREATE CLASS doc extends V; CREATE property id integer; CREATE property title string; INSERT INTO doc VALUES (10, 'manual'); INSERT INTO doc VALUES (11, 'license'); -- the edges CREATE CLASS occurs_in extends E CREATE EDGE occurs_in from (select from term where id=0) to (select from doc where id=10) CREATE EDGE occurs_in from (select from term where id=0) to (select from doc where id=11) CREATE EDGE occurs_in from (select from term where id=1) to (select from doc where id=10) CREATE EDGE occurs_in from (select from term where id=2) to (select from doc where id=10) This is how I would do it for a small example like this. But in a real scenario, when the relationships come from a stored (large) table (term_doc above) - this is a standard many-to-many relationship schema - I don't know how to construct the edges (either with SQL or with the ETL processor): somehow I need 2 joins (term -> term_doc <- doc) in order to connect the vertices of term and doc. Background: the reason why I'm trying to do this is to replicate real data with real data distributions and sizes, from a relational DB into OrientDB, in order to run some tests and see how it would fit my needs (where it would be slower/faster, which features would be available/missing, etc). Obviously, if I want to compare the two, I need to map the problem into the graph space, and not use OrientDB in a relational way (as I mentioned above, I tried this and as one could expected the performance loss is huge). Roberto On Thursday, 2 October 2014 10:35:44 UTC+2, Lvc@ wrote: > > Hi Roberto, > Could you provide a simple example of what you need in terms of source > tables and expected result in OrientDB? > > Lvc@ > ᐧ > > On 2 October 2014 09:56, Roberto Cornacchia <[email protected] > <javascript:>> wrote: > >> Thanks Luca, >> >> I just had a look at the documentation for ETL. >> If I understand correctly, it's not possible to start from the 3 separate >> tables like above (term, doc, term_doc), as the joinFieldName needs to >> refer to a field of one of the Vertex tables (either term or doc in this >> case). >> Is that correct? Or can this still be done? >> So I'd have to join the tables before using the ETL module? >> >> >> Best, Roberto >> >> On Wednesday, 1 October 2014 18:47:28 UTC+2, Lvc@ wrote: >>> >>> Hi Roberto, >>> I suggest you to use the ETL module: much more powerful and fast: >>> >>> http://www.orientechnologies.com/docs/last/orientdb-etl. >>> wiki/Import-from-DBMS.html >>> >>> Lvc@ >>> >>> ᐧ >>> >>> On 1 October 2014 17:56, Roberto Cornacchia <[email protected]> >>> wrote: >>> >>>> Hi there, >>>> >>>> I'm trying to approach OrientDB, coming from relational background. >>>> >>>> I've already looked at this: http://www.orientechnologies.com/docs/ >>>> last/orientdb.wiki/Import-RDBMS-to-Graph-Model.html >>>> >>>> However, I'm not sure how I would do this when I start from a >>>> many-to-many relationship. >>>> >>>> *Example (term-doc matrix as used in information retrieval - which term >>>> occurs in which document)* >>>> >>>> -- all terms >>>> CREATE TABLE term (id INTEGER, term STRING); >>>> INSERT INTO term VALUES (0, 'OrientDB'); >>>> INSERT INTO term VALUES (1, 'is'); >>>> INSERT INTO term VALUES (2, 'cool'); >>>> >>>> -- all docs >>>> CREATE TABLE doc (id INTEGER, title STRING); >>>> INSERT INTO doc VALUES (10, 'manual'); >>>> INSERT INTO doc VALUES (11, 'license'); >>>> >>>> -- many-to-many relations >>>> CREATE TABLE term_doc (term_id INTEGER, doc_id INTEGER); >>>> INSERT INTO term_doc VALUES (0, 10); >>>> INSERT INTO term_doc VALUES (0, 11); >>>> INSERT INTO term_doc VALUES (1, 10); >>>> INSERT INTO term_doc VALUES (2, 10); >>>> >>>> *First question:* >>>> With OrientDB, turning the term and the doc records into Vertex records >>>> is easy. No problem there. >>>> How could I take the content of term_doc and create the respective >>>> edges? Can this be done in the OrientDB SQL console? >>>> >>>> >>>> *Second question:*Supposing I have solved the first issue, and I want >>>> to count how many times each term appears in each document. >>>> In relational, this would be: >>>> SELECT term_id, doc_id, count(*) >>>> FROM term_doc >>>> GROUP BY term,doc; >>>> >>>> Now, I suppose this is not how you would do it once you have modeled >>>> the term-doc problem into a graph database. Also because when I tried to >>>> do >>>> run this query against a simple import of term_doc, it was incredibly slow. >>>> So, supposing I have real edges between term and doc, instead of the >>>> explicit table term_doc, how could I obtain the count I mention above? >>>> >>>> Thanks! >>>> Roberto >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "OrientDB" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
