Hi Roberto, Could you provide a simple example of what you need in terms of source tables and expected result in OrientDB?
Lvc@ ᐧ On 2 October 2014 09:56, Roberto Cornacchia <[email protected]> wrote: > Thanks Luca, > > I just had a look at the documentation for ETL. > If I understand correctly, it's not possible to start from the 3 separate > tables like above (term, doc, term_doc), as the joinFieldName needs to > refer to a field of one of the Vertex tables (either term or doc in this > case). > Is that correct? Or can this still be done? > So I'd have to join the tables before using the ETL module? > > > Best, Roberto > > On Wednesday, 1 October 2014 18:47:28 UTC+2, Lvc@ wrote: >> >> Hi Roberto, >> I suggest you to use the ETL module: much more powerful and fast: >> >> http://www.orientechnologies.com/docs/last/orientdb-etl. >> wiki/Import-from-DBMS.html >> >> Lvc@ >> >> ᐧ >> >> On 1 October 2014 17:56, Roberto Cornacchia <[email protected]> >> wrote: >> >>> Hi there, >>> >>> I'm trying to approach OrientDB, coming from relational background. >>> >>> I've already looked at this: http://www.orientechnologies.com/docs/ >>> last/orientdb.wiki/Import-RDBMS-to-Graph-Model.html >>> >>> However, I'm not sure how I would do this when I start from a >>> many-to-many relationship. >>> >>> *Example (term-doc matrix as used in information retrieval - which term >>> occurs in which document)* >>> >>> -- all terms >>> CREATE TABLE term (id INTEGER, term STRING); >>> INSERT INTO term VALUES (0, 'OrientDB'); >>> INSERT INTO term VALUES (1, 'is'); >>> INSERT INTO term VALUES (2, 'cool'); >>> >>> -- all docs >>> CREATE TABLE doc (id INTEGER, title STRING); >>> INSERT INTO doc VALUES (10, 'manual'); >>> INSERT INTO doc VALUES (11, 'license'); >>> >>> -- many-to-many relations >>> CREATE TABLE term_doc (term_id INTEGER, doc_id INTEGER); >>> INSERT INTO term_doc VALUES (0, 10); >>> INSERT INTO term_doc VALUES (0, 11); >>> INSERT INTO term_doc VALUES (1, 10); >>> INSERT INTO term_doc VALUES (2, 10); >>> >>> *First question:* >>> With OrientDB, turning the term and the doc records into Vertex records >>> is easy. No problem there. >>> How could I take the content of term_doc and create the respective >>> edges? Can this be done in the OrientDB SQL console? >>> >>> >>> *Second question:*Supposing I have solved the first issue, and I want >>> to count how many times each term appears in each document. >>> In relational, this would be: >>> SELECT term_id, doc_id, count(*) >>> FROM term_doc >>> GROUP BY term,doc; >>> >>> Now, I suppose this is not how you would do it once you have modeled the >>> term-doc problem into a graph database. Also because when I tried to do run >>> this query against a simple import of term_doc, it was incredibly slow. >>> So, supposing I have real edges between term and doc, instead of the >>> explicit table term_doc, how could I obtain the count I mention above? >>> >>> Thanks! >>> Roberto >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
