So, suppose I start from the relational tables in my original post.

I guess, if I tried to translate that by hand, I'd get something like:
-- all terms
CREATE CLASS term extends V;
CREATE property id integer;
CREATE property term string;
INSERT INTO term VALUES (0, 'OrientDB');
INSERT INTO term VALUES (1, 'is');
INSERT INTO term VALUES (2, 'cool');

-- all docs
CREATE CLASS doc extends V;
CREATE property id integer;
CREATE property title string;
INSERT INTO doc VALUES (10, 'manual');
INSERT INTO doc VALUES (11, 'license');

-- the edges
CREATE CLASS occurs_in extends E
CREATE EDGE occurs_in from (select from term where id=0) to (select from 
doc where id=10)
CREATE EDGE occurs_in from (select from term where id=0) to (select from 
doc where id=11)
CREATE EDGE occurs_in from (select from term where id=1) to (select from 
doc where id=10)
CREATE EDGE occurs_in from (select from term where id=2) to (select from
 doc where id=10)

This is how I would do it for a small example like this. 
But in a real scenario, when the relationships come from a stored (large) 
table (term_doc above) - this is a standard many-to-many relationship 
schema - I don't know how to construct the edges (either with SQL or with 
the ETL processor): somehow I need 2 joins (term -> term_doc <- doc) in 
order to connect the vertices of term and doc.


Background: the reason why I'm trying to do this is to replicate real data 
with real data distributions and sizes, from a relational DB into OrientDB, 
in order to run some tests and see how it would fit my needs (where it 
would be slower/faster, which features would be available/missing, etc). 
Obviously, if I want to compare the two, I need to map the problem into the 
graph space, and not use OrientDB in a relational way (as I mentioned 
above, I tried this and as one could expected the performance loss is huge).

Roberto

On Thursday, 2 October 2014 10:35:44 UTC+2, Lvc@ wrote:
>
> Hi Roberto,
> Could you provide a simple example of what you need in terms of source 
> tables and expected result in OrientDB?
>
> Lvc@
> ᐧ
>
> On 2 October 2014 09:56, Roberto Cornacchia <[email protected] 
> <javascript:>> wrote:
>
>> Thanks Luca,
>>
>> I just had a look at the documentation for ETL.
>> If I understand correctly, it's not possible to start from the 3 separate 
>> tables like above (term, doc, term_doc), as the joinFieldName needs to 
>> refer to a field of one of the Vertex tables (either term or doc in this 
>> case). 
>> Is that correct? Or can this still be done?
>> So I'd have to join the tables before using the ETL module?
>>
>>
>> Best, Roberto
>>
>> On Wednesday, 1 October 2014 18:47:28 UTC+2, Lvc@ wrote:
>>>
>>> Hi Roberto,
>>> I suggest you to use the ETL module: much more powerful and fast:
>>>
>>> http://www.orientechnologies.com/docs/last/orientdb-etl.
>>> wiki/Import-from-DBMS.html
>>>
>>> Lvc@
>>>
>>> ᐧ
>>>
>>> On 1 October 2014 17:56, Roberto Cornacchia <[email protected]> 
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I'm trying to approach OrientDB, coming from relational background.
>>>>
>>>> I've already looked at this: http://www.orientechnologies.com/docs/
>>>> last/orientdb.wiki/Import-RDBMS-to-Graph-Model.html
>>>>
>>>> However, I'm not sure how I would do this when I start from a 
>>>> many-to-many relationship.
>>>>
>>>> *Example (term-doc matrix as used in information retrieval - which term 
>>>> occurs in which document)*
>>>>
>>>> -- all terms
>>>> CREATE TABLE term (id INTEGER, term STRING);
>>>> INSERT INTO term VALUES (0, 'OrientDB');
>>>> INSERT INTO term VALUES (1, 'is');
>>>> INSERT INTO term VALUES (2, 'cool');
>>>>
>>>> -- all docs
>>>> CREATE TABLE doc (id INTEGER, title STRING);
>>>> INSERT INTO doc VALUES (10, 'manual');
>>>> INSERT INTO doc VALUES (11, 'license');
>>>>
>>>> -- many-to-many relations
>>>> CREATE TABLE term_doc (term_id INTEGER, doc_id INTEGER);
>>>> INSERT INTO term_doc VALUES (0, 10);
>>>> INSERT INTO term_doc VALUES (0, 11);
>>>> INSERT INTO term_doc VALUES (1, 10);
>>>> INSERT INTO term_doc VALUES (2, 10);
>>>>
>>>> *First question:*
>>>> With OrientDB, turning the term and the doc records into Vertex records 
>>>> is easy. No problem there.
>>>> How could I take the content of term_doc and create the respective 
>>>> edges? Can this be done in the OrientDB SQL console?
>>>>
>>>>
>>>> *Second question:*Supposing I have solved the first issue, and I want 
>>>> to count how many times each term appears in each document.
>>>> In relational, this would be:
>>>> SELECT term_id, doc_id, count(*)
>>>> FROM   term_doc
>>>> GROUP  BY term,doc;
>>>>
>>>> Now, I suppose this is not how you would do it once you have modeled 
>>>> the term-doc problem into a graph database. Also because when I tried to 
>>>> do 
>>>> run this query against a simple import of term_doc, it was incredibly slow.
>>>> So, supposing I have real edges between term and doc, instead of the 
>>>> explicit table term_doc, how could I obtain the count I mention above?
>>>>
>>>> Thanks!
>>>> Roberto
>>>>
>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to