[orientdb] Re: Creating edges using ETL that require multiple join fields

William Tue, 11 Apr 2017 13:54:06 -0700

Upon further investigation... I think the issue is answered here 
<http://stackoverflow.com/questions/5306741/do-json-keys-need-to-be-unique> 
w/rt what is happening in data2.json...  multiple "joinFieldName" entries 
don't cause an error in JSON, but the last one smashes the first one.


Enabling DEBUG messages in oetl.sh for my example resulted in:

OrientDB etl v.2.2.18 (build 3e8d46e73aa087fce245fa1125ab7d984a247f6e) https
://www.orientdb.com
[file] INFO Load from file ./data2.csv
BEGIN ETL PROCESSOR
[file] INFO Reading from file ./data2.csv with encoding UTF-8
Started execution with 1 worker threads
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
Start extracting
[csv] DEBUG document={b2:1,b1:1}
[1:vertex] DEBUG Transformer input: {b2:1,b1:1}
[csv] DEBUG document={b2:3,b1:2}
[csv] DEBUG document={b2:2,b1:1}
[csv] DEBUG document={b2:1,b1:2}
Extraction completed
<SNIP/>
[4:vertex] DEBUG Transformer input: {b2:1,b1:2}
[4:vertex] DEBUG Transformer output: v(B)[#36:0]
[4:edge] DEBUG Transformer input: v(B)[#36:0]
[4:edge] DEBUG joinCurrentValue=1, lookupResult=[#25:0, #28:0]
[4:edge] DEBUG created new edge=e[#45:0][#36:0-Conn->#25:0]
[4:edge] DEBUG created new edge=e[#46:0][#36:0-Conn->#28:0]
[4:edge] DEBUG Transformer output: v(B)[#36:0]
[4:log] DEBUG Transformer input: v(B)[#36:0]
[4:log] INFO >>> v(B)[#36:0]
[4:log] DEBUG Transformer output: v(B)[#36:0]
[orientdb] INFO committing
Pipeline worker done without errors: true
END ETL PROCESSOR

Looking at just the last vertex matching b(2,1), which I'd like to link to 
a(2,1)... but it's only matching that 2nd value... so the edges are only 
getting generated based on a match between A.a2 and B.b2.  

I'm still interested in knowing if it's possible to generate edges between 
nodes using ETL transformers in OrientDB based on multiple keys rather than 
just one.  If it's not possible, I'll have to write some custom code to do 
this... but I'd rather do that through the ETL process if possible :)



On Tuesday, April 11, 2017 at 2:08:54 PM UTC-6, William wrote:
>
> I have a problem where I need to be able to generate edges between nodes 
> using 2 or more join fields to properly resolve the match.
>
> It's similar to this question on stack overflow 
> <http://stackoverflow.com/questions/39517796/orientdb-etl-edge-transformer-2-joinfieldnames?noredirect=1&lq=1>...
>  
> the solution in that problem is to add multiple joinFieldName entries 
> into the edge transformer, but this isn't quite working as expected when I 
> tried it out...
>
> If I change the data by appending a new row, 2,1 to each data files to get 
> this:
>
> data1.csv
> a1,a2
> 1,1
> 1,2
> 2,3
> 2,1
>
> data2.csv
> b1,b2
> 1,1
> 2,3
> 1,2
> 2,1
>
> then using the json provided:
>
> data1.json
> {
>   "source": { "file": { "path": "./data1.csv" } },
>   "extractor": { "csv": {} },
>   "transformers": [
>     { "vertex": { "class": "A" } }
>   ],
>   "loader": {
>     "orientdb": {
>        "dbURL": "plocal:./test.orientdb",
>        "dbType": "graph",
>        "dbAutoCreate": true,
>        "classes": [
>          {"name": "A", "extends": "V"},
>          {"name": "B", "extends": "V"},
>          {"name": "Conn", "extends": "E"}
>        ]
>     }
>   }
> }
>
>
> data2.json
> {
>   "source": { "file": { "path": "./data2.csv" } },
>   "extractor": { "csv": {} },
>   "transformers": [
>     { "vertex": { "class": "B" } },
>     { "edge": { "class": "Conn",
>                 "joinFieldName": "b1",
>                 "lookup": "A.a1",
>                 "joinFieldName": "b2",
>                 "lookup": "A.a2",
>                 "direction": "out"
>             }}
>   ],
>   "loader": {
>     "orientdb": {
>        "dbURL": "plocal:./test.orientdb",
>        "dbType": "graph",
>        "dbAutoCreate": true,
>        "classes": [
>          {"name": "B", "extends": "V"},
>          {"name": "Conn", "extends": "E"}
>        ]
>     }
>   }
> }
>
> the result from running oetl.sh on data1.json then data2.json gives me 
> this:
> orientdb {db=test.orientdb}> select from v
>
>
> +----+-----+------+----+----+-------------+----+----+-------------+
> |#   |@RID |@CLASS|a1  |a2  |in_Conn      |b2  |b1  |out_Conn     |
> +----+-----+------+----+----+-------------+----+----+-------------+
> |0   |#25:0|A     |1   |1   |[#41:0,#45:0]|    |    |             |
> |1   |#26:0|A     |1   |2   |[#44:0]      |    |    |             |
> |2   |#27:0|A     |2   |3   |[#43:0]      |    |    |             |
> |3   |#28:0|A     |2   |1   |[#42:0,#46:0]|    |    |             |
> |4   |#33:0|B     |    |    |             |1   |1   |[#41:0,#42:0]|
> |5   |#34:0|B     |    |    |             |3   |2   |[#43:0]      |
> |6   |#35:0|B     |    |    |             |2   |1   |[#44:0]      |
> |7   |#36:0|B     |    |    |             |1   |2   |[#45:0,#46:0]|
> +----+-----+------+----+----+-------------+----+----+-------------+
>
>
> 8 item(s) found. Query executed in 0.01 sec(s).
>
> which seems wrong to me... if I write out the edges:
>
> A(1,1) <-- #41:0 --- B(1,1)   OK
> A(1,1) <-- #45:0 --- B(2,1)   WRONG
> A(1,2) <-- #44:0 --- B(1,2)   OK
> A(2,3) <-- #43:0 --- B(2,3)   OK
> A(2,1) <-- #42:0 --- B(1,1)   WRONG
> A(2,1) <-- #46:0 --- B(2,1)   OK
>
> My understanding here is that the two joinFieldName entries *should* be 
> creating an AND operation between the two keys... so I expect to match an A 
> to a B if A.a1 == B.b1 AND A.a2 == B.b2, but this isn't what is happening. 
>  From the looks of it, the first joinFieldName is ignored and the 2nd 
> joinFieldName entry is the thing that's actually used to match.
>
> Is this a bug?  If not and it's working as intended, how can I set up 
> something in ETL to generate edges between nodes based on more than one 
> field?
>
> Thanks!
>   -William
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Creating edges using ETL that require multiple join fields

Reply via email to