I have a problem where I need to be able to generate edges between nodes 
using 2 or more join fields to properly resolve the match.

It's similar to this question on stack overflow 
<http://stackoverflow.com/questions/39517796/orientdb-etl-edge-transformer-2-joinfieldnames?noredirect=1&lq=1>...
 
the solution in that problem is to add multiple joinFieldName entries into 
the edge transformer, but this isn't quite working as expected when I tried 
it out...

If I change the data by appending a new row, 2,1 to each data files to get 
this:

data1.csv
a1,a2
1,1
1,2
2,3
2,1

data2.csv
b1,b2
1,1
2,3
1,2
2,1

then using the json provided:

data1.json
{
  "source": { "file": { "path": "./data1.csv" } },
  "extractor": { "csv": {} },
  "transformers": [
    { "vertex": { "class": "A" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "plocal:./test.orientdb",
       "dbType": "graph",
       "dbAutoCreate": true,
       "classes": [
         {"name": "A", "extends": "V"},
         {"name": "B", "extends": "V"},
         {"name": "Conn", "extends": "E"}
       ]
    }
  }
}


data2.json
{
  "source": { "file": { "path": "./data2.csv" } },
  "extractor": { "csv": {} },
  "transformers": [
    { "vertex": { "class": "B" } },
    { "edge": { "class": "Conn",
                "joinFieldName": "b1",
                "lookup": "A.a1",
                "joinFieldName": "b2",
                "lookup": "A.a2",
                "direction": "out"
            }}
  ],
  "loader": {
    "orientdb": {
       "dbURL": "plocal:./test.orientdb",
       "dbType": "graph",
       "dbAutoCreate": true,
       "classes": [
         {"name": "B", "extends": "V"},
         {"name": "Conn", "extends": "E"}
       ]
    }
  }
}

the result from running oetl.sh on data1.json then data2.json gives me this:
orientdb {db=test.orientdb}> select from v


+----+-----+------+----+----+-------------+----+----+-------------+
|#   |@RID |@CLASS|a1  |a2  |in_Conn      |b2  |b1  |out_Conn     |
+----+-----+------+----+----+-------------+----+----+-------------+
|0   |#25:0|A     |1   |1   |[#41:0,#45:0]|    |    |             |
|1   |#26:0|A     |1   |2   |[#44:0]      |    |    |             |
|2   |#27:0|A     |2   |3   |[#43:0]      |    |    |             |
|3   |#28:0|A     |2   |1   |[#42:0,#46:0]|    |    |             |
|4   |#33:0|B     |    |    |             |1   |1   |[#41:0,#42:0]|
|5   |#34:0|B     |    |    |             |3   |2   |[#43:0]      |
|6   |#35:0|B     |    |    |             |2   |1   |[#44:0]      |
|7   |#36:0|B     |    |    |             |1   |2   |[#45:0,#46:0]|
+----+-----+------+----+----+-------------+----+----+-------------+


8 item(s) found. Query executed in 0.01 sec(s).

which seems wrong to me... if I write out the edges:

A(1,1) <-- #41:0 --- B(1,1)   OK
A(1,1) <-- #45:0 --- B(2,1)   WRONG
A(1,2) <-- #44:0 --- B(1,2)   OK
A(2,3) <-- #43:0 --- B(2,3)   OK
A(2,1) <-- #42:0 --- B(1,1)   WRONG
A(2,1) <-- #46:0 --- B(2,1)   OK

My understanding here is that the two joinFieldName entries *should* be 
creating an AND operation between the two keys... so I expect to match an A 
to a B if A.a1 == B.b1 AND A.a2 == B.b2, but this isn't what is happening. 
 From the looks of it, the first joinFieldName is ignored and the 2nd 
joinFieldName entry is the thing that's actually used to match.

Is this a bug?  If not and it's working as intended, how can I set up 
something in ETL to generate edges between nodes based on more than one 
field?

Thanks!
  -William


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to