There's an example <https://orientdb.com/docs/last/Transformer.html> in the 
docs that is like this:

{
   "source":{
      "content":{
         
"value":"id,name,surname,friendSince,friendId,friendName,friendSurname\n0,Jay,Miner,1996,1,Luca,Garulli"
      }
   },
   "extractor":{
      "row":{

      }
   },
   "transformers":[
      {
         "csv":{

         }
      },
      {
         "vertex":{
            "class":"V1"
         }
      },
      {
         "edge":{
            "unresolvedLinkAction":"CREATE",
            "class":"Friend",
            "joinFieldName":"friendId",
            "lookup":"V2.fid",
            "targetVertexFields":{
               "name":"${input.friendName}",
               "surname":"${input.friendSurname}"
            },
            "edgeFields":{
               "since":"${input.friendSince}"
            }
         }
      },
      {
         "field":{
            "fieldNames":[
               "friendSince",
               "friendId",
               "friendName",
               "friendSurname"
            ],
            "operation":"remove"
         }
      }
   ],
   "loader":{
      "orientdb":{
         "dbURL":"memory:ETLBaseTest",
         "dbType":"graph",
         "useLightweightEdges":false
      }
   }
}

In the *edge* transformer's *lookup* field, what do *V2* and *fid* refer 
to? *V2* is not defined in the vertex transforms and *fid* is not a column 
in the CSV input. Where are they coming from?

In particularly, I have two sets of CSV files:

*users.csv:*
username,first_name,last_name
user1,John,Doe
user2,Jane,Doe
user3,Gene,Doe

*user_friends.csv:*
username,friend_name
user1,user2
user1,user3
user2,user1
user2,user3
user3,user1
user3,user2

I first import the users.csv using this ETL config:

{
  "source":       {
    "file": {
      "path": "/tmp/users.csv"
    }
  },
  "extractor":    {
    "csv": {}
  },
  "transformers": [
    {
      "vertex": {
        "class": "User"
      }
    }
  ],
  "loader":       {
    "orientdb": {
      "dbURL":   "plocal:/temp/databases/users_friends",
      "dbType":  "graph",
      "classes": [
        {
          "name":    "User",
          "extends": "V"
        },
        {
          "name":    "HasFriend",
          "extends": "E"
        }
      ],
      "indexes": [
        {
          "class":  "User",
          "fields": [
            "username:string"
          ],
          "type":   "UNIQUE"
        }
      ]
    }
  }
}

And all the records are imported without any errors. Then I want to import 
the friendship CSV using the following ETL config:

{
  "source":       {
    "file": {
      "path": "/tmp/user_friends.csv"
    }
  },
  "extractor":    {
    "csv": {}
  },
  "transformers": [
    {
      "vertex": {
        "class": "User"
      }
    },
    {
      "edge": {
        "class":         "HasFriend",
        "joinFieldName": "friend_name",
        "lookup":        "User.username",
        "direction":     "in"
      }
    }
  ],
  "loader":       {
    "orientdb": {
      "dbURL":   "plocal:/temp/databases/users_friends",
      "dbType":  "graph",
      "classes": [
        {
          "name":    "User",
          "extends": "V"
        },
        {
          "name":    "HasFriends",
          "extends": "E"
        }
      ],
      "indexes": [
        {
          "class":  "User",
          "fields": [
            "username:string"
          ],
          "type":   "UNIQUE"
        }
      ]
    }
  }
}

However the import fails due to the fact that the same username can appear 
in multiple rows in the second CSV file:

Uncaught exception in thread 'pool-2-thread-1'
com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
Cannot index record User{friend_name:user-2,username:user1}: found 
duplicated key 'user-0' in index 'User.username' previously assigned to the 
record #25:0

Is there a way to handle scenarios like this?

Thanks in advance.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to