I have a really huge CSV (about 240GB) file with several columns (lets say there are columns A - H). The first column A is the primary key of the main record (vertex MainRecord). But the columns D, E, F, G are columns which should be stored in an own vertex (because these fields are redundant over all the records and I dont want to store them in the main record again and again). So the column value of D-G itself should be stored as a property called "title" in a new vertex (but it should not generate duplicates). Afterwards these vertexes needs to be linked. Is this possible to reach this with an single orient-etl configuration? I think the only way I know is to split the huge csv file's columns and create sepperate files for each vertex. But I dont want to do this if that is not neccessairy (file is so big). I hope you can give me an advice?
I try to describe it in the config json syntax what I need (of course, this will not work): { "source": { "file": { "path": "dataexport.csv" } }, "extractor": {"row": {}}, "transformers": [ { "csv": { "separator": ",", "nullValue": "NULL", "skipFrom": -1, "skipTo": -1 } }, { "field": { "fieldName": "_id", "expression": "$input._id.substring(9, 33)" } }, { "field": { "fieldName": "colD", "class": "ColumnD", "classProperty": "title" } }, { "field": { "fieldName": "colE", "class": "ColumnE", "classProperty": "title" } }, { "field": { "fieldName": "colF", "class": "ColumnF", "classProperty": "title" } { "field": { "fieldName": "colG", "class": "ColumnG", "classProperty": "title" } } }, { "vertex": {"class": "MainRecord"} } ], "loader": { "orientdb": { "dbURL": "remote:127.0.0.1/msales_testing", "dbUser": "admin", "dbPassword": "admin", "dbAutoCreate": true, "dbType": "graph", "classes": [ { "name": "MainRecord", "extends": "V" }, { "name": "ColumnD", "extends": "V" }, { "name": "ColumnE", "extends": "V" }, { "name": "ColumnF", "extends": "V" }, { "name": "ColumnG", "extends": "V" } ], "indexes": [ { "class": "MainRecord", "fields": ["_id:string"], "type": "UNIQUE" } ] } } } By the way: _id is in the MongoDB ObjectID format. I just want to store the original hex value, so I used the substring sql method to extract the hex id. Maybe there is a better way. regards Lars -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.