HI Lars,

did you get the solution for this problme ??

On Thursday, January 22, 2015 at 12:36:15 AM UTC+5:30, Lars Plessmann wrote:
>
> I have a really huge CSV (about 240GB) file with several columns (lets say 
> there are columns A - H). 
> The first column A is the primary key of the main record (vertex 
> MainRecord). But the columns D, E, F, G are columns which should be stored 
> in an own vertex (because these fields are redundant over all the records 
> and I dont want to store them in the main record again and again). So the 
> column value of D-G itself should be stored as a property called "title" in 
> a new vertex (but it should not generate duplicates). Afterwards these 
> vertexes needs to be linked.
> Is this possible to reach this with an single orient-etl configuration? I 
> think the only way I know is to split the huge csv file's columns and 
> create sepperate files for each vertex. But I dont want to do this if that 
> is not neccessairy (file is so big).
> I hope you can give me an advice?
>
> I try to describe it in the config json syntax what I need (of course, 
> this will not work):
>
> {
>   "source": {
>     "file": {
>       "path": "dataexport.csv"
>     }
>   },
>   "extractor": {"row": {}},
>   "transformers": [
>     {
>       "csv": {
>         "separator": ",",
>         "nullValue": "NULL",
>         "skipFrom": -1,
>         "skipTo": -1
>       }
>     },
>     {
>       "field": {
>         "fieldName": "_id",
>         "expression": "$input._id.substring(9, 33)"
>       }
>     },
>     {
>       "field": {
>         "fieldName": "colD",
>         "class": "ColumnD",
>         "classProperty": "title"
>       }
>     },
>     {
>       "field": {
>         "fieldName": "colE",
>         "class": "ColumnE",
>         "classProperty": "title"
>       }
>     },
>     {
>       "field": {
>         "fieldName": "colF",
>         "class": "ColumnF",
>         "classProperty": "title"
>     }    {
>       "field": {
>         "fieldName": "colG",
>         "class": "ColumnG",
>         "classProperty": "title"
>       }
>     }
>     },
>     {
>       "vertex": {"class": "MainRecord"}
>     }
>   ],
>   "loader": {
>     "orientdb": {
>       "dbURL": "remote:127.0.0.1/msales_testing",
>       "dbUser": "admin",
>       "dbPassword": "admin",
>       "dbAutoCreate": true,
>       "dbType": "graph",
>       "classes": [
>         {
>           "name": "MainRecord",
>           "extends": "V"
>         },
>         {
>           "name": "ColumnD",
>           "extends": "V"
>         },
>         {
>           "name": "ColumnE",
>           "extends": "V"
>         },
>         {
>           "name": "ColumnF",
>           "extends": "V"
>         },
>         {
>           "name": "ColumnG",
>           "extends": "V"
>         }
>       ],
>       "indexes": [
>         {
>           "class": "MainRecord",
>           "fields": ["_id:string"],
>           "type": "UNIQUE"
>         }
>       ]
>     }
>   }
> }
>
>
>
> By the way: _id is in the MongoDB ObjectID format. I just want to store 
> the original hex value, so I used the substring sql method to extract the 
> hex id. Maybe there is a better way.
>
>
> regards
> Lars
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to