HI Lars,
did you get the solution for this problme ??
On Thursday, January 22, 2015 at 12:36:15 AM UTC+5:30, Lars Plessmann wrote:
>
> I have a really huge CSV (about 240GB) file with several columns (lets say
> there are columns A - H).
> The first column A is the primary key of the main record (vertex
> MainRecord). But the columns D, E, F, G are columns which should be stored
> in an own vertex (because these fields are redundant over all the records
> and I dont want to store them in the main record again and again). So the
> column value of D-G itself should be stored as a property called "title" in
> a new vertex (but it should not generate duplicates). Afterwards these
> vertexes needs to be linked.
> Is this possible to reach this with an single orient-etl configuration? I
> think the only way I know is to split the huge csv file's columns and
> create sepperate files for each vertex. But I dont want to do this if that
> is not neccessairy (file is so big).
> I hope you can give me an advice?
>
> I try to describe it in the config json syntax what I need (of course,
> this will not work):
>
> {
> "source": {
> "file": {
> "path": "dataexport.csv"
> }
> },
> "extractor": {"row": {}},
> "transformers": [
> {
> "csv": {
> "separator": ",",
> "nullValue": "NULL",
> "skipFrom": -1,
> "skipTo": -1
> }
> },
> {
> "field": {
> "fieldName": "_id",
> "expression": "$input._id.substring(9, 33)"
> }
> },
> {
> "field": {
> "fieldName": "colD",
> "class": "ColumnD",
> "classProperty": "title"
> }
> },
> {
> "field": {
> "fieldName": "colE",
> "class": "ColumnE",
> "classProperty": "title"
> }
> },
> {
> "field": {
> "fieldName": "colF",
> "class": "ColumnF",
> "classProperty": "title"
> } {
> "field": {
> "fieldName": "colG",
> "class": "ColumnG",
> "classProperty": "title"
> }
> }
> },
> {
> "vertex": {"class": "MainRecord"}
> }
> ],
> "loader": {
> "orientdb": {
> "dbURL": "remote:127.0.0.1/msales_testing",
> "dbUser": "admin",
> "dbPassword": "admin",
> "dbAutoCreate": true,
> "dbType": "graph",
> "classes": [
> {
> "name": "MainRecord",
> "extends": "V"
> },
> {
> "name": "ColumnD",
> "extends": "V"
> },
> {
> "name": "ColumnE",
> "extends": "V"
> },
> {
> "name": "ColumnF",
> "extends": "V"
> },
> {
> "name": "ColumnG",
> "extends": "V"
> }
> ],
> "indexes": [
> {
> "class": "MainRecord",
> "fields": ["_id:string"],
> "type": "UNIQUE"
> }
> ]
> }
> }
> }
>
>
>
> By the way: _id is in the MongoDB ObjectID format. I just want to store
> the original hex value, so I used the substring sql method to extract the
> hex id. Maybe there is a better way.
>
>
> regards
> Lars
>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.