Hi Luca,

I know these fields do not exist. This should only be an example what I'm 
looking for for a better understanding (see my note above the config file 
;)).
The Wiki is not really informative about complex Transformation :-(

I think its more efficient to store records in its own classes instead of 
building document records with lots of redundant fields (esp. during 
reporting/OLAP)? So that is the reason I want to use multiple classes 
(reduce space and enhance query speed).

So let me make an concrete example of an CSV file I want to process:

id, voucherid, description,               creationdate,          brand,     
model,       customerid
1,  4711,      "Special Offer III",       "2012-05-02T09:12:17", "Samsung", 
"Galaxy S3", 50112
2,  4712,      "Special Offer II",        "2012-05-03T09:1417",  "HTC",     
"One",       50113
3,  4713,      "Special Offer III",       "2012-05-04T10:17:17", "HTC",     
"One",       50002
4,  4714,      "Special Offer for Mr XY", "2012-05-04T11:09:17", "Apple",   
"iPhone 5",  50017
[...]

So I want to create the class "Order" (extends V), "Brand" (extends V), 
"Model" (extends V), "Customer" (extends V).
The Brand and Model and the Customer should be linked with the Order and I 
do not want duplicates in the class Order, Brand, Model, Customer.

class Order gets the properties voucherid, description, creationdate.
class Brand gets the property title, where the value of csv field "brand" 
is stored.
class Model gets the property title, where the value of csv field "model" 
is stored.
class Customer gets the property customerid (firstname, lastname etc.), 
where the csv field "customerid" is stored (and firstname, lastname etc.)

I want to run a ETL job just one time (because there are millions of 
records) because its a huge file instead of running the job for each Class 
again and again.
I hope you know what I want?


regards
Lars


Am Mittwoch, 21. Januar 2015 23:09:40 UTC+1 schrieb Lvc@:
>
> Hi Lars,
> I don't know the "class" and "classProperty" fields in "field" 
> transformer. This is the syntax:
>
>
> http://www.orientechnologies.com/docs/last/orientdb-etl.wiki/Transformer.html#field
>
> Furthermore why do you need multiple classes?
>
> Lvc@
>
>
> On 21 January 2015 at 20:06, Lars Plessmann <lples...@gmail.com 
> <javascript:>> wrote:
>
>> I have a really huge CSV (about 240GB) file with several columns (lets 
>> say there are columns A - H). 
>> The first column A is the primary key of the main record (vertex 
>> MainRecord). But the columns D, E, F, G are columns which should be stored 
>> in an own vertex (because these fields are redundant over all the records 
>> and I dont want to store them in the main record again and again). So the 
>> column value of D-G itself should be stored as a property called "title" in 
>> a new vertex (but it should not generate duplicates). Afterwards these 
>> vertexes needs to be linked.
>> Is this possible to reach this with an single orient-etl configuration? I 
>> think the only way I know is to split the huge csv file's columns and 
>> create sepperate files for each vertex. But I dont want to do this if that 
>> is not neccessairy (file is so big).
>> I hope you can give me an advice?
>>
>> I try to describe it in the config json syntax what I need (of course, 
>> this will not work):
>>
>> {
>>   "source": {
>>     "file": {
>>       "path": "dataexport.csv"
>>     }
>>   },
>>   "extractor": {"row": {}},
>>   "transformers": [
>>     {
>>       "csv": {
>>         "separator": ",",
>>         "nullValue": "NULL",
>>         "skipFrom": -1,
>>         "skipTo": -1
>>       }
>>     },
>>     {
>>       "field": {
>>         "fieldName": "_id",
>>         "expression": "$input._id.substring(9, 33)"
>>       }
>>     },
>>     {
>>       "field": {
>>         "fieldName": "colD",
>>         "class": "ColumnD",
>>         "classProperty": "title"
>>       }
>>     },
>>     {
>>       "field": {
>>         "fieldName": "colE",
>>         "class": "ColumnE",
>>         "classProperty": "title"
>>       }
>>     },
>>     {
>>       "field": {
>>         "fieldName": "colF",
>>         "class": "ColumnF",
>>         "classProperty": "title"
>>     }    {
>>       "field": {
>>         "fieldName": "colG",
>>         "class": "ColumnG",
>>         "classProperty": "title"
>>       }
>>     }
>>     },
>>     {
>>       "vertex": {"class": "MainRecord"}
>>     }
>>   ],
>>   "loader": {
>>     "orientdb": {
>>       "dbURL": "remote:127.0.0.1/msales_testing",
>>       "dbUser": "admin",
>>       "dbPassword": "admin",
>>       "dbAutoCreate": true,
>>       "dbType": "graph",
>>       "classes": [
>>         {
>>           "name": "MainRecord",
>>           "extends": "V"
>>         },
>>         {
>>           "name": "ColumnD",
>>           "extends": "V"
>>         },
>>         {
>>           "name": "ColumnE",
>>           "extends": "V"
>>         },
>>         {
>>           "name": "ColumnF",
>>           "extends": "V"
>>         },
>>         {
>>           "name": "ColumnG",
>>           "extends": "V"
>>         }
>>       ],
>>       "indexes": [
>>         {
>>           "class": "MainRecord",
>>           "fields": ["_id:string"],
>>           "type": "UNIQUE"
>>         }
>>       ]
>>     }
>>   }
>> }
>>
>>
>>
>> By the way: _id is in the MongoDB ObjectID format. I just want to store 
>> the original hex value, so I used the substring sql method to extract the 
>> hex id. Maybe there is a better way.
>>
>>
>> regards
>> Lars
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to