Il 18/04/2016 10:25, Roberto Franchini ha scritto: > Now, ETL. If you configure ET to store on a given cluster, all the > document loaded will be store in that cluster. > So, you can load different data's partition on different clusters of > the same class. > Suppose to have 12 CSVs, one for each month. Each CSV contains > contains invoices for a single month: > invoices_01.csv contains invoices for January > invoices_12.csv contains invoices for December > > It could be useful to "partion" Invoice class in 12 clusters, and load > each csv on its own cluster. > > I hope this could clarify what's the purpose of Clusters.
Thank you for the reply, I have read documentation and I think that I
correctly understand the role of clusters in OrientDB.
So, I will explain my issue with your invoices example:
suppose we have two csv files, invoices_01.csv defined as follow:
"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"
and invoices_02.csv defined as follow:
"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"
So, I would to create a class named invoices (with default main cluster
named invoices) and two more cluster named respectively invoices_01 (for
the data of the first csv file) and invoices_02 (for the data of the
second one).
I define my first ETL loader as follow:
"loader": {
"orientdb": {
"dbURL": "plocal:../databases/invoices",
"wal": false,
"tx": false,
"batchCommit": 10000,
"dbType": "graph",
"cluster": "invoices_01",
"classes": [
{"name": "invoices", "extends": "V"}
], "indexes": [
{"class":"invoices", "fields":["id:integer"], "type":"UNIQUE" }
]
}
Look at the parameter "cluster" with value "invoice_01" (the second json
ETL loader is similar, it changes only for cluster name and csv file path).
When I launh first ETL module I expect it creates a class with two
clusters named respectively invoices and invoices_01 and I expect that
invoices cluster contains no records and invoices_01 contains all 4
records of csv file.
But my output is different: it creates two clusters respectively with
ids 11 (invoices) and 12 (invoices_01) and it loades data into classes
as follow:
[1:vertex] DEBUG Transformer output: v(invoices)[#11:0]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:0]
[3:vertex] DEBUG Transformer output: v(invoices)[#11:1]
[4:vertex] DEBUG Transformer output: v(invoices)[#12:1]
I think this is not correct, because I think that my loader should be
load data only into cluster with id 12.
However, when I launch the second ETL loader the results is similar: it
creates a new cluster named invoices_02 with id 13 and the log contains:
[1:vertex] DEBUG Transformer output: v(invoices)[#11:2]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:2]
[3:vertex] DEBUG Transformer output: v(invoices)[#13:0]
[4:vertex] DEBUG Transformer output: v(invoices)[#11:3]
I think that the second ETL loader should load data only into cluster
with id 13 (invoices_02). Finally I have three clusters (11, 12, 13)
which contains respectively 4, 3 and 1 records.
I don't know what's my error.
--
Fabio Rinnone
Skype: fabiorinnone
Web: http://www.fabiorinnone.eu
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
signature.asc
Description: OpenPGP digital signature
