Re: [orientdb] Loading data from CSV into specified cluster

Fabio Rinnone Mon, 18 Apr 2016 02:29:58 -0700

Il 18/04/2016 10:25, Roberto Franchini ha scritto:

> Now, ETL. If you configure ET to store on a given cluster, all the
> document loaded will be store in that cluster.
> So, you can load different data's partition on different clusters of
> the same class.
> Suppose to have 12 CSVs, one for each month. Each CSV contains
> contains invoices for a single month:
> invoices_01.csv contains invoices for January
> invoices_12.csv contains invoices for December
> 
> It could be useful to "partion" Invoice class in 12 clusters, and load
> each csv on its own cluster.
> 
> I hope this could clarify what's the purpose of Clusters.


Thank you for the reply, I have read documentation and I think that I
correctly understand the role of clusters in OrientDB.

So, I will explain my issue with your invoices example:

suppose we have two csv files, invoices_01.csv defined as follow:

"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"

and invoices_02.csv defined as follow:

"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"

So, I would to create a class named invoices (with default main cluster
named invoices) and two more cluster named respectively invoices_01 (for
the data of the first csv file) and invoices_02 (for the data of the
second one).

I define my first ETL loader as follow:

"loader": {
  "orientdb": {
     "dbURL": "plocal:../databases/invoices",
     "wal": false,
     "tx": false,
     "batchCommit": 10000,
     "dbType": "graph",
     "cluster": "invoices_01",
     "classes": [
       {"name": "invoices", "extends": "V"}
     ], "indexes": [
       {"class":"invoices", "fields":["id:integer"], "type":"UNIQUE" }
     ]
  }

Look at the parameter "cluster" with value "invoice_01" (the second json
ETL loader is similar, it changes only for cluster name and csv file path).

When I launh first ETL module I expect it creates a class with two
clusters named respectively invoices and invoices_01 and I expect that
invoices cluster contains no records and invoices_01 contains all 4
records of csv file.

But my output is different: it creates two clusters respectively with
ids 11 (invoices) and 12 (invoices_01) and it loades data into classes
as follow:

[1:vertex] DEBUG Transformer output: v(invoices)[#11:0]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:0]
[3:vertex] DEBUG Transformer output: v(invoices)[#11:1]
[4:vertex] DEBUG Transformer output: v(invoices)[#12:1]

I think this is not correct, because I think that my loader should be
load data only into cluster with id 12.

However, when I launch the second ETL loader the results is similar: it
creates a new cluster named invoices_02 with id 13 and the log contains:

[1:vertex] DEBUG Transformer output: v(invoices)[#11:2]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:2]
[3:vertex] DEBUG Transformer output: v(invoices)[#13:0]
[4:vertex] DEBUG Transformer output: v(invoices)[#11:3]

I think that the second ETL loader should load data only into cluster
with id 13 (invoices_02). Finally I have three clusters (11, 12, 13)
which contains respectively 4, 3 and 1 records.

I don't know what's my error.

-- 
Fabio Rinnone
Skype: fabiorinnone
Web: http://www.fabiorinnone.eu

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

signature.asc
Description: OpenPGP digital signature

Re: [orientdb] Loading data from CSV into specified cluster

Reply via email to