Re: Optimum storage startegy (Advice?)

joseheitor Tue, 27 Nov 2018 00:14:10 -0800

Thanks for the reply and questions, Evgenii. Here are some additional details
(long-winded...sorry), to provide you with some context of the usage.


The reasons for not using a Flat table structure are as follows:

1. - A real-world retail transaction has a much larger and complex structure
than my simple example test-case. A single transaction can have many simple
and complex nested elements with varying structures that house one or more
records for terminal, store,item-sales, item-returns, discounts, rewards,
tenders, taxes, customer, salesperson, etc., etc...

2. - The transaction structure can be (and generally is) extended over time,
without impacting the data store or the code.

3. - If we can attain reasonable performance with this 'meta' key-value type
storage structure, then we can tailor the amount and selection of individual
elements that need to be queried, for each customer's requirement based on
their need and appetite for big data analytics. This could be achieved
purely through configuration (list of keys used in queries), without
requiring custom development. For example:

A) Customer with basic financial reporting needs: we simply store a few
records as follows...

pk=1  id=2195  k='trans.body'  v='{[full JSON transaction]}'
pk=2  id=2195  k='trans.date'  v='2018-12-01'

B) Customer with larger appetite for complex reporting and analytics: we
would then store several records for a number of matching keys required for
querying as follows...

pk=1  id=2195  k='trans.body'  v='{[full JSON transaction]}'
pk=2  id=2195  k='trans.date'  v='2018-12-01'
pk=3  id=2195  k='trans.item.sale[].sku'  v='6005120257329'
pk=4  id=2195  k='trans.item.sale[].sku'  v='2460058410533'
pk=5  id=2195  k='trans.cust.lastname'  v='Smith'
pk=6  id=2195  k='trans.tender.type.card.amount'  v='2595'
pk=7  id=2195  k='trans.amount'  v='2595'
pk=8  id=2195  k='trans.tax.type0.amount'  v='189'
pk=9  id=2195  k='trans.tax.type1.amount'  v='0'
pk=10  id=2195  k='trans.store.id'  v='109574'
... (etc)

The table structure (with all indexes) would be as follows:

CREATE TABLE public.transactions (
  pk UUID, id UUID, k VARCHAR, v VARCHAR, PRIMARY KEY (pk, id))
  WITH "TEMPLATE=PARTITIONED, BACKUPS=1, ATOMICITY=TRANSACTIONAL,
WRITE_SYNCHRONIZATION_MODE=FULL_SYNC, AFFINITY_KEY=id";

CREATE INDEX test_data_id_k_v ON public.test_data (id, k, v);
CREATE INDEX test_data_k_id_v ON public.test_data (k, id, v);
CREATE INDEX test_data_k_v_id ON public.test_data (k, v, id);
 
CREATE INDEX test_data_id ON public.test_data (id);
CREATE INDEX test_data_k ON public.test_data (k);
CREATE INDEX test_data_v ON public.test_data (v);
CREATE INDEX test_data_pk ON public.test_data (pk);

Inserting data through plain JDBC INSERT operations is slow and worrying.
Although the docs for the JDBC Client Driver specify that setting
'streaming=true' in the connection string will enable streaming - I have
found that although no errors occur, the table remains empty. (Don't know
whether I am doing something wrong, or if it is a bug?). Do you know of any
examples for doing this?

Many thanks,
Jose



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Optimum storage startegy (Advice?)

Reply via email to