I have a need to create a flat or denormalized data structure in Phoenix.

If the sources are 3 Phoenix tables:

A (idA, colA1, colA2, colA3)
B (idB, colB1, colB2, colB3, idA, idC)
C (idC, colC1, colC2, colC3)

and the user wants to see & write queries on

table ABC (idA|idB|idC, colA2, colA2, colA3, colB1, colB2, colB3, colC1, colC2, colC3)

where "idA|idB|idC" is a compound key of the 3 identifiers and B is the table that has keys to join A & C to it, then it seems to me I could approach this two ways. ( In my case number of rows are rowcountA < rowcountB < rowcountC ).

1) create a program / Map Reduce that works through B and looks up the appropriate A and C rows and writes out a new table "ABC" which contains the flattened data. ( I could use M/R to efficiently do the joins.)

2)
But what might performance be if I were to join them at runtime?

For certain types of join, it would seem that a call to table B for columns idB, colB1, colB2, colB3, idA, idC could cause a Co-processor to execute inside a region server and pull in data dynamically from A & C.

Assuming that idA!=idB!=idB then there is reason to suppose the associated row from A & C would be local to the Region server, so there would be lots of network traffic to achieve this naive join, particularly compared to some other more efficient method.

Is my thinking about option (2) correct - that assuming neither A, B, C data fit into memory, (2) would perform poorly compared to the classic denormalized or flattened table... it just seem so wasteful to store colA1, colA2, colA3 again and again and again.

Thanks,

Andrew.


Reply via email to