I would also look at current setup. I agree with Chris that 500 GB is fairly insignificant.
Best, Vinay Bagare On Dec 19, 2013, at 12:51 PM, Chris Embree <[email protected]> wrote: > In big data terms, 500G isn't big. But, moving that much data around > every night is not trivial either. I'm going to guess at a lot here, > but at a very high level. > > 1. Sqoop the data required to build the summary tables into Hadoop. > 2. Crunch the summaries into new tables (really just files on Hadoop) > 3. Sqoop the summarized data back out to Oracle > 4. Build Indices as needed. > > Depending on the size of the data being sqoop'd, this might help. It > might also take longer. A real solution would require more details > and analysis. > > Chris > > On 12/19/13, Jay Vee <[email protected]> wrote: >> We have a large relational database ( ~ 500 GB, hundreds of tables ). >> >> We have summary tables that we rebuild from scratch each night that takes >> about 10 hours. >> From these summary tables, we have a web interface that accesses the >> summary tables to build reports. >> >> There is a business reason for doing a complete rebuild of the summary >> tables each night, and using >> views (as in the sense of Oracle views) is not an option at this time. >> >> If I wanted to leverage Big Data technologies to speed up the summary table >> rebuild, what would be the first step into getting all data into some big >> data storage technology? >> >> Ideally in the end, we want to retain the summary tables in a relational >> database and have reporting work the same without modifications. >> >> It's just the crunching of the data and building these relational summary >> tables where we need a significant performance increase. >>
