HI,
Need help on Hadoop cluster capacity and hardware specification: ================================================= We are plan to migrate the existing “Enterprise Data warehouse”/”Business intelligent “ to Hadoop based solution. In the current system has Teradata as storage, Abinitio for ETL and MicroStrategy for reporting. We like to replace the current solution with Hadoop based solution. In the Hadoop solution, should store all raw CDR in HDFS and ETL processing of that CDR using hive/spark ( using any Hadoop SQL tool) . In the current system, Teradata has 128TB storage and 100TB+ CDR files. Question: 1. How many Node needed to store and process 228TB(128TB+100TB )of data? 2. What hardware configuration required for each node slave node and master node? 3. Which is the best SQL on Hadoop tools for writing ETL jobs? We are considered hive, spark,casandra and cascading for evaluation. Please suggest me if you have any other tools. Please provide the valable input, thanks for you support. Thanks, Saravanan https://www.linkedin.com/in/saravanan303
