[GitHub] [spark] DJeyCodeX opened a new pull request #27506: Demo Pipeline of Migrating OnPremise Database to Spark & Hadoop

GitBox Sat, 08 Feb 2020 23:34:55 -0800

DJeyCodeX opened a new pull request #27506: Demo Pipeline of Migrating 
OnPremise Database to Spark & Hadoop
URL: https://github.com/apache/spark/pull/27506
 
 
   ### What changes were proposed in this pull request?
   This PR consist the following:
   
   1. Many of the organisations are facing issue to migrate their current 
database such as Mysql to Hadoop & Spark Ecosystem
   2. Created a Demo Pipeline where I have covered 3 use cases:
      **Case 1: Storing & then reading from HDFS Part File in Spark**
      **Case 2: Converting it into parquete format & then reading from parquete 
file format in SPARK**
      **Special Case: Directly analyisng in Spark from MySQL without storing in 
HDFS**
   3. Finally after all the aggregations in Spark, generating a reporting 
Dashboard using Tableau.
   
   Well, this Code may help many of the Spark Users who are willing to do this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] DJeyCodeX opened a new pull request #27506: Demo Pipeline of Migrating OnPremise Database to Spark & Hadoop

Reply via email to