Hi Mich!
   I think you can combine the good/rejected into one method that
internally:

   - Create good/rejected df's given an input df and input rules/predicates
   to apply to the df.
   - Create a third df containing the good rows and the rejected rows with
   the bad columns nulled out
   - Append/insert the two dfs into their respective hive good/exception
   tables
   - return value can be a tuple of the (goodDf,exceptionsDf,combinedDf)
   or maybe just the (combinedDf,exceptionsDf)


Am Sa., 2. Mai 2020 um 06:00 Uhr schrieb Mich Talebzadeh <
mich.talebza...@gmail.com>:

>
> Hi,
>
> I have a Spark Scala program created and compiled with Maven. It works
> fine. It basically does the following:
>
>
>    1. Reads an xml file from HDFS location
>    2. Creates a DF on top of what it reads
>    3. Creates a new DF with some columns renamed etc
>    4. Creates a new DF for rejected rows (incorrect value for a column)
>    5. Puts rejected data into Hive exception table
>    6. Puts valid rows into Hive main table
>    7. Nullifies the invalid rows by setting the invalid column to NULL
>    and puts the rows into the main Hive table
>
> These are currently performed in one method. Ideally I want to break this
> down as follows:
>
>
>    1. A method to read the XML file and creates DF and a new DF on top of
>    previous DF
>    2. A method to create a DF on top of rejected rows using t
>    3. A method to put invalid rows into the exception table using tmp
>    table
>    4. A method to put the correct rows into the main table again using
>    tmp table
>
> I was wondering if this is correct approach?
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Reply via email to