GitHub user dmainou edited a comment on the discussion: Question regarding 
Metadata Injection (MDI)

Hey, not sure what is the question.

I built the following on Friday if it may help you figure out what you need:

Situation: I just migrated a client from Pentaho to Hop. Need to prove things 
work.
Task: To create a validation framework that compares every table in the prod 
server (Pentaho) with every table in my test server (HOP)
Actions:
In a completely new project I created the following:

1. pipeline 1.  
- gets a list of files ending in *.hpl  from directory (target project)
- loads the files into memory
- parses the xml finding the transform block
- filters transfrorms of type TableOutput
- extracts the database connection and table name
- using a pipeline executor passes the above 2 elements to the next pipeline
2. pipeline 2 (metadata injector)
- Gets the list of columns and their metadata for the connection and table 
- excludes anything containing an SK
- Upper's all text
- Casts all dates into text yyyyMMdd
- Builds an SQL Select statement using the above
- Figures out a sort order list
- Figures out a merge-diff plan
- Figures out a different rows filename and output fields
- creates a placeholder for the table name and connection
- Injects all of the above into a blank template.
3. pipeline 3 (Blank template)
- 2x empty table input steps one pointing at prod the other at Test 
- 2x empty sort steps one pointing at prod the other at Test 
- 1 empty merge diff step
- a filter splitting things into identical and everything else
- everything else spits out a file with the differences
- everything else also copies the rows back to identical
- the identical side then sorts and executes a group by to sum the count of 
identical, new, deleted and modified which is appended to a csv.

I did not output a hardcoded populated template as I don't really need it.
I simply executed the job and validated some ~50 tables and ~40M rows in about 
30 minutes.

Today I have moved to a new project.

 
![image](https://github.com/user-attachments/assets/4e58f892-4ad2-446b-93cc-623d6aa53709)
![image](https://github.com/user-attachments/assets/59cb8ec8-f390-4d7e-a301-e44f3cd0cc8a)
![image](https://github.com/user-attachments/assets/349498ca-f6f1-4939-a033-f4950df18645)


GitHub link: 
https://github.com/apache/hop/discussions/5486#discussioncomment-13680260

----
This is an automatically sent email for users@hop.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org

Reply via email to