Thank you for the advice Diego! We had come across this type of multi-schema text input/output capability in Talend and I was hoping we could create our own plugins to accomplish something similar here.
From: Diego Mainou <[email protected]> Sent: Monday, March 27, 2023 3:44 PM To: users <[email protected]>; Austin, Justin <[email protected]> Subject: Re: Custom plugin - multi-schema text input [EXTERNAL EMAIL] Hi Justin, It seems to me that you are wanting to do too many things with one step and that you will struggle to find a piece of software cheap or expensive that does what you are describing in one step. ETL tools are good but they are not magical even ai needs to be trained. Best practice is to separate acquisition from business logic. So my recommendation would be to grab those files and acquire them in their native state + governance (e.g. a load id) before you do anything to them. Further, because you are dealing with many files of distinct nature you may wish to segregate the "acquisition" from the loading E.g. by creating: * A generic and reusable component that 'copies/moves' the files from wherever they are located into your landing zone. * A bespoke component that acquires either a specific file or a specific file types e.g. JSON and outputs to a generic format. E.g. a serialised file * A generic and reusable component that grabs files of the generic format and loads into a table containing the raw data plus governance. The above will result in files from all walks of life being loaded into your staging database in their raw state. This is very important for governance purposes. Potentially your next step is to create a generic and reusable component that utilises metadata injection to parse JSON into columns + governance. Rinse and repeat for xml, csv, etc. The next step being the mapping of your data and your dimensions. Once you have your sk's you can the drop the values that were used to map those sk's. etc, etc etc. Diego [Image removed by sender.]<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> Diego Mainou Product Manager M. +61 415 152 091 E. [email protected]<mailto:[email protected]> www.bizcubed.com.au<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> [Image removed by sender.]<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> ________________________________ From: "Austin, Justin via users" <[email protected]> To: "users" <[email protected]> Sent: Tuesday, 28 March, 2023 1:41:06 AM Subject: Custom plugin - multi-schema text input Hi Hop users, We're evaluating whether HOP is the right tool to solve a common problem for our business. We encounter hundreds of different file formats containing similar layers of one-to-many hierarchy (simplified example below). Getting this to work using out-of-box inputs/outputs and transform components results in a complex/convoluted set of workflows & pipelines. Since we run into this so often, we would like to develop a plugin with a custom "input" component that reads the input file, inserts some ID fields for relationships, and exposes multiple output rowsets (one for each schema/row type) that can be mapped to separate downstream transformations. Eventually we'd like to make another custom "output" component that can accept multiple inputs to load them where we need them with hierarchy preserved (JSON, relational DB, etc.). After reviewing the plugin documentation and samples, I'm still not sure whether this is possible. It seems that the relevant plugin base classes assume there will always be a single schema (IRowMeta) and single rowset shared by all input and output connections/hops. I believe we would require a single "transform" to have multiple IRowMeta and multiple rowsets and the ability to select a specific one for any given hop to a downstream transform/component. Is there a good path to accomplishing this with a HOP plugin? Or perhaps a better approach to the problem with existing Hop features? Thanks! Example file: REC|Jane Smith|03-20-2003 ADDR|123 Main Street|Apartment 321|Anytown|US|55555 ACT|987654321|$4321.56|02-01-2023|03-02-2023 DTL|debit|$23.45|02-05-2023 DTL|debit|$143.20|02-13-2023 DTL|credit|$652.02|02-14-2023 DTL|debit|$8.78|02-28-2023 ACT|56789123|$7894.56|02-01-2023|03-02-2023 DTL|credit|$0.28|02-14-2023 REC|John Jacobs|03-20-2003 ADDR|876 Big Avenue||Anywhere|US|55556 ACT|5632178|$2256.79|02-01-2023|03-02-2023 DTL|credit|$0.02|02-14-2023
