Hi Justin I think that you could try looking at the documentation for the metadata injection (MDI).
https://hop.apache.org/manual/latest/pipeline/metadata-injection.html That could be a start. But I'm sure that more expert Hoppers have better suggestions. Best regards, Michele On Tue, 28 Mar 2023, 15:41 Austin, Justin via users, <[email protected]> wrote: > Thank you for the advice Diego! > > > > We had come across this type of multi-schema text input/output capability > in Talend and I was hoping we could create our own plugins to accomplish > something similar here. > > > > *From:* Diego Mainou <[email protected]> > *Sent:* Monday, March 27, 2023 3:44 PM > *To:* users <[email protected]>; Austin, Justin > <[email protected]> > *Subject:* Re: Custom plugin - multi-schema text input > > > > [EXTERNAL EMAIL] > > Hi Justin, > > > > It seems to me that you are wanting to do too many things with one step > and that you will struggle to find a piece of software cheap or expensive > that does what you are describing in one step. > > > > ETL tools are good but they are not magical even ai needs to be trained. > > > > Best practice is to separate acquisition from business logic. > > So my recommendation would be to grab those files and acquire them in > their native state + governance (e.g. a load id) before you do anything to > them. > > > > Further, because you are dealing with many files of distinct nature you > may wish to segregate the "acquisition" from the loading > > E.g. by creating: > > - A generic and reusable component that 'copies/moves' the files from > wherever they are located into your landing zone. > - A bespoke component that acquires either a specific file or a > specific file types e.g. JSON and outputs to a generic format. E.g. a > serialised file > - A generic and reusable component that grabs files of the generic > format and loads into a table containing the raw data plus governance. > > The above will result in files from all walks of life being loaded into > your staging database in their raw state. This is very important for > governance purposes. > > > > Potentially your next step is to create a generic and reusable component > that utilises metadata injection to parse JSON into columns + governance. > > Rinse and repeat for xml, csv, etc. > > > > The next step being the mapping of your data and your dimensions. Once you > have your sk's you can the drop the values that were used to map those > sk's. etc, etc etc. > > > > Diego > > > > > > [image: Image removed by sender.] > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> > > Diego Mainou > Product Manager > M. +61 415 152 091 > E. [email protected] > > www.bizcubed.com.au > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> > > > > [image: Image removed by sender.] > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0> > > > ------------------------------ > > *From: *"Austin, Justin via users" <[email protected]> > *To: *"users" <[email protected]> > *Sent: *Tuesday, 28 March, 2023 1:41:06 AM > *Subject: *Custom plugin - multi-schema text input > > > > Hi Hop users, > > > > We’re evaluating whether HOP is the right tool to solve a common problem > for our business. > > > > We encounter hundreds of different file formats containing similar layers > of one-to-many hierarchy (simplified example below). Getting this to work > using out-of-box inputs/outputs and transform components results in a > complex/convoluted set of workflows & pipelines. Since we run into this so > often, we would like to develop a plugin with a custom “input” component > that reads the input file, inserts some ID fields for relationships, and > exposes multiple output rowsets (one for each schema/row type) that can be > mapped to separate downstream transformations. Eventually we’d like to make > another custom “output” component that can accept multiple inputs to load > them where we need them with hierarchy preserved (JSON, relational DB, > etc.). > > > > After reviewing the plugin documentation and samples, I’m still not sure > whether this is possible. It seems that the relevant plugin base classes > assume there will always be a single schema (IRowMeta) and single rowset > shared by all input and output connections/hops. I believe we would > require a single “transform” to have multiple IRowMeta and multiple rowsets > and the ability to select a specific one for any given hop to a downstream > transform/component. > > > > Is there a good path to accomplishing this with a HOP plugin? Or perhaps > a better approach to the problem with existing Hop features? > > > > Thanks! > > > > Example file: > > REC|Jane Smith|03-20-2003 > > ADDR|123 Main Street|Apartment 321|Anytown|US|55555 > > ACT|987654321|$4321.56|02-01-2023|03-02-2023 > > DTL|debit|$23.45|02-05-2023 > > DTL|debit|$143.20|02-13-2023 > > DTL|credit|$652.02|02-14-2023 > > DTL|debit|$8.78|02-28-2023 > > ACT|56789123|$7894.56|02-01-2023|03-02-2023 > > DTL|credit|$0.28|02-14-2023 > > REC|John Jacobs|03-20-2003 > > ADDR|876 Big Avenue||Anywhere|US|55556 > > ACT|5632178|$2256.79|02-01-2023|03-02-2023 > > DTL|credit|$0.02|02-14-2023 > > > > > > >
