Hi Justin
I think that you could try looking at the documentation for the metadata
injection (MDI).

https://hop.apache.org/manual/latest/pipeline/metadata-injection.html

That could be a start.
But I'm sure that more expert Hoppers have better suggestions.

Best regards,
Michele


On Tue, 28 Mar 2023, 15:41 Austin, Justin via users, <[email protected]>
wrote:

> Thank you for the advice Diego!
>
>
>
> We had come across this type of multi-schema text input/output capability
> in Talend and I was hoping we could create our own plugins to accomplish
> something similar here.
>
>
>
> *From:* Diego Mainou <[email protected]>
> *Sent:* Monday, March 27, 2023 3:44 PM
> *To:* users <[email protected]>; Austin, Justin
> <[email protected]>
> *Subject:* Re: Custom plugin - multi-schema text input
>
>
>
> [EXTERNAL EMAIL]
>
> Hi Justin,
>
>
>
> It seems to me that you are wanting to do too many things with one step
> and that you will struggle to find a piece of software cheap or  expensive
> that  does what you are describing in one step.
>
>
>
> ETL tools are good but they are not magical even ai needs to be trained.
>
>
>
> Best practice is to separate acquisition from business logic.
>
> So my recommendation would be to grab those files and acquire them in
> their native state + governance (e.g. a load id) before you do anything to
> them.
>
>
>
> Further, because you are dealing with many files of  distinct nature you
> may wish to segregate the "acquisition" from the loading
>
> E.g. by creating:
>
>    - A generic and reusable component that 'copies/moves' the files from
>    wherever they are located into your landing zone.
>    - A bespoke component that acquires either a specific file or a
>    specific file types e.g. JSON and outputs to a generic format. E.g. a
>    serialised file
>    - A generic and reusable component that grabs files of the generic
>    format and loads into a table containing the raw data plus governance.
>
> The above will result in files from all walks of life being loaded into
> your staging database in their raw state. This is very important for
> governance purposes.
>
>
>
> Potentially your next step is to create a generic and reusable component
> that utilises metadata injection to parse JSON into columns + governance.
>
> Rinse and repeat for xml, csv, etc.
>
>
>
> The next step being the mapping of your data and your dimensions. Once you
> have your sk's you can the drop the values that were used to map those
> sk's. etc, etc etc.
>
>
>
> Diego
>
>
>
>
>
> [image: Image removed by sender.]
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>
>
> Diego Mainou
> Product Manager
> M. +61 415 152 091
> E. [email protected]
>
> www.bizcubed.com.au
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>
>
>
>
> [image: Image removed by sender.]
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>
>
>
> ------------------------------
>
> *From: *"Austin, Justin via users" <[email protected]>
> *To: *"users" <[email protected]>
> *Sent: *Tuesday, 28 March, 2023 1:41:06 AM
> *Subject: *Custom plugin - multi-schema text input
>
>
>
> Hi Hop users,
>
>
>
> We’re evaluating whether HOP is the right tool to solve a common problem
> for our business.
>
>
>
> We encounter hundreds of different file formats containing similar layers
> of one-to-many hierarchy (simplified example below).  Getting this to work
> using out-of-box inputs/outputs and transform components results in a
> complex/convoluted set of workflows & pipelines.  Since we run into this so
> often, we would like to develop a plugin with a custom “input” component
> that reads the input file, inserts some ID fields for relationships, and
> exposes multiple output rowsets (one for each schema/row type) that can be
> mapped to separate downstream transformations. Eventually we’d like to make
> another custom “output” component that can accept multiple inputs to load
> them where we need them with hierarchy preserved (JSON, relational DB,
> etc.).
>
>
>
> After reviewing the plugin documentation and samples, I’m still not sure
> whether this is possible.  It seems that the relevant plugin base classes
> assume there will always be a single schema (IRowMeta) and single rowset
> shared by all input and output connections/hops.  I believe we would
> require a single “transform” to have multiple IRowMeta and multiple rowsets
> and the ability to select a specific one for any given hop to a downstream
> transform/component.
>
>
>
> Is there a good path to accomplishing this with a HOP plugin?  Or perhaps
> a better approach to the problem with existing Hop features?
>
>
>
> Thanks!
>
>
>
> Example file:
>
> REC|Jane Smith|03-20-2003
>
> ADDR|123 Main Street|Apartment 321|Anytown|US|55555
>
> ACT|987654321|$4321.56|02-01-2023|03-02-2023
>
> DTL|debit|$23.45|02-05-2023
>
> DTL|debit|$143.20|02-13-2023
>
> DTL|credit|$652.02|02-14-2023
>
> DTL|debit|$8.78|02-28-2023
>
> ACT|56789123|$7894.56|02-01-2023|03-02-2023
>
> DTL|credit|$0.28|02-14-2023
>
> REC|John Jacobs|03-20-2003
>
> ADDR|876 Big Avenue||Anywhere|US|55556
>
> ACT|5632178|$2256.79|02-01-2023|03-02-2023
>
> DTL|credit|$0.02|02-14-2023
>
>
>
>
>
>
>

Reply via email to