GitHub user zoomingrocket created a discussion: Best practice for dynamic 'De-serialize From File' in a Producer/Consumer pattern
I am implementing a Producer/Consumer pattern to optimize a fast REST-to-Slow Oracle data movement. Producer Pipeline: Fetches paginated data from a REST API and uses the Serialize to File transform to dump pages to local .ser files. Consumer Pipeline: Needs to pick up these files via Get File Names, use De-serialize from File, and push to an Oracle DB. The Problem: The De-serialize from File transform requires a static filename or a variable; it does not currently accept a filename from an incoming field in the stream. Since I am processing multiple files, I cannot easily use a single variable in a single pipeline execution without wrapping it in a complex loop/workflow. Questions for the Community: Is Metadata Injection (MDI) the recommended way to dynamically pass the filename into the De-serialize transform, or is there a more lightweight approach? In a high-throughput scenario, is there a performance penalty to using MDI for every file vs. a workflow loop? Are there plans to add "Accept filename from field" to the De-serialize transform to match the behavior of the Text File Input transform? I have tested this pattern by writing out raw REST API data via text output but this uses a lot of disk space so pivoted to serialization as it offers much better performance & compression, just stuck with last mile to Deserialize Files 🤷♂️ Thank for you the inputs in advance. GitHub link: https://github.com/apache/hop/discussions/6545 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
