GitHub user zoomingrocket created a discussion: Best practice for dynamic 
'De-serialize From File' in a Producer/Consumer pattern

I am implementing a Producer/Consumer pattern to optimize a fast REST-to-Slow 
Oracle data movement.

Producer Pipeline: Fetches paginated data from a REST API and uses the 
Serialize to File transform to dump pages to local .ser files.
Consumer Pipeline: Needs to pick up these files via Get File Names, use 
De-serialize from File, and push to an Oracle DB.

The Problem:
The De-serialize from File transform requires a static filename or a variable; 
it does not currently accept a filename from an incoming field in the stream. 
Since I am processing multiple files, I cannot easily use a single variable in 
a single pipeline execution without wrapping it in a complex loop/workflow.

Questions for the Community:
Is Metadata Injection (MDI) the recommended way to dynamically pass the 
filename into the De-serialize transform, or is there a more lightweight 
approach?
In a high-throughput scenario, is there a performance penalty to using MDI for 
every file vs. a workflow loop?
Are there plans to add "Accept filename from field" to the De-serialize 
transform to match the behavior of the Text File Input transform?

I have tested this pattern by writing out raw REST API data via text output but 
this uses a lot of disk space so pivoted to serialization as it offers much 
better performance & compression, just stuck with last mile to Deserialize 
Files 🤷‍♂️

Thank for you the inputs in advance.

GitHub link: https://github.com/apache/hop/discussions/6545

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to