GitHub user zoomingrocket added a comment to the discussion: 2.18 roadmap and 
goals

2. Move file shape for SFTP/FTP -> This translates into "RENAME" command for 
SFTP. Primarily, a lot of banking SFTP endpoints do not allow the RM command. 
So after processing/downloading the file, they expect the processors to move 
the file to a different directory on SFTP. With Hop, we can only delete while 
getting a file from SFTP and manually reupload (put) to a different directory 
with the current shapes. This approach works where the vendor does not restrict 
the RM SFTP command. However, if RM is restricted and only "RENAME" (which 
moves and/or renames files, similar to the shell "mv") is allowed, it would be 
beneficial to have a shape within Hop. To workaround this, we are doing an 
external Shell script SFTP and firing off RENAME within Hop.

3. We are doing that today by pre-determining which are insert vs update by 
grabbing the key column data and spooling them off at the start of the 
pipeline. This is ok for small to medium-sized tables, but once we hit larger 
tables having 1B+ rows, the spooling of keys itself takes a lot of time than 
the actual rows to be processed. Hence, the suggestion was more towards 
implementing MERGE (Oracle), ON CONFLICT (Postgres), ON DUPLICATE (MySQL), etc. 
Almost all modern databases support insert v/s update within a statement, so we 
save significant network round-trips and let the database manage upsert for us.

4. I was wondering if we can print out/redirect pipeline metrics in the stdout 
of a hoprun.sh execution. I see pipeline log allows redirect to a database, but 
in our case, we are using hop-run with Airflow, and then pushing all the std 
output into an OpenTelemetry-based collector for distributed tracing & overall 
observability. So we are missing pipeline metrics, which is a vital bit of info 
for stakeholder KPIs, Metrics for day-to-day operations, reports, etc.

GitHub link: 
https://github.com/apache/hop/discussions/6445#discussioncomment-15610923

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to