GitHub user zoomingrocket added a comment to the discussion: 2.18 roadmap and goals
2. Move file shape for SFTP/FTP -> This translates into "RENAME" command for SFTP. Primarily, a lot of banking SFTP endpoints do not allow the RM command. So after processing/downloading the file, they expect the processors to move the file to a different directory on SFTP. With Hop, we can only delete while getting a file from SFTP and manually reupload (put) to a different directory with the current shapes. This approach works where the vendor does not restrict the RM SFTP command. However, if RM is restricted and only "RENAME" (which moves and/or renames files, similar to the shell "mv") is allowed, it would be beneficial to have a shape within Hop. To workaround this, we are doing an external Shell script SFTP and firing off RENAME within Hop. 3. We are doing that today by pre-determining which are insert vs update by grabbing the key column data and spooling them off at the start of the pipeline. This is ok for small to medium-sized tables, but once we hit larger tables having 1B+ rows, the spooling of keys itself takes a lot of time than the actual rows to be processed. Hence, the suggestion was more towards implementing MERGE (Oracle), ON CONFLICT (Postgres), ON DUPLICATE (MySQL), etc. Almost all modern databases support insert v/s update within a statement, so we save significant network round-trips and let the database manage upsert for us. 4. I was wondering if we can print out/redirect pipeline metrics in the stdout of a hoprun.sh execution. I see pipeline log allows redirect to a database, but in our case, we are using hop-run with Airflow, and then pushing all the std output into an OpenTelemetry-based collector for distributed tracing & overall observability. So we are missing pipeline metrics, which is a vital bit of info for stakeholder KPIs, Metrics for day-to-day operations, reports, etc. GitHub link: https://github.com/apache/hop/discussions/6445#discussioncomment-15610923 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
