Dear community, I have an airflow server that is used to schedule tasks that range from 1 minute to a few hours. Most of the time of the task is invested in running an optimization solver: a compiled commercial or open source third-party app that’s called by a python package through the command line (subprocess package).
simplified example (taken from https://github.com/coin-or/pulp/blob/6af38012df67a9d8327593d4c8f8a1b5649d59c7/pulp/apis/coin_api.py#L169 ): import subprocess, os devnull = subprocess.DEVNULL args = [MY_PATH_TO_SOLVER_EXECUTABLE, "arg1", "arg2", "arg3"] if SHOW_IN_CONSOLE: pipe = Noneelse: pipe = open(os.devnull, "w") logPath = PATH_TO_WRITE_LOGif logPath is not None: pipe = open(logPath, "w") my_subprocess = subprocess.Popen(args, stdout=pipe, stderr=pipe, stdin=devnull) if my_subprocess.wait() != 0: # raise error try: pipe.close()except: pass # etc. As seen in the example, we can (and do) collect the stdout of these solvers so that we can then store the logs after they have finished running. *My problem*: I would like to read the logs of the solvers while they’re still running (as one can do in the command line when run locally). *My question*: is it possible to direct the stdout of the solvers running inside an Airflow Task to write incrementally into the airflow log system somehow? Alternatively, is there any other way to efficiently update a (file/ bucket/ database) with the stdout of the solver with some frequency (i.e., every minute). Some considerations: - The stdout of the solver is not a normal log format. Here’s an example: https://github.com/pchtsp/orloge/blob/master/tests/data/gurobi700-app1-2.out - We cannot modify the third-party solvers or change the log format. - We have ways of parsing it and interpreting the format. regards, Franco Peschiera