I am using postgres as the backend (my settings related to this are at the
bottom of this message).
Also there are no logs. Not in the webserver UI and not in the folder /
path indicated in the task details info for those tasks that failed without
logs (ie. those log_filepath values do not actually exist on the machine),
which is why the problem was so odd to me.
Could you explain a bit more about why you doubt that the threading issue
was the problem?
The docs here (
https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags#task_fails_without_emitting_logs)
are what initially made me think to take a second look at my machine specs
vs airflow.cfg concurrency settings.

postgresql settings (based on following this guide):

[airflow@airflowetl airflow]$ rpm -q postgresql-server postgresql
postgresql-devel
postgresql-server-9.2.24-1.el7_5.x86_64
postgresql-9.2.24-1.el7_5.x86_64
postgresql-devel-9.2.24-1.el7_5.x86_64


[airflow@airflowetl airflow]$ pip3 freeze | grep sqlalchemy
marshmallow-sqlalchemy==0.19.0
[airflow@airflowetl airflow]$ pip3 freeze | grep psycopg2
psycopg2==2.8.4



[airflow@airflowetl airflow]$ psql airflow
psql (9.2.24)
Type "help" for help.

airflow=> \du
                             List of roles
 Role name |                   Attributes                   | Member of
-----------+------------------------------------------------+-----------
 airflow   |                                                | {}
 postgres  | Superuser, Create role, Create DB, Replication | {}

airflow-> \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |
Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 airflow   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
=Tc/postgres         +
           |          |          |             |             |
postgres=CTc/postgres+
           |          |          |             |             |
airflow=CTc/postgres
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres          +
           |          |          |             |             |
postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres          +
           |          |          |             |             |
postgres=CTc/postgres

airflow=> \c airflow
You are now connected to database "airflow" as user "airflow".

airflow=> \dt
No relations found.

airflow=> \conninfo
You are connected to database "airflow" as user "airflow" via socket
in "/var/run/postgresql" at port "5432".



[root@airflowetl airflow]# cat /var/lib/pgsql/data/pg_hba.conf
....# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections onlylocal   all
     all                                     peer# IPv4 local
connections:#host    all             all             127.0.0.1/32
      identhost    all             all             0.0.0.0/0
    md5# IPv6 local connections:
host    all             all             ::1/128                 ident#
Allow replication connections from localhost, by a user with the#
replication privilege.#local   replication     postgres
                peer#host    replication     postgres
127.0.0.1/32            ident#host    replication     postgres
::1/128                 ident



[root@airflowetl airflow]# cat /var/lib/pgsql/data/postgresql.conf
....# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — #
CONNECTIONS AND AUTHENTICATION# — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — # — Connection Settings -#listen_addresses =
‘localhost’ # what IP address(es) to listen on;
listen_addresses = ‘*’ # for Airflow connection



[airflow@airflowetl airflow]$ cat airflow.cfg
....
[core]
....# The executor class that airflow should use. Choices include#
SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor,
KubernetesExecutor#executor = SequentialExecutor
executor = LocalExecutor
# The SqlAlchemy connection string to the metadata database.#
SqlAlchemy supports many different database engine, more information#
their website#sql_alchemy_conn =
sqlite:////home/airflow/airflow/airflow.db# if use localhost instead
of 127.0.0.1, posgres will use IPv6
sql_alchemy_conn =
postgresql+psycopg2://airflow:[email protected]:5432/airflow



On Wed, Dec 18, 2019 at 10:36 PM Jarek Potiuk <[email protected]>
wrote:

> I seriously doubt it's the problem. There should be dag/tasks logs in your
> logs folder as well and they should tell you what happened. What database
> are you using ? Can you please dig deeper and provide more logs?
>
> J,
>
> On Thu, Dec 19, 2019 at 1:28 AM Reed Villanueva <[email protected]>
> wrote:
>
>> Looking again at my lscpu specs, I noticed...
>>
>> [airflow@airflowetl airflow]$ lscpuArchitecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bitByte Order:            Little Endian
>> CPU(s):                8On-line CPU(s) list:   0-7Thread(s) per core:    
>> 1Core(s) per socket:    4Socket(s):             2
>>
>> Notice Thread(s) per core: 1
>>
>> Looking at my airflow.cfg settings I see max_threads = 2. Setting max_threads
>> = 1 and restarting both the scheduler
>> <https://www.astronomer.io/guides/airflow-scaling-workers/> seems to
>> have fixed the problem.
>>
>> If anyone knows more about what exactly is going wrong under the hood
>> (eg. why the task fails rather than just waiting for another thread to
>> become available), would be interested to hear about it.
>>
>> On Wed, Dec 18, 2019 at 11:45 AM Reed Villanueva <[email protected]>
>> wrote:
>>
>>> Running airflow dag that ran fine with SequentialExecutor now has many
>>> (though not all) simple tasks that fail without any log information when
>>> running with LocalExecutor and minimal parallelism, eg.
>>>
>>> <airflow.cfg>
>>> # overall task concurrency limit for airflow
>>> parallelism = 8 # which is same as number of cores shown by lscpu# max 
>>> tasks per dag
>>> dag_concurrency = 2# max instances of a given dag that can run on airflow
>>> max_active_runs_per_dag = 1# max threads used per worker / core
>>> max_threads = 2
>>>
>>> see https://www.astronomer.io/guides/airflow-scaling-workers/
>>>
>>> Looking at the airflow-webserver.* logs nothing looks out of the
>>> ordinary, but looking at airflow-scheduler.out I see...
>>>
>>> [airflow@airflowetl airflow]$ tail -n 20 
>>> airflow-scheduler.out....[2019-12-18 11:29:17,773] {scheduler_job.py:1283} 
>>> INFO - Executor reports execution of mydag.task_level1_table1 
>>> execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed 
>>> for try_number 1[2019-12-18 11:29:17,779] {scheduler_job.py:1283} INFO - 
>>> Executor reports execution of mydag.task_level1_table2 
>>> execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed 
>>> for try_number 1[2019-12-18 11:29:17,782] {scheduler_job.py:1283} INFO - 
>>> Executor reports execution of mydag.task_level1_table3 
>>> execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed 
>>> for try_number 1[2019-12-18 11:29:18,833] {scheduler_job.py:832} WARNING - 
>>> Set 1 task instances to state=None as their associated DagRun was not in 
>>> RUNNING state[2019-12-18 11:29:18,844] {scheduler_job.py:1283} INFO - 
>>> Executor reports execution of mydag.task_level1_table4 
>>> execution_date=2019-12-18 21:21:48.424900+00:00 exited with status success 
>>> for try_number 1....
>>>
>>> but not really sure what to take away from this.
>>>
>>> Anyone know what could be going on here or how to get more helpful
>>> debugging info?
>>>
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to