Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

2016-11-17 Thread Benjamin Bannier
Hi,

>> What do folks think about removing future timeouts in tests altogether?
>> Instead, we can time the whole suite differently on different CIs?

> Has there been any response from the ASF Infra folks on addressing the
> VM/hardware issues? Seems like it will be difficult to get good signal
> from the ASF CI in the absence of some improvements on the
> infrastructure side.

Alex brings up a valid way to largely decouple us from VM lag problems which 
seems to be mostly a problem since we expect actions in tests to finished 
faster than actual happing. The real, tested code would be much less aggressive 
in interpreting small response lags as fatal errors.

Would we set the default timeout for say `AWAIT_READY` in our test code to 
e.g., infinity, slow VMs would be much less an issue. To not indefinitely block 
machines for broken tests we probably should then either limit the duration of 
our Jenkins jobs (if ASF doesn’t already have that safeguard), or maybe even 
add that to our test execution setup itself (e.g., simply with `timeout(1)` or 
equivalents from the outside, or inside directly in the harness).

The downside of this is of course that a hanging test (e.g., due to some true 
race) could block execution of all other tests.

Being more patient can be helpful in other environments as well (e.g., 
`valgrind`).


Cheers,

Benjamin

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

2016-11-16 Thread Neil Conway
Has there been any response from the ASF Infra folks on addressing the
VM/hardware issues? Seems like it will be difficult to get good signal
from the ASF CI in the absence of some improvements on the
infrastructure side.

Neil

On Wed, Nov 16, 2016 at 10:45 AM, Alex R  wrote:
> Looks like VM lag again: http://pastebin.com/GZhG4fuN
>
> What do folks think about removing future timeouts in tests altogether?
> Instead, we can time the whole suite differently on different CIs?
>
> On 16 November 2016 at 15:30, Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
>> See > COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%
>> 20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=
>> ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-
>> us1)&&(!ubuntu-6)&&(!ubuntu-eu2)/2933/changes>
>>
>> Changes:
>>
>> [alexr] Added a comment about deprecation cycle of quota get authz.
>>
>> --
>> [...truncated 222325 lines...]
>> I1116 14:27:35.400284 30948 containerizer.cpp:202] Using isolation:
>> posix/cpu,posix/mem,filesystem/posix,network/cni
>> W1116 14:27:35.400846 30948 backend.cpp:76] Failed to create 'aufs'
>> backend: AufsBackend requires root privileges, but is running as user mesos
>> W1116 14:27:35.400980 30948 backend.cpp:76] Failed to create 'bind'
>> backend: BindBackend requires root privileges
>> I1116 14:27:35.405436 30978 slave.cpp:208] Mesos agent started on (644)@
>> 172.17.0.3:56829
>> I1116 14:27:35.405462 30978 slave.cpp:209] Flags at startup: --acls=""
>> --appc_simple_discovery_uri_prefix="http://; 
>> --appc_store_dir="/tmp/mesos/store/appc"
>> --authenticate_http_readonly="true" --authenticate_http_readwrite="true"
>> --authenticatee="crammd5" --authentication_backoff_factor="1secs"
>> --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false"
>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> --cgroups_limit_swap="false" --cgroups_root="mesos" 
>> --container_disk_watch_interval="15secs"
>> --containerizers="mesos" --credential="/tmp/Endpoint_SlaveEndpointTest_
>> AuthorizedRequest_1_6t56bO/credential" --default_role="*"
>> --disk_watch_interval="1mins" --docker="docker"
>> --docker_kill_orphans="true" --docker_registry="https://
>> registry-1.docker.io" --docker_remove_delay="6hrs"
>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>> --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_
>> dir="/var/run/mesos/isolators/docker/volume" 
>> --enforce_container_disk_quota="false"
>> --executor_registration_timeout="1mins" 
>> --executor_shutdown_grace_period="5secs"
>> --fetcher_cache_dir="/tmp/Endpoint_SlaveEndpointTest_
>> AuthorizedRequest_1_6t56bO/fetch" --fetcher_cache_size="2GB"
>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> --hadoop_home="" --help="false" --hostname_lookup="true"
>> --http_authenticators="basic" --http_command_executor="false"
>> --http_credentials="/tmp/Endpoint_SlaveEndpointTest_
>> AuthorizedRequest_1_6t56bO/http_credentials" 
>> --image_provisioner_backend="copy"
>> --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
>> --launcher="posix" --launcher_dir="/mesos/mesos-1.2.0/_build/src"
>> --logbufsecs="0" --logging_level="INFO" 
>> --max_completed_executors_per_framework="150"
>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>> --perf_interval="1mins" --qos_correction_interval_min="0ns"
>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;
>> mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true"
>> --runtime_dir="/tmp/Endpoint_SlaveEndpointTest_AuthorizedRequest_1_6t56bO"
>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>> --switch_user="true" --systemd_enable_support="true"
>> --systemd_runtime_directory="/run/systemd/system" --version="false"
>> --work_dir="/tmp/Endpoint_SlaveEndpointTest_AuthorizedRequest_1_OjLwwx"
>> I1116 14:27:35.406361 30978 credentials.hpp:86] Loading credential for
>> authentication from '/tmp/Endpoint_SlaveEndpointTest_
>> AuthorizedRequest_1_6t56bO/credential'
>> I1116 14:27:35.406677 30978 slave.cpp:346] Agent using credential for:
>> test-principal
>> I1116 14:27:35.406782 30978 credentials.hpp:37] Loading credentials for
>> authentication from '/tmp/Endpoint_SlaveEndpointTest_
>> AuthorizedRequest_1_6t56bO/http_credentials'
>> I1116 14:27:35.407196 30978 http.cpp:895] Using default 'basic' HTTP
>> authenticator for realm 'mesos-agent-readonly'
>> I1116 14:27:35.407546 30978 http.cpp:895] Using default 'basic' HTTP
>> authenticator for realm 'mesos-agent-readwrite'
>> I1116 14:27:35.409431 30978 slave.cpp:533] Agent resources: cpus(*):2;
>> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
>> I1116 14:27:35.409689 30978 slave.cpp:541] Agent attributes: [  

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

2016-11-16 Thread Alex R
Looks like VM lag again: http://pastebin.com/GZhG4fuN

What do folks think about removing future timeouts in tests altogether?
Instead, we can time the whole suite differently on different CIs?

On 16 November 2016 at 15:30, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%
> 20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=
> ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-
> us1)&&(!ubuntu-6)&&(!ubuntu-eu2)/2933/changes>
>
> Changes:
>
> [alexr] Added a comment about deprecation cycle of quota get authz.
>
> --
> [...truncated 222325 lines...]
> I1116 14:27:35.400284 30948 containerizer.cpp:202] Using isolation:
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W1116 14:27:35.400846 30948 backend.cpp:76] Failed to create 'aufs'
> backend: AufsBackend requires root privileges, but is running as user mesos
> W1116 14:27:35.400980 30948 backend.cpp:76] Failed to create 'bind'
> backend: BindBackend requires root privileges
> I1116 14:27:35.405436 30978 slave.cpp:208] Mesos agent started on (644)@
> 172.17.0.3:56829
> I1116 14:27:35.405462 30978 slave.cpp:209] Flags at startup: --acls=""
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc"
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true"
> --authenticatee="crammd5" --authentication_backoff_factor="1secs"
> --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos" 
> --container_disk_watch_interval="15secs"
> --containerizers="mesos" --credential="/tmp/Endpoint_SlaveEndpointTest_
> AuthorizedRequest_1_6t56bO/credential" --default_role="*"
> --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_registry="https://
> registry-1.docker.io" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_
> dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/Endpoint_SlaveEndpointTest_
> AuthorizedRequest_1_6t56bO/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --hostname_lookup="true"
> --http_authenticators="basic" --http_command_executor="false"
> --http_credentials="/tmp/Endpoint_SlaveEndpointTest_
> AuthorizedRequest_1_6t56bO/http_credentials" 
> --image_provisioner_backend="copy"
> --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
> --launcher="posix" --launcher_dir="/mesos/mesos-1.2.0/_build/src"
> --logbufsecs="0" --logging_level="INFO" 
> --max_completed_executors_per_framework="150"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;
> mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true"
> --runtime_dir="/tmp/Endpoint_SlaveEndpointTest_AuthorizedRequest_1_6t56bO"
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
> --switch_user="true" --systemd_enable_support="true"
> --systemd_runtime_directory="/run/systemd/system" --version="false"
> --work_dir="/tmp/Endpoint_SlaveEndpointTest_AuthorizedRequest_1_OjLwwx"
> I1116 14:27:35.406361 30978 credentials.hpp:86] Loading credential for
> authentication from '/tmp/Endpoint_SlaveEndpointTest_
> AuthorizedRequest_1_6t56bO/credential'
> I1116 14:27:35.406677 30978 slave.cpp:346] Agent using credential for:
> test-principal
> I1116 14:27:35.406782 30978 credentials.hpp:37] Loading credentials for
> authentication from '/tmp/Endpoint_SlaveEndpointTest_
> AuthorizedRequest_1_6t56bO/http_credentials'
> I1116 14:27:35.407196 30978 http.cpp:895] Using default 'basic' HTTP
> authenticator for realm 'mesos-agent-readonly'
> I1116 14:27:35.407546 30978 http.cpp:895] Using default 'basic' HTTP
> authenticator for realm 'mesos-agent-readwrite'
> I1116 14:27:35.409431 30978 slave.cpp:533] Agent resources: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1116 14:27:35.409689 30978 slave.cpp:541] Agent attributes: [  ]
> I1116 14:27:35.409821 30978 slave.cpp:546] Agent hostname: 5817a5b56afb
> I1116 14:27:35.411998 30970 state.cpp:57] Recovering state from
> '/tmp/Endpoint_SlaveEndpointTest_AuthorizedRequest_1_OjLwwx/meta'
> I1116 14:27:35.412678 30968 status_update_manager.cpp:203] Recovering
> status update manager
> I1116 14:27:35.413048 30975 containerizer.cpp:561] Recovering containerizer
>