Hi,

Curious about what your places are doing.
How come you started so many places without seeing this:
https://groups.google.com/d/msg/racket-users/oE72JfIKDO4/zbFI6knhAQAJ

Could you share what each of the places do?

Paulo Matos

On 04/11/2018 18:18, Matt Jadud wrote:
> Hi all,
> 
> I have some code that is unhappy.  I suspect I'm running into an
> OS-level resource limit.
> 
> I'm working with an Intel Phi machine running CentOS that reports 256
> cores. It is built using the Intel Xeon Phi 7210, which suggests that it
> has four, 64-core processors. I compiled Racket from source, but I don't
> think that makes a difference here.
> 
> I've parallelized some code using places, and seem to be OK when (<
> num-places 96). I've used channels (synchronous and asynchronous in
> places, as well as two, semaphore-protected queues) internally for the
> queen bee process. I have around N:M places-to-threads in the queen, and
> my code is not *intentionally* non-deterministic.
> 
> The worker bees process messages in-and-out on their place-channels.
> Each worker holds 2 connections to two databases. (I'm wondering if
> compiled .zos of libraries are counting towards the number of open files
> each place holds on to...)
> 
> When I run with 64 places, things are OK. When I run with 96 places,
> things seem OK (code runs to completion). When I run with 128 places,
> things are not OK. 
> 
> My current best guess is that I'm running into a max number of allowed
> open file descriptors. I don't have root on the system, so I can't
> easily change this, but I thought I'd throw this to the list, and see if
> anyone has any further thoughts as to what I might look for. Given that
> it takes a while to spin up all 128 of the places, I suspect things look
> like they're running fine... until enough of the places spin up, I run
> out of descriptors (I suspect), and then all kinds of badness begins.
> 
> [mjadud@phi data]$ ulimit -Ha
> 
> core file size          (blocks, -c) unlimited
> 
> data seg size           (kbytes, -d) unlimited
> 
> scheduling priority             (-e) 0
> 
> file size               (blocks, -f) unlimited
> 
> pending signals                 (-i) 385394
> 
> max locked memory       (kbytes, -l) 64
> 
> max memory size         (kbytes, -m) unlimited
> 
> open files                      (-n) 4096
> 
> pipe size            (512 bytes, -p) 8
> 
> POSIX message queues     (bytes, -q) 819200
> 
> real-time priority              (-r) 0
> 
> stack size              (kbytes, -s) unlimited
> 
> cpu time               (seconds, -t) unlimited
> 
> max user processes              (-u) 385394
> 
> virtual memory          (kbytes, -v) unlimited
> 
> file locks                      (-x) unlimited
> 
> [mjadud@phi data]$ ulimit -Sn
> 
> 1024
> 
> 
> When running with 96 places:
> ...
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 873
> 
> # This is a steady state.
> 
> 
> Keeping an eye on things when running with 128 places:
> ...
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 983
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 999
> 
> 
> # <!-- somewhere around here, things went badly. 
> 
> # It never achieves a steady state.
> 
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 1006
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 1011
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 1014
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 1019
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 1014
> 
> [mjadud@phi data]$ ls -l /proc/$(pidof racket)/fd | wc -l
> 
> 
> Thoughts appreciated,
> Matt
> 
> --- Machine ---
> cat /proc/os-release
> 
> NAME="CentOS Linux"
> 
> VERSION="7 (Core)"
> 
> ID="centos"
> 
> ID_LIKE="rhel fedora"
> 
> VERSION_ID="7"
> 
> PRETTY_NAME="CentOS Linux 7 (Core)"
> 
> ANSI_COLOR="0;31"
> 
> CPE_NAME="cpe:/o:centos:centos:7"
> 
> HOME_URL="https://www.centos.org/";
> 
> BUG_REPORT_URL="https://bugs.centos.org/";
> 
> 
> CENTOS_MANTISBT_PROJECT="CentOS-7"
> 
> CENTOS_MANTISBT_PROJECT_VERSION="7"
> 
> REDHAT_SUPPORT_PRODUCT="centos"
> 
> REDHAT_SUPPORT_PRODUCT_VERSION="7"
> 
> 
> ------ When Things Die ------
> 
> open-input-file: cannot open input file
> 
>   path:
> /usr/netapp/faculty/mjadud/racket-src/racket/share/pkgs/cldr-bcp47/cldr/bcp47/data/timezone.xml
> 
>   system error: Too many open files; errno=24
> 
>   context...:
> 
>   
> /usr/netapp/faculty/mjadud/racket-src/racket/collects/racket/private/kw-file.rkt:102:2:
> call-with-input-file*61
> 
>   
> "/usr/netapp/faculty/mjadud/racket-src/racket/share/pkgs/cldr-bcp47/cldr/bcp47/timezone.rkt":
> [running body]
> 
>    temp37_0
> 
>    for-loop
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    ...
> 
> instantiate: unknown module
> 
>   module name: #<resolved-module-path:(submod
> 'typed-racket/private/type-contract.rkt[8709188] predicates)>
> 
>   context...:
> 
>    namespace-module-instantiate!96
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    ...
> 
> 
> standard-module-name-resolver: collection not found
> 
>   for module path: typed-racket/utils/hash-contract
> 
>   collection: "typed-racket/utils"
> 
>   in collection directories:
> 
>    /usr/netapp/faculty/mjadud/.racket/development/collects
> 
>    /usr/netapp/faculty/mjadud/racket-src/racket/collects
> 
>   context...:
> 
> open-input-file: cannot open input file
> 
>   path:
> /usr/netapp/faculty/mjadud/racket-src/racket/share/pkgs/cldr-core/cldr/compiled/core_rkt.zo
> 
>   system error: Too many open files; errno=24   show-collection-err
> 
>    standard-module-name-resolver
> 
>    namespace-module-instantiate!96
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    perform-require!78
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    add-lifted-require!
> 
>    do-local-lift-to-module48
> 
>    syntax-local-lift-require
> 
>   
> /usr/netapp/faculty/mjadud/racket-src/racket/share/pkgs/typed-racket-lib/typed-racket/utils/redirect-contract.rkt:32:2:
> redirect
> 
> 
>   context...:
> 
>    default-load-handler
> 
>    [repeats 1 more time]
> 
>    standard-module-name-resolver
> 
>    apply-transformer-in-context
> 
>    namespace-module-instantiate!96
> 
>    for-loop
> 
>    apply-transformer52
> 
>    ...
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    [repeats 1 more time]
> 
>    run-module-instance!125
> 
>    for-loop
> 
>    ...
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-users+unsubscr...@googlegroups.com
> <mailto:racket-users+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Paulo Matos

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to