Author: Remi Meier <remi.me...@gmail.com> Branch: c8-reshare-pages Changeset: r2016:afef19229966 Date: 2017-02-27 17:00 +0100 http://bitbucket.org/pypy/stmgc/changeset/afef19229966/
Log: merge default diff too long, truncating to 2000 out of 3600 lines diff --git a/README.txt b/README.md rename from README.txt rename to README.md --- a/README.txt +++ b/README.md @@ -1,28 +1,250 @@ -STM-GC -====== +# STM-GC Welcome! This is a C library that combines a GC with STM capabilities. It is meant to be a general library that can be used in C programs. -The library interface is in "c4/stmgc.h". +The library interface is in `c8/stmgc.h`. Progress (these revisions are roughly stable versions, pick the last one): -- 3aea86a96daf: last rev of "c3", the previous version -- f1ccf5bbcb6f: first step, working but with no GC -- 8da924453f29: minor collection seems to be working, no major GC -- e7249873dcda: major collection seems to be working -The file "c4/doc-objects.txt" contains some low-level explanations. + - 3af462f -Run tests with "py.test". +Run tests with `py.test`. -A demo program can be found in "c4/demo1.c". -It can be built with "make debug-demo1" or "make build-demo1". +Demo programs can be found in `c8/demo/`. The plan is to use this C code directly with PyPy, and not write manually the many calls to the shadow stack and the barrier functions. But the manual way is possible too, say when writing a small interpreter directly in C. + + +# Other resources + +http://doc.pypy.org/en/latest/stm.html + +# How to run things + +## Get PyPy + + 1. `hg clone https://bitbucket.org/pypy/pypy` (this will take a while, but you + can continue with the instructions below) + 2. `hg checkout stmgc-c8` + + +## Get STMGC + + 1. `hg clone https://bitbucket.org/pypy/stmgc` + 2. `gcc-seg-gs/README.txt` mentions which GCC version should work. Maybe -fXXX + flags mentioned at the end are still needed for compiling full PyPy-STM + +The folder `c8` contains the current version of the STMGC library + +### Project layout of c8 + + - `stmgc.h`: the main header file for the library + - `stm/`: + + For the GC part: + + - `nursery`: minor collection + - `gcpage`: major collection + - `largemalloc`, `smallmalloc`: object allocation + - `finalizer`: object finalizer support + - `weakref`: weak references support + + For the STM part: + + - `core`: commit, abort, barrier logic of STM + - `sync`: segment management and thread support + - `pages`: management of page metadata + - `signal_handler`: manages pages together with `pages` + - `locks`: a set of locks to protect segments + - `rewind_setjmp`: setjmp/longjmp implementation that supports arbitrary rollback + - `forksupport`: support for forking an STM process + - `extra`: on-commit and on-abort callback mechanism + - `detach`: transaction detach mechanism (optimised transactional zones) + - `setup`: sets up the memory layout and segments + + Misc: + + - `fprintcolor`: colourful debug output + - `hash_id`: PyPy-compatible identity and identity-hash functionality + - `hashtable`: transactional hash table implementation + - `queue`: transactional work-queue implementation + - `list`: simple growable list implementation + - `marker`, `prof`: mechanism to record events + - `misc`: mostly debug and testing interface + - `pagecopy`: fast copy implementation for pages + - `prebuilt`: logic for PyPy's prebuilt objects + + + +### Running Tests + +Tests are written in Python that calls the C-library through CFFI (Python package). + + 1. install `pytest` and `cffi` packages for Python (via `pip`) + 2. running `py.test` in `c8/test` should run all the tests (alternatively, the + PyPy-checkout has a pytest.py script in its root-folder, which should work + too) + +### Running Demos + +Demos are small C programs that use the STMGC library directly. They sometimes +expose real data-races that the sequential Python tests cannot expose. + + 1. for + example: `make build-demo_random2` + 2. then run `./build-demo_random2` + + +### Debugging + +GDB works fine for debugging programs with the STMGC library. However, you have +to tell GDB to ignore `SIGSEGV` by default. A `.gdbinit` could look like this: + + handle SIGSEGV nostop pass noprint + + define sigon + handle SIGSEGV stop nopass print + end + + define sigoff + handle SIGSEGV nostop pass noprint + end + + define lon + set scheduler-locking on + end + define loff + set scheduler-locking off + end + + # run until crash + define runloop + set pagination off + p $_exitcode = 0 + while $_exitcode == 0 + p $_exitcode = -1 + r + end + set pagination on + end + + +The commands `sigon` and `sigoff` enable and disable `SIGSEGV`-handling. `lon` +and `loff` enables and disables stopping of other threads while stepping through +one of them. After reaching a breakpoint in GDB, I usually run `sigon` and `lon` +to enable GDB to handle real `SIGSEGV` (e.g., while printing) and to stop other +threads. + +`runloop` re-runs a program until there is a crash (useful for reproducing rare +race conditions). + +Furthermore, there are some useful GDB extensions under `/c7/gdb/gdb_stm.py` +that allow for inspecting segment-local pointers. To enable them, add the +following line to your `.gdbinit`: + + python exec(open('PATH-TO-STMGC/c7/gdb/gdb_stm.py').read()) + + + + +## Building PyPy-STM + +The STM branch of PyPy contains a *copy* of the STMGC library. After changes to +STMGC, run the `import_stmgc.py` script in `/rpython/translator/stm/`. In the +following, `/` is the root of your PyPy checkout. + + 0. Follow the [build instructions](http://doc.pypy.org/en/latest/build.html) + for PyPy until you get to the point to run the translation. + + 1. The Makefile expects a `gcc-seg-gs` executable to be on the `$PATH`. This + should be a GCC that is either patched or a wrapper to GCC 6.1 that passes + the necessary options. In my case, this is a script that points to my custom + build of GCC with the following content: + + :::bash + #!/bin/bash + BUILD=/home/remi/work/bin/gcc-build + exec $BUILD/gcc/xgcc -B $BUILD/gcc -fno-ivopts -fno-tree-vectorize -fno-tree-loop-distribute-patterns "$@" + + + 2. `cd /pypy/goal/` + + 3. A script to translate PyPy-STM (adapt all paths): + + :::bash + #!/bin/bash + export PYPY_USESSION_KEEP=200 + export PYPY_USESSION_DIR=~/pypy-usession + + STM=--stm #--stm + JIT=-Ojit #-Ojit #-O2 + VERSION=$(hg id -i) + ionice -c3 pypy ~/pypy_dir/rpython/bin/rpython --no-shared --source $STM $JIT targetpypystandalone.py + # --no-allworkingmodules + + notify-send "PyPy" "C source generated." + + cd ~/pypy-usession/usession-$(hg branch)-remi/testing_1/ + ionice -c3 make -Bj4 + + TIME=$(date +%y-%m-%d-%H:%M) + cp pypy-c ~/pypy_dir/pypy/goal/pypy-c-$STM-$JIT-$VERSION-$TIME + cp pypy-c ~/pypy_dir/pypy/goal/pypy-c + + notify-send "PyPy" "Make finished." + + The usession-folder will keep the produced C source files. You will need + them whenever you do a change to the STMGC library only (no need to + retranslate the full PyPy). In that case: + + 1. Go to `~/pypy-usession/usession-stmgc-c8-$USER/testing_1/` + 2. `make clean && make -j8` will rebuild all C sources + + Faster alternative that works in most cases: `rm ../module_cache/*.o` + instead of `make clean`. This will remove the compiled STMGC files, + forcing a rebuild from the *copy* in the `/rpython/translator/stm` + folder. + + 4. The script puts a `pypy-c` into `/pypy/goal/` that should be ready to run. + + +### Log tools + +STMGC produces an event-log, if requested. Some tools to parse and analyse these +logs are in the PyPy repository under `/pypy/stm/`. To produce a log, set the +environment variable `PYPYSTM` to a file name. E.g.: + +`env PYPYSTM=log.pypystm pypy-c program.py` + +and then see some statistics with + +`/pypy/stm/print_stm_log.py log.pypystm` + + +### Benchmarks + +In PyPy's benchmark repository (`https://bitbucket.org/pypy/benchmarks`) under +`multithread` is a collection of multi-threaded Python programs to measure +performance. + +One way to run them is to check out the branch `multithread-runner` and do the +following: + +`./runner.py pypy-c config-raytrace.json result.json` + +This will use the configuration in the JSON file and run a few iterations; then +write the result into a JSON file again. It will also print the command-line +used to run the benchmark, in case you don't want to use the runner. The +`getresults.py` script can be used to compare two versions of PyPy against each +other, but it is very limited. + + + + diff --git a/c8/TODO b/c8/TODO --- a/c8/TODO +++ b/c8/TODO @@ -1,3 +1,103 @@ + +- investigate if userfaultfd() helps: + http://kernelnewbies.org/Linux_4.3#head-3deefea7b0add8c1b171b0e72ce3b69c5ed35cb0 + + AFAICS, we could avoid the current in-kernel-pagefault+SIGSEGV-handler for + making pages accessible on-demand, and replace that mechanism with a + user-pagefault "handler". That should save on the number of needed VMAs and + possibly be faster (although I'm quite unsure of that). + +- investigate if membarrier() helps: + http://man7.org/linux/man-pages/man2/membarrier.2.html + + +################## +Issue: lee_router_tm in one of its versions uses a temporary grid (big list) +to do some calculations. This grid gets cleared everytime the router does +one transaction (lay_next_track). Hence, the transaction writes a big amount +of *old* memory. + * For one, clearing an array causes tons of stm_write_card() and creating + tons of small backup slices. + * Also, all of this stuff goes to the commit log, as we "modify" an old + object. + * Of course this is all completely unecessary, as the temporary grid is + basically thread-local, always gets cleared at the start of an atomic + block ("application-level transaction"), and therefore wouldn't need + to be mentioned in the commit log (and in theory wouldn't even need + reverting on abort). + +Here is a slice of the perf-profile: +- Total Self +- 14.39% 13.62% pypy-c-clcollec libc-2.19.so [.] __memcpy_sse2_unaligned + - __memcpy_sse2_unaligned + + 80.47% import_objects.constprop.50 + + 10.72% make_bk_slices_for_range ++ 16.82% 1.43% pypy-c-clcollec pypy-c-clcollector [.] make_bk_slices ++ 14.71% 4.44% pypy-c-clcollec pypy-c-clcollector [.] make_bk_slices_for_range ++ 4.63% 4.62% pypy-c-clcollec pypy-c-clcollector [.] go_to_the_past + +On this benchmark, pypy-stm is ~4x slower than pypy-default. It also doesn't +scale at all (probably because a lot of the things above are actually protected +by the privatization locks, which seem to be quite contended). +Probably around 10% of the time is spent importing the changes done to +a thread-local object. So here are a few ideas: + * Play with card-marking card size to speed up make_bk_slices? + * After marking a significant percentage of cards of an obj, maybe just + mark all of them (do a full write barrier)? + * Should modification-heavy transactions run longer (with high priority)? + * Special allocation area for thread-local objects which is always mapped + shared between all segments. + + changes to objs in this area do not need to be added to the commit log + - need to guarantee that only one thread/segment at a time accesses an + obj at a time + - needs explicit designation as thread-local at allocation time (by programmer) + - doesn't avoid creating backup copies + * Special objs that are transaction-local + + no need for backup copies, should always "reset" automatically + - needs programmer support + - limited applicability + * As long as a slice is in a page that is only mapped in one segment, + we can depend on the slice not being imported in other segs since + the page is not mapped. At least it avoids importing to seg0 before + major GC and also pushing overflow objs to seg0 on commit. However, + this requires the elimination of seg0 as it is now, which is hard: + * main concern is that major GC needs to trace objs somewhere, and right + now it traces them in seg0. We probably need a way to tell in which seg + an obj is accessible. + * one way would be to say an obj is always fully accessible in the seg it + was first allocated in, but that makes page resharing more difficult. + Still, we could record this information in the largemalloc-header, which + we first need to find (per obj check for where obj-header is accessible). + A check if all pages of an obj are accessible in some segment is probably + too slow, as is checking on-the-fly during tracing... + * largemalloc currently keeps its data structures in seg0. We would need + to keep the data structure up-to-date in all segments (e.g. by always + writing to all accessible segments, but this requires us to have the + privatization readlock when allocating, and privatization locks are + already contended in some cases) + * smallmalloc is a bit simpler since objs are always in a single page. + But I guess we still need to find the accessible page for each obj... + Overall: + + modifications to single-mapped pages would not be copied if they stay + single-mapped + + fully automatic/transparent + - provides only page-level granularity + - potentially slow complication of largemalloc and major GC + - requires us to put effort into making threads always run in the same + segment to increase effectivness + - doesn't avoid creating backup copies + * One could also argue that keeping around a tempgrid is not good and + the programmer should just re-create it in every transaction + (counter-intuitive). This indeed speeds up the benchmark ("only" 2x + slower than pypy-default), but causes tons of major GCs. These + major GCs completely prevent scaling, as they are stop-the-world. + So this opens up a whole new can of worms (concurrent, parallel, + incremental GC?). + + +################## + + - stm_identityhash spends a good time figuring out if an obj is prebuilt (40% of its time). maybe after setup_prebuilt, we could defer the test of GCFLAG_HAS_SHADOW in id_or_identityhash to after an address comparison. diff --git a/c8/demo/Makefile b/c8/demo/Makefile --- a/c8/demo/Makefile +++ b/c8/demo/Makefile @@ -2,9 +2,9 @@ # Makefile for the demos. # -DEBUG_EXE = debug-demo2 -BUILD_EXE = build-demo2 -RELEASE_EXE = release-demo2 +DEBUG_EXE = debug-demo_simple +BUILD_EXE = build-demo_simple +RELEASE_EXE = release-demo_simple debug: $(DEBUG_EXE) # with prints and asserts build: $(BUILD_EXE) # without prints, but with asserts diff --git a/c8/stm/atomic.h b/c8/stm/atomic.h --- a/c8/stm/atomic.h +++ b/c8/stm/atomic.h @@ -24,16 +24,16 @@ #if defined(__i386__) || defined(__amd64__) - static inline void spin_loop(void) { asm("pause" : : : "memory"); } - static inline void write_fence(void) { asm("" : : : "memory"); } + static inline void stm_spin_loop(void) { asm("pause" : : : "memory"); } + static inline void stm_write_fence(void) { asm("" : : : "memory"); } /*# define atomic_exchange(ptr, old, new) do { \ (old) = __sync_lock_test_and_set(ptr, new); \ } while (0)*/ #else - static inline void spin_loop(void) { asm("" : : : "memory"); } - static inline void write_fence(void) { __sync_synchronize(); } + static inline void stm_spin_loop(void) { asm("" : : : "memory"); } + static inline void stm_write_fence(void) { __sync_synchronize(); } /*# define atomic_exchange(ptr, old, new) do { \ (old) = *(ptr); \ @@ -42,19 +42,19 @@ #endif -static inline void _spinlock_acquire(uint8_t *plock) { +static inline void _stm_spinlock_acquire(uint8_t *plock) { retry: if (__builtin_expect(__sync_lock_test_and_set(plock, 1) != 0, 0)) { - spin_loop(); + stm_spin_loop(); goto retry; } } -static inline void _spinlock_release(uint8_t *plock) { +static inline void _stm_spinlock_release(uint8_t *plock) { assert(*plock == 1); __sync_lock_release(plock); } -#define spinlock_acquire(lock) _spinlock_acquire(&(lock)) -#define spinlock_release(lock) _spinlock_release(&(lock)) +#define stm_spinlock_acquire(lock) _stm_spinlock_acquire(&(lock)) +#define stm_spinlock_release(lock) _stm_spinlock_release(&(lock)) #endif /* _STM_ATOMIC_H */ diff --git a/c8/stm/core.c b/c8/stm/core.c --- a/c8/stm/core.c +++ b/c8/stm/core.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif char *stm_object_pages; @@ -113,212 +114,6 @@ } -/* ############# signal handler ############# */ - -static void copy_bk_objs_in_page_from(int from_segnum, uintptr_t pagenum, - bool only_if_not_modified) -{ - /* looks at all bk copies of objects overlapping page 'pagenum' and - copies the part in 'pagenum' back to the current segment */ - dprintf(("copy_bk_objs_in_page_from(%d, %ld, %d)\n", - from_segnum, (long)pagenum, only_if_not_modified)); - - assert(modification_lock_check_rdlock(from_segnum)); - struct list_s *list = get_priv_segment(from_segnum)->modified_old_objects; - struct stm_undo_s *undo = (struct stm_undo_s *)list->items; - struct stm_undo_s *end = (struct stm_undo_s *)(list->items + list->count); - - import_objects(only_if_not_modified ? -2 : -1, - pagenum, undo, end); -} - -static void go_to_the_past(uintptr_t pagenum, - struct stm_commit_log_entry_s *from, - struct stm_commit_log_entry_s *to) -{ - assert(modification_lock_check_wrlock(STM_SEGMENT->segment_num)); - assert(from->rev_num >= to->rev_num); - /* walk BACKWARDS the commit log and update the page 'pagenum', - initially at revision 'from', until we reach the revision 'to'. */ - - /* XXXXXXX Recursive algo for now, fix this! */ - if (from != to) { - struct stm_commit_log_entry_s *cl = to->next; - go_to_the_past(pagenum, from, cl); - - struct stm_undo_s *undo = cl->written; - struct stm_undo_s *end = cl->written + cl->written_count; - - import_objects(-1, pagenum, undo, end); - } -} - - -long ro_to_acc = 0; -static void handle_segfault_in_page(uintptr_t pagenum) -{ - /* assumes page 'pagenum' is ACCESS_NONE, privatizes it, - and validates to newest revision */ - dprintf(("handle_segfault_in_page(%lu), seg %d\n", pagenum, STM_SEGMENT->segment_num)); - - /* XXX: bad, but no deadlocks: */ - acquire_all_privatization_locks(); - - long i; - int my_segnum = STM_SEGMENT->segment_num; - - uint8_t page_status = get_page_status_in(my_segnum, pagenum); - assert(page_status == PAGE_NO_ACCESS - || page_status == PAGE_READONLY); - - if (page_status == PAGE_READONLY) { - /* make our page write-ready */ - page_mark_accessible(my_segnum, pagenum); - - dprintf((" > found READONLY, make others NO_ACCESS\n")); - /* our READONLY copy *has* to have the current data, no - copy necessary */ - /* make READONLY pages in other segments NO_ACCESS */ - for (i = 1; i < NB_SEGMENTS; i++) { - if (i == my_segnum) - continue; - - if (get_page_status_in(i, pagenum) == PAGE_READONLY) - page_mark_inaccessible(i, pagenum); - } - - ro_to_acc++; - - release_all_privatization_locks(); - return; - } - - /* find who has the most recent revision of our page */ - /* XXX: uh, *more* recent would be enough, right? */ - int copy_from_segnum = -1; - uint64_t most_recent_rev = 0; - bool was_readonly = false; - for (i = 1; i < NB_SEGMENTS; i++) { - if (i == my_segnum) - continue; - - if (!was_readonly && get_page_status_in(i, pagenum) == PAGE_READONLY) { - was_readonly = true; - break; - } - - struct stm_commit_log_entry_s *log_entry; - log_entry = get_priv_segment(i)->last_commit_log_entry; - if (get_page_status_in(i, pagenum) != PAGE_NO_ACCESS - && (copy_from_segnum == -1 || log_entry->rev_num > most_recent_rev)) { - copy_from_segnum = i; - most_recent_rev = log_entry->rev_num; - } - } - OPT_ASSERT(copy_from_segnum != my_segnum); - - if (was_readonly) { - assert(page_status == PAGE_NO_ACCESS); - /* this case could be avoided by making all NO_ACCESS to READONLY - when resharing pages (XXX: better?). - We may go from NO_ACCESS->READONLY->ACCESSIBLE on write with - 2 SIGSEGV in a row.*/ - dprintf((" > make a previously NO_ACCESS page READONLY\n")); - page_mark_readonly(my_segnum, pagenum); - - release_all_privatization_locks(); - return; - } - - /* make our page write-ready */ - page_mark_accessible(my_segnum, pagenum); - - /* account for this page now: XXX */ - /* increment_total_allocated(4096); */ - - - if (copy_from_segnum == -1) { - dprintf((" > found newly allocated page: copy from seg0\n")); - - /* this page is only accessible in the sharing segment seg0 so far (new - allocation). We can thus simply mark it accessible here. */ - pagecopy(get_virtual_page(my_segnum, pagenum), - get_virtual_page(0, pagenum)); - release_all_privatization_locks(); - return; - } - - dprintf((" > import data from seg %d\n", copy_from_segnum)); - - /* before copying anything, acquire modification locks from our and - the other segment */ - uint64_t to_lock = (1UL << copy_from_segnum); - acquire_modification_lock_set(to_lock, my_segnum); - pagecopy(get_virtual_page(my_segnum, pagenum), - get_virtual_page(copy_from_segnum, pagenum)); - - /* if there were modifications in the page, revert them. */ - copy_bk_objs_in_page_from(copy_from_segnum, pagenum, false); - - /* we need to go from 'src_version' to 'target_version'. This - might need a walk into the past. */ - struct stm_commit_log_entry_s *src_version, *target_version; - src_version = get_priv_segment(copy_from_segnum)->last_commit_log_entry; - target_version = STM_PSEGMENT->last_commit_log_entry; - - - dprintf(("handle_segfault_in_page: rev %lu to rev %lu\n", - src_version->rev_num, target_version->rev_num)); - /* adapt revision of page to our revision: - if our rev is higher than the page we copy from, everything - is fine as we never read/modified the page anyway - */ - if (src_version->rev_num > target_version->rev_num) - go_to_the_past(pagenum, src_version, target_version); - - release_modification_lock_set(to_lock, my_segnum); - release_all_privatization_locks(); -} - -static void _signal_handler(int sig, siginfo_t *siginfo, void *context) -{ - assert(_stm_segfault_expected > 0); - - int saved_errno = errno; - char *addr = siginfo->si_addr; - dprintf(("si_addr: %p\n", addr)); - if (addr == NULL || addr < stm_object_pages || - addr >= stm_object_pages+TOTAL_MEMORY) { - /* actual segfault, unrelated to stmgc */ - fprintf(stderr, "Segmentation fault: accessing %p\n", addr); - detect_shadowstack_overflow(addr); - abort(); - } - - int segnum = get_segment_of_linear_address(addr); - OPT_ASSERT(segnum != 0); - if (segnum != STM_SEGMENT->segment_num) { - fprintf(stderr, "Segmentation fault: accessing %p (seg %d) from" - " seg %d\n", addr, segnum, STM_SEGMENT->segment_num); - abort(); - } - dprintf(("-> segment: %d\n", segnum)); - - char *seg_base = STM_SEGMENT->segment_base; - uintptr_t pagenum = ((char*)addr - seg_base) / 4096UL; - if (pagenum < END_NURSERY_PAGE) { - fprintf(stderr, "Segmentation fault: accessing %p (seg %d " - "page %lu)\n", addr, segnum, pagenum); - abort(); - } - - DEBUG_EXPECT_SEGFAULT(false); - handle_segfault_in_page(pagenum); - DEBUG_EXPECT_SEGFAULT(true); - - errno = saved_errno; - /* now return and retry */ -} /* ############# commit log ############# */ @@ -361,6 +156,7 @@ } static void reset_modified_from_backup_copies(int segment_num, object_t *only_obj); /* forward */ +static void undo_modifications_to_single_obj(int segment_num, object_t *only_obj); /* forward */ static bool _stm_validate(void) { @@ -442,7 +238,7 @@ for (; undo < end; undo++) { object_t *obj; - if (undo->type != TYPE_POSITION_MARKER) { + if (LIKELY(undo->type != TYPE_POSITION_MARKER)) { /* common case: 'undo->object' was written to in this past commit, so we must check that it was not read by us. */ @@ -472,13 +268,17 @@ an abort. However, from now on, we also assume that an abort would not roll-back to what is in the backup copy, as we don't trace the bkcpy - during major GCs. + during major GCs. (Seg0 may contain the version + found in the other segment and thus not have to + content of our bk_copy) + We choose the approach to reset all our changes to this obj here, so that we can throw away the backup copy completely: */ /* XXX: this browses through the whole list of modified fragments; this may become a problem... */ - reset_modified_from_backup_copies(my_segnum, obj); + undo_modifications_to_single_obj(my_segnum, obj); + continue; } @@ -604,11 +404,15 @@ */ static void _validate_and_attach(struct stm_commit_log_entry_s *new) { + uintptr_t cle_length = 0; struct stm_commit_log_entry_s *old; OPT_ASSERT(new != NULL); OPT_ASSERT(new != INEV_RUNNING); + cle_length = list_count(STM_PSEGMENT->modified_old_objects); + assert(cle_length == new->written_count * 3); + soon_finished_or_inevitable_thread_segment(); retry_from_start: @@ -617,6 +421,16 @@ stm_abort_transaction(); } + if (cle_length != list_count(STM_PSEGMENT->modified_old_objects)) { + /* something changed the list of modified objs during _stm_validate; or + * during a major GC that also does _stm_validate(). That "something" + * can only be a reset of a noconflict obj. Thus, we recreate the CL + * entry */ + free_cle(new); + new = _create_commit_log_entry(); + cle_length = list_count(STM_PSEGMENT->modified_old_objects); + } + #if STM_TESTS if (STM_PSEGMENT->transaction_state != TS_INEVITABLE && STM_PSEGMENT->last_commit_log_entry->next == INEV_RUNNING) { @@ -815,6 +629,10 @@ size_t start_offset; if (first_call) { start_offset = 0; + + /* flags like a never-touched obj */ + assert(obj->stm_flags & GCFLAG_WRITE_BARRIER); + assert(!(obj->stm_flags & GCFLAG_WB_EXECUTED)); } else { start_offset = -1; } @@ -1249,8 +1067,7 @@ assert(tree_is_cleared(STM_PSEGMENT->nursery_objects_shadows)); assert(tree_is_cleared(STM_PSEGMENT->callbacks_on_commit_and_abort[0])); assert(tree_is_cleared(STM_PSEGMENT->callbacks_on_commit_and_abort[1])); - assert(list_is_empty(STM_PSEGMENT->young_objects_with_light_finalizers)); - assert(STM_PSEGMENT->finalizers == NULL); + assert(list_is_empty(STM_PSEGMENT->young_objects_with_destructors)); assert(STM_PSEGMENT->active_queues == NULL); #ifndef NDEBUG /* this should not be used when objects_pointing_to_nursery == NULL */ @@ -1259,6 +1076,13 @@ check_nursery_at_transaction_start(); + if (tl->mem_reset_on_abort) { + assert(!!tl->mem_stored_for_reset_on_abort); + memcpy(tl->mem_stored_for_reset_on_abort, tl->mem_reset_on_abort, + tl->mem_bytes_to_reset_on_abort); + } + + /* Change read-version here, because if we do stm_validate in the safe-point below, we should not see our old reads from the last transaction. */ @@ -1282,7 +1106,7 @@ } #ifdef STM_NO_AUTOMATIC_SETJMP -static int did_abort = 0; +int did_abort = 0; #endif long _stm_start_transaction(stm_thread_local_t *tl) @@ -1294,6 +1118,12 @@ #else long repeat_count = stm_rewind_jmp_setjmp(tl); #endif + if (repeat_count) { + /* only if there was an abort, we need to reset the memory: */ + if (tl->mem_reset_on_abort) + memcpy(tl->mem_reset_on_abort, tl->mem_stored_for_reset_on_abort, + tl->mem_bytes_to_reset_on_abort); + } _do_start_transaction(tl); if (repeat_count == 0) { /* else, 'nursery_mark' was already set @@ -1419,6 +1249,7 @@ push_large_overflow_objects_to_other_segments(); /* push before validate. otherwise they are reachable too early */ + /* before releasing _stm_detached_inevitable_from_thread, perform the commit. Otherwise, the same thread whose (inev) transaction we try to commit here may start a new one in another segment *but* w/o @@ -1436,7 +1267,7 @@ /* but first, emit commit-event of this thread: */ timing_event(STM_SEGMENT->running_thread, STM_TRANSACTION_COMMIT); STM_SEGMENT->running_thread = NULL; - write_fence(); + stm_write_fence(); assert(_stm_detached_inevitable_from_thread == -1); _stm_detached_inevitable_from_thread = 0; } @@ -1481,6 +1312,48 @@ invoke_general_finalizers(tl); } +static void undo_modifications_to_single_obj(int segment_num, object_t *obj) +{ + /* special function used for noconflict objs to reset all their + * modifications and make them appear untouched in the current transaction. + * I.e., reset modifications and remove from all lists. */ + + struct stm_priv_segment_info_s *pseg = get_priv_segment(segment_num); + + reset_modified_from_backup_copies(segment_num, obj); + + /* reset read marker (must not be considered read either) */ + ((struct stm_read_marker_s *) + (pseg->pub.segment_base + (((uintptr_t)obj) >> 4)))->rm = 0; + + /* reset possibly marked cards */ + if (get_page_status_in(segment_num, (uintptr_t)obj / 4096) == PAGE_ACCESSIBLE + && obj_should_use_cards(pseg->pub.segment_base, obj)) { + /* if header is not accessible, we didn't mark any cards */ + _reset_object_cards(pseg, obj, CARD_CLEAR, false, false); + } + + /* remove from all other lists */ + LIST_FOREACH_R(pseg->old_objects_with_cards_set, object_t * /*item*/, + { + if (item == obj) { + /* copy last element over this one (HACK) */ + _lst->count -= 1; + _lst->items[_i] = _lst->items[_lst->count]; + break; + } + }); + LIST_FOREACH_R(pseg->objects_pointing_to_nursery, object_t * /*item*/, + { + if (item == obj) { + /* copy last element over this one (HACK) */ + _lst->count -= 1; + _lst->items[_i] = _lst->items[_lst->count]; + break; + } + }); +} + static void reset_modified_from_backup_copies(int segment_num, object_t *only_obj) { #pragma push_macro("STM_PSEGMENT") @@ -1490,6 +1363,9 @@ assert(modification_lock_check_wrlock(segment_num)); DEBUG_EXPECT_SEGFAULT(false); + /* WARNING: resetting the obj will remove the WB flag. Make sure you either + * re-add it or remove it from lists where it was added based on the flag. */ + struct stm_priv_segment_info_s *pseg = get_priv_segment(segment_num); struct list_s *list = pseg->modified_old_objects; struct stm_undo_s *undo = (struct stm_undo_s *)list->items; @@ -1500,7 +1376,7 @@ continue; object_t *obj = undo->object; - if (only_obj != NULL && obj != only_obj) + if (UNLIKELY(only_obj != NULL) && LIKELY(obj != only_obj)) continue; char *dst = REAL_ADDRESS(pseg->pub.segment_base, obj); @@ -1515,19 +1391,15 @@ free_bk(undo); - if (only_obj != NULL) { - assert(IMPLY(only_obj != NULL, - (((struct object_s *)dst)->stm_flags - & (GCFLAG_NO_CONFLICT - | GCFLAG_WRITE_BARRIER - | GCFLAG_WB_EXECUTED)) - == (GCFLAG_NO_CONFLICT | GCFLAG_WRITE_BARRIER))); + if (UNLIKELY(only_obj != NULL)) { + assert(((struct object_s *)dst)->stm_flags & GCFLAG_NO_CONFLICT); + /* copy last element over this one */ end--; list->count -= 3; - if (undo < end) - *undo = *end; - undo--; /* next itr */ + *undo = *end; + /* to neutralise the increment for the next iter: */ + undo--; } } @@ -1652,6 +1524,13 @@ if (tl->mem_clear_on_abort) memset(tl->mem_clear_on_abort, 0, tl->mem_bytes_to_clear_on_abort); + if (tl->mem_reset_on_abort) { + /* temporarily set the memory of mem_reset_on_abort to zeros since in the + case of vmprof, the old value is really wrong if we didn't do the longjmp + back yet (that restores the C stack). We restore the memory in + _stm_start_transaction() */ + memset(tl->mem_reset_on_abort, 0, tl->mem_bytes_to_reset_on_abort); + } invoke_and_clear_user_callbacks(1); /* for abort */ @@ -1760,7 +1639,7 @@ 0. We have to wait for this to happen bc. otherwise, eg. _stm_detach_inevitable_transaction is not safe to do yet */ while (_stm_detached_inevitable_from_thread == -1) - spin_loop(); + stm_spin_loop(); assert(_stm_detached_inevitable_from_thread == 0); soon_finished_or_inevitable_thread_segment(); @@ -1830,13 +1709,13 @@ assert(STM_PSEGMENT->privatization_lock); assert(obj->stm_flags & GCFLAG_WRITE_BARRIER); assert(!(obj->stm_flags & GCFLAG_WB_EXECUTED)); + assert(!(obj->stm_flags & GCFLAG_CARDS_SET)); ssize_t obj_size = stmcb_size_rounded_up( (struct object_s *)REAL_ADDRESS(STM_SEGMENT->segment_base, obj)); OPT_ASSERT(obj_size >= 16); if (LIKELY(is_small_uniform(obj))) { - assert(!(obj->stm_flags & GCFLAG_CARDS_SET)); OPT_ASSERT(obj_size <= GC_LAST_SMALL_SIZE); _synchronize_fragment((stm_char *)obj, obj_size); return; diff --git a/c8/stm/core.h b/c8/stm/core.h --- a/c8/stm/core.h +++ b/c8/stm/core.h @@ -1,3 +1,9 @@ +#ifndef _STMGC_H +# error "must be compiled via stmgc.c" +# include "../stmgc.h" // silence flymake +#endif + + #define _STM_CORE_H_ #include <stdlib.h> @@ -7,7 +13,8 @@ #include <errno.h> #include <pthread.h> #include <signal.h> - +#include <stdbool.h> +#include "list.h" /************************************************************/ @@ -139,9 +146,9 @@ pthread_t running_pthread; #endif - /* light finalizers */ - struct list_s *young_objects_with_light_finalizers; - struct list_s *old_objects_with_light_finalizers; + /* destructors */ + struct list_s *young_objects_with_destructors; + struct list_s *old_objects_with_destructors; /* regular finalizers (objs from the current transaction only) */ struct finalizers_s *finalizers; @@ -304,6 +311,14 @@ static bool _stm_validate(void); static void _core_commit_transaction(bool external); +static void import_objects( + int from_segnum, /* or -1: from undo->backup, + or -2: from undo->backup if not modified */ + uintptr_t pagenum, /* or -1: "all accessible" */ + struct stm_undo_s *undo, + struct stm_undo_s *end); + + static inline bool was_read_remote(char *base, object_t *obj) { uint8_t other_transaction_read_version = @@ -326,12 +341,12 @@ static inline void acquire_privatization_lock(int segnum) { - spinlock_acquire(get_priv_segment(segnum)->privatization_lock); + stm_spinlock_acquire(get_priv_segment(segnum)->privatization_lock); } static inline void release_privatization_lock(int segnum) { - spinlock_release(get_priv_segment(segnum)->privatization_lock); + stm_spinlock_release(get_priv_segment(segnum)->privatization_lock); } static inline bool all_privatization_locks_acquired(void) diff --git a/c8/stm/detach.c b/c8/stm/detach.c --- a/c8/stm/detach.c +++ b/c8/stm/detach.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif #include <errno.h> @@ -107,7 +108,7 @@ is reset to a value different from -1 */ dprintf(("reattach_transaction: busy wait...\n")); while (_stm_detached_inevitable_from_thread == -1) - spin_loop(); + stm_spin_loop(); /* then retry */ goto restart; @@ -157,7 +158,7 @@ /* busy-loop: wait until _stm_detached_inevitable_from_thread is reset to a value different from -1 */ while (_stm_detached_inevitable_from_thread == -1) - spin_loop(); + stm_spin_loop(); goto restart; } if (!__sync_bool_compare_and_swap(&_stm_detached_inevitable_from_thread, @@ -209,7 +210,7 @@ /* busy-loop: wait until _stm_detached_inevitable_from_thread is reset to a value different from -1 */ while (_stm_detached_inevitable_from_thread == -1) - spin_loop(); + stm_spin_loop(); goto restart; } } diff --git a/c8/stm/extra.c b/c8/stm/extra.c --- a/c8/stm/extra.c +++ b/c8/stm/extra.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif diff --git a/c8/stm/finalizer.c b/c8/stm/finalizer.c --- a/c8/stm/finalizer.c +++ b/c8/stm/finalizer.c @@ -1,68 +1,100 @@ - +#ifndef _STM_CORE_H_ +# error "must be compiled via stmgc.c" +# include "core.h" // silence flymake +#endif +#include "finalizer.h" +#include "fprintcolor.h" +#include "nursery.h" +#include "gcpage.h" /* callbacks */ -void (*stmcb_light_finalizer)(object_t *); +void (*stmcb_destructor)(object_t *); void (*stmcb_finalizer)(object_t *); static void init_finalizers(struct finalizers_s *f) { f->objects_with_finalizers = list_create(); - f->count_non_young = 0; - f->run_finalizers = NULL; - f->running_next = NULL; + f->probably_young_objects_with_finalizers = list_create(); + f->run_finalizers = list_create(); + f->lock = 0; + f->running_trigger_now = NULL; } static void setup_finalizer(void) { init_finalizers(&g_finalizers); + + for (long j = 1; j < NB_SEGMENTS; j++) { + struct stm_priv_segment_info_s *pseg = get_priv_segment(j); + + assert(pseg->finalizers == NULL); + struct finalizers_s *f = malloc(sizeof(struct finalizers_s)); + if (f == NULL) + stm_fatalerror("out of memory in create_finalizers"); /* XXX */ + init_finalizers(f); + pseg->finalizers = f; + } } -static void teardown_finalizer(void) +void stm_setup_finalizer_queues(int number, stm_finalizer_trigger_fn *triggers) { - if (g_finalizers.run_finalizers != NULL) - list_free(g_finalizers.run_finalizers); - list_free(g_finalizers.objects_with_finalizers); + assert(g_finalizer_triggers.count == 0); + assert(g_finalizer_triggers.triggers == NULL); + + g_finalizer_triggers.count = number; + g_finalizer_triggers.triggers = (stm_finalizer_trigger_fn *) + malloc(number * sizeof(stm_finalizer_trigger_fn)); + + for (int qindex = 0; qindex < number; qindex++) { + g_finalizer_triggers.triggers[qindex] = triggers[qindex]; + dprintf(("setup_finalizer_queue(qindex=%d,fun=%p)\n", qindex, triggers[qindex])); + } +} + +static void teardown_finalizer(void) { + LIST_FREE(g_finalizers.run_finalizers); + LIST_FREE(g_finalizers.objects_with_finalizers); + LIST_FREE(g_finalizers.probably_young_objects_with_finalizers); memset(&g_finalizers, 0, sizeof(g_finalizers)); + + if (g_finalizer_triggers.triggers) + free(g_finalizer_triggers.triggers); + memset(&g_finalizer_triggers, 0, sizeof(g_finalizer_triggers)); } static void _commit_finalizers(void) { /* move finalizer lists to g_finalizers for major collections */ while (__sync_lock_test_and_set(&g_finalizers.lock, 1) != 0) { - spin_loop(); + stm_spin_loop(); } - if (STM_PSEGMENT->finalizers->run_finalizers != NULL) { + struct finalizers_s *local_fs = STM_PSEGMENT->finalizers; + if (!list_is_empty(local_fs->run_finalizers)) { /* copy 'STM_PSEGMENT->finalizers->run_finalizers' into 'g_finalizers.run_finalizers', dropping any initial NULLs (finalizers already called) */ - struct list_s *src = STM_PSEGMENT->finalizers->run_finalizers; - uintptr_t frm = 0; - if (STM_PSEGMENT->finalizers->running_next != NULL) { - frm = *STM_PSEGMENT->finalizers->running_next; - assert(frm <= list_count(src)); - *STM_PSEGMENT->finalizers->running_next = (uintptr_t)-1; - } - if (frm < list_count(src)) { - if (g_finalizers.run_finalizers == NULL) - g_finalizers.run_finalizers = list_create(); + struct list_s *src = local_fs->run_finalizers; + if (list_count(src)) { g_finalizers.run_finalizers = list_extend( g_finalizers.run_finalizers, - src, frm); + src, 0); } - list_free(src); } + LIST_FREE(local_fs->run_finalizers); /* copy the whole 'STM_PSEGMENT->finalizers->objects_with_finalizers' into 'g_finalizers.objects_with_finalizers' */ g_finalizers.objects_with_finalizers = list_extend( g_finalizers.objects_with_finalizers, - STM_PSEGMENT->finalizers->objects_with_finalizers, 0); - list_free(STM_PSEGMENT->finalizers->objects_with_finalizers); + local_fs->objects_with_finalizers, 0); + LIST_FREE(local_fs->objects_with_finalizers); + assert(list_is_empty(local_fs->probably_young_objects_with_finalizers)); + LIST_FREE(local_fs->probably_young_objects_with_finalizers); - free(STM_PSEGMENT->finalizers); - STM_PSEGMENT->finalizers = NULL; + // re-init + init_finalizers(local_fs); __sync_lock_release(&g_finalizers.lock); } @@ -71,24 +103,22 @@ { /* like _commit_finalizers(), but forget everything from the current transaction */ - if (pseg->finalizers != NULL) { - if (pseg->finalizers->run_finalizers != NULL) { - if (pseg->finalizers->running_next != NULL) { - *pseg->finalizers->running_next = (uintptr_t)-1; - } - list_free(pseg->finalizers->run_finalizers); - } - list_free(pseg->finalizers->objects_with_finalizers); - free(pseg->finalizers); - pseg->finalizers = NULL; - } + LIST_FREE(pseg->finalizers->run_finalizers); + LIST_FREE(pseg->finalizers->objects_with_finalizers); + LIST_FREE(pseg->finalizers->probably_young_objects_with_finalizers); + // re-init + init_finalizers(pseg->finalizers); + + // if we were running triggers, release the lock: + if (g_finalizers.running_trigger_now == pseg) + g_finalizers.running_trigger_now = NULL; /* call the light finalizers for objects that are about to be forgotten from the current transaction */ char *old_gs_register = STM_SEGMENT->segment_base; bool must_fix_gs = old_gs_register != pseg->pub.segment_base; - struct list_s *lst = pseg->young_objects_with_light_finalizers; + struct list_s *lst = pseg->young_objects_with_destructors; long i, count = list_count(lst); if (lst > 0) { for (i = 0; i < count; i++) { @@ -98,15 +128,15 @@ set_gs_register(pseg->pub.segment_base); must_fix_gs = false; } - stmcb_light_finalizer(obj); + stmcb_destructor(obj); } list_clear(lst); } /* also deals with overflow objects: they are at the tail of - old_objects_with_light_finalizers (this list is kept in order + old_objects_with_destructors (this list is kept in order and we cannot add any already-committed object) */ - lst = pseg->old_objects_with_light_finalizers; + lst = pseg->old_objects_with_destructors; count = list_count(lst); while (count > 0) { object_t *obj = (object_t *)list_item(lst, --count); @@ -117,7 +147,7 @@ set_gs_register(pseg->pub.segment_base); must_fix_gs = false; } - stmcb_light_finalizer(obj); + stmcb_destructor(obj); } if (STM_SEGMENT->segment_base != old_gs_register) @@ -125,44 +155,42 @@ } -void stm_enable_light_finalizer(object_t *obj) +void stm_enable_destructor(object_t *obj) { if (_is_young(obj)) { - LIST_APPEND(STM_PSEGMENT->young_objects_with_light_finalizers, obj); + LIST_APPEND(STM_PSEGMENT->young_objects_with_destructors, obj); } else { assert(_is_from_same_transaction(obj)); - LIST_APPEND(STM_PSEGMENT->old_objects_with_light_finalizers, obj); + LIST_APPEND(STM_PSEGMENT->old_objects_with_destructors, obj); } } -object_t *stm_allocate_with_finalizer(ssize_t size_rounded_up) + +void stm_enable_finalizer(int queue_index, object_t *obj) { - object_t *obj = _stm_allocate_external(size_rounded_up); - - if (STM_PSEGMENT->finalizers == NULL) { - struct finalizers_s *f = malloc(sizeof(struct finalizers_s)); - if (f == NULL) - stm_fatalerror("out of memory in create_finalizers"); /* XXX */ - init_finalizers(f); - STM_PSEGMENT->finalizers = f; + if (_is_young(obj)) { + LIST_APPEND(STM_PSEGMENT->finalizers->probably_young_objects_with_finalizers, obj); + LIST_APPEND(STM_PSEGMENT->finalizers->probably_young_objects_with_finalizers, queue_index); } - assert(STM_PSEGMENT->finalizers->count_non_young - <= list_count(STM_PSEGMENT->finalizers->objects_with_finalizers)); - LIST_APPEND(STM_PSEGMENT->finalizers->objects_with_finalizers, obj); - return obj; + else { + assert(_is_from_same_transaction(obj)); + LIST_APPEND(STM_PSEGMENT->finalizers->objects_with_finalizers, obj); + LIST_APPEND(STM_PSEGMENT->finalizers->objects_with_finalizers, queue_index); + } } + /************************************************************/ -/* Light finalizers +/* Destructors */ -static void deal_with_young_objects_with_finalizers(void) +static void deal_with_young_objects_with_destructors(void) { - /* for light finalizers: executes finalizers for objs that don't survive + /* for destructors: executes destructors for objs that don't survive this minor gc */ - struct list_s *lst = STM_PSEGMENT->young_objects_with_light_finalizers; + struct list_s *lst = STM_PSEGMENT->young_objects_with_destructors; long i, count = list_count(lst); for (i = 0; i < count; i++) { object_t *obj = (object_t *)list_item(lst, i); @@ -171,28 +199,29 @@ object_t *TLPREFIX *pforwarded_array = (object_t *TLPREFIX *)obj; if (pforwarded_array[0] != GCWORD_MOVED) { /* not moved: the object dies */ - stmcb_light_finalizer(obj); + stmcb_destructor(obj); } else { obj = pforwarded_array[1]; /* moved location */ assert(!_is_young(obj)); - LIST_APPEND(STM_PSEGMENT->old_objects_with_light_finalizers, obj); + LIST_APPEND(STM_PSEGMENT->old_objects_with_destructors, obj); } } list_clear(lst); } -static void deal_with_old_objects_with_finalizers(void) +static void deal_with_old_objects_with_destructors(void) { - /* for light finalizers */ + /* for destructors */ int old_gs_register = STM_SEGMENT->segment_num; int current_gs_register = old_gs_register; long j; - assert(list_is_empty(get_priv_segment(0)->old_objects_with_light_finalizers)); + assert(list_is_empty(get_priv_segment(0)->old_objects_with_destructors)); for (j = 1; j < NB_SEGMENTS; j++) { struct stm_priv_segment_info_s *pseg = get_priv_segment(j); - struct list_s *lst = pseg->old_objects_with_light_finalizers; + assert(list_is_empty(pseg->young_objects_with_destructors)); + struct list_s *lst = pseg->old_objects_with_destructors; long i, count = list_count(lst); lst->count = 0; for (i = 0; i < count; i++) { @@ -214,7 +243,7 @@ set_gs_register(get_segment_base(j)); current_gs_register = j; } - stmcb_light_finalizer(obj); + stmcb_destructor(obj); } else { /* object survives */ @@ -227,6 +256,7 @@ } + /************************************************************/ /* Algorithm for regular (non-light) finalizers. Follows closely pypy/doc/discussion/finalizer-order.rst @@ -325,20 +355,23 @@ struct list_s *marked = list_create(); + assert(list_is_empty(f->probably_young_objects_with_finalizers)); struct list_s *lst = f->objects_with_finalizers; long i, count = list_count(lst); lst->count = 0; - f->count_non_young = 0; - for (i = 0; i < count; i++) { + for (i = 0; i < count; i += 2) { object_t *x = (object_t *)list_item(lst, i); + uintptr_t qindex = list_item(lst, i + 1); assert(_finalization_state(x) != 1); if (_finalization_state(x) >= 2) { list_set_item(lst, lst->count++, (uintptr_t)x); + list_set_item(lst, lst->count++, qindex); continue; } LIST_APPEND(marked, x); + LIST_APPEND(marked, qindex); struct list_s *pending = _finalizer_pending; LIST_APPEND(pending, x); @@ -370,27 +403,29 @@ struct list_s *run_finalizers = f->run_finalizers; long i, count = list_count(marked); - for (i = 0; i < count; i++) { + for (i = 0; i < count; i += 2) { object_t *x = (object_t *)list_item(marked, i); + uintptr_t qindex = list_item(marked, i + 1); int state = _finalization_state(x); assert(state >= 2); if (state == 2) { - if (run_finalizers == NULL) - run_finalizers = list_create(); LIST_APPEND(run_finalizers, x); + LIST_APPEND(run_finalizers, qindex); _recursively_bump_finalization_state_from_2_to_3(pseg, x); } else { struct list_s *lst = f->objects_with_finalizers; list_set_item(lst, lst->count++, (uintptr_t)x); + list_set_item(lst, lst->count++, qindex); } } - list_free(marked); + LIST_FREE(marked); f->run_finalizers = run_finalizers; } + static void deal_with_objects_with_finalizers(void) { /* for non-light finalizers */ @@ -433,11 +468,10 @@ static void mark_visit_from_finalizer1( struct stm_priv_segment_info_s *pseg, struct finalizers_s *f) { - if (f != NULL && f->run_finalizers != NULL) { - LIST_FOREACH_R(f->run_finalizers, object_t * /*item*/, - ({ - mark_visit_possibly_overflow_object(item, pseg); - })); + long i, count = list_count(f->run_finalizers); + for (i = 0; i < count; i += 2) { + object_t *x = (object_t *)list_item(f->run_finalizers, i); + mark_visit_possibly_overflow_object(x, pseg); } } @@ -451,40 +485,6 @@ mark_visit_from_finalizer1(get_priv_segment(0), &g_finalizers); } -static void _execute_finalizers(struct finalizers_s *f) -{ - if (f->run_finalizers == NULL) - return; /* nothing to do */ - - restart: - if (f->running_next != NULL) - return; /* in a nested invocation of execute_finalizers() */ - - uintptr_t next = 0, total = list_count(f->run_finalizers); - f->running_next = &next; - - while (next < total) { - object_t *obj = (object_t *)list_item(f->run_finalizers, next); - list_set_item(f->run_finalizers, next, 0); - next++; - - stmcb_finalizer(obj); - } - if (next == (uintptr_t)-1) { - /* transaction committed: the whole 'f' was freed */ - return; - } - f->running_next = NULL; - - if (f->run_finalizers->count > total) { - memmove(f->run_finalizers->items, - f->run_finalizers->items + total, - (f->run_finalizers->count - total) * sizeof(uintptr_t)); - goto restart; - } - - LIST_FREE(f->run_finalizers); -} /* XXX: according to translator.backendopt.finalizer, getfield_gc for primitive types is a safe op in light finalizers. @@ -492,43 +492,185 @@ getfield on *dying obj*). */ +static void _trigger_finalizer_queues(struct finalizers_s *f) +{ + /* runs triggers of finalizer queues that have elements in the queue. May + NOT run outside of a transaction, but triggers never leave the + transactional zone. + + returns true if there are also old-style finalizers to run */ + assert(in_transaction(STM_PSEGMENT->pub.running_thread)); + + bool *to_trigger = (bool*)alloca(g_finalizer_triggers.count * sizeof(bool)); + memset(to_trigger, 0, g_finalizer_triggers.count * sizeof(bool)); + + while (__sync_lock_test_and_set(&f->lock, 1) != 0) { + /* somebody is adding more finalizers (_commit_finalizer()) */ + stm_spin_loop(); + } + + int count = list_count(f->run_finalizers); + for (int i = 0; i < count; i += 2) { + int qindex = (int)list_item(f->run_finalizers, i + 1); + dprintf(("qindex=%d\n", qindex)); + to_trigger[qindex] = true; + } + + __sync_lock_release(&f->lock); + + // trigger now: + for (int i = 0; i < g_finalizer_triggers.count; i++) { + if (to_trigger[i]) { + dprintf(("invoke-finalizer-trigger(qindex=%d)\n", i)); + g_finalizer_triggers.triggers[i](); + } + } +} + +static bool _has_oldstyle_finalizers(struct finalizers_s *f) +{ + int count = list_count(f->run_finalizers); + for (int i = 0; i < count; i += 2) { + int qindex = (int)list_item(f->run_finalizers, i + 1); + if (qindex == -1) + return true; + } + return false; +} + +static void _invoke_local_finalizers() +{ + /* called inside a transaction; invoke local triggers, process old-style + * local finalizers */ + dprintf(("invoke_local_finalizers %lu\n", list_count(STM_PSEGMENT->finalizers->run_finalizers))); + if (list_is_empty(STM_PSEGMENT->finalizers->run_finalizers) + && list_is_empty(g_finalizers.run_finalizers)) + return; + + struct stm_priv_segment_info_s *pseg = get_priv_segment(STM_SEGMENT->segment_num); + //try to run local triggers + if (STM_PSEGMENT->finalizers->running_trigger_now == NULL) { + // we are not recursively running them + STM_PSEGMENT->finalizers->running_trigger_now = pseg; + _trigger_finalizer_queues(STM_PSEGMENT->finalizers); + STM_PSEGMENT->finalizers->running_trigger_now = NULL; + } + + // try to run global triggers + if (__sync_lock_test_and_set(&g_finalizers.running_trigger_now, pseg) == NULL) { + // nobody is already running these triggers (recursively) + _trigger_finalizer_queues(&g_finalizers); + g_finalizers.running_trigger_now = NULL; + } + + if (!_has_oldstyle_finalizers(STM_PSEGMENT->finalizers)) + return; // no oldstyle to run + + object_t *obj; + while ((obj = stm_next_to_finalize(-1)) != NULL) { + stmcb_finalizer(obj); + } +} + static void _invoke_general_finalizers(stm_thread_local_t *tl) { - /* called between transactions */ + /* called between transactions + * triggers not called here, since all should have been called already in _invoke_local_finalizers! + * run old-style finalizers (q_index=-1) + * queues that are not empty. */ + dprintf(("invoke_general_finalizers %lu\n", list_count(g_finalizers.run_finalizers))); + if (list_is_empty(g_finalizers.run_finalizers)) + return; + + if (!_has_oldstyle_finalizers(&g_finalizers)) + return; // no oldstyle to run + + // run old-style finalizers: rewind_jmp_buf rjbuf; stm_rewind_jmp_enterframe(tl, &rjbuf); _stm_start_transaction(tl); - /* XXX: become inevitable, bc. otherwise, we would need to keep - around the original g_finalizers.run_finalizers to restore it - in case of an abort. */ - _stm_become_inevitable(MSG_INEV_DONT_SLEEP); - /* did it work? */ - if (STM_PSEGMENT->transaction_state != TS_INEVITABLE) { /* no */ - /* avoid blocking here, waiting for another INEV transaction. - If we did that, application code could not proceed (start the - next transaction) and it will not be obvious from the profile - why we were WAITing. */ - _stm_commit_transaction(); - stm_rewind_jmp_leaveframe(tl, &rjbuf); - return; - } - while (__sync_lock_test_and_set(&g_finalizers.lock, 1) != 0) { - /* somebody is adding more finalizers (_commit_finalizer()) */ - spin_loop(); - } - struct finalizers_s copy = g_finalizers; - assert(copy.running_next == NULL); - g_finalizers.run_finalizers = NULL; - /* others may add to g_finalizers again: */ - __sync_lock_release(&g_finalizers.lock); - - if (copy.run_finalizers != NULL) { - _execute_finalizers(©); + dprintf(("invoke_oldstyle_finalizers %lu\n", list_count(g_finalizers.run_finalizers))); + object_t *obj; + while ((obj = stm_next_to_finalize(-1)) != NULL) { + assert(STM_PSEGMENT->transaction_state == TS_INEVITABLE); + stmcb_finalizer(obj); } _stm_commit_transaction(); stm_rewind_jmp_leaveframe(tl, &rjbuf); +} - LIST_FREE(copy.run_finalizers); +object_t* stm_next_to_finalize(int queue_index) { + assert(STM_PSEGMENT->transaction_state != TS_NONE); + + /* first check local run_finalizers queue, then global */ + if (!list_is_empty(STM_PSEGMENT->finalizers->run_finalizers)) { + struct list_s *lst = STM_PSEGMENT->finalizers->run_finalizers; + int count = list_count(lst); + for (int i = 0; i < count; i += 2) { + int qindex = (int)list_item(lst, i + 1); + if (qindex == queue_index) { + /* no need to become inevitable for local ones */ + /* Remove obj from list and return it. */ + object_t *obj = (object_t*)list_item(lst, i); + int remaining = count - i - 2; + if (remaining > 0) { + memmove(&lst->items[i], + &lst->items[i + 2], + remaining * sizeof(uintptr_t)); + } + lst->count -= 2; + return obj; + } + } + } + + /* no local finalizers found, continue in global list */ + + while (__sync_lock_test_and_set(&g_finalizers.lock, 1) != 0) { + /* somebody is adding more finalizers (_commit_finalizer()) */ + stm_spin_loop(); + } + + struct list_s *lst = g_finalizers.run_finalizers; + int count = list_count(lst); + for (int i = 0; i < count; i += 2) { + int qindex = (int)list_item(lst, i + 1); + if (qindex == queue_index) { + /* XXX: become inevitable, bc. otherwise, we would need to keep + around the original g_finalizers.run_finalizers to restore it + in case of an abort. */ + if (STM_PSEGMENT->transaction_state != TS_INEVITABLE) { + _stm_become_inevitable(MSG_INEV_DONT_SLEEP); + /* did it work? */ + if (STM_PSEGMENT->transaction_state != TS_INEVITABLE) { /* no */ + /* avoid blocking here, waiting for another INEV transaction. + If we did that, application code could not proceed (start the + next transaction) and it will not be obvious from the profile + why we were WAITing. XXX: still true? */ + __sync_lock_release(&g_finalizers.lock); + return NULL; + } + } + + /* Remove obj from list and return it. */ + object_t *obj = (object_t*)list_item(lst, i); + int remaining = count - i - 2; + if (remaining > 0) { + memmove(&lst->items[i], + &lst->items[i + 2], + remaining * sizeof(uintptr_t)); + } + lst->count -= 2; + + __sync_lock_release(&g_finalizers.lock); + return obj; + } + } + + /* others may add to g_finalizers again: */ + __sync_lock_release(&g_finalizers.lock); + + return NULL; } diff --git a/c8/stm/finalizer.h b/c8/stm/finalizer.h --- a/c8/stm/finalizer.h +++ b/c8/stm/finalizer.h @@ -1,16 +1,20 @@ +#ifndef _STM_FINALIZER_H_ +#define _STM_FINALIZER_H_ + +#include <stdint.h> /* see deal_with_objects_with_finalizers() for explanation of these fields */ struct finalizers_s { long lock; + struct stm_priv_segment_info_s * running_trigger_now; /* our PSEG, if we are running triggers */ struct list_s *objects_with_finalizers; - uintptr_t count_non_young; + struct list_s *probably_young_objects_with_finalizers; /* empty on g_finalizers! */ struct list_s *run_finalizers; - uintptr_t *running_next; }; static void mark_visit_from_finalizer_pending(void); -static void deal_with_young_objects_with_finalizers(void); -static void deal_with_old_objects_with_finalizers(void); +static void deal_with_young_objects_with_destructors(void); +static void deal_with_old_objects_with_destructors(void); static void deal_with_objects_with_finalizers(void); static void setup_finalizer(void); @@ -27,19 +31,22 @@ /* regular finalizers (objs from already-committed transactions) */ static struct finalizers_s g_finalizers; +static struct { + int count; + stm_finalizer_trigger_fn *triggers; +} g_finalizer_triggers; + static void _invoke_general_finalizers(stm_thread_local_t *tl); +static void _invoke_local_finalizers(void); #define invoke_general_finalizers(tl) do { \ - if (g_finalizers.run_finalizers != NULL) \ - _invoke_general_finalizers(tl); \ + _invoke_general_finalizers(tl); \ } while (0) -static void _execute_finalizers(struct finalizers_s *f); -#define any_local_finalizers() (STM_PSEGMENT->finalizers != NULL && \ - STM_PSEGMENT->finalizers->run_finalizers != NULL) #define exec_local_finalizers() do { \ - if (any_local_finalizers()) \ - _execute_finalizers(STM_PSEGMENT->finalizers); \ + _invoke_local_finalizers(); \ } while (0) + +#endif diff --git a/c8/stm/forksupport.c b/c8/stm/forksupport.c --- a/c8/stm/forksupport.c +++ b/c8/stm/forksupport.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif #include <fcntl.h> /* For O_* constants */ diff --git a/c8/stm/fprintcolor.h b/c8/stm/fprintcolor.h --- a/c8/stm/fprintcolor.h +++ b/c8/stm/fprintcolor.h @@ -1,3 +1,7 @@ +#ifndef _FPRINTCOLOR_H +#define _FPRINTCOLOR_H + + /* ------------------------------------------------------------ */ #ifdef STM_DEBUGPRINT /* ------------------------------------------------------------ */ @@ -40,3 +44,5 @@ __attribute__((unused)) static void stm_fatalerror(const char *format, ...) __attribute__((format (printf, 1, 2), noreturn)); + +#endif diff --git a/c8/stm/gcpage.c b/c8/stm/gcpage.c --- a/c8/stm/gcpage.c +++ b/c8/stm/gcpage.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif static struct tree_s *tree_prebuilt_objs = NULL; /* XXX refactor */ @@ -75,7 +76,7 @@ /* uncommon case: need to initialize some more pages */ - spinlock_acquire(lock_growth_large); + stm_spinlock_acquire(lock_growth_large); char *start = uninitialized_page_start; if (addr + size > start) { @@ -99,7 +100,7 @@ ((struct object_s*)addr)->stm_flags = 0; - spinlock_release(lock_growth_large); + stm_spinlock_release(lock_growth_large); return (stm_char*)(addr - stm_object_pages); } @@ -188,7 +189,7 @@ DEBUG_EXPECT_SEGFAULT(true); release_all_privatization_locks(); - write_fence(); /* make sure 'nobj' is fully initialized from + stm_write_fence(); /* make sure 'nobj' is fully initialized from all threads here */ return (object_t *)nobj; } @@ -976,9 +977,9 @@ LIST_FREE(marked_objects_to_trace); - /* weakrefs and execute old light finalizers */ + /* weakrefs and execute old destructors */ stm_visit_old_weakrefs(); - deal_with_old_objects_with_finalizers(); + deal_with_old_objects_with_destructors(); /* cleanup */ clean_up_segment_lists(); diff --git a/c8/stm/gcpage.h b/c8/stm/gcpage.h --- a/c8/stm/gcpage.h +++ b/c8/stm/gcpage.h @@ -1,3 +1,7 @@ +#ifndef _STM_GCPAGE_H_ +#define _STM_GCPAGE_H_ + +#include <stdbool.h> /* Granularity when grabbing more unused pages: take 20 at a time */ #define GCPAGE_NUM_PAGES 20 @@ -22,3 +26,9 @@ static void major_collection_with_mutex(void); static bool largemalloc_keep_object_at(char *data); /* for largemalloc.c */ static bool smallmalloc_keep_object_at(char *data); /* for smallmalloc.c */ + +static inline bool mark_visited_test(object_t *obj); +static bool is_overflow_obj_safe(struct stm_priv_segment_info_s *pseg, object_t *obj); +static void mark_visit_possibly_overflow_object(object_t *obj, struct stm_priv_segment_info_s *pseg); + +#endif diff --git a/c8/stm/hash_id.c b/c8/stm/hash_id.c --- a/c8/stm/hash_id.c +++ b/c8/stm/hash_id.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif diff --git a/c8/stm/hashtable.c b/c8/stm/hashtable.c --- a/c8/stm/hashtable.c +++ b/c8/stm/hashtable.c @@ -40,7 +40,7 @@ Inspired by: http://ppl.stanford.edu/papers/podc011-bronson.pdf */ - +#include <stdint.h> uint32_t stm_hashtable_entry_userdata; @@ -216,7 +216,7 @@ } biggertable->resize_counter = rc; - write_fence(); /* make sure that 'biggertable' is valid here, + stm_write_fence(); /* make sure that 'biggertable' is valid here, and make sure 'table->resize_counter' is updated ('table' must be immutable from now on). */ VOLATILE_HASHTABLE(hashtable)->table = biggertable; @@ -278,7 +278,7 @@ just now. In both cases, this thread must simply spin loop. */ if (IS_EVEN(rc)) { - spin_loop(); + stm_spin_loop(); goto restart; } /* in the other cases, we need to grab the RESIZING_LOCK. @@ -348,7 +348,7 @@ hashtable->additions++; } table->items[i] = entry; - write_fence(); /* make sure 'table->items' is written here */ + stm_write_fence(); /* make sure 'table->items' is written here */ VOLATILE_TABLE(table)->resize_counter = rc - 6; /* unlock */ stm_read((object_t*)entry); return entry; @@ -437,7 +437,7 @@ table = VOLATILE_HASHTABLE(hashtable)->table; rc = VOLATILE_TABLE(table)->resize_counter; if (IS_EVEN(rc)) { - spin_loop(); + stm_spin_loop(); goto restart; } diff --git a/c8/stm/largemalloc.c b/c8/stm/largemalloc.c --- a/c8/stm/largemalloc.c +++ b/c8/stm/largemalloc.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif /* This contains a lot of inspiration from malloc() in the GNU C Library. @@ -116,12 +117,12 @@ static void lm_lock(void) { - spinlock_acquire(lm.lock); + stm_spinlock_acquire(lm.lock); } static void lm_unlock(void) { - spinlock_release(lm.lock); + stm_spinlock_release(lm.lock); } diff --git a/c8/stm/list.c b/c8/stm/list.c --- a/c8/stm/list.c +++ b/c8/stm/list.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif diff --git a/c8/stm/list.h b/c8/stm/list.h --- a/c8/stm/list.h +++ b/c8/stm/list.h @@ -1,5 +1,11 @@ +#ifndef _LIST_H +#define _LIST_H + + #include <stdlib.h> #include <stdbool.h> +#include <stdint.h> + /************************************************************/ @@ -11,13 +17,13 @@ static struct list_s *list_create(void) __attribute__((unused)); -static inline void list_free(struct list_s *lst) +static inline void _list_free(struct list_s *lst) { free(lst); } #define LIST_CREATE(lst) ((lst) = list_create()) -#define LIST_FREE(lst) (list_free(lst), (lst) = NULL) +#define LIST_FREE(lst) (_list_free(lst), (lst) = NULL) static struct list_s *_list_grow(struct list_s *, uintptr_t); @@ -245,3 +251,5 @@ TREE_FIND(tree, addr, result, return false); return true; } + +#endif diff --git a/c8/stm/marker.c b/c8/stm/marker.c --- a/c8/stm/marker.c +++ b/c8/stm/marker.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif diff --git a/c8/stm/misc.c b/c8/stm/misc.c --- a/c8/stm/misc.c +++ b/c8/stm/misc.c @@ -1,5 +1,6 @@ #ifndef _STM_CORE_H_ # error "must be compiled via stmgc.c" +# include "core.h" // silence flymake #endif diff --git a/c8/stm/nursery.c b/c8/stm/nursery.c --- a/c8/stm/nursery.c +++ b/c8/stm/nursery.c _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit