On Sat, Mar 14, 2026 at 5:56 PM Michael Paquier <[email protected]> wrote:
>
> On Fri, Mar 13, 2026 at 10:39:52AM +0800, Xuneng Zhou wrote:
> > Thanks for fixing this and for taking the time to review and test
> > the patches.
>
> Looking at the rest, I have produced some numbers:
> pgstattuple_small (20k tuples, io_uring) base= 60839.9ms
> patch=10949.9ms 5.56x ( 82.0%) (reads=4139->260,
> io_time=49616.97->55.25ms)
> pgstattuple_small (20k tuples, worker=3) base= 60577.5ms
> patch=11470.0ms 5.28x ( 81.1%) (reads=4139->260,
> io_time=49359.79->69.60ms)
> hash_vacuum (1M tuples, io_uring)  base=199929.0ms patch=161747.0ms
> 1.24x ( 19.1%) (reads=4665->1615, io_time=47084.8->9925.77ms)
> hash_vacuum (1M tuples, worker=12) base=203417.0ms patch=161687.0ms
> 1.26x ( 20.5%) (reads=4665->1615, io_time=48356.3->9917.24ms)
>
> The hash vacuum numbers are less amazing here than yours.  Trying out
> various configurations does not change the results much (I was puzzled
> for a couple of hours that I did not see any performance impact but
> forgot the eviction of the index pages from the shared buffers, that
> influences the numbers to what I have here), but I'll take it anyway.

My guess is that the results are influenced by the write delay. Vacuum
operations can be write-intensive, so when both read and write delays
are set to 2 ~ 5 ms, a large portion of the runtime may be spent on
writes. According to Amdahl’s Law, the overall performance improvement
from optimizing a single component is limited by the fraction of time
that component actually contributes to the total execution time. In
this case, the potential rate of speedup from streaming the read path
could be masked by the time spent performing writes.

To investigate this, I added a new option, write-delay. When it is set
to zero, the benchmark simulates a system with a fast write device and
a slow read device, reducing the proportion of time spent on writes.
Admittedly, this setup is somewhat artificial—we would not normally
expect such a large discrepancy between read and write performance in
real systems.

-- worker 12, write-delay 2 ms
hash_vacuum_medium         base= 33743.2ms  patch= 27371.3ms   1.23x
( 18.9%)  (reads=4662→1612, read_time=8242.51→1725.03ms,
writes=12689→12651, write_time=25144.87→25041.75ms)

-- worker 12, write-delay 0 ms
hash_vacuum_medium         base=  8601.1ms  patch=  2234.0ms   3.85x
( 74.0%)  (reads=4662→1612, read_time=8021.65→1637.87ms,
writes=12689→12651, write_time=337.38→288.15ms)

To better understand the behavior, the latest version of the script
separates the I/O time into read time and write time. This allows us
to directly observe their respective contributions and how they change
across runs. A further improvement would be to report the speedup for
the read and write components separately, making it easier to
understand where and how much the performance gains actually occur.

> One thing that I was wondering for the pgstattuple patch is if we
> should have "scanned" put outside the private data of the callback as
> we get back to the main loop once we know that the page is not
> all-visible, so we could increment the counter in the main loop
> instead of the callback.  Now I get that you have done that as it
> feels cleaner for the "default" return path of the callback, while the
> logic remains the same, so I have kept it as-is at the end, tweaked a
> few things, and applied this one.

Thanks for the review and for applying it. My reasoning for putting
scanned inside the callback was to keep all per-block accounting in
one place — the callback is already the point where the skip-vs-read
decision is made, so it seemed natural to count reads there as well.
But I agree the main loop would also be a clean spot for it.

> I have not been able to review yet the patch for the hash VACUUM
> proposal, which would be the last one.
> --
> Michael



-- 
Best,
Xuneng
#!/usr/bin/env bash
set -euo pipefail

###############################################################################
# Streaming Read Patches Benchmark
#
# Usage: ./run_streaming_bench.sh [OPTIONS] <patch>
#
# Options:
#   --clean           Remove existing builds and start fresh
#   --baseline        Also build and test vanilla PostgreSQL for comparison
#   --test TEST       Run specific test (bloom_scan, bloom_vacuum, pgstattuple,
#                     pgstatindex, gin_vacuum, wal_logging, hash_vacuum, or "all")
#   --io-method MODE  I/O method: io_uring, worker, or sync (default: io_uring)
#   --io-workers N    Number of I/O workers for worker mode (default: 3)
#   --io-concurrency N  Max concurrent I/Os per process (default: 64)
#   --direct-io         Enable direct IO (debug_io_direct=data), bypasses OS page cache
#   --read-delay MS     Simulate read latency via dm_delay (requires pre-created device)
#   --write-delay MS    Simulate write latency via dm_delay (default: 0, requires --read-delay)
#   --profile           Enable perf profiling and flamegraph generation
#
# Environment:
#   WORKROOT       Base directory (default: $HOME/pg_bench)
#   REPS           Repetitions per test (default: 5)
#   SIZES          Table sizes to test (default: "large")
#   FLAMEGRAPH_DIR Path to FlameGraph tools (default: $HOME/FlameGraph)
#   DM_DELAY_DEV   dm_delay device name for --read-delay (default: "delayed")
###############################################################################

log() { printf '\033[1;34m==>\033[0m %s\n' "$*"; }
die() { printf '\033[1;31mERROR:\033[0m %s\n' "$*" >&2; exit 1; }

# --- CLI parsing ---
CLEAN=0
BASELINE=0
DO_PROFILE=0
DIRECT_IO=0
IO_DELAY_MS=""
WRITE_DELAY_MS="0"
TEST="all"
IO_METHOD="${IO_METHOD:-io_uring}"
IO_WORKERS="${IO_WORKERS:-3}"
IO_MAX_CONCURRENCY="${IO_MAX_CONCURRENCY:-64}"
DM_DELAY_DEV="${DM_DELAY_DEV:-delayed}"
PATCH=""

while [[ $# -gt 0 ]]; do
  case "$1" in
    --clean)          CLEAN=1 ;;
    --baseline)       BASELINE=1 ;;
    --profile)        DO_PROFILE=1 ;;
    --direct-io)      DIRECT_IO=1 ;;
    --read-delay)     IO_DELAY_MS="$2"; shift ;;
    --write-delay)    WRITE_DELAY_MS="$2"; shift ;;
    --test)           TEST="$2"; shift ;;
    --io-method)      IO_METHOD="$2"; shift ;;
    --io-workers)     IO_WORKERS="$2"; shift ;;
    --io-concurrency) IO_MAX_CONCURRENCY="$2"; shift ;;
    -h|--help)        sed -n '3,27p' "$0" | sed 's/^# \?//'; exit 0 ;;
    -*)               die "Unknown option: $1" ;;
    *)                PATCH="$1" ;;
  esac
  shift
done

# Validate io_method
case "$IO_METHOD" in
  io_uring|worker|sync) ;;
  *) die "Invalid --io-method: $IO_METHOD (must be io_uring, worker, or sync)" ;;
esac

# Validate dm_delay device if --read-delay is used
if [[ -n "$IO_DELAY_MS" ]]; then
  command -v dmsetup >/dev/null 2>&1 || die "--read-delay requires dmsetup (sudo apt install dmsetup)"
  sudo dmsetup status "$DM_DELAY_DEV" >/dev/null 2>&1 \
    || die "dm_delay device '$DM_DELAY_DEV' not found. Create it first, e.g.:\n  umount /srv && dmsetup create $DM_DELAY_DEV --table \"0 \$(blockdev --getsz /dev/DEVICE) delay /dev/DEVICE 0 $IO_DELAY_MS\" && mount /dev/mapper/$DM_DELAY_DEV /srv/"
fi

[[ -z "$PATCH" ]] && die "Usage: $0 [--clean] [--baseline] [--test TEST] <patch>"
[[ ! -f "$PATCH" ]] && die "Patch not found: $PATCH"
[[ "$PATCH" != /* ]] && PATCH="$PWD/$PATCH"

# --- Profiling validation ---
FLAMEGRAPH_DIR="${FLAMEGRAPH_DIR:-$HOME/FlameGraph}"
PERF_SUDO="${PERF_SUDO:-sudo}"
PERF_EVENT="${PERF_EVENT:-cycles}"  # cycles = user+kernel; cycles:u = user-only
if [[ $DO_PROFILE -eq 1 ]]; then
  command -v perf >/dev/null 2>&1 || die "Need perf (sudo apt install linux-tools-$(uname -r))"
  [[ -x "$FLAMEGRAPH_DIR/stackcollapse-perf.pl" ]] || die "Missing $FLAMEGRAPH_DIR/stackcollapse-perf.pl (git clone https://github.com/brendangregg/FlameGraph)"
  [[ -x "$FLAMEGRAPH_DIR/flamegraph.pl" ]] || die "Missing $FLAMEGRAPH_DIR/flamegraph.pl"
fi

# --- Configuration ---
WORKROOT="${WORKROOT:-$HOME/pg_bench}"
REPS="${REPS:-5}"
SIZES="${SIZES:-large}"

ROOT_BASE="$WORKROOT/vanilla"
PATCH_TAG=$(basename "$PATCH" .patch | tr -dc '[:alnum:]_-' | cut -c1-40)
ROOT_PATCH="$WORKROOT/$PATCH_TAG"

# --- Helpers ---
pg() { echo "$1/pg/bin/$2"; }

pick_port() {
  for p in $(seq "${1:-5432}" 60000); do
    lsof -iTCP:"$p" -sTCP:LISTEN >/dev/null 2>&1 || { echo "$p"; return; }
  done
  die "No free port found"
}

set_io_delay() {
  local ms="$1"
  [[ -z "$IO_DELAY_MS" ]] && return
  local table size dev
  table=$(sudo dmsetup table "$DM_DELAY_DEV")
  size=$(echo "$table" | awk '{print $2}')
  dev=$(echo "$table" | awk '{print $4}')
  log "Setting dm_delay on $DM_DELAY_DEV to ${ms}ms read / ${WRITE_DELAY_MS}ms write"
  sudo dmsetup suspend "$DM_DELAY_DEV"
  sudo dmsetup reload "$DM_DELAY_DEV" --table "0 $size delay $dev 0 $ms $dev 0 $WRITE_DELAY_MS"
  sudo dmsetup resume "$DM_DELAY_DEV"
}

# --- Build PostgreSQL ---
build_pg() {
  local ROOT="$1" PATCH_FILE="${2:-}"
  
  [[ $CLEAN -eq 1 ]] && rm -rf "$ROOT"
  
  if [[ ! -x "$(pg "$ROOT" initdb)" ]]; then
    log "Building PostgreSQL: $ROOT"
    mkdir -p "$ROOT"
    
    git clone --depth 1 https://github.com/postgres/postgres "$ROOT/src" 2>/dev/null
    cd "$ROOT/src"
    
    [[ -n "$PATCH_FILE" ]] && { log "Applying patch"; git apply "$PATCH_FILE"; }
    
    ./configure --prefix="$ROOT/pg" --with-liburing \
      CFLAGS='-O2 -ggdb3 -fno-omit-frame-pointer' >/dev/null 2>&1
    
    make -j"$(nproc)" install >/dev/null 2>&1
  else
    log "Reusing build: $ROOT"
    cd "$ROOT/src"
  fi
  
  # Always install contribs (idempotent, catches reused builds missing new extensions)
  make -C contrib/bloom install >/dev/null 2>&1
  make -C contrib/pgstattuple install >/dev/null 2>&1
  make -C contrib/pg_buffercache install >/dev/null 2>&1
  make -C contrib/pg_prewarm install >/dev/null 2>&1
}

# --- Cluster management ---
init_cluster() {
  local ROOT="$1" PORT="$2"
  
  rm -rf "$ROOT/data"
  "$(pg "$ROOT" initdb)" -D "$ROOT/data" --no-locale >/dev/null 2>&1
  
  cat >> "$ROOT/data/postgresql.conf" <<EOF
port = $PORT
listen_addresses = '127.0.0.1'
shared_buffers = '32GB'
effective_io_concurrency = 200
io_method = $IO_METHOD
io_workers = $IO_WORKERS
io_max_concurrency = $IO_MAX_CONCURRENCY
track_io_timing = on
track_wal_io_timing = on
synchronous_commit = on
autovacuum = off
checkpoint_timeout = 1h
max_wal_size = 10GB
max_parallel_workers_per_gather = 0
EOF
  
  [[ $DIRECT_IO -eq 1 ]] && echo "debug_io_direct = data" >> "$ROOT/data/postgresql.conf"
  
  "$(pg "$ROOT" pg_ctl)" -D "$ROOT/data" -l "$ROOT/server.log" start -w >/dev/null
  
  psql_run "$ROOT" "$PORT" -c "CREATE EXTENSION IF NOT EXISTS pg_buffercache;"
  psql_run "$ROOT" "$PORT" -c "CREATE EXTENSION IF NOT EXISTS pg_prewarm;"
}

stop_cluster() {
  local ROOT="$1"
  "$(pg "$ROOT" pg_ctl)" -D "$ROOT/data" stop -m fast 2>/dev/null || true
}

drop_caches() {
  local ROOT="$1" PORT="$2"
  shift 2
  local rels=("$@")
  
  # Evict target relations from shared buffers (no PG restart needed)
  for rel in "${rels[@]}"; do
    psql_run "$ROOT" "$PORT" -c "SELECT pg_buffercache_evict_relation('${rel}'::regclass);" >/dev/null
  done
  
  # Drop OS page cache (skip with direct IO — no page cache involved)
  if [[ $DIRECT_IO -eq 0 ]]; then
    sync
    echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null 2>&1 || true
    sleep 2
  fi
}

psql_run() {
  local ROOT="$1" PORT="$2"
  shift 2
  "$(pg "$ROOT" psql)" -h 127.0.0.1 -p "$PORT" -d postgres -v ON_ERROR_STOP=1 -Atq "$@"
}

# --- Timing ---
run_timed() {
  local ROOT="$1" PORT="$2" SQL="$3"
  local ms
  # -X: ignore .psqlrc, -v ON_ERROR_STOP=1: fail on SQL errors
  # Parse last Time: line, handle both "ms" and "s" units
  ms=$("$(pg "$ROOT" psql)" -h 127.0.0.1 -p "$PORT" -d postgres -X -v ON_ERROR_STOP=1 -At \
    -c '\timing on' -c "$SQL" 2>&1 | \
    awk '
      /Time:/ {
        val=$2; unit=$3;
        if (unit=="ms") ms=val;
        else if (unit=="s") ms=val*1000;
      }
      END { if (ms=="") exit 1; printf "%.3f\n", ms; }
    ')
  # Validate numeric output
  [[ "$ms" =~ ^[0-9]+(\.[0-9]+)?$ ]] || { echo "ERROR: Non-numeric timing: $ms" >&2; return 1; }
  echo "$ms"
}

# --- I/O Stats ---
# Run SQL and capture timing + I/O stats from pg_stat_io
# Resets stats before query, waits for flush, then reads absolute values
# Note: pg_stat_io has PGSTAT_MIN_INTERVAL=1000ms flush delay, so we wait 1.5s
#       after the query to ensure stats are flushed to shared memory.
# Note: pg_stat_io counts I/O operations, not pages (with io_combine_limit=128kB,
#       up to 16 pages per operation). This is expected behavior.
# Returns: ms,reads,read_time,writes,write_time
run_timed_with_io() {
  local ROOT="$1" PORT="$2" SQL="$3"
  local result
  
  # Reset stats, run query, wait for flush, read absolute values
  # - Filter by client backend and io worker (excludes bgwriter/checkpointer)
  # - 1.5s delay allows stats to flush (PGSTAT_MIN_INTERVAL=1000ms)
  result=$("$(pg "$ROOT" psql)" -h 127.0.0.1 -p "$PORT" -d postgres -X -v ON_ERROR_STOP=1 <<EOSQL
SELECT pg_stat_reset_shared('io');
\\timing on
$SQL
\\timing off
SELECT pg_sleep(1.5);
\\t on
SELECT 
  COALESCE(SUM(reads),0)::bigint,
  COALESCE(SUM(read_time),0)::numeric(12,2),
  COALESCE(SUM(writes),0)::bigint,
  COALESCE(SUM(write_time),0)::numeric(12,2)
FROM pg_stat_io 
WHERE object = 'relation' AND backend_type IN ('client backend', 'io worker');
EOSQL
  2>&1)
  
  # Parse timing (last Time: line)
  local ms
  ms=$(echo "$result" | awk '
    /Time:/ {
      val=$2; unit=$3;
      if (unit=="ms") ms=val;
      else if (unit=="s") ms=val*1000;
    }
    END { if (ms=="") exit 1; printf "%.3f\n", ms; }
  ')
  
  # Parse I/O stats (last non-empty line with pipe separator: reads|read_time|writes|write_time)
  local reads read_time writes write_time
  local io_line
  io_line=$(echo "$result" | grep '|' | tail -1)
  reads=$(echo "$io_line"     | cut -d'|' -f1 | tr -d ' ')
  read_time=$(echo "$io_line"  | cut -d'|' -f2 | tr -d ' ')
  writes=$(echo "$io_line"    | cut -d'|' -f3 | tr -d ' ')
  write_time=$(echo "$io_line" | cut -d'|' -f4 | tr -d ' ')
  
  # Default to 0 if not found
  [[ "$reads"      =~ ^-?[0-9]+$             ]] || reads=0
  [[ "$read_time"  =~ ^-?[0-9]+(\.[0-9]+)?$ ]] || read_time=0
  [[ "$writes"     =~ ^-?[0-9]+$             ]] || writes=0
  [[ "$write_time" =~ ^-?[0-9]+(\.[0-9]+)?$ ]] || write_time=0
  
  echo "$ms,$reads,$read_time,$writes,$write_time"
}

# --- Statistics ---
calc_median() {
  awk -F, 'NR>1{a[++n]=$2}END{
    if(n==0){print 0; exit}
    for(i=1;i<=n;i++)for(j=i+1;j<=n;j++)if(a[i]>a[j]){t=a[i];a[i]=a[j];a[j]=t}
    print (n%2)?a[int(n/2)+1]:(a[n/2]+a[n/2+1])/2
  }' "$1"
}

calc_median_col() {
  local file="$1" col="$2"
  awk -F, -v col="$col" 'NR>1{a[++n]=$col}END{
    if(n==0){print 0; exit}
    for(i=1;i<=n;i++)for(j=i+1;j<=n;j++)if(a[i]>a[j]){t=a[i];a[i]=a[j];a[j]=t}
    print (n%2)?a[int(n/2)+1]:(a[n/2]+a[n/2+1])/2
  }' "$file"
}

calc_stats() {
  local csv="$1"
  awk -F, 'NR>1{a[++n]=$2;s+=$2}END{
    if(n==0)exit
    for(i=1;i<=n;i++)for(j=i+1;j<=n;j++)if(a[i]>a[j]){t=a[i];a[i]=a[j];a[j]=t}
    med=(n%2)?a[int(n/2)+1]:(a[n/2]+a[n/2+1])/2
    avg=s/n; for(i=1;i<=n;i++)ss+=(a[i]-avg)^2; sd=sqrt(ss/n)
    printf "median=%.1fms mean=%.1f±%.1fms n=%d", med, avg, sd, n
  }' "$csv"
}

# --- Profiling ---
# Run a SQL command under perf, attaching to the backend PID.
# Generates perf.data and flamegraph SVG.
#   profile_sql ROOT PORT LABEL SQL
profile_sql() {
  [[ $DO_PROFILE -ne 1 ]] && return
  
  local ROOT="$1" PORT="$2" LABEL="$3" SQL="$4"
  local PROF_DIR="$ROOT/profile"
  mkdir -p "$PROF_DIR"
  
  local PERF_DATA="$PROF_DIR/${LABEL}.perf.data"
  local SVG="$PROF_DIR/${LABEL}.svg"
  local psql_bin
  psql_bin="$(pg "$ROOT" psql)"
  
  # Use a unique application_name to find the backend PID
  local APP="prof_${LABEL}_$$"
  
  # Launch a psql session that will first identify itself, then run the SQL
  # The pg_sleep() gives us time to find the backend PID and attach perf
  PGAPPNAME="$APP" "$psql_bin" -h 127.0.0.1 -p "$PORT" -d postgres \
    -X -v ON_ERROR_STOP=1 <<EOSQL >/dev/null 2>&1 &
SELECT pg_sleep(2);
$SQL
EOSQL
  local QUERY_SHELL_PID=$!
  
  # Find the backend PID via pg_stat_activity
  local BACKEND_PID=""
  for ((n=0; n<100; n++)); do
    BACKEND_PID=$("$psql_bin" -h 127.0.0.1 -p "$PORT" -d postgres -Atq \
      -c "SELECT pid FROM pg_stat_activity WHERE application_name='${APP}' ORDER BY backend_start DESC LIMIT 1;" 2>/dev/null)
    [[ -n "$BACKEND_PID" && -d "/proc/$BACKEND_PID" ]] && break
    sleep 0.05
  done
  
  if [[ -z "$BACKEND_PID" || ! -d "/proc/$BACKEND_PID" ]]; then
    log "WARNING: Could not find backend PID for profiling, skipping"
    wait "$QUERY_SHELL_PID" 2>/dev/null || true
    return
  fi
  
  log "Profiling backend PID $BACKEND_PID → $PERF_DATA"
  
  # Attach perf to the backend; we explicitly kill -INT it after the query finishes
  $PERF_SUDO perf record -g --call-graph dwarf \
    -p "$BACKEND_PID" -o "$PERF_DATA" \
    --event="$PERF_EVENT" 2>/dev/null &
  local PERF_PID=$!
  sleep 0.1
  
  # Verify perf actually started (permissions, valid PID, etc.)
  if ! kill -0 "$PERF_PID" 2>/dev/null; then
    log "WARNING: perf record failed to start (permissions/config?), skipping flamegraph"
    wait "$QUERY_SHELL_PID" 2>/dev/null || true
    return
  fi
  
  # Wait for the query to finish
  wait "$QUERY_SHELL_PID" 2>/dev/null || true
  
  # Give perf a moment to flush, then stop it
  sleep 0.5
  $PERF_SUDO kill -INT "$PERF_PID" 2>/dev/null || true; wait "$PERF_PID" 2>/dev/null || true
  
  # Generate flamegraph
  generate_flamegraph "$PERF_DATA" "$SVG" "$LABEL"
}

# Convert perf.data → flamegraph SVG
#   generate_flamegraph PERF_DATA SVG_PATH TITLE
generate_flamegraph() {
  local PERF_DATA="$1" SVG="$2" TITLE="$3"
  
  [[ -f "$PERF_DATA" ]] || return
  
  local FOLDED="${PERF_DATA%.perf.data}.folded"
  if $PERF_SUDO perf script -i "$PERF_DATA" 2>/dev/null \
      | "$FLAMEGRAPH_DIR/stackcollapse-perf.pl" > "$FOLDED" 2>/dev/null \
      && [[ -s "$FOLDED" ]]; then
    "$FLAMEGRAPH_DIR/flamegraph.pl" --title "$TITLE" --countname samples \
      "$FOLDED" > "$SVG" 2>/dev/null
    log "Flamegraph: $SVG"
    rm -f "$FOLDED"
  else
    log "WARNING: Failed to generate flamegraph for $TITLE"
    rm -f "$FOLDED"
  fi
}

# --- Benchmark runner ---
# benchmark ROOT PORT NAME SQL RELATION [RELATION...]
benchmark() {
  local ROOT="$1" PORT="$2" NAME="$3" SQL="$4"
  shift 4
  local rels=("$@")
  local OUT="$ROOT/results/${NAME}.csv"
  
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  for ((i=1; i<=REPS; i++)); do
    drop_caches "$ROOT" "$PORT" "${rels[@]}"
    local result ms reads read_time writes write_time
    result=$(run_timed_with_io "$ROOT" "$PORT" "$SQL")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "$NAME [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
}

# --- Data setup functions ---
setup_bloom() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=100000 ;;
    medium) NROWS=1000000 ;;
    large)  NROWS=10000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating Bloom test data ($SIZE: $NROWS rows)"
  psql_run "$ROOT" "$PORT" <<SQL
CREATE EXTENSION IF NOT EXISTS bloom;
DROP TABLE IF EXISTS bloom_test;
CREATE TABLE bloom_test (id INT, data TEXT, val1 INT, val2 INT);
INSERT INTO bloom_test SELECT i, 'data_'||i, i%1000, i%100 FROM generate_series(1,$NROWS) i;
CREATE INDEX bloom_idx ON bloom_test USING bloom (val1, val2);
VACUUM ANALYZE bloom_test;
CHECKPOINT;
SQL
}

setup_pgstattuple() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=100000 ;;
    medium) NROWS=1000000 ;;
    large)  NROWS=10000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating pgstattuple test data ($SIZE: $NROWS rows)"
  psql_run "$ROOT" "$PORT" <<SQL
CREATE EXTENSION IF NOT EXISTS pgstattuple;
DROP TABLE IF EXISTS heap_test;
CREATE TABLE heap_test (id SERIAL PRIMARY KEY, data TEXT);
INSERT INTO heap_test (data) SELECT repeat('x',100) FROM generate_series(1,$NROWS);
VACUUM ANALYZE heap_test;
CHECKPOINT;
SQL
}

setup_pgstatindex() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=100000 ;;
    medium) NROWS=1000000 ;;
    large)  NROWS=10000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating pgstatindex test data ($SIZE: $NROWS rows)"
  psql_run "$ROOT" "$PORT" <<SQL
CREATE EXTENSION IF NOT EXISTS pgstattuple;
DROP TABLE IF EXISTS idx_test;
CREATE TABLE idx_test (id SERIAL PRIMARY KEY, data TEXT);
INSERT INTO idx_test (data) SELECT 'data_row_' || i || '_' || repeat('x',50) FROM generate_series(1,$NROWS) i;
VACUUM ANALYZE idx_test;
CHECKPOINT;
SQL
}

setup_gin() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=100000 ;;
    medium) NROWS=1000000 ;;
    large)  NROWS=5000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating GIN test data ($SIZE: $NROWS rows)"
  psql_run "$ROOT" "$PORT" <<SQL
DROP TABLE IF EXISTS gin_test;
-- No PRIMARY KEY: isolate GIN index vacuum from btree overhead
CREATE TABLE gin_test (id INT, tags TEXT[]);
INSERT INTO gin_test (id, tags)
SELECT i, ARRAY(SELECT 'tag_'||(random()*100)::int FROM generate_series(1,5))
FROM generate_series(1,$NROWS) i;
CREATE INDEX gin_idx ON gin_test USING gin (tags);
VACUUM ANALYZE gin_test;
CHECKPOINT;
SQL
}

setup_hash() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=500000 ;;
    medium) NROWS=1000000 ;;
    large)  NROWS=20000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating Hash test data ($SIZE: $NROWS unique values)"
  psql_run "$ROOT" "$PORT" <<SQL
DROP TABLE IF EXISTS hash_test;
-- No PRIMARY KEY: isolate hash index vacuum from btree overhead
CREATE TABLE hash_test (id INT, data TEXT);
INSERT INTO hash_test SELECT i, 'x' FROM generate_series(1,$NROWS) i;
CREATE INDEX hash_idx ON hash_test USING hash (id);
VACUUM ANALYZE hash_test;
CHECKPOINT;
SQL
}

setup_wal() {
  local ROOT="$1" PORT="$2" SIZE="$3"
  local NROWS
  case "$SIZE" in
    small)  NROWS=1000000 ;;
    medium) NROWS=5000000 ;;
    large)  NROWS=20000000 ;;
    *) die "Invalid size '$SIZE' (must be small, medium, or large)" ;;
  esac
  
  log "Creating table for GIN index build / log_newpage_range test ($SIZE: $NROWS rows)"
  psql_run "$ROOT" "$PORT" <<SQL
DROP TABLE IF EXISTS wal_test;
-- Table with tsvector column for GIN indexing (full-text search)
-- GIN index builds always call log_newpage_range() at the end of
-- ginbuild() (gininsert.c) to WAL-log all index pages. 
CREATE TABLE wal_test (id INT, doc TEXT, doc_tsv TSVECTOR);
INSERT INTO wal_test
  SELECT i,
         'word' || (random()*10000)::int || ' term' || (random()*10000)::int
           || ' token' || (random()*5000)::int || ' phrase' || (random()*8000)::int,
         to_tsvector('simple',
           'word' || (random()*10000)::int || ' term' || (random()*10000)::int
           || ' token' || (random()*5000)::int || ' phrase' || (random()*8000)::int)
  FROM generate_series(1,$NROWS) i;
VACUUM ANALYZE wal_test;
CHECKPOINT;
SQL
}

# --- Test functions ---
test_bloom_scan() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  setup_bloom "$ROOT" "$PORT" "$SIZE"
  benchmark "$ROOT" "$PORT" "${LABEL}_bloom_scan_${SIZE}" \
    "SET enable_seqscan=off; SELECT COUNT(*) FROM bloom_test WHERE val1=42 AND val2=7;" \
    bloom_test bloom_idx
  # Profile after benchmark reps: shared_buffers memory already faulted in,
  # so page-fault noise is gone; drop_caches ensures cold IO for the profile.
  if [[ $DO_PROFILE -eq 1 ]]; then
    drop_caches "$ROOT" "$PORT" bloom_test bloom_idx
    profile_sql "$ROOT" "$PORT" "${LABEL}_bloom_scan_${SIZE}" \
      "SET enable_seqscan=off; SELECT COUNT(*) FROM bloom_test WHERE val1=42 AND val2=7;"
  fi
}

test_bloom_vacuum() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  local OUT="$ROOT/results/${LABEL}_bloom_vacuum_${SIZE}.csv"
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  for ((i=1; i<=REPS; i++)); do
    # Fresh table each run for consistent state
    setup_bloom "$ROOT" "$PORT" "$SIZE"
    # Create 10% dead tuples
    psql_run "$ROOT" "$PORT" -c "DELETE FROM bloom_test WHERE id % 10 = 0;"
    
    drop_caches "$ROOT" "$PORT" bloom_test bloom_idx
    local result ms reads read_time writes write_time
    result=$(run_timed_with_io "$ROOT" "$PORT" "VACUUM bloom_test;")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "${LABEL}_bloom_vacuum_${SIZE} [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
  
  if [[ $DO_PROFILE -eq 1 ]]; then
    setup_bloom "$ROOT" "$PORT" "$SIZE"
    psql_run "$ROOT" "$PORT" -c "DELETE FROM bloom_test WHERE id % 10 = 0;"
    drop_caches "$ROOT" "$PORT" bloom_test bloom_idx
    profile_sql "$ROOT" "$PORT" "${LABEL}_bloom_vacuum_${SIZE}" "VACUUM bloom_test;"
  fi
}

test_pgstattuple() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  local OUT="$ROOT/results/${LABEL}_pgstattuple_${SIZE}.csv"
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  # Setup once — rolled-back DELETE keeps layout identical across all reps
  setup_pgstattuple "$ROOT" "$PORT" "$SIZE"
  # Rolled-back DELETE clears the all-visible bit in the Visibility Map so
  # pgstattuple_approx must actually read those pages (it skips all-visible pages).
  # Using ROLLBACK keeps the physical layout identical across all reps (no TOAST
  # out-of-page updates, no dirty pages to flush from shared_buffers).
  psql_run "$ROOT" "$PORT" -c "BEGIN; DELETE FROM heap_test WHERE id % 500 = 0; ROLLBACK;"
  # Warmup pass: The rolled-back DELETE left every touched tuple with an xmax
  # pointing to the aborted transaction but no hint bits set. On the first
  # pgstattuple_approx call, HeapTupleSatisfiesVacuum → HeapTupleSatisfiesVacuumHorizon
  # must resolve each such xmax: TransactionIdIsInProgress (ProcArray scan) then
  # TransactionIdDidCommit (CLOG lookup) — only then can it call SetHintBits to
  # stamp HEAP_XMAX_INVALID and MarkBufferDirtyHint. Without this warmup, rep 1
  # pays ~1100ms extra CPU for those CLOG/ProcArray lookups. Subsequent reps hit
  # the early-exit at "if (t_infomask & HEAP_XMAX_INVALID) return HEAPTUPLE_LIVE"
  # and skip the expensive path entirely.
  # After this pass, the dirtied hint-bit pages are flushed to disk via
  # drop_caches, so all reps start from the same on-disk state.
  psql_run "$ROOT" "$PORT" -c "SELECT * FROM pgstattuple_approx('heap_test');" >/dev/null

  for ((i=1; i<=REPS; i++)); do
    drop_caches "$ROOT" "$PORT" heap_test heap_test_pkey
    local result ms reads read_time writes write_time
    result=$(run_timed_with_io "$ROOT" "$PORT" "SELECT * FROM pgstattuple_approx('heap_test');")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "${LABEL}_pgstattuple_${SIZE} [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
  
  if [[ $DO_PROFILE -eq 1 ]]; then
    psql_run "$ROOT" "$PORT" -c "BEGIN; DELETE FROM heap_test WHERE id % 500 = 0; ROLLBACK;"
    drop_caches "$ROOT" "$PORT" heap_test heap_test_pkey
    profile_sql "$ROOT" "$PORT" "${LABEL}_pgstattuple_${SIZE}" \
      "SELECT * FROM pgstattuple_approx('heap_test');"
  fi
}

test_pgstatindex() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  setup_pgstatindex "$ROOT" "$PORT" "$SIZE"
  benchmark "$ROOT" "$PORT" "${LABEL}_pgstatindex_${SIZE}" \
    "SELECT * FROM pgstatindex('idx_test_pkey');" \
    idx_test idx_test_pkey
  if [[ $DO_PROFILE -eq 1 ]]; then
    drop_caches "$ROOT" "$PORT" idx_test idx_test_pkey
    profile_sql "$ROOT" "$PORT" "${LABEL}_pgstatindex_${SIZE}" \
      "SELECT * FROM pgstatindex('idx_test_pkey');"
  fi
}

test_gin_vacuum() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  local OUT="$ROOT/results/${LABEL}_gin_vacuum_${SIZE}.csv"
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  for ((i=1; i<=REPS; i++)); do
    # Fresh table each run for consistent state
    setup_gin "$ROOT" "$PORT" "$SIZE"
    
    drop_caches "$ROOT" "$PORT" gin_test gin_idx
    local result ms reads read_time writes write_time
    # VACUUM ANALYZE forces ginvacuumcleanup() to run and scan all pages
    result=$(run_timed_with_io "$ROOT" "$PORT" "VACUUM ANALYZE gin_test;")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "${LABEL}_gin_vacuum_${SIZE} [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
  
  if [[ $DO_PROFILE -eq 1 ]]; then
    setup_gin "$ROOT" "$PORT" "$SIZE"
    drop_caches "$ROOT" "$PORT" gin_test gin_idx
    profile_sql "$ROOT" "$PORT" "${LABEL}_gin_vacuum_${SIZE}" "VACUUM ANALYZE gin_test;"
  fi
}

test_hash_vacuum() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  local OUT="$ROOT/results/${LABEL}_hash_vacuum_${SIZE}.csv"
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  for ((i=1; i<=REPS; i++)); do
    # Fresh table each run for consistent state
    setup_hash "$ROOT" "$PORT" "$SIZE"
    # Create 10% dead tuples
    psql_run "$ROOT" "$PORT" -c "DELETE FROM hash_test WHERE id % 10 = 0;"
    
    drop_caches "$ROOT" "$PORT" hash_test hash_idx
    local result ms reads read_time writes write_time
    result=$(run_timed_with_io "$ROOT" "$PORT" "VACUUM hash_test;")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "${LABEL}_hash_vacuum_${SIZE} [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
  
  if [[ $DO_PROFILE -eq 1 ]]; then
    setup_hash "$ROOT" "$PORT" "$SIZE"
    psql_run "$ROOT" "$PORT" -c "DELETE FROM hash_test WHERE id % 10 = 0;"
    drop_caches "$ROOT" "$PORT" hash_test hash_idx
    profile_sql "$ROOT" "$PORT" "${LABEL}_hash_vacuum_${SIZE}" "VACUUM hash_test;"
  fi
}

test_wal_logging() {
  local ROOT="$1" PORT="$2" LABEL="$3" SIZE="$4"
  local OUT="$ROOT/results/${LABEL}_wal_logging_${SIZE}.csv"
  mkdir -p "$ROOT/results"
  echo "run,ms,reads,read_time_ms,writes,write_time_ms" > "$OUT"
  
  # Build table once - only rebuild index each rep
  setup_wal "$ROOT" "$PORT" "$SIZE"
  
  local WAL_SQL="CREATE INDEX wal_test_gin_idx ON wal_test USING gin (doc_tsv);"
  
  for ((i=1; i<=REPS; i++)); do
    # Drop index from previous iteration
    psql_run "$ROOT" "$PORT" -c "DROP INDEX IF EXISTS wal_test_gin_idx;"
    
    # Drop OS caches - source table pages are COLD on disk
    drop_caches "$ROOT" "$PORT" wal_test
    
    # CREATE INDEX on GIN (tsvector_ops):
    # - GIN always uses the same build path: ginbuild() populates the
    #   index in memory, flushes to disk, then calls log_newpage_range()
    #   to read ALL index pages and write them to WAL (gininsert.c:785-790)
    local result ms reads read_time writes write_time
    result=$(run_timed_with_io "$ROOT" "$PORT" "$WAL_SQL")
    IFS=',' read -r ms reads read_time writes write_time <<<"$result"
    echo "$i,$ms,$reads,$read_time,$writes,$write_time" >> "$OUT"
    log "${LABEL}_wal_logging_${SIZE} [$i/$REPS]: ${ms}ms (reads=$reads, read_time=${read_time}ms, writes=$writes, write_time=${write_time}ms)"
  done
  
  if [[ $DO_PROFILE -eq 1 ]]; then
    psql_run "$ROOT" "$PORT" -c "DROP INDEX IF EXISTS wal_test_gin_idx;"
    drop_caches "$ROOT" "$PORT" wal_test
    profile_sql "$ROOT" "$PORT" "${LABEL}_wal_logging_${SIZE}" "$WAL_SQL"
  fi
}

# --- Run tests for a build ---
warmup_catalog() {
  local ROOT="$1" PORT="$2"
  # Explicitly prewarm catalog tables and their indexes into shared_buffers
  # so rep 1 doesn't pay disk-read cost for catalog pages.
  # pg_buffercache_evict_relation only evicts the test relation, not catalogs,
  # so these stay warm across all reps.
  psql_run "$ROOT" "$PORT" <<SQL >/dev/null
SELECT pg_prewarm('pg_class',     'buffer');
SELECT pg_prewarm('pg_attribute', 'buffer');
SELECT pg_prewarm('pg_namespace', 'buffer');
SELECT pg_prewarm('pg_proc',      'buffer');
SELECT pg_prewarm('pg_type',      'buffer');
SQL
}

run_tests() {
  local ROOT="$1" LABEL="$2"
  local PORT
  PORT=$(pick_port)
  
  log "[$LABEL] Starting cluster on port $PORT"
  init_cluster "$ROOT" "$PORT"
  warmup_catalog "$ROOT" "$PORT"
  set_io_delay "$IO_DELAY_MS"
  
  trap "stop_cluster '$ROOT'" EXIT
  
  for SIZE in $SIZES; do
    case "$TEST" in
      bloom_scan)   test_bloom_scan "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      bloom_vacuum) test_bloom_vacuum "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      pgstattuple)  test_pgstattuple "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      pgstatindex)  test_pgstatindex "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      gin_vacuum)   test_gin_vacuum "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      hash_vacuum)  test_hash_vacuum "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      wal_logging)  test_wal_logging "$ROOT" "$PORT" "$LABEL" "$SIZE" ;;
      all)
        test_bloom_vacuum "$ROOT" "$PORT" "$LABEL" "$SIZE"
        test_pgstattuple "$ROOT" "$PORT" "$LABEL" "$SIZE"
        test_pgstatindex "$ROOT" "$PORT" "$LABEL" "$SIZE"
        test_hash_vacuum "$ROOT" "$PORT" "$LABEL" "$SIZE"
        test_wal_logging "$ROOT" "$PORT" "$LABEL" "$SIZE"
        ;;
      *) die "Unknown test: $TEST" ;;
    esac
  done
  
  stop_cluster "$ROOT"
  trap - EXIT
}

# --- Compare results ---
compare_results() {
  local base_csv="$1" patch_csv="$2" label="$3"
  
  [[ ! -f "$base_csv" || ! -f "$patch_csv" ]] && return
  
  local base_med patch_med
  base_med=$(calc_median "$base_csv")
  patch_med=$(calc_median "$patch_csv")
  
  # Guard against empty or zero values to prevent division by zero
  [[ -z "$base_med" || "$base_med" == "0" ]] && base_med="0.001"
  [[ -z "$patch_med" || "$patch_med" == "0" ]] && patch_med="0.001"
  
  local speedup pct
  speedup=$(awk "BEGIN { printf \"%.2f\", $base_med / $patch_med }")
  pct=$(awk "BEGIN { printf \"%.1f\", ($base_med - $patch_med) / $base_med * 100 }")
  
  local io_info=""
  if head -1 "$base_csv" | grep -q "reads"; then
    # Standard test: columns are run,ms,reads,read_time_ms,writes,write_time_ms
    local base_reads patch_reads base_rtime patch_rtime base_writes patch_writes base_wtime patch_wtime
    base_reads=$(calc_median_col "$base_csv" 3)
    patch_reads=$(calc_median_col "$patch_csv" 3)
    base_rtime=$(calc_median_col "$base_csv" 4)
    patch_rtime=$(calc_median_col "$patch_csv" 4)
    base_writes=$(calc_median_col "$base_csv" 5)
    patch_writes=$(calc_median_col "$patch_csv" 5)
    base_wtime=$(calc_median_col "$base_csv" 6)
    patch_wtime=$(calc_median_col "$patch_csv" 6)
    # Default to 0 if empty
    [[ -z "$base_reads" ]]   && base_reads=0
    [[ -z "$patch_reads" ]]  && patch_reads=0
    [[ -z "$base_rtime" ]]   && base_rtime=0
    [[ -z "$patch_rtime" ]]  && patch_rtime=0
    [[ -z "$base_writes" ]]  && base_writes=0
    [[ -z "$patch_writes" ]] && patch_writes=0
    [[ -z "$base_wtime" ]]   && base_wtime=0
    [[ -z "$patch_wtime" ]]  && patch_wtime=0
    io_info="  (reads=${base_reads}→${patch_reads}, read_time=${base_rtime}→${patch_rtime}ms, writes=${base_writes}→${patch_writes}, write_time=${base_wtime}→${patch_wtime}ms)"
  fi
  
  printf "%-26s base=%8.1fms  patch=%8.1fms  %5.2fx  (%5.1f%%)%s\n" \
    "$label" "$base_med" "$patch_med" "$speedup" "$pct" "$io_info"
}

print_summary() {
  echo ""
  echo "═══════════════════════════════════════════════════════════════════════"
  echo "                     STREAMING READ BENCHMARK RESULTS                   "
  echo "═══════════════════════════════════════════════════════════════════════"
  echo ""
  
  if [[ $BASELINE -eq 1 ]]; then
    printf "%-26s %-17s %-17s %-7s %-7s %s\n" "TEST" "BASELINE" "PATCHED" "SPEEDUP" "CHANGE" "I/O TIME"
    echo "─────────────────────────────────────────────────────────────────────────────────────────────────"
    
    for SIZE in $SIZES; do
      for test_name in bloom_scan bloom_vacuum pgstattuple pgstatindex gin_vacuum hash_vacuum wal_logging; do
        [[ "$TEST" != "all" && "$TEST" != "$test_name" ]] && continue
        compare_results \
          "$ROOT_BASE/results/base_${test_name}_${SIZE}.csv" \
          "$ROOT_PATCH/results/patched_${test_name}_${SIZE}.csv" \
          "${test_name}_${SIZE}"
      done
    done
  else
    echo "Results (patched only):"
    echo ""
    for f in "$ROOT_PATCH/results/"*.csv; do
      [[ -f "$f" ]] || continue
      printf "%-40s %s\n" "$(basename "$f" .csv):" "$(calc_stats "$f")"
    done
  fi
  
  echo ""
  echo "═══════════════════════════════════════════════════════════════════════"
  echo "CSV files: $ROOT_PATCH/results/"
  [[ $BASELINE -eq 1 ]] && echo "Baseline:  $ROOT_BASE/results/"
  
  # List generated flamegraphs
  if [[ $DO_PROFILE -eq 1 ]]; then
    local svgs=()
    for dir in "$ROOT_BASE/profile" "$ROOT_PATCH/profile"; do
      [[ -d "$dir" ]] || continue
      for svg in "$dir"/*.svg; do
        [[ -f "$svg" ]] && svgs+=("$svg")
      done
    done
    if [[ ${#svgs[@]} -gt 0 ]]; then
      echo ""
      echo "Flamegraphs:"
      for svg in "${svgs[@]}"; do echo "  $svg"; done
    fi
  fi
  
  echo "═══════════════════════════════════════════════════════════════════════"
}

# --- Main ---
main() {
  log "Streaming Read Benchmark"
  log "Patch: $PATCH ($PATCH_TAG)"
  log "Tests: $TEST"
  log "Sizes: $SIZES"
  log "Reps:  $REPS"
  log "I/O:   $IO_METHOD (workers=$IO_WORKERS, concurrency=$IO_MAX_CONCURRENCY)"
  [[ $DIRECT_IO -eq 1 ]] && log "Direct IO: enabled (debug_io_direct=data)"
  [[ -n "$IO_DELAY_MS" ]] && log "I/O delay: ${IO_DELAY_MS}ms read / ${WRITE_DELAY_MS}ms write via dm_delay ($DM_DELAY_DEV)"
  [[ $DO_PROFILE -eq 1 ]] && log "Profile: enabled (flamegraphs → <root>/profile/)"
  
  # Build
  if [[ $BASELINE -eq 1 ]]; then
    build_pg "$ROOT_BASE" ""
  fi
  build_pg "$ROOT_PATCH" "$PATCH"
  
  # Run tests
  if [[ $BASELINE -eq 1 ]]; then
    log "Running baseline tests"
    run_tests "$ROOT_BASE" "base"
  fi
  
  log "Running patched tests"
  run_tests "$ROOT_PATCH" "patched"
  
  # Summary
  print_summary
}

main

Reply via email to