resending to rrd-developers ... All, I've found an issue with rrdcached on Solaris 10. Simple graphs are created fine from the cache, the more complex ones with multiple source rrds fail.
I'm using version rrdtool 1.4.4 on Solaris 10. I found the issue from Ganglia, but I've now recreated the issue on the command line. In short, without cache it works: $ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --start '-3600' --end N --width 1024 --height 600 --title 'host Load last hour' --lower-limit 0 --vertical-label 'Load/Procs' --rigid DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs' LINE2:'proc_run'#0000FF:'Running Processes' 1121x673 With cache it fails: $ /opt/CSCrrdtool/bin/rrdtool graph /tmp/foo.png --daemon unix:/tmp/rrdcached.socket --start '-3600' --end N --width 1024 --height 600 --title 'host Load last hour' --lower-limit 0 --vertical-label 'Load/Procs' --rigid DEF:'load_one'='/opt/rrd/ganglia/Management/host/load_one.rrd':'sum':AVERAGE DEF:'proc_run'='/opt/rrd/ganglia/Management/host/proc_run.rrd':'sum':AVERAGE DEF:'cpu_num'='/opt/rrd/ganglia/Management/host/cpu_num.rrd':'sum':AVERAGE AREA:'load_one'#CCCCCC:'1-min Load' LINE2:'cpu_num'#FF0000:'CPUs' LINE2:'proc_run'#0000FF:'Running Processes' ERROR: rrdc_flush (/opt/rrd/ganglia/Management/host/proc_run.rrd) failed with status -1. This works fine under linux, see this ganglia-general thread: http://www.mail-archive.com/[email protected]/msg05775.html Startup command: # /opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l /tmp/rrdcached.socket -g starting uplistening for connections <no other output> # ps -ef | grep rrdcached rrdtool 1401 1 0 15:02:22 ? 0:06 /opt/CSCrrdtool/bin/rrdcached -p /opt/rrd/rrdcache/rrdcached.pid -l /tmp/rrdcac # ./dtruss -a -p 1401 <snip> 1401/1: 6241 2024 140 accept(0x4, 0xFFBFFA98, 0xFFBFFA94) = 6 0 1401/1: 6259 31 2 lwp_kill(0x12, 0x0, 0xFE2E3200) = -1 Err#3 1401/1: 6327 84 55 lwp_create(0xFFBFF7F8, 0xC0, 0xFFBFF7F4) = 19 0 1401/1: 6350 39 9 lwp_continue(0x13, 0x1, 0xFE2E3200) = 0 0 1401/19: 48 1885 1 setcontext(0x3, 0xFE2E3288, 0x0) = 0 0 1401/19: 63 31 5 schedctl(0xFE5374C0, 0x0, 0x0) = -12607376 0 1401/19: 143 150043 64 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00) = 1 0 1401/19: 171 47 17 read(0x6, "flush /opt/rrd/ganglia/Management/shango/load_one.rrd\n\004\020\0", 0x2000) = 54 0 1401/19: 178 26 1 gtime() = 1279543205 0 1401/5: 6354 1884 5 gtime() = 1279543205 0 1401/5: 6416 67 39 open("/opt/rrd/ganglia/Management/shango/load_one.rrd\0", 0x2, 0x1B6) = 7 0 1401/5: 6428 31 6 fstat(0x7, 0xFE07BCF0, 0x0) = 0 0 1401/5: 6482 74 49 mmap(0x0, 0x2F70, 0x3) = -17891328 0 1401/5: 6493 31 5 memcntl(0xFEEF0000, 0x2F70, 0x4) = 0 0 1401/5: 6510 31 14 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 6517 18 2 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 6580 28 10 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 6593 25 8 memcntl(0xFEEF0000, 0x230, 0x4) = 0 0 1401/5: 6606 24 8 memcntl(0xFEEF0000, 0x8, 0x4) = 0 0 1401/5: 6634 43 17 fcntl(0x7, 0x6, 0xFE07BD60) = 0 0 1401/5: 7389 26 5 memcntl(0xFEEF0000, 0x2F70, 0x1) = 0 0 1401/5: 7488 120 94 munmap(0xFEEF0000, 0x2F70) = 0 0 1401/19: 270 19699 38 lwp_park(0x0, 0x0, 0x5) = 0 0 1401/19: 327 57 30 write(0x6, "0 Successfully flushed /opt/rrd/ganglia/Management/shango/load_one.rrd.\n\0", 0x48) = 72 0 1401/5: 7773 15636 276 close(0x7) = 0 0 1401/5: 7809 37 11 lwp_park(0x1, 0x13, 0x0) = 0 0 1401/19: 374 898 33 pollsys(0xFDD7BF70, 0x1, 0xFDD7BF00) = 1 0 1401/19: 394 29 12 read(0x6, "0 Successfully flushed /opt/rrd/ganglia/Management/shango/load_one.rrd.\nflush /opt/rrd/ganglia/Management/shango/proc_run.rrd\na\320\0", 0x2000) = 126 0 1401/19: 400 17 0 gtime() = 1279543205 0 1401/19: 452 46 27 write(0x6, "-1 Unknown command: 0\n\0", 0x16) = 22 0 1401/5: 7855 1616 21 lwp_park(0x0, 0x0, 0x0) = 0 0 1401/5: 7868 16 0 gtime() = 1279543205 0 1401/5: 7916 47 31 open("/opt/rrd/ganglia/Management/shango/proc_run.rrd\0", 0x2, 0x1B6) = 7 0 1401/5: 7927 21 5 fstat(0x7, 0xFE07BCF0, 0x0) = 0 0 1401/5: 7967 51 36 mmap(0x0, 0x2F70, 0x3) = -17891328 0 1401/5: 7975 18 2 memcntl(0xFEEF0000, 0x2F70, 0x4) = 0 0 1401/5: 7989 26 10 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 7995 17 2 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 8049 26 9 memcntl(0xFEEF0000, 0x78, 0x4) = 0 0 1401/5: 8063 24 8 memcntl(0xFEEF0000, 0x230, 0x4) = 0 0 1401/5: 8075 23 8 memcntl(0xFEEF0000, 0x8, 0x4) = 0 0 1401/5: 8093 26 10 fcntl(0x7, 0x6, 0xFE07BD60) = 0 0 1401/5: 8767 21 4 memcntl(0xFEEF0000, 0x2F70, 0x1) = 0 0 1401/5: 8824 69 52 munmap(0xFEEF0000, 0x2F70) = 0 0 1401/5: 8930 117 99 close(0x7) = 0 0 1401/5: 8961 28 10 lwp_park(0x1, 0x13, 0x0) = 0 0 1401/19: 510 2021 24 lwp_park(0x0, 0x0, 0x5) = 0 0 1401/19: 540 25 8 write(0x6, "0 Successfully flushed /opt/rrd/ganglia/Management/shango/proc_run.rrd.\n\0", 0x48) = -1 Err#32 <snip> This output seems a bit odd. Firstly I don't understand why lwp_kill is being called, and having tried to read the code I'm none the wiser. I used the errinfo DTrace script from the DTraceToolkit: # ./errinfo -n rrdcached EXEC SYSCALL ERR DESC rrdcached lwp_kill 3 No such process When calling rrdtool from ganglia using the cache I get some extra messages in the rrdcached -g output: send_response: could not write status message Has anyone else seen this? Thanks in advance, Peter. _______________________________________________ rrd-developers mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
