Fixed it. Went to the subscribers pane of the web console and spotted an old machine that had an accounts durable subscription in "NC" mode - the client was long dead.
Thankfully was able to delete it and after a pause quite a few of the data files were GC'd. Going to repeat this for the other 'NC' clients and hope to have further clean up happen. James On 9 May 2011 14:43, James Green <[email protected]> wrote: > Gary, > > Let me check I understand you correctly. > > If I see the GC kick in and list lots of references after lots of channels > are considered, then the list suddenly drops by a large amount, that channel > is likely to have considerable references (even though they are likely > dead)? > > This is what I see: > 2011-05-09 14:29:27,367 [eckpoint Worker] TRACE > MessageDatabase - gc candidates after > dest:1:Requests.DeliveryNotificationsRebuild, [113, 114, 117, 118, 121, 122, > 123, 134, 135, 136, 138, 139, 140, 143, 144, 148, 149, 152, 153, 165, 166, > 167, 169, 170, 171, 174, 175, 178, 179, 180, 183, 184, 196, 197, 200, 201, > 202, 205, 206, 209, 210, 211, 214, 215, 217, 218, 219, 222, 223, 226, 227, > 228, 231, 232, 245, 246, 249, 250, 254, 255, 258, 259, 263, 264, 276, 277, > 280, 281, 282, 285, 286, 287, 289, 290, 291, 294, 295, 296, 308, 309, 312, > 313, 314, 317, 318, 319, 322, 323, 327, 328, 340, 341, 344, 345, 346, 349, > 350, 351, 354, 355, 356, 358, 359, 360, 372, 373, 374, 377, 378, 379, 381, > 382, 383, 386, 387, 391, 392, 414, 419, 423, 424, 441, 442, 445, 446, 447, > 450, 451, 455, 457, 469, 470, 474, 475, 476, 479, 480, 481, 485, 486, 487, > 490, 491, 492, 507, 508, 512, 513, 514, 519, 520, 521, 524, 525, 526, 529, > 530, 531, 545, 546, 547, 551, 552, 553, 556, 557, 558, 563, 564, 567, 568, > 569, 582, 583, 584, 585, 589, 590, 591, 595, 596, 597, 600, 601, 602, 603, > 606, 622, 623, 624, 628, 629, 630, 634, 636, 640, 641, 642, 645, 646, 647, > 661, 662, 663, 666, 667, 668, 669, 672, 673, 674, 678, 681, 684, 685, 686, > 699, 700, 701, 702, 705, 706, 707, 710, 711, 712, 713, 719, 722, 723, 724, > 737, 738, 739, 740, 743, 744, 745, 749, 750, 751, 752, 755, 756, 757, 760, > 761, 762, 775, 776, 777, 778, 781, 782, 783, 784, 787, 788, 789, 790, 794, > 795, 796, 799, 800, 801, 812, 813, 814, 815, 819, 821, 824, 825, 826, 832, > 833, 834, 837, 838, 839, 852, 853, 854, 855, 859, 860, 865, 866, 867, 870, > 871, 872, 876, 891, 892, 893, 897, 898, 899, 902, 903, 904, 908, 909, 910, > 911, 914, 915, 916, 930, 932, 936, 937, 938, 942, 943, 947, 948, 949, 968, > 969, 973, 974, 975, 979, 980, 985, 986, 987, 990, 991, 992, 1006, 1007, > 1008, 1012, 1013, 1017, 1019, 1022, 1024, 1053, 1054, 1058, 1060, 1082, > 1083, 1084, 1085, 1088, 1090, 1091, 1094, 1095, 1096, 1099, 1100, 1101, > 1102] > 2011-05-09 14:29:27,367 [eckpoint Worker] TRACE > MessageDatabase - gc candidates after dest:1:accounts, [553, > 556, 1102] > > accounts is a topic, consumed by each of "production boxes". I am not aware > of any problems this respect. Each consumer is durable. Each message is > processed then ACKed. > > Can you suggest any reason this situation is occurring? > > Is there a way to list the contents of these data files in a more > meaningful way? To list for example references to other data files as you > suggest? > > Thanks, > > James > > > On 6 May 2011 17:57, Gary Tully <[email protected]> wrote: > >> reading the trace output is a little unintuitive as it follows the code >> logic... >> so it starts with the entire set of data files and considers them all >> as gc candidates. >> Then it asks each destination in turn if it still has pending >> references and if so >> removes them from the gc candidate set. >> >> The list should get smaller as destinations grab data files. >> >> In the case below, it looks like after asking >> dest:0:Outbound.Account.22312, there are still >> lots of data files that will be ok to gc. >> >> The second step is to determine if the data files contain acks for >> referenced data files. Deleting them would mean after a >> failure/recovery restart, the acks would be gone and the messages in >> the referenced data files would be replayed in error. >> >> That further reduces the candidate list. >> >> So you need to look for the channel that pulls the most from the gc >> candidate list. >> >> >> On 6 May 2011 17:18, James Green <[email protected]> wrote: >> > OK to take but one channel: >> > 2011-05-06 17:16:25,154 [eckpoint Worker] TRACE >> > MessageDatabase - gc candidates after >> > dest:0:Outbound.Account.22312, [113, 114, 117, 118, 121, 122, 123, 134, >> 135, >> > 136, 138, 139, 140, 143, 144, 148, 149, 152, 153, 165, 166, 167, 169, >> 170, >> > 171, 174, 175, 178, 179, 180, 183, 184, 196, 197, 200, 201, 202, 205, >> 206, >> > 209, 210, 211, 214, 215, 217, 218, 219, 222, 223, 226, 227, 228, 231, >> 232, >> > 245, 246, 249, 250, 254, 255, 258, 259, 263, 264, 276, 277, 280, 281, >> 282, >> > 285, 286, 287, 289, 290, 291, 294, 295, 296, 308, 309, 312, 313, 314, >> 317, >> > 318, 319, 322, 323, 327, 328, 340, 341, 344, 345, 346, 349, 350, 351, >> 354, >> > 355, 356, 358, 359, 360, 372, 373, 374, 377, 378, 379, 381, 382, 383, >> 386, >> > 387, 391, 392, 414, 419, 423, 424, 441, 442, 445, 446, 447, 450, 451, >> 455, >> > 457, 469, 470, 474, 475, 476, 479, 480, 481, 485, 486, 487, 490, 491, >> 492, >> > 507, 508, 512, 513, 514, 519, 520, 521, 524, 525, 526, 529, 530, 531, >> 545, >> > 546, 547, 551, 552, 553, 556, 557, 558, 563, 564, 567, 568, 569, 582, >> 583, >> > 584, 585, 589, 590, 591, 595, 596, 597, 600, 601, 602, 603, 606, 622, >> 623, >> > 624, 628, 629, 630, 634, 636, 640, 641, 642, 645, 646, 647, 661, 662, >> 663, >> > 666, 667, 668, 669, 672, 673, 674, 678, 681, 684, 685, 686, 699, 700, >> 701, >> > 702, 705, 706, 707, 710, 711, 712, 713, 719, 722, 723, 724, 737, 738, >> 739, >> > 740, 743, 744, 745, 749, 750, 751, 752, 755, 756, 757, 760, 761, 762, >> 775, >> > 776, 777, 778, 781, 782, 783, 784, 787, 788, 789, 790, 794, 795, 796, >> 799, >> > 800, 801, 812, 813, 814, 815, 819, 821, 824, 825, 826, 832, 833, 834, >> 837, >> > 838, 839, 852, 853, 854, 855, 859, 860, 865, 866, 867, 870, 871, 872, >> 876, >> > 891, 892, 893, 897, 898, 899, 902, 903, 904, 908, 909, 910, 911, 914, >> 915, >> > 916, 930, 932, 936, 937, 938, 942, 943, 947, 948, 949, 968, 969, 973, >> 974, >> > 975, 979, 980, 985, 986, 987, 990, 991, 992, 1006, 1007, 1008, 1012, >> 1013, >> > 1017, 1019, 1022, 1024, 1053, 1054, 1058, 1060, 1082, 1083, 1084, 1085, >> > 1088, 1090, 1091, 1094, 1095, 1096, 1099, 1100] >> > >> > That channel would only ever have messages sent/received on a remote >> > machines. The messages would never go over the network. >> > >> > Clearly that's a lot of references which should not be there. Any ideas? >> > >> > James >> > >> > On 6 May 2011 15:01, Gary Tully <[email protected]> wrote: >> > >> >> on the broker, enable TRACE level logging for: >> >> org.apache.activemq.store.kahadb.MessageDatabase >> >> >> >> and the cleanup processing will tell you which destination has a >> >> reference to those data files. >> >> >> >> On 6 May 2011 09:26, James Green <[email protected]> wrote: >> >> > Ubuntu Linux running AMQ 5.5.0, previously running 5.4.x releases. >> >> > >> >> > I have just noticed our "hub" machine has 12% store used. df -h >> inside >> >> the >> >> > kahadb dir shows 357 .log files consuming 12G of space. They begin >> Oct >> >> 2010 >> >> > - there are no obvious large gaps over time but some files are >> clearly >> >> gone. >> >> > >> >> > Looking at lsof only three are currently open. The hub receives >> messages >> >> on >> >> > queues and publishes messages on topics. >> >> > >> >> > Can anyone advise on investigation work please. >> >> > >> >> > James >> >> > >> >> >> >> >> >> >> >> -- >> >> http://blog.garytully.com >> >> http://fusesource.com >> >> >> > >> >> >> >> -- >> http://blog.garytully.com >> http://fusesource.com >> > >
