Hi Pierre,

Yes, I'm describing multiple symptoms here. But the message queues were the problem, despite not seeing anything in the logs.

5. The integrity errors look like this (not a failing disk in this case.):
2017-03-07T07:36:04-0500 [-] Got fatal Exception on DB
        Traceback (most recent call last):
Failure: sqlalchemy.exc.IntegrityError: (IntegrityError) update or delete on table "changes" violates foreign key constraint "changes_parent_changeids_fkey" on table "changes" DETAIL: Key (changeid)=(5983) is still referenced from table "changes". 'DELETE FROM changes WHERE changes.changeid IN (%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s, %(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s, %(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s, %(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s, %(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s, %(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s, %(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s, %(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s, %(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s, %(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s, %(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s, %(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s, %(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s, %(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s, %(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s, %(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s, %(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s, %(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s, %(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s, %(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s, %(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s, %(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s, %(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s, %(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s, %(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)' {'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975, 'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978, 'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981, 'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914, 'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923, 'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918, 'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917, 'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969, 'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966, 'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971, 'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904, 'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909, 'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910, 'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940, 'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943, 'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938, 'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935, 'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989, 'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990, 'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985, 'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932, 'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929, 'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926, 'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958, 'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957, 'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960, 'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955, 'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997, 'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001, 'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994, 'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951, 'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952, 'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949, 'changeid_55': 5948}

2017-03-07T07:36:04-0500 [-] while pruning changes
        Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
            self.run()
          File "/usr/lib/python2.7/threading.py", line 504, in run
            self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_threadworker.py", line 46, in work
            task()
File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_team.py", line 190, in doWork
            task()
        --- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py", line 246, in inContext
            result = inContext.theWork()
File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py", line 262, in <lambda> inContext.theWork = lambda: context.call(ctx, func, *args, **kw) File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 81, in callWithContext
            return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/pool.py", line 180, in __thd
            rv = callable(arg, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/changes.py", line 338, in thd
            table.delete(table.c.changeid.in_(batch)))
File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 662, in execute

File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement

File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 874, in _execute_context

File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception

File "build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 195, in raise_from_cause

File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 867, in _execute_context

File "build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line 324, in do_execute

sqlalchemy.exc.IntegrityError: (IntegrityError) update or delete on table "changes" violates foreign key constraint "changes_parent_changeids_fkey" on table "changes" DETAIL: Key (changeid)=(5983) is still referenced from table "changes". 'DELETE FROM changes WHERE changes.changeid IN (%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s, %(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s, %(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s, %(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s, %(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s, %(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s, %(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s, %(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s, %(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s, %(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s, %(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s, %(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s, %(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s, %(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s, %(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s, %(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s, %(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s, %(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s, %(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s, %(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s, %(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s, %(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s, %(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s, %(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s, %(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)' {'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975, 'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978, 'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981, 'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914, 'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923, 'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918, 'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917, 'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969, 'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966, 'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971, 'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904, 'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909, 'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910, 'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940, 'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943, 'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938, 'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935, 'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989, 'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990, 'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985, 'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932, 'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929, 'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926, 'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958, 'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957, 'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960, 'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955, 'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997, 'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001, 'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994, 'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951, 'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952, 'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949, 'changeid_55': 5948}

Thanks!

Neil Gilmore
grammatech.com

On 3/7/2017 3:49 AM, Pierre Tardy wrote:
Hi Neil,
I am not sure exactly how I can help on this as you are describing lots of symptoms.

What goes to my mind right now is a problem with the message queue. In the multimaster tests I am doing, I figured out that a disconnection of the message queue is not recovered right now, which could explain why build do not start (the master will not check for new requests unless they receive a message)

However, when the mq fails, I can see evidence of it in the logs, but you don't mention any issue in the logs.

Database integrity errors looks bad also, what kind of errors is that? We already had some reports of those which were due to a failing disk. Could that be the case?

Regards
Pierre


On Mon, Mar 6, 2017 at 10:36 PM Neil Gilmore <[email protected] <mailto:[email protected]>> wrote:

    Hi everyone,

    Well, things ran OK for a couple weeks. But we had some problems
    starting last weekend. At least some failure emails don't seem to be
    getting sent out. And a problem we'd been having a bit of got a
    lot worse.

    For whatever reason, queued builds don't seem to want to start.
    Sometimes for hours. Even forced builds. This doesn't seem to be a
    locking problem, though I'll be having a look at that side in a
    bit. But
    we'll have builds sitting for hours before they start. If they start.
    Some of our people get antsy and cancel the current queue then force a
    build. But sometimes those wait, too.

    And we're having trouble getting the masters to deal with new
    revisions
    fro svn. Everything else looks OK (postcommit hooks, etc.) I'm
    just not
    sure what's going on.

    Reconfig hasn't helped, nor has restarting one of the masters.

    We are getting integrity errors in our database, too.

    Except for the database problem, the rest looks like network
    connection
    stuff, perhaps, though we haven't had any problems there for a while.

    Neil Gilmore
    grammatech.com <http://grammatech.com>
    _______________________________________________
    users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.buildbot.net/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://lists.buildbot.net/mailman/listinfo/users

Reply via email to