Possible workaround is to tell RH not to clear "older" records. to avoid flooding robinhood log you may want to replace LVL_VERB with LVL_DEBUG or LVL_MAJOR
index 6f4dc853..360120f8 100644 --- a/src/chglog_reader/chglog_reader.c +++ b/src/chglog_reader/chglog_reader.c @@ -210,6 +210,16 @@ static int clear_changelog_records(reader_thr_info_t *p_info) return 0; } + // Make sure record id is higher than currently cleared record ids + if (p_info->last_clear.rec_id > p_info->last_commit.rec_id) { + DisplayLog(LVL_VERB, CHGLOG_TAG, + "%s: ChangeLog backward clear records up to #%"PRIu64 + " already cleared up to #%"PRIu64, + p_info->mdtdevice, p_info->last_commit.rec_id, + p_info->last_clear.rec_id); + return 0; + } + reader_id = cl_reader_config.mdt_def[p_info->thr_index].reader_id; On Thu, 2019-07-18 at 09:39 +0200, Carsten Beyer wrote: > Hi Thomas, > > yes, changelog reader is registered on both MDT. The filesystem has > 5 > MDT in total and the other three are connected to second RBH server > (same problem/error). > > I got also a notice from our storage vendor, it's already listed in > JIRA: > > Failure to clear the changelog for user 1 on MDT => > https://jira.whamcloud.com/browse/LU-11205 > > Looks like that we hit this. > > Cheers, > Carsten > > > On 17.07.19 16:16, thomas.leibov...@cea.fr wrote: > > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] ChangeLog | > > > ERROR: llapi_changelog_clear("lustre01-MDT0001", "cl1", > > > 42031384423) returned -22 > > > > Hello, > > > > Did you register a changelog reader on each MDT? > > Here it seams your filesystem has at least 2 MDTs (MDT0001 must be > > the second one). > > > > Regards, > > Thomas > > > > -----Message d'origine----- > > De : Carsten Beyer [mailto:be...@dkrz.de] > > Envoyé : mercredi 17 juillet 2019 15:56 > > À : robinhood > > Objet : [robinhood-support] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515556002) > > returned -22 > > > > Hi @all, > > > > I have a question if someone is using Lustre 2.11 (serverside / > > clientside) with Robinhood v3.1.5 ? > > > > We have 3 Lustre systems (one testsystem, 2 production systems) and > > I > > get error messages for llapi_changelog_clear when I start > > Robinhood. > > It's after updating the Lustre filesystems from v2.5 to v2.11. > > Errors > > occur only on the production systems but not on the testsystem. > > It's > > maybe the load on the filesystems. Maybe somebody has the same > > issue on > > other Lustre version(s) ? > > > > [root@mrh0 robinhood]# tail -f robinhood.log > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] CheckFS | > > '/mnt/lustre01' > > matches mount point '/mnt/lustre01', type=lustre, > > fs=10.50.32.53@o2ib:10.50.32.54@o2ib:/lustre01 > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | Table does > > not > > exist: 'SELECT value FROM VARS WHERE varname='VersionFunctionSet'' > > (Table 'robinhood_lustre01.VARS' doesn't exist) > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | No function > > versioning (expected: 1.6). Existing functions will be dropped and > > re-created. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | Table does > > not > > exist: 'SELECT value FROM VARS WHERE varname='VersionTriggerSet'' > > (Table > > 'robinhood_lustre01.VARS' doesn't exist) > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | No trigger > > versioning (expected: 1.4). Existing triggers will be dropped and > > re-created. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table VARS > > does > > not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table ENTRIES > > does > > not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table NAMES > > does > > not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table > > ANNEX_INFO > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | function > > sz_range > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table > > ACCT_STAT > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | Populating > > accounting table from existing DB contents. This can take a > > while... > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table > > STRIPE_INFO > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table > > STRIPE_ITEMS > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | table SOFT_RM > > does > > not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | trigger > > ACCT_ENTRY_INSERT does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | trigger > > ACCT_ENTRY_DELETE does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | trigger > > ACCT_ENTRY_UPDATE does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | function > > one_path > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] ListMgr | function > > this_path > > does not exist (or wrong version): creating it. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] llapi | warning: > > llapi_changelog_start() called without CHANGELOG_FLAG_EXTRA_FLAGS > > 2019/07/17 14:20:09 robinhood@mrh0[40836/1] Main | Daemon started > > (running modules: log_reader) > > 2019/07/17 14:20:09 robinhood@mrh0[40836/2] ChangeLog | LU-1331 is > > fixed > > in this version of Lustre. > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] llapi | cannot purge > > records > > for 'cl1' > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0001", "cl1", 42031384423) > > returned -22 > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > > > > > [root@mrh0 robinhood]# egrep '(ChangeLog \| > > ERROR|STAGE_CHGLOG_CLR)' > > robinhood.log > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0001", "cl1", 42031384423) > > returned -22 > > 2019/07/17 14:20:09 robinhood@mrh0[40836/3] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > 2019/07/17 14:20:11 robinhood@mrh0[40836/31] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515555990) > > returned -22 > > 2019/07/17 14:20:11 robinhood@mrh0[40836/31] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > 2019/07/17 14:20:11 robinhood@mrh0[40836/6] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515555991) > > returned -22 > > 2019/07/17 14:20:11 robinhood@mrh0[40836/6] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > 2019/07/17 14:20:11 robinhood@mrh0[40836/33] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515555992) > > returned -22 > > 2019/07/17 14:20:11 robinhood@mrh0[40836/33] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > 2019/07/17 14:20:11 robinhood@mrh0[40836/8] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515555993) > > returned -22 > > 2019/07/17 14:20:11 robinhood@mrh0[40836/8] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > 2019/07/17 14:20:11 robinhood@mrh0[40836/5] ChangeLog | ERROR: > > llapi_changelog_clear("lustre01-MDT0000", "cl1", 11515555994) > > returned -22 > > 2019/07/17 14:20:11 robinhood@mrh0[40836/5] EntryProc | Error -22 > > performing callback at stage STAGE_CHGLOG_CLR. > > > > Robinhood is on RHEL6 with Lustre 2.11 client / MariaDB / RBH > > v3.1.5 > > > > [root@mrh0 robinhood]# rpm -qa | egrep -i > > '(lustre-client|robinhood|mariadb)' | sort > > kmod-lustre-client-2.11.0-1_2.6.32_754.14.2.el6.x86_64 > > lustre-client-2.11.0-1_2.6.32_754.14.2.el6.x86_64 > > MariaDB-client-10.2.11-1.el6.x86_64 > > MariaDB-common-10.2.11-1.el6.x86_64 > > MariaDB-compat-10.2.11-1.el6.x86_64 > > MariaDB-devel-10.2.11-1.el6.x86_64 > > MariaDB-server-10.2.11-1.el6.x86_64 > > MariaDB-shared-10.2.11-1.el6.x86_64 > > robinhood-adm-3.1.5-1.x86_64 > > robinhood-lustre-3.1.5-1.lustre2.11.el6.x86_64 > > [root@mrh0 robinhood]# > > > > > > Cheers, > > Carsten > > > > > > > _______________________________________________ > robinhood-support mailing list > robinhood-support@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/robinhood-support _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support