Dear Theodoros
On 07/24/2012 02:28 PM, Theodoros Theodoropoulos wrote:
[...] a websubmit_dump script that dumps the
submission tables for a specific doctype. Where could I find this
script?
Please find the websubmit_dump script attached to this email.
*NOTE* THAT THE SCRIPT IS PROVIDED "AS IS", AND REQUIRES YOU TO
UNDERSTAND WHAT YOU ARE DOING WHEN USING/EDITING THE DUMP FILES
CREATED WITH THE SCRIPT.
With the script, you can for example do:
$ ./websubmit_dump --dump=DEMOART > DEMOART_db_dump.sql
$ ./websubmit_dump --diff=DEMOART -i d,o < DEMOART_db_dump.sql | less -S
$ /opt/invenio/bin/dbexec < DEMOART_db_dump.sql
The above commands are typically executed to dump a submission from
a development environment, commit the dump file to a repository, diff
it with the existing submission on a production system, and load it
to replace the current submission. Custom WebSubmit functions files
would need to be moved/archived separately.
By default the script will consider elements and functions prefixed
with the submission doctype (for eg. "DEMOART" -> DEMOART_TITLE,
DEMOART_ABSTRACT, etc.) only, so that elements/functions that are
common to several submissions are not dumped/inserted several times.
Categories are also considered, but Javascript checks are not dumped.
Please also check '$ ./websubmit_dump --help' and in particular the
'--method=RELATIONS' option which should allow in principle to
workaround the need to prefix elements and functions with the
submission name. Note however that this type of dump is considered
experimental, and as far as I know has never been really used nor
extensively tested.
In any case ensure that this script matches your submission development
practice and deployment workflow before using it in a production
environment.
The reason for this script not to be part of Invenio is that a cleaner,
more generic database configuration dumping tool is being expected. It
is also wished that created dumps would be more resistant to changes in
the database schema. Nevertheless this script has been successfully
used at CERN for several years already, and should perform equally well
on Invenio v0.9x.x or v1.x series.
Best regards
--
Jerome Caffaro
#! /usr/bin/env python
# -*- coding: utf-8 -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
WebSubmit dump tool -- Dump submissions from the database.
"""
__revision__ = \
"$Id$"
import os
import sys
import getopt
import difflib
import re
import time
import tempfile
from MySQLdb.converters import conversions
from MySQLdb import escape, escape_string
from invenio.config import CFG_PREFIX
from invenio.dbquery import run_sql
CFG_DEFAULT_METHOD = "NAMES"
CFG_WEBSUBMIT_DB_SCHEMA_VERSION = 1
def dump_submission(doctype, method=None, include_cleaning=True,
ignore_duplicate_insert=False):
"""Returns a .sql dump of submission with given doctype"""
def build_table_dump(table_name, rows_with_desc, ignore_duplicate_insert):
"Build a dump-like output from the given table and rows"
table_dump = ''
for row in rows_with_desc[0]:
table_dump += 'INSERT%s INTO %s VALUES (%s);\n' % \
(ignore_duplicate_insert and ' IGNORE' or '',
table_name,
','.join([escape(column, conversions) for column in
row]))
return table_dump
if not method:
method = CFG_DEFAULT_METHOD
dump_header = "-- %s dump %s v%i\n" % (doctype,
time.strftime("%Y-%m-%d %H:%M:%S"),
CFG_WEBSUBMIT_DB_SCHEMA_VERSION)
if method == "NAMES":
dump_header += "-- Extra:NAMES (the following dump contains rows in
sbmALLFUNCDESCR, sbmFUNDESC, sbmFIELD and sbmFIELDDESC tables which are not
specific to this submission, but that include keyword %s)\n" % doctype
elif method == "RELATIONS":
dump_header += "-- Extra:RELATIONS (the following dump contains rows in
sbmALLFUNCDESCR, sbmFUNDESC, sbmFIELD and sbmFIELDDESC tables that are not
specific to doctype %s\n" % doctype
else:
dump_header += "-- Extra:None (the following dump only has rows
specific to submission %s i.e. does not contains rows from sbmALLFUNCDESCR,
sbmFUNDESC, sbmFIELD and sbmFIELDDESC tables\n" % doctype
if include_cleaning:
if method == 'NAMES':
dump_header += """
DELETE FROM sbmFUNDESC WHERE function LIKE '%(doctype)s%%';
DELETE FROM sbmFIELD WHERE subname LIKE '%%%(doctype)s';
DELETE FROM sbmFIELDDESC WHERE name LIKE '%(doctype)s%%';
DELETE FROM sbmALLFUNCDESCR WHERE function LIKE '%(doctype)s%%';
""" % {'doctype': escape_string(doctype)}
elif method == "RELATIONS":
dump_header += """
DELETE sbmALLFUNCDESCR.* FROM sbmALLFUNCDESCR, sbmFUNCTIONS WHERE
sbmALLFUNCDESCR.function=sbmFUNCTIONS.function and
sbmFUNCTIONS.doctype='%(doctype)s';
DELETE sbmFUNDESC.* FROM sbmFUNDESC, sbmFUNCTIONS WHERE
sbmFUNDESC.function=sbmFUNCTIONS.function and
sbmFUNCTIONS.doctype='%(doctype)s';
DELETE sbmFIELDDESC.* FROM sbmFIELDDESC, sbmFIELD, sbmIMPLEMENT WHERE
sbmFIELD.fidesc=sbmFIELDDESC.name AND sbmFIELD.subname=sbmIMPLEMENT.subname AND
sbmIMPLEMENT.docname='%(doctype)s';
DELETE sbmFIELD.* FROM sbmFIELD, sbmIMPLEMENT WHERE
sbmFIELD.subname=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname='%(doctype)s';
""" % {'doctype': escape_string(doctype)}
dump_header += """DELETE FROM sbmDOCTYPE WHERE sdocname='%(doctype)s';
DELETE FROM sbmCATEGORIES WHERE doctype ='%(doctype)s';
DELETE FROM sbmFUNCTIONS WHERE doctype='%(doctype)s';
DELETE FROM sbmIMPLEMENT WHERE docname='%(doctype)s';
DELETE FROM sbmPARAMETERS WHERE doctype='%(doctype)s';
""" % {'doctype': escape_string(doctype)}
dump_output = ''
res = run_sql('SELECT * FROM sbmDOCTYPE WHERE sdocname=%s', (doctype,),
with_desc=1)
dump_output += build_table_dump('sbmDOCTYPE', res, ignore_duplicate_insert)
res = run_sql('SELECT * FROM sbmCATEGORIES WHERE doctype=%s', (doctype,),
with_desc=1)
dump_output += build_table_dump('sbmCATEGORIES', res,
ignore_duplicate_insert)
# res = run_sql("SELECT * FROM sbmFIELD WHERE subname like '%%%s'" %
(escape_string(doctype),), with_desc=1)
# dump_output += build_table_dump('sbmFIELD', res)
# res = run_sql("SELECT * FROM sbmFIELDDESC WHERE name like '%s%%'" %
(escape_string(doctype),), with_desc=1)
# dump_output += build_table_dump('sbmFIELDDESC', res)
res = run_sql('SELECT * FROM sbmFUNCTIONS WHERE doctype=%s', (doctype,),
with_desc=1)
dump_output += build_table_dump('sbmFUNCTIONS', res,
ignore_duplicate_insert)
res = run_sql('SELECT * FROM sbmIMPLEMENT WHERE docname=%s', (doctype,),
with_desc=1)
dump_output += build_table_dump('sbmIMPLEMENT', res,
ignore_duplicate_insert)
res = run_sql('SELECT * FROM sbmPARAMETERS WHERE doctype=%s', (doctype,),
with_desc=1)
dump_output += build_table_dump('sbmPARAMETERS', res,
ignore_duplicate_insert)
if method == "NAMES":
res = run_sql("SELECT * FROM sbmALLFUNCDESCR WHERE function LIKE
'%s%%'" % (escape_string(doctype),), with_desc=1)
dump_output += build_table_dump('sbmALLFUNCDESCR', res,
ignore_duplicate_insert)
res = run_sql("SELECT * FROM sbmFUNDESC WHERE function LIKE '%s%%'" %
(escape_string(doctype),), with_desc=1)
dump_output += build_table_dump('sbmFUNDESC', res,
ignore_duplicate_insert)
res = run_sql("SELECT * FROM sbmFIELD WHERE subname LIKE '%%%s'" %
(escape_string(doctype),), with_desc=1)
dump_output += build_table_dump('sbmFIELD', res,
ignore_duplicate_insert)
res = run_sql("SELECT * FROM sbmFIELDDESC WHERE name LIKE '%s%%'" %
(escape_string(doctype),), with_desc=1)
dump_output += build_table_dump('sbmFIELDDESC', res,
ignore_duplicate_insert)
elif method == "RELATIONS":
res = run_sql("SELECT DISTINCT sbmALLFUNCDESCR.* FROM sbmALLFUNCDESCR,
sbmFUNCTIONS WHERE sbmALLFUNCDESCR.function=sbmFUNCTIONS.function and
sbmFUNCTIONS.doctype=%s", \
(doctype,), with_desc=1)
dump_output += build_table_dump('sbmALLFUNCDESCR', res,
ignore_duplicate_insert)
res = run_sql("SELECT DISTINCT sbmFUNDESC.* FROM sbmFUNDESC,
sbmFUNCTIONS WHERE sbmFUNDESC.function=sbmFUNCTIONS.function and
sbmFUNCTIONS.doctype=%s", \
(doctype,), with_desc=1)
dump_output += build_table_dump('sbmFUNDESC', res,
ignore_duplicate_insert)
res = run_sql("SELECT DISTINCT sbmFIELD.* FROM sbmFIELD, sbmIMPLEMENT
WHERE sbmFIELD.subname=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname=%s", \
(doctype,), with_desc=1)
dump_output += build_table_dump('sbmFIELD', res,
ignore_duplicate_insert)
# check:
res = run_sql("SELECT DISTINCT sbmFIELDDESC.* FROM sbmFIELDDESC,
sbmFIELD, sbmIMPLEMENT WHERE sbmFIELD.fidesc=sbmFIELDDESC.name AND
sbmFIELD.subname=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname=%s", \
(doctype,), with_desc=1)
#res = run_sql("SELECT DISTINCT sbmFIELDDESC.* FROM sbmFIELDDESC,
sbmFIELD, sbmIMPLEMENT WHERE sbmFIELD.fidesc=sbmFIELDDESC.name AND
sbmFIELDDESC.name=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname=%s", \
# (doctype,), with_desc=1)
dump_output += build_table_dump('sbmFIELDDESC', res,
ignore_duplicate_insert)
# Sort
dump_output_lines = dump_output.splitlines()
dump_output_lines.sort()
return dump_header + '\n'.join(dump_output_lines)
def remove_submission(doctype, method=None):
"Remove submission from database"
# NOT TESTED
if not method:
# Play safe...
method = CFG_DEFAULT_METHOD
run_sql("DELETE FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,))
run_sql("DELETE FROM sbmCATEGORIES WHERE doctype=%s", (doctype,))
run_sql("DELETE FROM sbmFUNCTIONS WHERE doctype=%s", (doctype,))
run_sql("DELETE FROM sbmIMPLEMENT WHERE docname=%s", (doctype,))
run_sql("DELETE FROM sbmPARAMETERS WHERE doctype=%s", (doctype,))
if method == "NAMES":
run_sql("DELETE FROM sbmALLFUNCDESCR WHERE function LIKE '%s%%'" %
(doctype,))
run_sql("DELETE FROM sbmFIELD WHERE subname LIKE '%%%s'" % (doctype,))
run_sql("DELETE FROM sbmFIELDDESC WHERE name LIKE '%s%%'" % (doctype,))
run_sql("DELETE FROM sbmFUNDESC WHERE function LIKE '%s%%'" %
(doctype,))
elif method == "RELATIONS":
run_sql("DELETE FROM sbmALLFUNCDESCR, sbmFUNCTIONS WHERE
sbmALLFUNCDESCR.function=sbmFUNCTIONS.function and sbmFUNCTIONS.doctype=%s",
(doctype,))
run_sql("DELETE FROM sbmFUNDESC, sbmFUNCTIONS WHERE
sbmFUNDESC.function=sbmFUNCTIONS.function and sbmFUNCTIONS.doctype=%s",
(doctype,))
run_sql("DELETE sbmFIELD.* FROM sbmFIELD, sbmIMPLEMENT WHERE
sbmFIELD.subname=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname=%s", (doctype,))
run_sql("DELETE sbmFIELDDESC.* FROM sbmFIELDDESC, sbmFIELD,
sbmIMPLEMENT WHERE sbmFIELD.fidesc=sbmFIELDDESC.name AND
sbmFIELD.subname=sbmIMPLEMENT.subname AND sbmIMPLEMENT.docname=%s", (doctype,))
re_method_pattern = re.compile("-- Extra:(?P<method>\S*)\s")
def load_submission(doctype, dump, method=None):
"Insert submission into database. Return tuple(error code, msg)"
# NOT TESTED
messages = []
def guess_dump_method(dump):
"""Guess which method was used to dump this file (i.e. if it contains
all the submission rows or not)"""
match_obj = re_method_pattern.search(dump)
if match_obj:
return match_obj.group('method')
else:
return None
if not method:
method = guess_dump_method(dump)
if method is None:
method = CFG_DEFAULT_METHOD
messages.append("WARNING: method could not be guessed. Using method
%s" % method)
else:
messages.append("Used method %s to load data" % method)
(dump_fd, dump_path) = tempfile.mkstemp(prefix=doctype)
dump_fd.write(dump)
dump_fd.close()
# We need to remove the submission. But let's create a backup first.
submission_backup = dump_submission(doctype, method)
submission_backup_path = "%s_db_dump%s.sql" % (doctype,
time.strftime("%Y%m%d_%H%M%S"))
fd = file(os.path.join("tmp", submission_backup_path), "w")
fd.write(submission_backup)
fd.close()
remove_submission(doctype, method)
# Do the deed
cmd = "%s/bin/dbexec < %s" % (CFG_PREFIX, os.path.abspath(dump_path))
if os.system(cmd):
messages.append("ERROR: failed execution of" + cmd)
return (1, messages)
messages.append("Submission loaded. Previous submission saved to %s" %
submission_backup_path)
return (0, messages)
def diff_submission(submission1_dump, submission2_dump, verbose=2,
ignore_dates=False, ignore_positions=False,
ignore_pages=False):
"Output diff between submissions"
def clean_line(line, ignore_dates, ignore_positions, ignore_pages):
"Clean one line of the submission"
updated_line = line
if ignore_dates:
if line.startswith('INSERT INTO sbmFIELD VALUES'):
args = updated_line.split(",")
args[-3] = ''
args[-4] = ''
updated_line = ','.join(args)
elif line.startswith('INSERT INTO sbmFIELDDESC VALUES'):
args = updated_line.split(",")
args[-4] = ''
args[-5] = ''
updated_line = ','.join(args)
elif line.startswith('INSERT INTO sbmIMPLEMENT VALUES '):
args = updated_line.split(",")
args[-6] = ''
args[-7] = ''
updated_line = ','.join(args)
if ignore_positions:
if line.startswith('INSERT INTO sbmFIELD VALUES'):
args = updated_line.split(",")
args[2] = ''
updated_line = ','.join(args)
if ignore_pages:
if line.startswith('INSERT INTO sbmFIELD VALUES'):
args = updated_line.split(",")
args[1] = ''
updated_line = ','.join(args)
if line.startswith('INSERT INTO sbmIMPLEMENT VALUES '):
args = updated_line.split(",")
args[4] = ''
updated_line = ','.join(args)
return updated_line
file1 = [line.strip() for line in submission1_dump.splitlines() if line]
file2 = [line.strip() for line in submission2_dump.splitlines() if line]
file1 = [clean_line(line, ignore_dates, ignore_positions, ignore_pages) for
line in file1]
file2 = [clean_line(line, ignore_dates, ignore_positions, ignore_pages) for
line in file2]
file1.sort()
file2.sort()
d = difflib.Differ()
result = d.compare(file2, file1)
result = [line for line in result if not line.startswith(' ')]
if verbose > 1:
result = [line.rstrip().replace('? ', ' ', 1) for line in result]
else:
result = [line for line in result if not line.startswith('? ')]
print '\n'.join(result)
def usage(exitcode=1, msg=""):
"Print usage"
print """./websubmit_dump [options]
Options:
-v, --verbose Verbose level (0=min, 2=default, 3=max).
-h, --help Prints this help
-d, --dump=DOCTYPE Dump given DOCTYPE from database
-c, --clean={y|n} Create dump that includes lines to remove
submission from database before
insertion (`y', default) or not (`n')
-n, --no-fail-insert Create dump that does not fail when inserting
duplicate rows
-f, --diff=DOCTYPE Diff given DOCTYPE from database with standard
input
-i, --ignore={d|o|p} Ignore some differences (d=date, o=order,
p=page). Use with --diff
-m, --method=METHOD Type of dumps: NAMES (default) or RELATIONS:
- NAMES: includes functions and elements
(including
definitions) with a name starting with
doctype,
even if not used by the submission.
Might then miss
functions and elements (mostly
``generic'' ones) and
add some unwanted elements.
- RELATIONS: include all functions and elements
used
by the submission. Might leave aside
elements that are defined, but not
used.
Dump submission:
Eg: ./websubmit_dump --dump=ATN > ATN_db_dump.sql
Dump submission including all used functions and elements definitions:
Eg: ./websubmit_dump --dump=ATN -m relations > ATN_db_dump.sql
Diff submission with given dump:
Eg: ./websubmit_dump --diff=ATN < ATN_db_dump.sql
Diff between CVS version and submission in database:
Eg: cvs update -p ATN_db_dump.sql | ./websubmit_dump --diff=ATN | less -S
Diff between CVS version and submission in database, ignoring dates
and ordering of submission fields on the page:
Eg: cvs update -p ATN_db_dump.sql | ./websubmit_dump -i d,o --diff=ATN | less -S
"""
print msg
sys.exit(exitcode)
if __name__ == '__main__':
try:
opts, args = getopt.getopt(sys.argv[1:], "hv:i:d:l:f:r:m:c:n",
["help",
"verbose=",
"ignore=",
"dump=",
"load=",
"diff=",
"remove=",
"method=",
"clean=",
"no-fail-insert"])
except getopt.GetoptError, err:
print err
usage(1)
_ignore_date = False
_ignore_position = False
_ignore_page = False
_doctype = None
_verbose = 2
_action = None
_method = None
_clean = True
_no_fail_insert = False
try:
for opt in opts:
if opt[0] in ["-h", "--help"]:
usage()
elif opt[0] in ["-v", "--verbose"]:
_verbose = opt[1]
elif opt[0] in ["-m", "--method"]:
_method = opt[1].upper()
if not _method in ["NAMES", "RELATIONS"]:
usage("Parameter --method must be 'NAMES' or 'RELATIONS'")
elif opt[0] in ["-c", "--clean"]:
_clean = opt[1].lower()
if not _clean in ["y", "n"]:
usage("Parameter --clean must be 'y' or 'n'")
_clean = _clean == 'y' and True or False
elif opt[0] in ["-n", "--no-fail-insert"]:
_no_fail_insert = True
elif opt[0] in ["-i", "--ignore"]:
ignore = opt[1].split(',')
if 'd' in ignore:
_ignore_date = True
if 'p' in ignore:
_ignore_page = True
if 'o' in ignore:
_ignore_position = True
elif opt[0] in ["-d", "--dump"]:
if _action:
usage("Choose only one action among --dump, --load, --diff
and --remove")
_action = 'dump'
_doctype = opt[1]
elif opt[0] in ["-l", "--load"]:
if _action:
usage("Choose only one action among --dump, --load, --diff
and --remove")
_action = 'load'
_doctype = opt[1]
elif opt[0] in ["-f", "--diff"]:
if _action:
usage("Choose only one action among --dump, --load, --diff
and --remove")
_action = 'diff'
_doctype = opt[1]
elif opt[0] in ["-r", "--remove"]:
if _action:
usage("Choose only one action among --dump, --load, --diff
and --remove")
action = 'remove'
_doctype = opt[1]
except StandardError, _exception:
print _exception
usage(1)
if not _action:
usage(1, 'You must specify an action among --dump, --load, --diff and
--remove')
if not _doctype:
usage(1, 'You must specify a doctype')
if _action == 'dump':
print dump_submission(doctype=_doctype,
method=_method,
include_cleaning=_clean,
ignore_duplicate_insert=_no_fail_insert)
elif _action == 'load':
#input_stream = sys.stdin.read()
#(code, messages) = load_submission(doctype=_doctype,
dump=input_stream, method=_method)
#print '\n'.join(messages)
#sys.exit(code)
pass
elif _action == 'diff':
if not sys.stdin.isatty():
input_stream = sys.stdin.read()
dump1 = dump_submission(doctype=_doctype,
method=_method,
include_cleaning=_clean,
ignore_duplicate_insert=_no_fail_insert)
diff_submission(dump1, input_stream, _verbose, _ignore_date,
_ignore_position, _ignore_page)
elif _action == 'remove':
#remove_submission(doctype=_doctype, method=_method)
pass