Hello Max,
I had looked into the bibexport recently and hadn't even considered your
problem (i had assumed that i would get restricted collections as long
they matched provided filter). Now again i've looked into the code and
performed some tests, and indeed it seems to work just as the web
interface does, in the way that it auto-select "Any public collection"
when you perform searches, so no matter the filters, only elements in
public collections are found. There should be a better way to do it, but
i thought it wouldn't take so much to at least "avoid" your problem and
possibly mine in a near future, so i coded a marcxml method, which
allows the standard filter usage, but also a new type of constraint,
which can indicate a collection. Config file would be like this:
[export_job]
export_method = marcxml_extra
[export_criterias]
thesis = collection_name:Theses
report = 980__a:REPORT
Note that "collection_name" filter uses the actual name (the one you see
in WebSearch admin) instead of 980__a value and that constraints can't
be combined (collections and filters). Also note that it uses a
marcxml_extra method, you would have to place the file i attach inside
invenio lib folders, most probably:
/opt/invenio/lib/python/invenio/
and change owner to user running invenio, most probably www-data:
sudo chown www-data:www-data
/opt/invenio/lib/python/invenio/bibexport_method_marcxml_extra.py
For the record, differences between methods' code:
diff lib/python/invenio/bibexport_method_marcxml_extra.py
lib/python/invenio/bibexport_method_marcxml.py
43d42
< import re
133,135c132,133
< return perform_request_search(dt="m", d1y =
one_month_ago.year, d1m = one_month_ago.month,
< d1d = one_month_ago.day,
< **self._get_perform_search_dict(export_pattern))
---
>
> return perform_request_search(dt="m", p=export_pattern, d1y =
one_month_ago.year, d1m = one_month_ago.month, d1d = one_month_ago.day)
139,154c137,138
< return
perform_request_search(**self._get_perform_search_dict(export_pattern))
<
< def _get_perform_search_dict(self, export_pattern):
< """
< Get dictionary paramameter to call perform_request_search
< """
< result = {}
< group =
re.match(r"\s*collection_name\s*:\s*(?P<coll_name>.+?)\s*$",
< export_pattern, re.IGNORECASE)
< if group:
< result["c"]=list(group.groups("coll_name"))
< print result["c"]
< else:
< result["p"]=export_pattern
< return result
<
---
> return perform_request_search(p=export_pattern)
I hope this works for you.
Regards
Mauricio.
On 06/19/2014 05:23 PM, Max Cohen wrote:
Hello all.
Does anyone know if it possible to do a bibexport of restricted
records without unrestricting them, running bibexport, and then
re-restricting the records?
I ran bibexport with my superadmin privileges (because of course I
have access to those records) but it didn't seem to work...
So ./bibexport marcxml.cfg --u "myusername"
My marcxml.cfg
[export_job]
export_method = marcxml
[export_criterias]
grey = 980__a:GREY
article = 980__a:ARTICLE (this is the restricted section)
Which results in all_aritcle.xml and nothing inside that file except:
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
</collection>
Max Cohen, Research Information Support
Union of Concerned Scientists | 2 Brattle Square | Cambridge, MA
02138-3780 | 617-301-8048
The Union of Concerned Scientists puts rigorous, independent science
to work to solve our planet's most pressing problems. Joining with
citizens across the country, we combine technical analysis and
effective advocacy to create innovative, practical solutions for a
healthy, safe, and sustainable future.
www.ucsusa.org <http://www.ucsusa.org/>| Take action with our citizen
network <http://ucs.convio.net/site/PageServer?pagename=sign_up>or
expert network
<http://www.ucsusa.org/forms/sign-up-for-ucs-science-network.html>. |
Support our work
<https://secure3.convio.net/ucs/site/Donation2?df_id=1420&1420.donation=form1&s_src=signature>.
|
Join the conversation on our blog <blog.ucsusa.org>or follow us on
Facebook <facebook.com/unionofconcernedscientists>and Twitter
<twitter.com/ucsusa>.
# -*- coding: utf-8 -*-
##
## This file is part of Invenio.
## Copyright (C) 2009, 2010, 2011 CERN.
##
## Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibExport plugin implementing MARCXML exporting method.
The main function is run_export_method(jobname) defined at the end.
This is what BibExport daemon calls for all the export jobs that use
this exporting method.
The MARCXML exporting method export as MARCXML all the records
matching a particular search query, zip them and move them to the
requested folder. The output of this exporting method is similar to
what one would get by listing the records in MARCXML from the web
search interface. The exporter also export all the records modified
in the last month.
* all exportable records:
/export/marcxml/all_"export_name".xml.gz - where "export_name" is the name specified in the config
* records modified in the last month:
/export/marcxml/lastmonth_"export_name".xml.gz - where "export_name" is the name specified in the config
"""
import re
from invenio.config import CFG_WEBDIR, CFG_ETCDIR
from invenio.bibtask import write_message
from invenio.search_engine import perform_request_search, print_record
from ConfigParser import ConfigParser
import os
import gzip
import datetime
def run_export_method(jobname):
"""Main function, reading params and running the task."""
# read jobname's cfg file to detect export criterias
jobconf = ConfigParser()
jobconffile = CFG_ETCDIR + os.sep + 'bibexport' + os.sep + jobname + '.cfg'
if not os.path.exists(jobconffile):
write_message("ERROR: cannot find config file %s." % jobconffile)
return None
jobconf.read(jobconffile)
export_criterias = dict(jobconf.items('export_criterias'))
write_message("bibexport_marcxml: job %s started." % jobname)
try:
output_directory = CFG_WEBDIR + os.sep + "export" + os.sep + "marcxml"
exporter = MARCXMLExporter(output_directory, export_criterias)
exporter.export()
except MARCXMLExportException, ex:
write_message("%s Exception: %s" %(ex.get_error_message(), ex.get_inner_exception()))
write_message("bibexport_marcxml: job %s finished." % jobname)
class MARCXMLExporter:
"""Export data to MARCXML"""
_output_directory = ""
_export_criterias = {}
def __init__(self, output_directory, export_criterias):
"""Constructor of MARCXMLExporter
@param output_directory: directory where files will be placed
@param export_criterias: dictionary of names and associated search patterns
"""
self.set_output_directory(output_directory)
self._export_criterias = export_criterias
def export(self):
"""Export all records and records modified last month"""
for export_name, export_pattern in self._export_criterias.iteritems():
LAST_MONTH_FILE_NAME = "lastmonth_" + export_name + '.xml'
ALL_MONTH_FILE_NAME = "all_" + export_name + '.xml'
# Export records modified last month
records = self._get_records_modified_last_month(export_name, export_pattern)
self._delete_files(self._output_directory, LAST_MONTH_FILE_NAME)
#self._split_records_into_files(records, SPLIT_BY_RECORDS, LAST_MONTH_FILE_NAME_PATTERN, self._output_directory)
self._save_records_into_file(records, LAST_MONTH_FILE_NAME, self._output_directory)
# Export all records
all_records = self._get_all_records(export_name, export_pattern)
self._delete_files(self._output_directory, ALL_MONTH_FILE_NAME)
self._save_records_into_file(all_records, ALL_MONTH_FILE_NAME, self._output_directory)
def set_output_directory(self, path_to_directory):
"""Check if directory exists. If it does not exists it creates it."""
directory = path_to_directory
# remove the slash from the end of the path if exists
if directory[-1] == os.sep:
directory = directory[:-1]
# if directory does not exists then create it
if not os.path.exists(directory):
try:
os.makedirs(directory)
except(IOError, OSError), exception:
self._report_error("Directory %s does not exist and cannot be created." % (directory, ), exception)
# if it is not path to a directory report an error
if not os.path.isdir(directory):
self._report_error("%s is not a directory." % (directory, ))
return
self._output_directory = directory
def _get_records_modified_last_month(self, export_name, export_pattern):
"""Returns all records modified last month and matching the criteria."""
current_date = datetime.date.today()
one_month_ago = current_date - datetime.timedelta(days = 31)
return perform_request_search(dt="m", d1y = one_month_ago.year, d1m = one_month_ago.month,
d1d = one_month_ago.day,
**self._get_perform_search_dict(export_pattern))
def _get_all_records(self, export_name, export_pattern):
"""Return all records matching the criteria no matter of their modification date."""
return perform_request_search(**self._get_perform_search_dict(export_pattern))
def _get_perform_search_dict(self, export_pattern):
"""
Get dictionary paramameter to call perform_request_search
"""
result = {}
group = re.match(r"\s*collection_name\s*:\s*(?P<coll_name>.+?)\s*$",
export_pattern, re.IGNORECASE)
if group:
result["c"]=list(group.groups("coll_name"))
print result["c"]
else:
result["p"]=export_pattern
return result
def _save_records_into_file(self, records, file_name, output_directory):
"""Save all the records into file in MARCXML
file_name - the name of the file where records will be saved
output_directory - directory where the file will be placed"""
output_file = self._open_output_file(file_name, output_directory)
self._write_to_output_file(output_file,
'<?xml version="1.0" encoding="UTF-8"?>\n<collection xmlns="http://www.loc.gov/MARC21/slim">\n')
for record in records:
marcxml = self._get_record_MARCXML(record)
output_file.write(marcxml)
self._write_to_output_file(output_file, "\n</collection>")
self._close_output_file(output_file)
def _open_output_file(self, file_name, output_directory):
"""Opens new file for writing.
file_name - the name of the file without the extention.
output_directory - the directory where file will be created"""
path = output_directory + os.sep + file_name + '.gz'
try:
output_file = gzip.GzipFile(filename = path, mode = "w")
return output_file
except (IOError, OSError), exception:
self._report_error("Failed to open file file %s." % (path, ), exception)
return None
def _close_output_file(self, output_file):
"""Closes the file"""
if output_file is None:
return
output_file.close()
def _write_to_output_file(self, output_file, text_to_write):
""""Wirtes a the text passed as a parameter to file"""
try:
output_file.write(text_to_write)
except (IOError, OSError), exception:
self._report_error("Failed to write to file " + output_file.name, exception)
def _get_record_MARCXML(self, record):
"""Returns the record in MARCXML format."""
return print_record(record, format='xm')
def _delete_files(self, path_to_directory, name_pattern):
"""Deletes files with file name starting with name_pattern
from directory specified by path_to_directory"""
files = os.listdir(path_to_directory)
for current_file in files:
if current_file.startswith(name_pattern):
path_to_file = path_to_directory + os.sep + current_file
os.remove(path_to_file)
def _report_error(self, error_message, exception = None):
"""Reprts an error during exprotring"""
raise MARCXMLExportException(error_message, exception)
class MARCXMLExportException(Exception):
"""Exception indicating an error when exporting to MARCXML."""
_error_message = ""
_inner_exception = None
def __init__(self, error_message, inner_exception = None):
"""Constructor of the exception"""
Exception.__init__(self, error_message, inner_exception)
self._error_message = error_message
self._inner_exception = inner_exception
def get_error_message(self):
"""Returns the error message that explains the reason for the exception"""
return self._error_message
def get_inner_exception(self):
"""Returns the inner exception that is the cause for the current exception"""
return self._inner_exception