On 08.11.2012 09:32, Vesa Sivunen wrote:

Hi!

I would be very interested in seeing the script. The reason
being exactly "...no one really wanted to do this by mouse and
keyboard... ;)".

The second, an IMHO much more important, reason is that you never get it
consistent if you want to do it by hand on more than one instance. Even
if you use diligent librarians for the setup ;)

For sake of simpicity I've included the real world configs from JuSER,
which is the instance in Jülich of the ongoing project with DESY, GSI,
Jülich and RWTH Aachen.

CreateCollections.py is the main script, just call it with it's input in
the same directory. We use _2_ config files. One that is specific for a
given instance. It's name is derived in the usual Invenio style by
evaluating CFG_WEBSTYLE_TEMPLATE_SKIN. Our local instance uses

       CFG_WEBSTYLE_TEMPLATE_SKIN = fzj

so our local, instance specific file is CollectionList_fzj.txt. The
second one is a "generic" list we share between all instances.

Note that the _fzj-File is executed first as it sets up some mother
collections for the gereic ones. For us these are mainly the collections
Workflow, Documenttypes, Authrities, InstColl and FullTexts which should
be first level children of our main instance which is called JuSER. This
explains the first (and subsequent) lines in this file:

Place a collection named internally "Workflow" as a _v_irtual child of
JuSER, and name it "Workflow collections" in @english and "Workflow
collections in @deutsch.

For documenttypes you see the different namings like "Document types"
and "Dokumenttypen". Note also that all collections in the _fzj-File do
NOT use a collection query except the speciality "FullTexts". Therefore
you have _2_ tab chars after the internal name. (:set list in vi shows
it or use some spreadsheet app like gnumeric).

In Collectionlist.txt you see all mor or less regular collections we
use. Same syntax:

Internalname <tab> collection query <tab> r|v <tab> translations

You can spedify whatever languagnes you want, just use the invenio
internal language code preceeded by an @. (So you can have more columns
than the files show.)

If you check out CreateCollections.py in more detail you'll see that it
doesn't deserve the "rocket science" label. It's actually pretty simple
straight forward calls against invenios high level api. In a way it
mimics exactly what would happen if you hit a mouse button her and the
keyboard there in the web frontend.

HTH :)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : [email protected]
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Kennen Sie schon unsere app? http://www.fz-juelich.de/app
#!/usr/bin/env python
#
##
## This file is part of Invenio.
## Copyright (C) 2011, HGF
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

import sys
import re
from invenio.config import CFG_SITE_NAME, CFG_WEBSTYLE_TEMPLATE_SKIN
from invenio.websearchadminlib import \
        add_col, \
        add_col_dad_son, \
        delete_col
from invenio.search_engine import \
        get_colID
from invenio.bibrankadminlib import \
        get_languages, \
        modify_translations
import csv

base = 'Collectionlist'
ext  = '.txt'

generic = base + ext
specific = base + '_' + CFG_WEBSTYLE_TEMPLATE_SKIN + ext

print generic
print specific



sitelangs = get_languages()

CollectionReader = csv.reader(open(specific, 'rb'), delimiter='\t');
for row in CollectionReader:
        data = []
        for col in row:
                data.append(col)
        delete_col(get_colID(data[0]))

        add_col(data[0], data[1])
        add_col_dad_son(get_colID(data[3]), get_colID(data[0]), data[2])

        transdict = {}
        translist = []
        for i in range(4,len(data)):
                lang = re.split('@', data[i])
                for t in lang:
                        transdict[lang[1]] = lang[0]
        for lang in sitelangs:
                try:
                        translist.append(transdict[lang[0]])
                except:
                        translist.append('')
        modify_translations(get_colID(data[0]), sitelangs, 'ln', translist, 
"collection")




CollectionReader = csv.reader(open(generic, 'rb'), delimiter='\t');
for row in CollectionReader:
        data = []
        for col in row:
                data.append(col)
        ## print "Collection: ", data[0]
        ## print "Query     : ", data[1]
        ## print "r/v       : ", data[2]
        ## print "dad       : ", data[3]
        ## print "en        : ", data[4]
        ## print "de        : ", data[5]
        ## there could be more translations at the end of the line

        # Drop all collections first before recreating them. Just in case they 
exist.
        delete_col(get_colID(data[0]))

        # Add the collections and handle subordination
        add_col(data[0], data[1])
        add_col_dad_son(get_colID(data[3]), get_colID(data[0]), data[2])

        # Extract available translations. Languages are marked as @en etc.
        # Build up a hash using language as key so they can easily be sorted 
into 
        # an array for modify_translations call.
        transdict = {}
        translist = []
        # Build a dictionary of all available translations
        for i in range(4,len(data)):
                lang = re.split('@', data[i])
                for t in lang:
                        transdict[lang[1]] = lang[0]
        # Try to add the translated value, add '' if no proper tranlsation is 
found
        for lang in sitelangs:
                try:
                        translist.append(transdict[lang[0]])
                except:
                        translist.append('')
        modify_translations(get_colID(data[0]), sitelangs, 'ln', translist, 
"collection")

HGVVOC  collection:"HGFVOC"     r       Authorities     Controlled 
vocabulary@en        Kontrolliertes Vokabular@de
StatID  collection:"StatID"     r       Authorities     Statistics keys@en      
Statistikschlüssel@de
PubTypes        collection:"PUB"        r       Authorities     Publication 
types@en    Publikationsformen@de
Periodicals     collection:"PERI"       r       Authorities     Periodicals@en  
Periodika@de
People  collection:"P"  r       Authorities     People@en       Personen@de
Institutes      collection:"I"  r       Authorities     Institutes@en   
Institute@de
Institution     collection:"Institution"        r       Authorities     
Institutions@en Institutionen@de
Grants  collection:"G"  r       Authorities     Grants@en       Projekte@de
Unpublished             r       Documenttypes   Unpublished@en  
Unpubliziertes@de
Theses          r       Documenttypes   Theses@en       Hochschulschriften@de
Reports         r       Documenttypes   Reports@en      Berichte@de
Presentations           r       Documenttypes   Presentations@en        
Präsentationen@de
Patents         r       Documenttypes   Patents@en      Patente@de
Other Resources         r       Documenttypes   Other Resources@en      
Andere@de
Events          r       Documenttypes   Events@en       Ereignisse@de
Books           r       Documenttypes   Books@en        Bücher@de
Articles                r       Documenttypes   Articles@en     Aufsätze@de
EDITOR  collection:"EDITOR"     r       Workflow        In process@en   In 
Bearbeitung@de
LIBRARY collection:"LIBRARY"    r       Workflow        At library@en   
Bibliotheksprüfung@de
MAIL    collection:"MAIL"       r       Workflow        Mail to editor@en       
Sachbearbeiter benachrichtigt@de
TEMPENTRY       collection:"TEMPENTRY"  r       Workflow        Temporary 
Entries@en    Temporäre Einträge@de
USER    collection:"USER"       r       Workflow        User submitted 
records@en       Eingereichte Dokumente@de
VDB     collection:"VDB"        r       Workflow        Publications 
database@en        Publikationsdatenbank@de
VDBINPRINT      collection:"VDBINPRINT" r       Workflow        Documents in 
print@en   Im Druck@de
VDBRELEVANT     collection:"VDBRELEVANT"        r       Workflow        
Relevant for Publication database@en    Für Publikationsdatenbank relevant@de
MIGRATION       collection:"MIGRATION"  r       Workflow        Migrated 
datasets (backup)@en   Migrierte Datensätze (Backup)@de
MASSMEDIA       collection:"MASSMEDIA"  r       Workflow        In the media@en 
In den Medien@de
UNRESTRICTED    collection:"UNRESTRICTED"       r       Workflow        Public 
records@en       Öffentliche Einträge@de
Project collection:"UNRESTRICTED" and collection:"project"      r       
Unpublished     Projects@en     Projekte@de
Notes   collection:"UNRESTRICTED" and collection:"notes"        r       
Unpublished     Notes@en        Notizen@de
News    collection:"UNRESTRICTED" and collection:"news" r       Unpublished     
News@en Nachrichten@de
FormTemplate    collection:"UNRESTRICTED" and collection:"formtmp"      r       
Unpublished     Forms / Templates@en    Formulare / Vorlagen@de
Communication   collection:"UNRESTRICTED" and collection:"comm" r       
Unpublished     Communication@en        Mitteilung@de
Staatsexamen    collection:"UNRESTRICTED" and collection:"exam" r       Theses  
Staatsexamen@en Staatsexamen@de
PostdoctoralThesis      collection:"UNRESTRICTED" and collection:"habil"        
r       Theses  Postdoctoral Theses@en  Habilitationen@de
PhDThesis       collection:"UNRESTRICTED" and collection:"phd"  r       Theses  
Ph.D. Theses@en Doktorarbeiten@de
MasterThesis    collection:"UNRESTRICTED" and collection:"master"       r       
Theses  Master Theses@en        Masterarbeiten@de
Magisterarbeit  collection:"UNRESTRICTED" and collection:"magister"     r       
Theses  Magisterarbeit@en       Magisterarbeiten@de
DiplomaThesis   collection:"UNRESTRICTED" and collection:"diploma"      r       
Theses  Diploma Theses@en       Diplomarbeiten@de
Coursework      collection:"UNRESTRICTED" and collection:"course"       r       
Theses  Course works@en Kursarbeiten@de
BachelorThesis  collection:"UNRESTRICTED" and collection:"bachelor"     r       
Theses  Bachelor Theses@en      Bachelorarbeiten@de
Report  collection:"UNRESTRICTED" and collection:"report"       r       Reports 
Reports@en      Berichte@de
Preprint        collection:"UNRESTRICTED" and collection:"preprint"     r       
Reports Preprints@en    Vorabdrucke@de
Minutes collection:"UNRESTRICTED" and collection:"minutes"      r       Reports 
Minutes@en      Protokolle@de
InternalReport  collection:"UNRESTRICTED" and collection:"intrep"       r       
Reports Internal Reports@en     Interne Berichte@de
Talknon-conference      collection:"UNRESTRICTED" and collection:"talk" r       
Presentations   Talks (non-conference)@en       Vorträge (nicht Konferenz)@de
Poster  collection:"UNRESTRICTED" and collection:"poster"       r       
Presentations   Poster@en       Poster@de
Lecture collection:"UNRESTRICTED" and collection:"lecture"      r       
Presentations   Lectures@en     Vorlesungen@de
ConferencePresentation  collection:"UNRESTRICTED" and collection:"conf" r       
Presentations   Conference Presentations@en     Konferenzvorträge@de
Abstract        collection:"UNRESTRICTED" and collection:"abstract"     r       
Presentations   Abstracts@en    Zusammenfassungen@de
Patent  collection:"UNRESTRICTED" and collection:"patent"       r       Patents 
Patents@en      Patente@de
Software        collection:"UNRESTRICTED" and collection:"sware"        r       
Other Resources Software@en     Software@de
PhysicalObject  collection:"UNRESTRICTED" and collection:"physobj"      r       
Other Resources Physical Objects@en     Physikalische Objekte@de
Multimedia      collection:"UNRESTRICTED" and collection:"media"        r       
Other Resources Multimedia@en   Multimedia@de
Images  collection:"UNRESTRICTED" and collection:"images"       r       Other 
Resources Images@en       Bilder@de
Dataset collection:"UNRESTRICTED" and collection:"dataset"      r       Other 
Resources Datasets@en     Datensätze@de
Contribution2proceeding collection:"UNRESTRICTED" and collection:"contrib"      
r       Events  Contributions to a conference proceeding@en     Beiträge zu 
Proceedings@de
Conferences     collection:"UNRESTRICTED" and collection:"Conferences"  r       
Events  Conferences@en  Konferenzen@de
ConferenceEvent collection:"UNRESTRICTED" and collection:"ConferenceEvent"      
r       Events  Conferences / Events@en Konferenzen / Veranstaltungen@de
Reference       collection:"UNRESTRICTED" and collection:"refs" r       Books   
Reference@en    Referenzen@de
Proceedings     collection:"UNRESTRICTED" and collection:"proc" r       Books   
Proceedings@en  Proceedings@de
Contribution2book       collection:"UNRESTRICTED" and collection:"contb"        
r       Books   Contribution to a book@en       Buchbeitrag@de
Book    collection:"UNRESTRICTED" and collection:"book" r       Books   
Books@en        Bücher@de
JournalArticle  collection:"UNRESTRICTED" and collection:"journal"      r       
Articles        Journal Article@en      Zeitschriftenaufsätze@de
Workflow                v       JuSER   Workflow collections@en Workflow 
collections@de
Documenttypes           v       JuSER   Document types@en       Dokumenttypen@de
Authorities             r       JuSER   Authorities@en  Normsätze@de
InstColl                r       JuSER   Institute Collections@en        
Institutssammlungen@de
FullTexts       collection:"JUWEL"      r       JuSER   JUWEL@en        JUWEL@de

Reply via email to