Hi,

> There are no silly questions when it comes to installing and running
> invenio! :-)

Nor simple answers ;-)
 
> The indexer and other daemons were indeed not running when I made the
> test submissions, but I set them running yesterday. Re-indexing all
> the indexes has not helped. 

In general, I tend to agree with Alexander: if the information is
stored in the Marc record, use that; it is more standard, the search
api is well defined and, in most cases, it has a web interface, and so
it can be retrieved from a different host, if needed.

But sometimes not all information is there, and moreover not always in
the database.  My own solution to a similar query, in my case, was
better solved extracting the information from temporary files.

A word of warning: it is too simplistic and rude (you know, internal
stuff), and obviously it shouldn't re-extract the information every day
(I run it from cron), but for now it gives us useful info.  You'll may
have to adjust some paths.  The shell script sets some filenames and it
calls the python script of the same name.

Hope it helps you in another way,

Ferran

---

#!/bin/sh
# -*- coding: utf-8 -*-
# Time-stamp: <2012.10.01 11:31:35 submissions.sh [email protected]>

# Keep a historical monthly extract of submission details from Invenio
# temp files

what=submissions

year=$(date +"%Y")
month=$(date +"%m" | sed 's|^0||')
logfile=${what}_a${year}m${month}.log
logdir=~/log

cd $logdir

~/bin/$what.py | sort -nr >$logfile
sort -nru ${what}_a*.log >$what.log

---

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Time-stamp: <2012.10.01 11:21:38 submissions.py [email protected]>

# Extract submission details from Invenio temp files

import sys
import os
import glob
import time
import tarfile
import pprint

basedir = os.path.expanduser('~/invenio/var/data/submit/storage/done/running/')

def extract_info(tarball):
    #print tarball
    mtime = time.localtime(os.path.getmtime(tarball))
    timestamp = time.strftime("%Y.%m.%d %H:%M:%S", mtime)

    record = {}
    record['RN'] = tarball.split('_')[0]
    record['timestamp'] = timestamp
    record['lastuploadedfile'] = '-'

    tar = tarfile.open(tarball)
    filenames = tar.getnames()
    for filename in filenames:
        key = filename.split('/')[-1]
        try:
            value = tar.extractfile(filename).read()
        except AttributeError:
            value = '-'
        record[key] = value
    tar.close()

    out = '%s\t%s\t%s\t%s\t%s\t%s\t%s' % (
        record['SN'],
        record['timestamp'],
        record['SuE'],
        record['doctype'],
        record['RN'],
        record['access'],
        record['lastuploadedfile'],
        )
    return out


if __name__ == '__main__':
    os.chdir(basedir)
    doctypes = os.listdir(basedir)
    for doctype in doctypes:
        os.chdir(doctype)
        tarballs = glob.glob('*.tar.gz')
        for tarball in tarballs:
            print extract_info(tarball)
        os.chdir(basedir)

Reply via email to