Dear Sergio,

Sergio Laberer wrote:
The documents within such a collection are now exported by someone via excel and processed further outside CDS-invenio. This person would now be interested in being able to pick up the incremental addions to this collection since they exported the documents last.

You could simply implement a check in the Excel output:

for each record:
 if the record modification/addition time is older \
   than the last export time then:
       skip record
   else:
       format record to excel
Save current export time in a file
(also save collection if export is done on a collection basis)

You can find the Excel export function is
/opt/cds-invenio/lib/python/invenio/bibformat.py

That would however not be very efficient and could leads to a
few problems (see below).

Alternatively, I was thinking of possibly doing this by adding a field for each document, which indicates whether the document was already exported or not. However, I would then have to batch update all those fields after it being exported to excel. Do you think this would be feasible? If so, do you have some pointers?

That would be feasible: at export time (see "algorithm" above), just
starts a BibUpload task with the updated XML. That would be as simple
as adding a field:

For eg. generate /tmp/your_modif.xml:
<record>
 <controlfield tag="001">XXXX</controfield>
 <datafield tag="999" ind1="9" ind2="9">
   <subfield code="a">PROCESSED</a>
 </datafield>
</record>

and then:
$ /opt/cds-invenio/bin/bibupload -a /tmp/your_modif.xml

You could even update the collection of the record, if you want
to have a collection tree in this form:
"Collection YYYYY"
 -> "Collection YYYYY new"
 -> "Collection YYYYY processed"

See BibUpload admin guide:
<http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide>

However I see a problem in this workflow: what if someone exports the
data to an Excel output, but does not save the file (click by mistake
on "Cancel" when asked to save)? It would no longer be possible to
re-export these records "easily".

Also there would be a problem if the documents are exported
concurrently, by several people at the same time.

While there is the alerting functionality which in my view would solve this matter, they actually would like to do this whenever they wanted it and not being bound on a regular schedule.

If they set up a daily alerts that goes to some dedicated mailbox,
they could process them whenever they want. That would also give them
a good idea of the amount of work ("ZZZZ unread messages").

The alerts could all go to a shared mailbox, or could be set up so
that they go to a particular person depending on the collection.

Note that alerts can be send to WebBaskets too: your users could
progressively empty their baskets while records are processed.  Also
the contents of WebBaskets can be exported to some output formats: you
could plug one of the possibilities discussed above in this module
(remove records from the basket once they have been exported). However
the same limitations apply.

If I understand the new feature you mentioned, there would be a staging area where the latest additions would be queued before being promoted to be viewable. Will those "pre-release" documents be already within the database, i.e. searchable or would they simply be queued before being loaded into the repository? If it is the former, then I could see a way around my problem. The later might not solve my problem.

They would be queued before being searchable/viewable.

Then you might think about setting up a second Invenio server: Records
are harvested from external sources on the first Invenio instance,
while the second Invenio instance harvest from the first one. The
first server would be for search/view, while the second would be used
only for its possibility to queue new records before they are
integrated. That could however be a bit heavy for your needs.

Finally consider this possibility to use CDS Invenio to process the
records instead of producing Excel listings for external processing:
with some custom WebSubmit submissions you can achieve quite powerful
workflows.

Best regards

--
Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>


Reply via email to