Re: Draft for bibdocadmin CLI tool

Tibor Simko Mon, 21 Jan 2008 16:07:23 +0100

Hi Sam:

On Fri, 18 Jan 2008, Samuele Kaplun wrote:


> bibdocadmin <query> <action> [url:<url>] [docname:<docname>] [format:<format>]
>    [description:<description>] [comment:<comment>] [restriction:<restriction>]
>    [icon:<icon>] [newdocname:<newdocname>]

I prefer traditional "--option-name option-value" convention.

Interesting missing options:

   --docid

because even if we "hide" docIDs now from users, the admins may still
need to work by them.

(BTW, generic options are missing too, e.g. --verbose will be much
needed.)

I also prefer options to be named more explicitly, e.g. to mirror read
and write operations as much as possible, such as:

$ bibdocadmin --record 8 --docname thesis --get-comment
This is foo and bar.
$ bibdocadmin --record 8 --docname thesis --set-comment "This is baz."

Generally, we could have two family of options: a) one family for
selecting which documents to process; b) another family for selecting
actions together with their arguments.  (See also below.)  The names
could support this distinction to make things clearer.

> <query> should be any query that perform_request_search understand.

When selecting documents, we need to separate search pattern (=p) from
seached collection (=c) arguments, e.g. to be able to say:

   --select-by-pattern 'author:"Ellis, John"'
   --select-by-collection Theses

or shorter:

   --pattern 'author:"Ellis, John"'
   --collection Theses

This would simply transform to perform_request_search(p=..., cc=...).
We probably do not need other options from p_r_s() capabilities.  See
also "bibreformat --help".

> one or more result <action>:
> list_docname
> list_docfiles (their urls and filesystem path)
> list_info (all the possibile informartion about bibdoc and bibdocfiles)
> list_statistics (total size of the latest versions and total size in general)
> delete
> purge
> expunge
> fix
>
> one and only result <action>:
> add (to add a docname)
> append (to append a new format or icon)
> revise to revise a document (by optionally providing a new url)

Interesting missing actions:

  --stamp    ... or maybe leave this for the above layer?
                 (--get-file && stamp.py && --revise-file ?)
  --check    ... compare MD5 checksums, report health status
  --get-restrictions ... report about document restrictions
  --set-restrictions ... set a restriction on document
  --undelete ... a mirror action to delete; or should we name them
                 "enable/disable" to make it less stressful?

     (BTW, there is a use case from Theodoros for a submitted file to
     be viewable only after a certain date in the future, due to
     copyright restrictions.  We do not treat this in bibdoc so far.
     We could eventually introduce the notion of "starting date" into
     there.)

  --get-usage-stats ... list some download data from rnkDOWNLOADS for
                        this file (of course, some nice format for
                        post-processing would be needed)

BTW, if by the phrase "one or more result <action>" you mean we should
allow action chaining, then it could be misleading if people write
things like:

   --list-docnames --delete

If we distinguish actions to be roughly of three types --get-foo,
--set-foo, and --do-foo, then it is probably safer not to allow
cross-chaining between these categories, and not to allow any multiple
actions inside some categories at all.

> To let revise/add/append more than one data at a time we can think
> of having special symbols to divide them: say a '-' and '--' with
> the semantic:

Again, it could be somewhat error prone if we allow treating of more
than one data object at the same time with different arguments.  Maybe
people could do this only with bibupload FFTs.  It is hard to combine
all the multi-multi options into a clean CLI interface.  So I'd rather
prefer the CLI interface to be very simple and rock-solid, and for
complexish things, people could use MARCXML with FFTs, or the future
XML-based "inveniocfg" dump-edit-load tool.

> bibdocadmin recid:1234567 revise docname:foo icon:http://cds.cern.ch/cds.gif

In my speak this would be something like:

  $ bibdocadmin --record 1234567 --docname foo \
                --revise-icon http://cds.cern.ch/cds.gif

Other interesting use cases we could mention:

  $ bibdocadmin --collection Theses --check --verbose 9
  INFO: Record 10, file thesis.pdf ... ok
  ERROR: Record 10, file defence.ppt ... FAILED
  INFO: Record 12, file article.pdf ... ok
  [...]
  INFO: 123 files checked, 113 okay, 10 failed
  FAILED

  $ bibdocadmin --collection Theses --get-restrictions
  INFO: Record 10, file thesis.pdf, restrictions NONE
  INFO: Record 12, file article.pdf, restrictions my-nice-group
  [...]
  INFO: 123 files checked, 100 NONE restriction, 23 my-nice-group

  $ bibdocadmin --pattern 'author:"Crackpot, John"' \
                --docname * \
                --add-comment "This file was removed due to foo."
  OK

These examples show that we may also need some "output conventions" so
that people could parse bibdocadmin result if they need.  (In addition
to our usage of proper exit status codes to indicate success/failure.)

(See also "bibsched status" simple output that is here to quickly
indicate health report of the bibsched queue; we may want to harmonize
things so that e.g. BibEdit and watchdogs can call this better.)

                                * * *

Okay, these were some first comments from the top of my head.  Unless
other people want to brainstorm more, let's take this discussion IRL
and let's polish the thing and let's document the decisions in
Savannah when ready.

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Re: Draft for bibdocadmin CLI tool

Reply via email to