[Pywikibot-commits] [Gerrit] ...core[master]: [pep8] PEP8 changes for create_isbn_edition.py

Xqt (Code Review) Sat, 27 Aug 2022 04:59:59 -0700

Xqt has submitted this change. ( 
https://gerrit.wikimedia.org/r/c/pywikibot/core/+/826937 )


Change subject: [pep8] PEP8 changes for create_isbn_edition.py
......................................................................

[pep8] PEP8 changes for create_isbn_edition.py

- Code style issues
- clear trailing white space
- untabify file
- keep lines below 80 chars
- update function documentation and parameter list
- update shebang
- script documentation is in __doc__
- replace print statements by pywikibot.info
- add main() function, mostly needed for windows and for script tests
- add pywikibot.handle_args to handle global options and test -help
- add isbnlib dependency
- lazy import isbnlib and unidecode
- replace sys.stdin.read by pywikibot.input to show a input message
- create wikidata_site and repo after global args are read to prevent
  site warning

Change-Id: I6917ec9b511db609c2f1828486b9a53998d1e376
---
M scripts/create_isbn_edition.py
M setup.py
M tests/script_tests.py
M tox.ini
4 files changed, 317 insertions(+), 212 deletions(-)

Approvals:
  jenkins-bot: Verified
  Xqt: Looks good to me, approved



diff --git a/scripts/create_isbn_edition.py b/scripts/create_isbn_edition.py
index ee6b18c..2aea6c1 100644
--- a/scripts/create_isbn_edition.py
+++ b/scripts/create_isbn_edition.py
@@ -1,15 +1,15 @@
-#!/home/geertivp/pwb/bin/python3
-
-codedoc = """
-Pywikibot client to load ISBN related data into Wikidata
+#!/usr/bin/python3
+"""Pywikibot script to load ISBN related data into Wikidata.

 Pywikibot script to get ISBN data from a digital library,
 and create or amend the related Wikidata item for edition
 (with the P212=ISBN number as unique external ID).

-Use digital libraries to get ISBN data in JSON format, and integrate the 
results into Wikidata.
+Use digital libraries to get ISBN data in JSON format, and integrate the
+results into Wikidata.

-Then the resulting item number can be used e.g. to generate Wikipedia 
references using template Cite_Q.
+Then the resulting item number can be used e.g. to generate Wikipedia
+references using template Cite_Q.

 Parameters:

@@ -34,39 +34,49 @@
                     Default LANG; e.g. en, nl, fr, de, es, it, etc.

         P3 P4...:   P/Q pairs to add additional claims (repeated)
-                    e.g. P921 Q107643461 (main subject: database management 
linked to P2163 Fast ID)
+                    e.g. P921 Q107643461 (main subject: database
+                    management linked to P2163 Fast ID)

     stdin: ISBN numbers (International standard book number)

-        Free text (e.g. Wikipedia references list, or publication list) is 
accepted.
-        Identification is done via an ISBN regex expression.
+        Free text (e.g. Wikipedia references list, or publication list)
+        is accepted. Identification is done via an ISBN regex expression.

 Functionality:

-    * The ISBN number is used as a primary key (P212 where no duplicates are 
allowed)
-        The item update is not performed when there is no unique match
-    * Statements are added or merged incrementally; existing data is not 
overwritten.
-    * Authors and publishers are searched to get their item number (ambiguous 
items are skipped)
+    * The ISBN number is used as a primary key (P212 where no duplicates
+      are allowed. The item update is not performed when there is no
+      unique match
+    * Statements are added or merged incrementally; existing data is not
+      overwritten.
+    * Authors and publishers are searched to get their item number
+      (ambiguous items are skipped)
     * Book title and subtitle are separated with '.', ':', or '-'
     * This script can be run incrementally with the same parameters
-        Caveat: Take into account the Wikidata Query database replication 
delay.
-        Wait for minimum 5 minutes to avoid creating duplicate objects.
+        Caveat: Take into account the Wikidata Query database
+        replication delay. Wait for minimum 5 minutes to avoid creating
+        duplicate objects.

 Data quality:

-    * Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates
-        Merge duplicate items before running the script again.
+    * Use https://query.wikidata.org/querybuilder/ to identify P212
+        duplicates. Merge duplicate items before running the script
+        again.
     * The following properties should only be used for written works
         P5331:  OCLC work ID (editions should only have P243)
-        P8383:  Goodreads-identificatiecode for work (editions should only 
have P2969)
+        P8383:  Goodreads-identificatiecode for work (editions should
+        only have P2969)

 Examples:

-    # Default library (Google Books), language (LANG), no additional statements
+    # Default library (Google Books), language (LANG), no additional
+      statements
+
     ./create_isbn_edition.py
     9789042925564

     # Wikimedia, language Dutch, main subject: database management
+
     ./create_isbn_edition.py wiki en P921 Q107643461
     978-0-596-10089-6

@@ -109,10 +119,11 @@
     P1036:  Dewey Decimal Classification
     P2163:  Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
     P2969:  Goodreads-identificatiecode
-
+
     (only for written works)
     P5331:  OCLC work ID (editions should only have P243)
-    P8383:  Goodreads-identificatiecode for work (editions should only have 
P2969)
+    P8383:  Goodreads-identificatiecode for work (editions should only
+            have P2969)

 Author:

@@ -154,7 +165,7 @@
     https://pypi.org/search/?q=isbnlib_

         pip install isbnlib (mandatory)
-
+
         (optional)
         pip install isbnlib-bol
         pip install isbnlib-bnf
@@ -169,24 +180,32 @@
     * Better use the ISO 639-1 language code parameter as a default
         The language code is not always available from the digital library.
     * SPARQL queries run on a replicated database
-        Possible important replication delay; wait 5 minutes before retry -- 
otherwise risk for creating duplicates.
+        Possible important replication delay; wait 5 minutes before retry
+        -- otherwise risk for creating duplicates.

 Known problems:

     * Unknown ISBN, e.g. 9789400012820
-    * No ISBN data available for an edition either causes no output (goob = 
Google Books), or an error message (wiki, openl)
+    * No ISBN data available for an edition either causes no output
+        (goob = Google Books), or an error message (wiki, openl)
         The script is taking care of both
     * Only 6 ISBN attributes are listed by the webservice(s)
         missing are e.g.: place of publication, number of pages
-    * Not all ISBN atttributes have data (authos, publisher, date of 
publication, language)
-    * The script uses multiple webservice calls (script might take time, but 
it is automated)
-    * Need to amend ISBN items that have no author, publisher, or other 
required data (which additional services to use?)
+    * Not all ISBN atttributes have data (authos, publisher, date of
+      publication, language)
+    * The script uses multiple webservice calls (script might take time,
+      but it is automated)
+    * Need to amend ISBN items that have no author, publisher, or other
+      required data (which additional services to use?)
     * How to add still more digital libraries?
-        * Does the KBR has a public ISBN service (Koninklijke Bibliotheek van 
België)?
-    * Filter for work properties -- need to amend Q47461344 (written work) 
instance and P629 (edition of) + P747 (has edition) statements
-        https://www.wikidata.org/wiki/Q63413107
+        * Does the KBR has a public ISBN service (Koninklijke
+          Bibliotheek van België)?
+    * Filter for work properties -- need to amend Q47461344 (written
+        work) instance and P629 (edition of) + P747 (has edition)
+        statements https://www.wikidata.org/wiki/Q63413107
         ['9781282557246', '9786612557248', '9781847196057', '9781847196040']
-        P8383: Goodreads-identificatiecode voor work 13957943 (should have 
P2969)
+        P8383: Goodreads-identificatiecode voor work 13957943 (should
+               have P2969)
         P5331: OCLC-identificatiecode voor work 793965595 (should have P243)

 To do:
@@ -205,7 +224,7 @@
 Environment:

     The python script can run on the following platforms:
-
+
         Linux client
         Google Chromebook (Linux container)
         Toolforge Portal
@@ -238,7 +257,7 @@
 Related projects:

     https://phabricator.wikimedia.org/T314942 (this script)
-
+
     (other projects)
     https://phabricator.wikimedia.org/T282719
     https://phabricator.wikimedia.org/T214802
@@ -254,64 +273,71 @@
     https://en.wikipedia.org/wiki/bibliographic_database
     
https://www.titelbank.nl/pls/ttb/f?p=103:4012:::NO::P4012_TTEL_ID:3496019&cs=19BB8084860E3314502A1F777F875FE61

+.. versionadded:: 7.7
 """
-
+#
+# (C) Pywikibot team, 2022
+#
+# Distributed under the terms of the MIT license.
+#
 import logging          # Error logging
 import os               # Operating system
-import re                      # Regular expressions (very handy!)
+import re               # Regular expressions (very handy!)
 import sys              # System calls
-import unidecode        # Unicode

-import pywikibot               # API interface to Wikidata
-
-from isbnlib import *   # ISBN data
-from pywikibot import pagegenerators as pg # Wikidata Query interface
+import pywikibot        # API interface to Wikidata
+from pywikibot import pagegenerators as pg  # Wikidata Query interface
+from pywikibot.backports import List
 from pywikibot.data import api

+try:
+    import isbnlib
+except ImportError as e:
+    isbnlib = e
+
+try:
+    from unidecode import unidecode
+except ImportError as e:
+    unidecode = e
+
 # Initialisation
 debug = True            # Show debugging information
 verbose = True          # Verbose mode

 booklib = 'goob'        # Default digital library
-isbnre = re.compile(r'[0-9-]{10,17}')       # ISBN number: 10 or 13 digits 
with optional dashes (-)
+
+# ISBN number: 10 or 13 digits with optional dashes (-)
+isbnre = re.compile(r'[0-9-]{10,17}')
 propre = re.compile(r'P[0-9]+')             # Wikidata P-number
 qsuffre = re.compile(r'Q[0-9]+')            # Wikidata Q-number

 # Other statements are added via command line parameters
 target = {
-'P31':'Q3331189',                           # Is an instance of an edition
+    'P31': 'Q3331189',  # Is an instance of an edition
 }

 # Statement property and instance validation rules
 propreqinst = {
-'P50':'Q5',                                 # Author requires human
-'P123':{'Q2085381', 'Q1114515', 'Q1320047'},# Publisher requires publisher
-'P407':{'Q34770', 'Q33742', 'Q1288568'},    # Edition language requires at 
least one of (living, natural) language
+    'P50': 'Q5',  # Author requires human
+    # Publisher requires publisher
+    'P123': {'Q2085381', 'Q1114515', 'Q1320047'},
+    # Edition language requires at least one of (living, natural) language
+    'P407': {'Q34770', 'Q33742', 'Q1288568'},
 }

 mainlang = os.getenv('LANG', 'en')[:2]      # Default description language

 # Connect to database
-transcmt = '#pwb Create ISBN edition'                  # Wikidata transaction 
comment
-wikidata_site = pywikibot.Site('wikidata', 'wikidata')  # Login to Wikibase 
instance
-repo = wikidata_site.data_repository()                  # Required for 
wikidata object access (item, property, statement)
+transcmt = '#pwb Create ISBN edition'  # Wikidata transaction comment


-def is_in_list(statement_list, checklist):
+def is_in_list(statement_list, checklist: List[str]) -> bool:
+    """Verify if statement list contains at least one item from the checklist.
+
+    :param statement_list: Statement list
+    :param checklist: List of values
+    :Returns: True when match
     """
-Verify if statement list contains at least one item from the checklist
-
-Parameters:
-
-    statement_list: Statement list
-
-    checklist:      List of values (string)
-
-Returns:
-
-    Boolean (True when match)
-    """
-
     for seq in statement_list:
         if seq.getTarget().getID() in checklist:
             isinlist = True
@@ -322,84 +348,92 @@


 def get_item_list(item_name, instance_id):
+    """Get list of items by name, belonging to an instance (list).
+
+    :param item_name: Item name (string; case sensitive)
+    :param instance_id:    Instance ID (string, set, or list)
+    :Returns: Set of items (Q-numbers)
     """
-Get list of items by name, belonging to an instance (list)
-
-Parameters:
-
-    item_name:      Item name (string; case sensitive)
-
-    instance_id:    Instance ID (string, set, or list)
-
-Returns:
-
-    Set of items (Q-numbers)
-    """
-
     item_list = set()       # Empty set
-    params = {'action': 'wbsearchentities', 'format': 'json', 'type': 'item', 
'strictlanguage': False,
-              'language': mainlang,       # All languages are searched, but 
labels are in native language
-              'search': item_name}        # Get item list from label
+    params = {
+        'action': 'wbsearchentities',
+        'format': 'json',
+        'type': 'item',
+        'strictlanguage': False,
+        # All languages are searched, but labels are in native language
+        'language': mainlang,
+        'search': item_name,  # Get item list from label
+    }
     request = api.Request(site=wikidata_site, parameters=params)
     result = request.submit()

     if 'search' in result:
         for res in result['search']:
             item = pywikibot.ItemPage(repo, res['id'])
-            item.get(get_redirect = True)
+            item.get(get_redirect=True)
             if 'P31' in item.claims:
-                for seq in item.claims['P31']:       # Loop through instances
-                    if seq.getTarget().getID() in instance_id:  # Matching 
instance
-                        for lang in item.labels:                # Search all 
languages
-                            if unidecode.unidecode(item_name.lower()) == 
unidecode.unidecode(item.labels[lang].lower()):    # Ignore label case and 
accents
-                                item_list.add(item.getID())     # Label math
+                for seq in item.claims['P31']:  # Loop through instances
+                    # Matching instance
+                    if seq.getTarget().getID() in instance_id:
+                        for lang in item.labels:  # Search all languages
+                            # Ignore label case and accents
+                            if (unidecode(item_name.lower())
+                                    == unidecode(item.labels[lang].lower())):
+                                item_list.add(item.getID())  # Label math
                         for lang in item.aliases:
-                            if item_name in item.aliases[lang]: # Case 
sensitive for aliases
-                                item_list.add(item.getID())     # Alias match
+                            # Case sensitive for aliases
+                            if item_name in item.aliases[lang]:
+                                item_list.add(item.getID())  # Alias match
     return item_list


-def amend_isbn_edition(isbn_number):
-    """
-Amend ISBN registration.
-
-Parameters:
-
-    isbn_number:    ISBN number (string; 10 or 13 digits with optional hyphens)
-
-Result:
+def amend_isbn_edition(isbn_number):  # noqa: C901
+    """Amend ISBN registration.

     Amend Wikidata, by registering the ISBN-13 data via P212,
     depending on the data obtained from the digital library.
+
+    :param isbn_number:  ISBN number (string; 10 or 13 digits with
+        optional hyphens)
     """
+    global logger
     global proptyx
+    global targetx

     isbn_number = isbn_number.strip()
     if isbn_number == '':
-        return 3    # Do nothing when the ISBN number is missing
-
+        return 3  # Do nothing when the ISBN number is missing
+
     # Validate ISBN data
     if verbose:
-        print()
+        pywikibot.info()

     try:
-        isbn_data = meta(isbn_number, service=booklib)
+        isbn_data = isbnlib.meta(isbn_number, service=booklib)
         logger.info(isbn_data)
-        # {'ISBN-13': '9789042925564', 'Title': 'De Leuvense Vaart - Van De 
Vaartkom Tot Wijgmaal. Aspecten Uit De Industriele Geschiedenis Van Leuven', 
'Authors': ['A. Cresens'], 'Publisher': 'Peeters Pub & Booksellers', 'Year': 
'2012', 'Language': 'nl'}
+        # {'ISBN-13': '9789042925564',
+        #  'Title': 'De Leuvense Vaart - Van De Vaartkom Tot Wijgmaal. '
+        #           'Aspecten Uit De Industriele Geschiedenis Van Leuven',
+        #  'Authors': ['A. Cresens'],
+        #  'Publisher': 'Peeters Pub & Booksellers',
+        #  'Year': '2012',
+        #  'Language': 'nl'}
     except Exception as error:
         # When the book is unknown the function returns
         logger.error(error)
-        #raise ValueError(error)
+        # raise ValueError(error)
         return 3

     if len(isbn_data) < 6:
-        logger.error('Unknown or incomplete digital library registration for 
%s' % isbn_number)
+        logger.error(
+            'Unknown or incomplete digital library registration for %s'
+            % isbn_number)
         return 3

     # Show the raw results
     if verbose:
         for i in isbn_data:
-            print('%s:\t%s' % (i, isbn_data[i]))
+            pywikibot.info('%s:\t%s' % (i, isbn_data[i]))

     # Get the book language from the ISBN book reference
     booklang = mainlang         # Default language
@@ -419,10 +453,10 @@

     # Get formatted ISBN number
     isbn_number = isbn_data['ISBN-13']  # Numeric format
-    isbn_fmtd = mask(isbn_number)       # Canonical format
+    isbn_fmtd = isbnlib.mask(isbn_number)       # Canonical format
     if verbose:
-        print()
-    print(isbn_fmtd)                    # First one
+        pywikibot.info()
+    pywikibot.info(isbn_fmtd)                    # First one

     # Get (sub)title when there is a dot
     titles = isbn_data['Title'].split('. ')          # goob is using a '.'
@@ -435,14 +469,17 @@
     if len(titles) > 1:
         subtitle = titles[1].strip()

-    # Print book titles
+    # pywikibot.info book titles
     if debug:
-        print(objectname, file=sys.stderr)
-        print(subtitle, file=sys.stderr)                # Optional
-        for i in range(2,len(titles)):                  # Print subsequent 
subtitles, when available
-            print(titles[i].strip(), file=sys.stderr)   # Not stored in 
Wikidata...
+        pywikibot.info(objectname, file=sys.stderr)
+        pywikibot.info(subtitle, file=sys.stderr)  # Optional
+        # print subsequent subtitles, when available
+        for i in range(2, len(titles)):
+            # Not stored in Wikidata...
+            pywikibot.info(titles[i].strip(), file=sys.stderr)

     # Search the ISBN number in Wikidata both canonical and numeric
+    # P212 should have canonical hyphenated format
     isbn_query = ("""# Get ISBN number
 SELECT ?item WHERE {
   VALUES ?isbn_number {
@@ -451,13 +488,13 @@
   }
   ?item wdt:P212 ?isbn_number.
 }
-""" % (isbn_fmtd, isbn_number))      # P212 should have canonical hyphenated 
format
+""" % (isbn_fmtd, isbn_number))

     logger.info(isbn_query)
     generator = pg.WikidataSPARQLPageGenerator(isbn_query, site=wikidata_site)

     rescnt = 0
-    for item in generator:                     # Main loop for all DISTINCT 
items
+    for item in generator:  # Main loop for all DISTINCT items
         rescnt += 1
         qnumber = item.getID()
         logger.warning('Found item: %s' % qnumber)
@@ -479,7 +516,7 @@
     # Add all P/Q values
     # Make sure that labels are known in the native language
     if debug:
-        print(target, file=sys.stderr)
+        pywikibot.info(target, file=sys.stderr)

     # Register statements
     for propty in target:
@@ -489,8 +526,11 @@
             targetx[propty] = pywikibot.ItemPage(repo, target[propty])

             try:
-                logger.warning('Add %s (%s): %s (%s)' % 
(proptyx[propty].labels[booklang], propty, targetx[propty].labels[booklang], 
target[propty]))
-            except:
+                logger.warning('Add %s (%s): %s (%s)'
+                               % (proptyx[propty].labels[booklang], propty,
+                                  targetx[propty].labels[booklang],
+                                  target[propty]))
+            except:  # noqa: B001, E722, H201
                 logger.warning('Add %s:%s' % (propty, target[propty]))

             claim = pywikibot.Claim(repo, propty)
@@ -508,20 +548,23 @@
     if 'P1476' not in item.claims:
         logger.warning('Add Title (P1476): %s' % (objectname))
         claim = pywikibot.Claim(repo, 'P1476')
-        claim.setTarget(pywikibot.WbMonolingualText(text=objectname, 
language=booklang))
+        claim.setTarget(pywikibot.WbMonolingualText(text=objectname,
+                                                    language=booklang))
         item.addClaim(claim, bot=True, summary=transcmt)

     # Subtitle
     if subtitle != '' and 'P1680' not in item.claims:
         logger.warning('Add Subtitle (P1680): %s' % (subtitle))
         claim = pywikibot.Claim(repo, 'P1680')
-        claim.setTarget(pywikibot.WbMonolingualText(text=subtitle, 
language=booklang))
+        claim.setTarget(pywikibot.WbMonolingualText(text=subtitle,
+                                                    language=booklang))
         item.addClaim(claim, bot=True, summary=transcmt)

     # Date of publication
     pub_year = isbn_data['Year']
     if pub_year != '' and 'P577' not in item.claims:
-        logger.warning('Add Year of publication (P577): %s' % 
(isbn_data['Year']))
+        logger.warning('Add Year of publication (P577): %s'
+                       % (isbn_data['Year']))
         claim = pywikibot.Claim(repo, 'P577')
         claim.setTarget(pywikibot.WbTime(year=int(pub_year), precision='year'))
         item.addClaim(claim, bot=True, summary=transcmt)
@@ -543,7 +586,8 @@
                             break

                 if add_author:
-                    logger.warning('Add author %d (P50): %s (%s)' % 
(author_cnt, author_name, author_list[0]))
+                    logger.warning('Add author %d (P50): %s (%s)'
+                                   % (author_cnt, author_name, author_list[0]))
                     claim = pywikibot.Claim(repo, 'P50')
                     claim.setTarget(pywikibot.ItemPage(repo, author_list[0]))
                     item.addClaim(claim, bot=True, summary=transcmt)
@@ -559,11 +603,13 @@
     # Get the publisher
     publisher_name = isbn_data['Publisher'].strip()
     if publisher_name != '':
-        publisher_list = list(get_item_list(publisher_name, 
propreqinst['P123']))
+        publisher_list = list(get_item_list(publisher_name,
+                                            propreqinst['P123']))

         if len(publisher_list) == 1:
             if 'P123' not in item.claims:
-                logger.warning('Add publisher (P123): %s (%s)' % 
(publisher_name, publisher_list[0]))
+                logger.warning('Add publisher (P123): %s (%s)'
+                               % (publisher_name, publisher_list[0]))
                 claim = pywikibot.Claim(repo, 'P123')
                 claim.setTarget(pywikibot.ItemPage(repo, publisher_list[0]))
                 item.addClaim(claim, bot=True, summary=transcmt)
@@ -573,30 +619,33 @@
             logger.warning('Ambiguous publisher: %s' % publisher_name)

     # Get addional data from the digital library
-    isbn_cover = cover(isbn_number)
-    isbn_editions = editions(isbn_number, service='merge')
-    isbn_doi = doi(isbn_number)
-    isbn_info = info(isbn_number)
+    isbn_cover = isbnlib.cover(isbn_number)
+    isbn_editions = isbnlib.editions(isbn_number, service='merge')
+    isbn_doi = isbnlib.doi(isbn_number)
+    isbn_info = isbnlib.info(isbn_number)

     if verbose:
-        print()
-        print(isbn_info)
-        print(isbn_doi)
-        print(isbn_editions)
+        pywikibot.info()
+        pywikibot.info(isbn_info)
+        pywikibot.info(isbn_doi)
+        pywikibot.info(isbn_editions)

     # Book cover images
     for i in isbn_cover:
-        print('%s:\t%s' % (i, isbn_cover[i]))
+        pywikibot.info('%s:\t%s' % (i, isbn_cover[i]))

     # Handle ISBN classification
-    isbn_classify = classify(isbn_number)
+    isbn_classify = isbnlib.classify(isbn_number)
     if debug:
         for i in isbn_classify:
-            print('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr)
+            pywikibot.info('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr)

     # ./create_isbn_edition.py '978-3-8376-5645-9' - de P407 Q188
     # Q113460204
-    # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882', 
'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)', '1795979': 
'Wikipedia', '1122877': 'Social sciences'}}
+    # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882',
+    #  'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)',
+    #                         '1795979': 'Wikipedia',
+    #                         '1122877': 'Social sciences'}}

     # Set the OCLC ID
     if 'oclc' in isbn_classify and 'P243' not in item.claims:
@@ -608,54 +657,75 @@
     # OCLC ID and OCLC work ID should not be both assigned
     if 'P243' in item.claims and 'P5331' in item.claims:
         if 'P629' in item.claims:
-            oclcwork = item.claims['P5331'][0]      # OCLC Work should be 
unique
-            oclcworkid = oclcwork.getTarget()       # Get the OCLC Work ID 
from the edition
-            work = item.claims['P629'][0].getTarget()           # Edition 
should belong to only one single work
-            logger.warning('Move OCLC Work ID %s to work %s' % (oclcworkid, 
work.getID()))  # There doesn't exist a moveClaim method?
-            if 'P5331' not in work.claims:                      # Keep current 
OCLC Work ID if present
+            oclcwork = item.claims['P5331'][0]  # OCLC Work should be unique
+            # Get the OCLC Work ID from the edition
+            oclcworkid = oclcwork.getTarget()
+            # Edition should belong to only one single work
+            work = item.claims['P629'][0].getTarget()
+            # There doesn't exist a moveClaim method?
+            logger.warning('Move OCLC Work ID %s to work %s'
+                           % (oclcworkid, work.getID()))
+            # Keep current OCLC Work ID if present
+            if 'P5331' not in work.claims:
                 claim = pywikibot.Claim(repo, 'P5331')
                 claim.setTarget(oclcworkid)
                 work.addClaim(claim, bot=True, summary=transcmt)
-            item.removeClaims(oclcwork, bot=True, summary=transcmt)  # OCLC 
Work ID does not belong to edition
+            # OCLC Work ID does not belong to edition
+            item.removeClaims(oclcwork, bot=True, summary=transcmt)
         else:
-            logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no 
work available' % (item.claims['P5331'][0].getTarget(), 
item.claims['P243'][0].getTarget()))
+            logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no '
+                         'work available'
+                         % (item.claims['P5331'][0].getTarget(),
+                            item.claims['P243'][0].getTarget()))

     # OCLC work ID should not be registered for editions, only for works
     if 'owi' not in isbn_classify:
         pass
-    elif 'P629' in item.claims:                     # Get the work related to 
the edition
-        work = item.claims['P629'][0].getTarget()   # Edition should only have 
one single work
-        if 'P5331' not in work.claims:              # Assign the OCLC work ID 
if missing
-            logger.warning('Add OCLC work ID (P5331): %s to work %s' % 
(isbn_classify['owi'], work.getID()))
+    elif 'P629' in item.claims:  # Get the work related to the edition
+        # Edition should only have one single work
+        work = item.claims['P629'][0].getTarget()
+        if 'P5331' not in work.claims:  # Assign the OCLC work ID if missing
+            logger.warning('Add OCLC work ID (P5331): %s to work %s'
+                           % (isbn_classify['owi'], work.getID()))
             claim = pywikibot.Claim(repo, 'P5331')
             claim.setTarget(isbn_classify['owi'])
             work.addClaim(claim, bot=True, summary=transcmt)
     elif 'P243' in item.claims:
-        logger.warning('OCLC Work ID %s ignored because of OCLC ID %s' % 
(isbn_classify['owi'], item.claims['P243'][0].getTarget()))
-    elif 'P5331' not in item.claims:                # Assign the OCLC work ID 
only if there is no work, and no OCLC ID for edition
-        logger.warning('Add OCLC work ID (P5331): %s to edition' % 
(isbn_classify['owi']))
+        logger.warning('OCLC Work ID %s ignored because of OCLC ID %s'
+                       % (isbn_classify['owi'],
+                          item.claims['P243'][0].getTarget()))
+    # Assign the OCLC work ID only if there is no work, and no OCLC ID
+    # for edition
+    elif 'P5331' not in item.claims:
+        logger.warning('Add OCLC work ID (P5331): %s to edition'
+                       % (isbn_classify['owi']))
         claim = pywikibot.Claim(repo, 'P5331')
         claim.setTarget(isbn_classify['owi'])
         item.addClaim(claim, bot=True, summary=transcmt)

-    # Reverse logic for moving OCLC ID and P212 (ISBN) from work to edition is 
more difficult because of 1:M relationship...
+    # Reverse logic for moving OCLC ID and P212 (ISBN) from work to
+    # edition is more difficult because of 1:M relationship...

     # Same logic as for OCLC (work) ID

     # Goodreads-identificatiecode (P2969)

-    # Goodreads-identificatiecode for work (P8383) should not be registered 
for editions; should rather use P2969
+    # Goodreads-identificatiecode for work (P8383) should not be
+    # registered for editions; should rather use P2969

     # Library of Congress Classification (works and editions)
     if 'lcc' in isbn_classify and 'P8360' not in item.claims:
-        logger.warning('Add Library of Congress Classification for edition 
(P8360): %s' % (isbn_classify['lcc']))
+        logger.warning(
+            'Add Library of Congress Classification for edition (P8360): %s'
+            % (isbn_classify['lcc']))
         claim = pywikibot.Claim(repo, 'P8360')
         claim.setTarget(isbn_classify['lcc'])
         item.addClaim(claim, bot=True, summary=transcmt)

     # Dewey Decimale Classificatie
     if 'ddc' in isbn_classify and 'P1036' not in item.claims:
-        logger.warning('Add Dewey Decimale Classificatie (P1036): %s' % 
(isbn_classify['ddc']))
+        logger.warning('Add Dewey Decimale Classificatie (P1036): %s'
+                       % (isbn_classify['ddc']))
         claim = pywikibot.Claim(repo, 'P1036')
         claim.setTarget(isbn_classify['ddc'])
         item.addClaim(claim, bot=True, summary=transcmt)
@@ -666,7 +736,8 @@
     # https://www.oclc.org/research/areas/data-science/fast.html
     # 
https://www.oclc.org/content/dam/oclc/fast/FAST-quick-start-guide-2022.pdf

-    # Authority control identifier from WorldCat's “FAST Linked Data” 
authority file (external ID P2163)
+    # Authority control identifier from WorldCat's “FAST Linked Data”
+    # authority file (external ID P2163)
     # Corresponding to P921 (Wikidata main subject)
     if 'fast' in isbn_classify:
         for fast_id in isbn_classify['fast']:
@@ -679,109 +750,142 @@
 """ % (fast_id))

             logger.info(main_subject_query)
-            generator = pg.WikidataSPARQLPageGenerator(main_subject_query, 
site=wikidata_site)
+            generator = pg.WikidataSPARQLPageGenerator(main_subject_query,
+                                                       site=wikidata_site)

             rescnt = 0
-            for main_subject in generator:                 # Main loop for all 
DISTINCT items
+            for main_subject in generator:  # Main loop for all DISTINCT items
                 rescnt += 1
                 qmain_subject = main_subject.getID()
                 try:
                     main_subject_label = main_subject.labels[booklang]
-                    logger.info('Found main subject %s (%s) for Fast ID %s' % 
(main_subject_label, qmain_subject, fast_id))
-                except:
+                    logger.info('Found main subject %s (%s) for Fast ID %s'
+                                % (main_subject_label, qmain_subject, fast_id))
+                except:  # noqa B001, E722, H201
                     main_subject_label = ''
-                    logger.info('Found main subject (%s) for Fast ID %s' % 
(qmain_subject, fast_id))
-                    logger.error('Missing label for item %s' % qmain_subject)
+                    logger.info('Found main subject (%s) for Fast ID %s'
+                                % (qmain_subject, fast_id))
+                    logger.error('Missing label for item %s'
+                                 % qmain_subject)

             # Create or amend P921 statement
             if rescnt == 0:
-                logger.error('Main subject not found for Fast ID %s' % 
(fast_id))
+                logger.error('Main subject not found for Fast ID %s'
+                             % (fast_id))
             elif rescnt == 1:
                 add_main_subject = True
-                if 'P921' in item.claims:               # Check for duplicates
+                if 'P921' in item.claims:  # Check for duplicates
                     for seq in item.claims['P921']:
                         if seq.getTarget().getID() == qmain_subject:
                             add_main_subject = False
                             break

                 if add_main_subject:
-                    logger.warning('Add main subject (P921) %s (%s)' % 
(main_subject_label, qmain_subject))
+                    logger.warning('Add main subject (P921) %s (%s)'
+                                   % (main_subject_label, qmain_subject))
                     claim = pywikibot.Claim(repo, 'P921')
                     claim.setTarget(main_subject)
                     item.addClaim(claim, bot=True, summary=transcmt)
                 else:
-                    logger.info('Skipping main subject %s (%s)' % 
(main_subject_label, qmain_subject))
+                    logger.info('Skipping main subject %s (%s)'
+                                % (main_subject_label, qmain_subject))
             else:
-                logger.error('Ambiguous main subject for Fast ID %s' % 
(fast_id))
+                logger.error('Ambiguous main subject for Fast ID %s'
+                             % (fast_id))

     # Book description
-    isbn_description = desc(isbn_number)
+    isbn_description = isbnlib.desc(isbn_number)
     if isbn_description != '':
-        print()
-        print(isbn_description)
+        pywikibot.info()
+        pywikibot.info(isbn_description)

     # Currently does not work (service not available)
     try:
         logger.warning('BibTex unavailable')
         return 0
-        bibtex_metadata = doi2tex(isbn_doi)
-        print(bibtex_metadata)
+        bibtex_metadata = isbnlib.doi2tex(isbn_doi)
+        pywikibot.info(bibtex_metadata)
     except Exception as error:
         logger.error(error)     # Data not available

     return 0


-# Error logging
-logger = logging.getLogger('create_isbn_edition')
-#logging.basicConfig(level=logging.DEBUG)       # Uncomment for debugging
-##logger.setLevel(logging.DEBUG)
+def main(*args: str) -> None:
+    """
+    Process command line arguments and invoke bot.

-pgmnm = sys.argv.pop(0)
-logger.debug('%s %s' % (pgmnm, '2022-08-23 (gvp)'))
+    If args is an empty list, sys.argv is used.

-# Get optional parameters
+    :param args: command line arguments
+    """
+    # Error logging
+    global logger
+    global repo
+    global targetx
+    global wikidata_site

-# Get the digital library
-if len(sys.argv) > 0:
-    booklib = sys.argv.pop(0)
-    if booklib == '-':
-        booklib = 'goob'
+    logger = logging.getLogger('create_isbn_edition')

-# Get the native language
-# The language code is only required when P/Q parameters are added, or 
different from the LANG code
-if len(sys.argv) > 0:
-    mainlang = sys.argv.pop(0)
+    # Get optional parameters
+    local_args = pywikibot.handle_args(*args)

-# Get additional P/Q parameters
-while len(sys.argv) > 0:
-    inpar = propre.findall(sys.argv.pop(0).upper())[0]
-    target[inpar] = qsuffre.findall(sys.argv.pop(0).upper())[0]
+    # Login to Wikibase instance
+    wikidata_site = pywikibot.Site('wikidata')
+    # Required for wikidata object access (item, property, statement)
+    repo = wikidata_site.data_repository()

-# Validate P/Q list
-proptyx={}
-targetx={}
+    # Get the digital library
+    if local_args:
+        booklib = local_args.pop(0)
+        if booklib == '-':
+            booklib = 'goob'

-# Validate the propery/instance pair
-for propty in target:
-    if propty not in proptyx:
-        proptyx[propty] = pywikibot.PropertyPage(repo, propty)
-    targetx[propty] = pywikibot.ItemPage(repo, target[propty])
-    targetx[propty].get(get_redirect=True)
-    if propty in propreqinst and ('P31' not in targetx[propty].claims or not 
is_in_list(targetx[propty].claims['P31'], propreqinst[propty])):
-        logger.critical('%s (%s) is not a language' % 
(targetx[propty].labels[mainlang], target[propty]))
-        sys.exit(12)
+    # Get the native language
+    # The language code is only required when P/Q parameters are added,
+    # or different from the LANG code
+    if local_args:
+        mainlang = local_args.pop(0)

-# Get list of item numbers
-inputfile = sys.stdin.read()            # Typically the Appendix list of 
references of e.g. a Wikipedia page containing ISBN numbers
-itemlist = sorted(set(isbnre.findall(inputfile)))   # Extract all ISBN numbers
+    # Get additional P/Q parameters
+    while local_args:
+        inpar = propre.findall(local_args.pop(0).upper())[0]
+        target[inpar] = qsuffre.findall(local_args(0).upper())[0]

-for isbn_number in itemlist:            # Process the next edition
-    amend_isbn_edition(isbn_number)
+    # Validate P/Q list
+    proptyx = {}
+    targetx = {}

-# Einde van de miserie
-"""
-Notes:
+    # Validate the propery/instance pair
+    for propty in target:
+        if propty not in proptyx:
+            proptyx[propty] = pywikibot.PropertyPage(repo, propty)
+        targetx[propty] = pywikibot.ItemPage(repo, target[propty])
+        targetx[propty].get(get_redirect=True)
+        if propty in propreqinst and (
+            'P31' not in targetx[propty].claims
+            or not is_in_list(targetx[propty].claims['P31'],
+                              propreqinst[propty])):
+            logger.critical('%s (%s) is not a language'
+                            % (targetx[propty].labels[mainlang],
+                               target[propty]))
+            sys.exit(12)
+
+    # check dependencies
+    for module in (isbnlib, unidecode):
+        if isinstance(module, ImportError):
+            raise module
+
+    # Get list of item numbers
+    # Typically the Appendix list of references of e.g. a Wikipedia page
+    # containing ISBN numbers
+    inputfile = pywikibot.input('Get list of item numbers')
+    # Extract all ISBN numbers
+    itemlist = sorted(set(isbnre.findall(inputfile)))
+
+    for isbn_number in itemlist:            # Process the next edition
+        amend_isbn_edition(isbn_number)


-"""
+if __name__ == '__main__':
+    main()
diff --git a/setup.py b/setup.py
index 21779d9..00a1cb9 100755
--- a/setup.py
+++ b/setup.py
@@ -97,6 +97,7 @@

 # ------- setup extra_requires for scripts ------- #
 script_deps = {
+    'create_isbn_edition.py': ['isbnlib', 'unidecode'],
     'commons_information.py': extra_deps['mwparserfromhell'],
     'patrol.py': extra_deps['mwparserfromhell'],
     'weblinkchecker.py': extra_deps['memento'],
diff --git a/tests/script_tests.py b/tests/script_tests.py
index 94d0b80..d499269 100755
--- a/tests/script_tests.py
+++ b/tests/script_tests.py
@@ -26,6 +26,7 @@
 # These dependencies are not always the package name which is in setup.py.
 # Here, the name given to the module which will be imported is required.
 script_deps = {
+    'create_isbn_edition': ['isbnlib', 'unidecode'],
     'commons_information': ['mwparserfromhell'],
     'patrol': ['mwparserfromhell'],
     'weblinkchecker': ['memento_client'],
@@ -374,7 +375,7 @@
     # Here come scripts requiring and missing dependencies, that haven't been
     # fixed to output -help in that case.
     _expected_failures = {'version'}
-    _allowed_failures = ['create_isbn_edition']
+    _allowed_failures = []

     _arguments = '-help'
     _results = None
diff --git a/tox.ini b/tox.ini
index ecf4bfc..3b35408 100644
--- a/tox.ini
+++ b/tox.ini
@@ -164,7 +164,6 @@
     scripts/clean_sandbox.py: N816
     scripts/commonscat.py: N802, N806, N816
     scripts/cosmetic_changes.py: N816
-    scripts/create_isbn_edition.py: C901, D100, E402, E501, F405, T201
     scripts/dataextend.py: C901, D101, D102, E126, E127, E131, E501
     scripts/harvest_template.py: N802, N816
     scripts/interwiki.py: N802, N803, N806, N816

--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/826937
To unsubscribe, or for help writing mail filters, visit 
https://gerrit.wikimedia.org/r/settings

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I6917ec9b511db609c2f1828486b9a53998d1e376
Gerrit-Change-Number: 826937
Gerrit-PatchSet: 17
Gerrit-Owner: Xqt <[email protected]>
Gerrit-Reviewer: D3r1ck01 <[email protected]>
Gerrit-Reviewer: Geertivp <[email protected]>
Gerrit-Reviewer: Xqt <[email protected]>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged

_______________________________________________
Pywikibot-commits mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Pywikibot-commits] [Gerrit] ...core[master]: [pep8] PEP8 changes for create_isbn_edition.py

Reply via email to