https://bugzilla.wikimedia.org/show_bug.cgi?id=10867
Philippe Verdy <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #5 from Philippe Verdy <[email protected]> 2009-11-20 17:00:38 UTC --- Interestingly, the various number types that can be mapped to the EAN international standard (which encompass them all) can also be parsed by recognizing a list of known prefixes, from which we could automatically determine the product type, or the geographical area (or linguistic area). If such an application for automatic determination of these types of information is developed, the table of prefixes should include: - the exact standard identifier (such as "ISBN", "ASIN", "ISMN", possibly "UPC", but NOT "EAN" which is not qualifying and includes all the other standards in their longest number form...) - the total length of the identifier (including its prefix below and the check digit) - the prefix (possibly a regular expression) : it may be empty if all numbers with the above length are part of the standard. - an optional qualifier (such as country or language zone in ISBN) - a comment containing a verifiable reference for this prefix. - an optional field containing a bigger length on which the shorter The table should be editable somewhere (possibly restricted by admins) And there may exist several lines for the same standard identifier (because there are alternate forms using other lengths, or different mappings from a shorter legacy number to the longer number, or because one wants to subdivide the standard space for easier identification of products of the same type, for building specific search pages per product type and or optional subdivision like the geographic area). Note that the geographic area can be complex to determine (it may require lots of lines, notably for ISBN), and some legacy geographical areas are remaining in the standards that are now split across distinct countries: if you want to search a book with vendors for a specific geographic zone, they may need to support the legacy numbers that were allocated to larger zones. (and these legacy zones are still in use for newer products, because the geographic subdivisions are then subdivided by producer/editor company which still have unallocated space in their current number blocks, or because the vendors have disappeared, have been split or merged, and their existing number blocks became shared, so they no longer map exactly to the exact geographic area in which the block prefixes were allocated). Another table could contain optional equivalent identifiers, or could contain a list of regular expressions used to recognize them, and another one used to parse the acceptable number formats (because there may be letters sometimes), and an identifier for the method used to verify the number validity (by its check digit) : the legacy ISBN-10 check digit is computed diferently from the newer ISBN-13 identifier that uses the EAN method. These tables do not necessarily have to be on a Wiki page or in the database. They may perfectly reside in a PHP source file, because the number validation or remapping methods will frequently require specific PHP code, and also because it will probably be more efficient there. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
