Hi all, Based on the latest data dump (August 31st), I made VacuumBot clean up some of the 6791 "formats".
Many formats are badly split MARC title lines (I think because field delimiters in the MARC records were (partially) missing) and include the "by statement" (e.g. "[microform] / by Jeffrey C. Hyde") or subtitle (e.g. "[microform] : European culture studies."). In some spare time, I used Google Refine to split these formats to a format and by statement or subtitle and used VacuumBot to update the records (this task still fits with my definition of cleaning). If the field (by_statement or subtitle) already existed and was not empty, the content was put in (if not empty, then added to) the notes field. This is also explained in the edit comment. There may be bad data left. I didn't extensively check for partially missing field delimiters, for example some subtitles may start with "b ". I also changed many other formats, although those include changing "eBook" to "E-book" and even "10 cm." to "10 cm". All in all, the number of different "formats" is down to about 4500. Ben _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
