... my OCR scanner has rested for tooo long -

To enjoy with your morning coffee.....

Regards to all,

John in Wales

===========================================================
>From NETWORK in The Independent's "The Monday Review":  4th March 2002

[1400 words: graphics (2) removed]

Remembrance of Data Past

As digital records increase, so does the need to keep them readable.
Charles Arthur reports on a new drive to keep old information alive.


It you want to amaze a child used to playing on a games console, try telling
them that computer games used to come on cassette tapes. In 1982, proud
owners of the Sinclair ZX Spectrum (which boasted a stunning 16K - that's
kilobytes - of memory in its basic configuration) would connect the audio
output of the cassette recorder to the Spectrum's input; the program,
recorded as a series of high and low tones, was then translated into data
and loaded into memory.

Doing that now might seem pointless; after all, you can nip out to the shops
and buy a new game with far more bang for the buck than a Spectrum. True,
you could probably pick one up for �5 at a car boot sale, but you'll have to
find it first.

What, though, if you were a historian, and the program you were looking for
was the one that was used originally to arm nuclear missiles in the 1960s,
and fed into by punched cards to computers that are now long since consigned
to the scrap heap? What if you needed to find a working version of the
computer to test the program? Suddenly, the problem doesn't seem so trivial.
After all, it would be a major research coup to discover that there was once
a bug in our scheme to eliminate the enemy.

And what about the public and private records being generated today - the
letters that a famous author wrote on a PC that has been discontinued, using
a program whose developers have long ago gone bust? And what of the e-mails
being generated within this government that will one day have to be made
public?

Getting a handle on the preservation of this digital data is the purpose of
the Digital Preservation Coalition (DPC), which last week announced an
action plan "to ensure that the digital information we are producing is not
lost to current and future generations".

At the launch of the project, which has backing from 19 UK organisations -
including the Public Record Office (PRO), the Joint Information Systems
Committee of the Higher and Farther Education Funding Councils (JISC), the
British Library and the University of London - a pertinent example was
mentioned: the BBC Domesday Project. This was a multimedia project that
eventually produced a pair of interactive video discs, made by the BBC, to
celebrate the 900th anniversary of the original Domesday Book. More than a
million people contributed in some way, providing offerings from schools and
researchers.

These were then stored on the discs and could be viewed using a BBC Acorn
computer. It was claimed that it would take you more than seven years to
look at everything on the discs. However, by the time you had looked at all
that content, the computers would long since have become obsolete. And
that's pretty much what has happened: "As a multimedia resource and
interactive learning tool it was unsurpassed," said Loyd Grossman, chairman
of the DPC. "Yet despite those achievements, the problems of hardware and
software dependence have now rendered the system obsolete. With few working
examples left, the information on this incredible historical object will
soon disappear forever."

Lynne Brindley, who chairs the DPC, concurs: "When the average life cycle of
a website is six weeks, and the life cycle of new technologies is measured
in singleton years, the concept of long-term access to digital content being
measured in hundreds of years is, to say the least, challenging."

Among those who feel really challenged is the PRO. There, David Ryan, head
of archive services, has the unenviable task of trying to marshal the
growing flood of computer-based information that is coming in from all over
the civil service.

Items are sent to the PRO when they are at least 30 years old; most are
weeded out over time, and regarded as not worth keeping as a matter of
historical record about the working of government, and so the PRO only
receives 3 per cent of the paperwork that was generated in any department.
It was even so for 2001 - covering the period stretching back to 1971 and
(for more secret documents) even earlier, which generated a stack of paper
that covers the equivalent of 1.5 kilometres (0.9 miles) of shelf space. And
in a few years, there will be more and more computer tapes and disks. The
question is, how should they be preserved? And what is the best medium and
encoding format to make them available over the long term, perhaps hundreds
of years?

"I don't know," says Ryan bluntly. But it's not said in defeat; instead, he
relishes the idea of tackling this problem. "I'm actually fairly optimistic
about all this. I think that society is migrating from being paper-based to
being computer-based. We're at a crossover period, and so the rate of change
in formats and media is because we are in the early age of the computer
revolution. Computers will become ubiquitous, and in a few years many of
these issues will have been dealt with. Look at cash machines, for example:
the cards are all the same size because the system depends on
interoperability. Maybe in the future it will be the same with the computing
infrastructure."

One problem with storing digital data for the future is finding a standard,
open format. "In the 1980s everyone would have said that it was ASCII, which
is just plain text," he says. "Now, people are saying it's XML [Extensible
Markup Language, of which HTML is a subset]. I would say - perhaps; but the
really important thing is that digital data is very different from paper
data. The latter has very low entropy You can store and it will last
literally for centuries if it's on acid-free paper. But with digital data,
you have to keep paying attention to how standards are changing." Otherwise,
you'll end up with your marvellous BBC Domesday discs - and no way to unlock
the content.

To that end, the PRO is assembling its own computer library of emulation
programs. "We recognise that not everyone will have a copy of Wordstar for
DOS, so we're working on either using an emulator to present that document
in the same format as it would have appeared, or to export it to PW (PDF,
the Portable Document Format, is not a proprietary standard, despite having
been defined by the graphics company Adobe.)

He doesn't even put forward an opinion on whether the PRO'S future storage
will use magnetic or optical media; a tender is about to go out for
companies to bid for the contract to store its electronic records. And the
PRO will store any computer programs sent to it, although the punch cards
that might have controlled our missiles in the Cuban missile crisis are long
gone. "Those would have been transferred to magnetic tape," Ryan says.

What he is expecting, though, is that formats will settle down. One can
imagine, for
example, that historians will be interested in the e-mails sent within the
Department of Transport for the period between 11 September 2001 and
mid-February 2002, especially where Jo Moore or Martin Sixsmith are among
the senders or recipients.

But they won't have to hunt around at the future equivalents of car boot
sales for machines to run them on. "Frankly, I would be depressed if in 200
years people are still having to go through this loop of finding old
machines and emulators," says Ryan.

Certainly, when there's enough interest, the programs and data will live on.
Using the ZX Spectrum's rubber keys was once memorably described as being
"like typing on dead flesh". But for a generation, it was their introduction
to computing and, even, to hacking operating systems. Their enthusiasm for
the box means that today there are dozens of Spectrum emulators on the net,
available for free and written in Java. And you can get a stack of games for
free - though ironically Amstrad, which bought the Sinclair name, recently
announced that it will sell those games via its latest Em@iler web
appliance. Clearly, the best way for a technology to combat obsolescence is
the simple one: always remain popular.

end




Reply via email to