Re: Archiving information, was Re: ADM-3A question

2019-08-17 Thread Antonio Carlini via cctalk

On 16/08/2019 16:33, Noel Chiappa via cctalk wrote:

There is an automatic backup system which sends copies to a machine at his
house, so the particular scenario above (hosting sevice goes away with no
warning) is not an issue. (Yes, a Chicxulub event in Scandanavia would defeat
that, but we'd all probably have larger problems to worry about!) After the
first event, I make manual backups here of all the articles I contribute.

The biggest concern is if he has an unfortunate interaction with a truck. I
did raise this issue with him, and he had some initial suggestions, but I
haven't followed through. If people start contributing, it'd probably be time
to formalize something.



I think having it mirrored would be a smart move. bitsavers is mirrored, 
manx nearly vanished but is now online (although I've just noticed that 
it's hosted on codeplex ...).



It is possible to Special:Export each page via a script but it would be 
much easier to have the existing backup mechanism make copies available 
to multiple people. (It's easy to install mediawiki, so testing the 
backup occasionally should be straightforward).



Antonio


--
Antonio Carlini
anto...@acarlini.com



Re: Archiving information, was Re: ADM-3A question

2019-08-17 Thread Christian Corti via cctalk

On Fri, 16 Aug 2019, ben wrote:

Well with me I have been finding with many searches, the modern browsers
refuse to display sites for "what they figure is unsafe" yet the porn ads 
still show. I can find it, but not view it.


I use current versions of Firefox and Chrome/Chromium, and I don't have 
any problems viewing sites. And for your porn problem, what about using 
uBlock and NoScript?


Christian


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Cameron Kaiser via cctalk
> Then imagine that a law is passed in a far away land, and the site owner 
> decides it's is too risky to bother with, and they then take the entire 
> site down - wiki and fora - with no warning and no access to the material...

Gosh, Steven, I can't imagine for the *life of me* what site you're
referring to here. I mean, it's not like Some Guy Immediately pulls it
all down on a whim, is it?

;)

-- 
 personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- God may be subtle, but He isn't plain mean. -- Albert Einstein -


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread dwight via cctalk
One of the problems with archiving is what to do with items that are not 
popular. Some things might be more valued ten or twenty years in the future but 
not now. Is the fact that the item has relatively low interest now a possible 
reason to not archive it in a searchable form for future reference?
What about things that are scattered on other personal sites currently that may 
be gone next week? So much information is already lost.
Who determines what should be saved? What say you come across a rare document 
but the copy was poorly done at a lower than desired resolution. Do you refuse 
to post it because it doesn't meet your standards or do you post it with a note 
that it is the best to date? Judging such things can be arbitrary and be the 
reason for lost information.
At least when you publish a book, there is a chance that some copy may be 
saved. Now with information sitting on someones disk drives, it could be 
deletes with one mistake.
This is a really complicated issue. I'm getting older and know I'm on the tail 
end of my life. Still, I have no way to begin to pass on what I have. I doubt 
my heirs would care much unless it had significant monetary value.
Dwight


From: cctalk  on behalf of Seth J. Morabito via 
cctalk 
Sent: Friday, August 16, 2019 8:31 AM
To: General Discussion: On-Topic and Off-Topic Posts 
Subject: Re: Archiving information, was Re: ADM-3A question


Paul Koning via cctalk writes:

> Anything worth having around deserves backup.  Which makes me wonder
> -- how is Wikipedia backed up?  I guess it has a fork, which isn't
> quite the same thing.  I know Bitsavers is replicated in a number of
> places.  And one argument in favor of GIT is that every workspace is a
> full backup of the original, history and all.
>
> One should worry for smaller scale efforts, though.

This is a problem I think about a lot.

In the early 2000s I worked on the LOCKSS program at Stanford
University. LOCKSS stands for "Lots Of Copies Keep Stuff Safe", and is a
distributed network of servers that replicate backup copies of
electronic academic journals. It stemmed from a research project that
looked at how to design an attack resistent peer-to-peer digital
archival network.  Each node in the network keeps a copy of the original
journal content, does a cryptographic hash of each resource (HTML page,
image, PDF, etc.), and participates in a steady stream of polls with all
the other nodes where they vote on the hashes. If a minority of nodes
loses a poll, their content is assumed to be damaged, missing, or bad,
and they replicate the content from the winners of the poll.

It's designed as a "Dark" archive, meaning the data is there, but nobody
tries to access it unless the original web content disappears. Then, the
servers act as transparent web proxies, so when you hit the original URL
or URI, they serve up the content that's now missing from the real
public Internet.

It's a neat idea. It's also open source, and unencumbered with
patents. I've always thought a similar model could be used to archive
and replicate just about anything, but it's just one of those things
that nobody's ever gotten around to doing.

>paul

-Seth

--
  Seth Morabito
  Poulsbo, WA, USA
  w...@loomcom.com


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Tomasz Rola via cctalk
On Fri, Aug 16, 2019 at 02:21:36AM -0400, Noel Chiappa via cctalk
wrote:
[...]
> Yeah, I added "CHWiki" to the text on the Main Page to make it a
> little easier

Because of curiosity, I tried.

On gog:
 === chwiki - because gog discovers I type from Poland and "chwiki"
 looks like Polish word "chwili" (a genetivus of "chwila" which means
 "moment", or "a second", like "just a second"), so it gave me page
 full of stuff like "this moment is best" or "no better moment than
 her touch" (which even for native speaker sounds a bit too contorted,
 but gog just indexes whatever garbage local folk produce)

 === computer history wiki - fifth result on first page

 === gunkies - first link on first page

On double duck:
 all the same, like above

Please note, for me gunkies.org and http://gunkies.org/wiki/Main_Page
are equals, so I assume finding gunkies.org counts.


-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did "rm -rif" on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread ben via cctalk

On 8/16/2019 1:50 AM, Christian Corti via cctalk wrote:

On Thu, 15 Aug 2019, Noel Chiappa wrote:
An additional issue, I think, is that Google is deprecating sites that 
use
HTTP, versus HTTPS. I can't comment more, lest I start ranting at the 
utter


Not true, in contrary, Google even crawls through FTP sites :-)

Christian


Well with me I have been finding with many searches, the modern browsers
refuse to display sites for "what they figure is unsafe" yet the porn 
ads still show. I can find it, but not view it.

Ben.



Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Noel Chiappa via cctalk
> From: Steven M Jones

> imagine that a law is passed in a far away land, and the site owner
> decides it's is too risky to bother with, and they then take the entire
> site down - wiki and fora - with no warning and no access to the
> material...
> ..
> I would strongly suggest that if people are going to do something of
> the scale you describe, they might want to consider setting up a
> distribution or replication mechanism 

Past events have made me very concerned about this issue! On a couple of
occasions, Tore (who runs the CHWiki) has forgotten to pay the DNS fee, or
something similar, and it went off-line (the first time for a week, as he
was off camping). Leading to total panic on my part when he wasn't reachable,
about all the content I'd written!

There is an automatic backup system which sends copies to a machine at his
house, so the particular scenario above (hosting sevice goes away with no
warning) is not an issue. (Yes, a Chicxulub event in Scandanavia would defeat
that, but we'd all probably have larger problems to worry about!) After the
first event, I make manual backups here of all the articles I contribute.

The biggest concern is if he has an unfortunate interaction with a truck. I
did raise this issue with him, and he had some initial suggestions, but I
haven't followed through. If people start contributing, it'd probably be time
to formalize something.

Noel


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Seth J. Morabito via cctalk


Paul Koning via cctalk writes:

> Anything worth having around deserves backup.  Which makes me wonder
> -- how is Wikipedia backed up?  I guess it has a fork, which isn't
> quite the same thing.  I know Bitsavers is replicated in a number of
> places.  And one argument in favor of GIT is that every workspace is a
> full backup of the original, history and all.
>
> One should worry for smaller scale efforts, though.

This is a problem I think about a lot.

In the early 2000s I worked on the LOCKSS program at Stanford
University. LOCKSS stands for "Lots Of Copies Keep Stuff Safe", and is a
distributed network of servers that replicate backup copies of
electronic academic journals. It stemmed from a research project that
looked at how to design an attack resistent peer-to-peer digital
archival network.  Each node in the network keeps a copy of the original
journal content, does a cryptographic hash of each resource (HTML page,
image, PDF, etc.), and participates in a steady stream of polls with all
the other nodes where they vote on the hashes. If a minority of nodes
loses a poll, their content is assumed to be damaged, missing, or bad,
and they replicate the content from the winners of the poll.

It's designed as a "Dark" archive, meaning the data is there, but nobody
tries to access it unless the original web content disappears. Then, the
servers act as transparent web proxies, so when you hit the original URL
or URI, they serve up the content that's now missing from the real
public Internet.

It's a neat idea. It's also open source, and unencumbered with
patents. I've always thought a similar model could be used to archive
and replicate just about anything, but it's just one of those things
that nobody's ever gotten around to doing.

>   paul

-Seth

--
  Seth Morabito
  Poulsbo, WA, USA
  w...@loomcom.com


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Jon Elson via cctalk

On 08/16/2019 02:50 AM, Christian Corti via cctalk wrote:

On Thu, 15 Aug 2019, Noel Chiappa wrote:
An additional issue, I think, is that Google is 
deprecating sites that use
HTTP, versus HTTPS. I can't comment more, lest I start 
ranting at the utter


Not true, in contrary, Google even crawls through FTP 
sites :-)


I kind of wonder what this is all about?  I mean, why do you 
have to encrypt today's weather report, a company's public 
web page, and such stuff.  Just to waste CPU time?


Jon


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Paul Koning via cctalk



> On Aug 16, 2019, at 6:14 AM, Steven M Jones via cctalk 
>  wrote:
> 
> On 08/15/2019 23:21, Noel Chiappa via cctalk wrote:
>> I have on several occasions posted appeals to this list for people to
>> contribute content to it, and gotten almost no response (with one notable
>> exception), in terms of added content; that was a large part of why I merely
>> mentioned it in an offhand way.
> 
> I don't want to discourage anybody from contributing to this or any other 
> project. However...
> 
> Imagine if you will that many people, over many years, put a lot of work into 
> pulling information together on a site with forums, and then distilling that 
> information into a lot of wiki pages. Many discussions in the forums, with 
> hard-won facts and interesting projects documented there. Things the 
> manufacturer(s) never admitted you could do! So many wiki pages carefully 
> explaining things, recording specifications, procedures, configurations, part 
> numbers, substitutions. An incredibly useful resource and a very active 
> community.
> 
> Then imagine that a law is passed in a far away land, and the site owner 
> decides it's is too risky to bother with, and they then take the entire site 
> down - wiki and fora - with no warning and no access to the material...

You don't even have to assume government malice.  Lots of providers have gone 
out of business without any warning simply because of not being economically 
viable.  Or even because the operators decided they weren't interested any 
longer.

Anything worth having around deserves backup.  Which makes me wonder -- how is 
Wikipedia backed up?  I guess it has a fork, which isn't quite the same thing.  
I know Bitsavers is replicated in a number of places.  And one argument in 
favor of GIT is that every workspace is a full backup of the original, history 
and all.

One should worry for smaller scale efforts, though.

paul



Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Steven M Jones via cctalk

On 08/15/2019 23:21, Noel Chiappa via cctalk wrote:


I have on several occasions posted appeals to this list for people to
contribute content to it, and gotten almost no response (with one notable
exception), in terms of added content; that was a large part of why I merely
mentioned it in an offhand way.


I don't want to discourage anybody from contributing to this or any 
other project. However...


Imagine if you will that many people, over many years, put a lot of work 
into pulling information together on a site with forums, and then 
distilling that information into a lot of wiki pages. Many discussions 
in the forums, with hard-won facts and interesting projects documented 
there. Things the manufacturer(s) never admitted you could do! So many 
wiki pages carefully explaining things, recording specifications, 
procedures, configurations, part numbers, substitutions. An incredibly 
useful resource and a very active community.


Then imagine that a law is passed in a far away land, and the site owner 
decides it's is too risky to bother with, and they then take the entire 
site down - wiki and fora - with no warning and no access to the material...


I'm not arguing against community collaborations at all - I guess I'm 
mostly just venting my considerable spleen. :(


But I would strongly suggest that if people are going to do something of 
the scale you describe, they might want to consider setting up a 
distribution or replication mechanism at their earliest convenience.


--S.


Re: Archiving information, was Re: ADM-3A question

2019-08-16 Thread Christian Corti via cctalk

On Thu, 15 Aug 2019, Noel Chiappa wrote:

An additional issue, I think, is that Google is deprecating sites that use
HTTP, versus HTTPS. I can't comment more, lest I start ranting at the utter


Not true, in contrary, Google even crawls through FTP sites :-)

Christian


Re: Archiving information, was Re: ADM-3A question

2019-08-15 Thread Noel Chiappa via cctalk
> From: Eric Christopherson

>> Anyway, the whole 'how do we find the info' is a part of why I started
>> working on CHWiki, once I discovered it

> Psst: it would've been a good idea to share the URL to CHWiki.

Well, that passing reference wasn't an attempt to get people to go look at
it, hence no URL! :-) I was focused on the abstract discussion about 'how do
we make information accessible, if relying on search engines to find blog
postings doesn't work'.

I have on several occasions posted appeals to this list for people to
contribute content to it, and gotten almost no response (with one notable
exception), in terms of added content; that was a large part of why I merely
mentioned it in an offhand way.

> a site I was already familiar with, but not under the name you used for
> it.

Ah, formally it's the 'Computer History Wiki', except that's a lot of typing,
so I've been using 'CHWiki' as a short, easy-to-type, name for it for some
time now.

> (It was a bit hard to find with Google, which just goes to show...)

Yeah, I added "CHWiki" to the text on the Main Page to make it a little easier
to find from the short name, after a previous case where I'd used that term
here, to some people's confusion. But I see it still doesn't work well; I
guess I'll have to add 'CHWiki' links from more pages. Using 'Computer History
Wiki' as a search term only works slightly better, though; it's at the bottom
of the first page of results for me, below a bunch of Wikipedia links.

Noel

PS: In response to a point raised in a private reply to me; the site is for
_all_ historical computers: personal computers, mainframes, the lot. I myself
have added a lot of PDP-11 material, but only because I'm very fond of them,
and know them well. The field of historial computers is _way_ too broad for
one person to cover in depth, which is part of why I previously appealed to
people who knew/were familar with other corners of it to add detailed content
in those areas.


Re: Archiving information, was Re: ADM-3A question

2019-08-15 Thread Eric Christopherson via cctalk
On Thu, Aug 15, 2019, 7:38 PM Noel Chiappa via cctalk 
wrote:

> Anyway, the whole 'how do we find the info' is a part of why I started
> working on CHWiki, once I discovered it - in addition to the usual
> advantages
> of wikis (good for collaboration, good for adding stuff incrementally), it
> would put all the info in one place, a 'one stop shopping' for old computer
> info.
>

Psst: it would've been a good idea to share the URL to CHWiki. It's
http://gunkies.org/wiki/Main_Page - the address to a site I was already
familiar with, but not under the name you used for it. (It was a bit hard
to find with Google, which just goes to show...)

-- 
Eric Christopherson


Re: Archiving information, was Re: ADM-3A question

2019-08-15 Thread Noel Chiappa via cctalk
> From: Seth J. Morabito

>> having stuff scattered across a zillion personal pages (be they blogs,
>> or whatever) is going to make it hard to find the useful one when
>> needed

> The sheer vastness of content available, combined with a Google
> monoculture, combined with a concerted attempt to GAME the Google
> monoculture, is making search and discovery hard

An additional issue, I think, is that Google is deprecating sites that use
HTTP, versus HTTPS. I can't comment more, lest I start ranting at the utter
stupidity of forcing everyone to use HTTPS. But if those blogs are using
HTTP, that will push them down the results.

> I honestly don't know what to do about it. I don't have a better idea,
> unless we go back to something like a directory-style curated
> experience, a-la Yahoo! circa 1998-ish. 

I'm not sure that would scale to cover detailed pages on obsolete computers;
why is a manual indexer going to cover them?

Anyway, the whole 'how do we find the info' is a part of why I started
working on CHWiki, once I discovered it - in addition to the usual advantages
of wikis (good for collaboration, good for adding stuff incrementally), it
would put all the info in one place, a 'one stop shopping' for old computer
info.

But when I tried to convince people to post stuff there, instead of on their
blogs, I got at least one person who was pretty vehement that no way in h***
were they going to stop putting their stuff in their own blog.

Noel


RE: Archiving information, was Re: ADM-3A question

2019-08-15 Thread Electronics Plus via cctalk
OTOH, there are vast quantities of old manuals, schematics, text books, etc. 
that get thrown out each year because no one will pay for them. I have had the 
unenjoyable experience of trashing boxes full of stuff because they did not 
sell. $1-5 is pretty cheap, considering the time to check the condition, 
photograph, list on website, pack it properly, and get it to the right place.

If something has sat here for 23 years and not moved, it is soon going to go 
away. I filled up John's car last time he came down. I would much rather they 
went to a good home than the dumpster, but most people do not want the 
"clutter".

-Original Message-
From: cctalk [mailto:cctalk-boun...@classiccmp.org] On Behalf Of ben via cctalk
Sent: Thursday, August 15, 2019 6:57 PM
To: cctalk@classiccmp.org
Subject: Re: Archiving information, was Re: ADM-3A question

On 8/15/2019 4:33 PM, Marvin Johnston via cctalk wrote:

> Instead of the search engines working to improve AI, they should be
> putting more effort into ESP.
>

However with 'FREE' web hosting vanishing faster the Dodo,
you have lost most of the Small sites that may of had the
information. A blog tends lose things after the current
year.

> Marvin

My other gripe, is technical books tend to revise for the latest
trend in marketing. A fictional book like "Software tools for fools",
Version #1 8008, Version #2 Z80 Version #3 386. Version #4 RISC machine
#5 latest machine available only for Beta testing.
* library has removed books that have not been checked out in the
last 3 years. We can borrow the latest copy when comes in print from the
main branch.
Ben.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



Re: Archiving information, was Re: ADM-3A question

2019-08-15 Thread ben via cctalk

On 8/15/2019 4:33 PM, Marvin Johnston via cctalk wrote:

Instead of the search engines working to improve AI, they should be 
putting more effort into ESP.




However with 'FREE' web hosting vanishing faster the Dodo,
you have lost most of the Small sites that may of had the
information. A blog tends lose things after the current
year.


Marvin


My other gripe, is technical books tend to revise for the latest
trend in marketing. A fictional book like "Software tools for fools",
Version #1 8008, Version #2 Z80 Version #3 386. Version #4 RISC machine
#5 latest machine available only for Beta testing.
* library has removed books that have not been checked out in the
last 3 years. We can borrow the latest copy when comes in print from the
main branch.
Ben.


Archiving information, was Re: ADM-3A question

2019-08-15 Thread Marvin Johnston via cctalk




Al Kossow via cctalk writes:

> On 8/14/19 8:53 AM, Anders Nelson via cctalk wrote:
>> I hope this thread will be written to a blog post
>
> Buried in a filing cabinet in the basement with a sign that says
> "Beware of Leopard".
>
> Blogs are a stupid way to archive information, almost as stupid as
> putting it on Facebook.

The problem is not archiving, but rather retrieving the data.

As a current example, I am looking for information on the Jonas Escort 
computers. A slight misspelling (Jonas instead of Jonos) resulted in a 
whole slew of graphic escort services. And spelling it properly has 
resulted in basically zero useful information about the computer itself.


It is hard to believe the almost total lack of information on the Jonos. 
If the scarcity is real, it must be worth at least as much as the Apple 
I :).


And ditto for the Molecular Computer although not as bad as the Jonos.

BTW, these are two computers I'm looking at bringing to VCFMW if there 
is any serious interest.


Instead of the search engines working to improve AI, they should be 
putting more effort into ESP.



Marvin