Re: dpaste and the wayback machine

2016-02-09 Thread Andrei Alexandrescu via Digitalmars-d

On 2/8/16 11:44 AM, Wyatt wrote:

On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu
wrote:

Dpaste currently does not expire pastes by default. I was thinking
it would be nice if it saved them in the Wayback Machine such that
they are archived redundantly.

I'm not sure what's the way to do it - probably linking the
newly-generated paste URLs from a page that the Wayback Machine
already knows of.

I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when
 the WM does not see a link that is search for, it offers the
option to archive it) obtaining
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.




Thoughts?


You want it in Wayback?  Sounds like you need some WARC [0]. Since
anyone can upload to IA (using a nice S3-like API, even [1]), this
should be pretty uncomplicated.  If you can get a list of all the
paste URLs, you can use wget [2] to build the WARC fairly trivially.
[3]  Then I'd suggest getting a dlang account and make an item [4]
out of it. Just make sure it's set to mediatype:web and it should get
ingested by Wayback.

After that?  Generate a WARC when a paste is made and use the dlang
S3 keys to add it to the previous item (or maybe just do it daily or
weekly so as to not stress the derive queue too much). I'm pretty
sure that's all that's needed.


That's intense. I think a simple page (or chained linked collection of
pages) containing links to all pastes defined would suffice. For example
consider defining dpaste.dzfl.pl containing a link to
dpaste.dzfl.pl/today.html. That would contain e.g. the links generated
today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html
(which would be yesterday). That in turn would contain links to
yesterday's pastes and a link to the day before etc.

My understanding is this is enough to have wayback archive all pastes.


I'm pretty sure that's Andrei's thought, too. It's a pastebin; people
use it to make web links to pasted things. If it were to disappear, a
lot of links would break very permanently because Heritrix has no way
to index and crawl the site.


Yah.


Andrei




Re: dpaste and the wayback machine

2016-02-08 Thread Wyatt via Digitalmars-d
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu 
wrote:
Dpaste currently does not expire pastes by default. I was 
thinking it would be nice if it saved them in the Wayback 
Machine such that they are archived redundantly.


I'm not sure what's the way to do it - probably linking the 
newly-generated paste URLs from a page that the Wayback Machine 
already knows of.


I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec 
(when the WM does not see a link that is search for, it offers 
the option to archive it) obtaining 
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.



Thoughts?

You want it in Wayback?  Sounds like you need some WARC [0].  
Since anyone can upload to IA (using a nice S3-like API, even 
[1]), this should be pretty uncomplicated.  If you can get a list 
of all the paste URLs, you can use wget [2] to build the WARC 
fairly trivially. [3]  Then I'd suggest getting a dlang account 
and make an item [4] out of it.  Just make sure it's set to 
mediatype:web and it should get ingested by Wayback.


After that?  Generate a WARC when a paste is made and use the 
dlang S3 keys to add it to the previous item (or maybe just do it 
daily or weekly so as to not stress the derive queue too much).  
I'm pretty sure that's all that's needed.


-Wyatt

[0] http://fileformats.archiveteam.org/wiki/WARC
[1] https://archive.org/help/abouts3.txt
[2] -i,  --input-file=FILE   download URLs found in local or 
external FILE.
[3] 
http://www.archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
[4] 
https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/


Re: dpaste and the wayback machine

2016-02-08 Thread Wyatt via Digitalmars-d

On Monday, 8 February 2016 at 20:02:41 UTC, Jesse Phillips wrote:


I'm not sure if the wayback machine should be used for version 
control, if you want to keep a history of your past I suggest 
using a gist.github.com.


I view the wayback machine as a view for what the web used to 
look like not necessarily what information was in it.


I'm pretty sure that's Andrei's thought, too.  It's a pastebin; 
people use it to make web links to pasted things.  If it were to 
disappear, a lot of links would break very permanently because 
Heritrix has no way to index and crawl the site.


-Wyatt


Re: dpaste and the wayback machine

2016-02-08 Thread Jesse Phillips via Digitalmars-d
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu 
wrote:
Dpaste currently does not expire pastes by default. I was 
thinking it would be nice if it saved them in the Wayback 
Machine such that they are archived redundantly.


I'm not sure what's the way to do it - probably linking the 
newly-generated paste URLs from a page that the Wayback Machine 
already knows of.


I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec 
(when the WM does not see a link that is search for, it offers 
the option to archive it) obtaining 
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.



Thoughts?

Andrei


I'm not sure if the wayback machine should be used for version 
control, if you want to keep a history of your past I suggest 
using a gist.github.com.


I view the wayback machine as a view for what the web used to 
look like not necessarily what information was in it.


dpaste and the wayback machine

2016-02-07 Thread Andrei Alexandrescu via Digitalmars-d
Dpaste currently does not expire pastes by default. I was thinking it 
would be nice if it saved them in the Wayback Machine such that they are 
archived redundantly.


I'm not sure what's the way to do it - probably linking the 
newly-generated paste URLs from a page that the Wayback Machine already 
knows of.


I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the 
WM does not see a link that is search for, it offers the option to 
archive it) obtaining 
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.



Thoughts?

Andrei