Re: Reducing the pain of a clojars outage

2016-01-06 Thread Lucas Bradstreet
Ouch. I'm not sure what happened to that email. I blame autocorrect.

There were some scaling problems with npm in the past and they ended
up taking funding. The list of issues you've provided look good.
Perhaps some "newbie" tags in the issues would be good too. I will
join the maintainers list.

Thank you for your effort in providing this essential service.

Lucas

On 5 January 2016 at 11:51, Toby Crawley  wrote:
> On Mon, Jan 4, 2016 at 3:31 PM, Lucas Bradstreet
>  wrote:
>> Good info. Now that we've performed the initial clojars drive, which was
>> performed at a very fortuitous time, do you think that the problem is
>> primarily one of money, man poweror, or both? I realise that there's a lot
>> of kI'm happy to help in I'm one of
>> Ri way, because I think we definitely want to avoid some of the past
>> issues in Node JS - which I think they have mostly solved now
>
> I don't quite follow all of that, but I think I get the gist :)
>
> Seriously though, what issues did the Node JS community have? I
> haven't been involved there at all, so haven't paid attention.
>
> The donations have been great, and I appreciate every bit of it. But
> what we primarily need right now is time from others. For the past
> nine months, I've been the only administrator, but today Daniel
> Compton graciously agreed to help out with that[1], so I think we are
> good there. I also need help with some of the bigger issues (moving
> the repo to block storage[2], possibly behind a CDN[3], and
> implementing atomic deploys[4]), which I plan to post bounties[5] for
> (using some of the donations) in the next few days.
>
> Beyond that, we have quite a few other smaller issues that are ready
> for work (marked with the "ready" tag[6], along with a subjective
> rough estimate of effort involved ("small", "medium", "large")), if
> people are looking for other ways to contribute. And, if you are
> wanting to be more involved in and up to date with what is happening
> with Clojars, I urge you to join the clojars-maintainers list[7].
>
> - Toby
>
> [1]: 
> https://groups.google.com/d/msg/clojars-maintainers/75VmB2F0VX4/hL6dQZAKCQAJ
> [2]: https://github.com/clojars/clojars-web/issues/433
> [3]: https://github.com/clojars/clojars-web/issues/434
> [4]: https://github.com/clojars/clojars-web/issues/226
> [5]: https://www.bountysource.com/teams/clojars
> [6]: https://github.com/clojars/clojars-web/labels/ready
> [7]: https://groups.google.com/forum/#!forum/clojars-maintainers
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-04 Thread Toby Crawley
On Mon, Jan 4, 2016 at 3:31 PM, Lucas Bradstreet
 wrote:
> Good info. Now that we've performed the initial clojars drive, which was
> performed at a very fortuitous time, do you think that the problem is
> primarily one of money, man poweror, or both? I realise that there's a lot
> of kI'm happy to help in I'm one of
> Ri way, because I think we definitely want to avoid some of the past
> issues in Node JS - which I think they have mostly solved now

I don't quite follow all of that, but I think I get the gist :)

Seriously though, what issues did the Node JS community have? I
haven't been involved there at all, so haven't paid attention.

The donations have been great, and I appreciate every bit of it. But
what we primarily need right now is time from others. For the past
nine months, I've been the only administrator, but today Daniel
Compton graciously agreed to help out with that[1], so I think we are
good there. I also need help with some of the bigger issues (moving
the repo to block storage[2], possibly behind a CDN[3], and
implementing atomic deploys[4]), which I plan to post bounties[5] for
(using some of the donations) in the next few days.

Beyond that, we have quite a few other smaller issues that are ready
for work (marked with the "ready" tag[6], along with a subjective
rough estimate of effort involved ("small", "medium", "large")), if
people are looking for other ways to contribute. And, if you are
wanting to be more involved in and up to date with what is happening
with Clojars, I urge you to join the clojars-maintainers list[7].

- Toby

[1]: 
https://groups.google.com/d/msg/clojars-maintainers/75VmB2F0VX4/hL6dQZAKCQAJ
[2]: https://github.com/clojars/clojars-web/issues/433
[3]: https://github.com/clojars/clojars-web/issues/434
[4]: https://github.com/clojars/clojars-web/issues/226
[5]: https://www.bountysource.com/teams/clojars
[6]: https://github.com/clojars/clojars-web/labels/ready
[7]: https://groups.google.com/forum/#!forum/clojars-maintainers

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-04 Thread Lucas Bradstreet
Good info. Now that we've performed the initial clojars drive, which was 
performed at a very fortuitous time, do you think that the problem is primarily 
one of money, man poweror, or both? I realise that there's a lot of kI'm happy 
to help in I'm one of
Ri way, because I think we definitely want to avoid some of the past  
issues in Node JS - which I think they have mostly solved now

Lucas

> On 4 Jan 2016, at 5:14 AM, Nando Breiter  wrote:
> 
> I've spent some time looking into both Cloudflare and Fastly over the 
> weekend. Fastly seems to have a sophisticated purging mechanism which the 
> ticket mentions would be a requirement. See 
> https://docs.fastly.com/guides/purging/
> 
> Initial setup is dead easy (for both), basically requiring a signup and a 
> change to the DNS record, adding a CNAME. Fastly charges for bandwidth and 
> caches everything. Cloudflare charges monthly flat rates but only caches the 
> most popular assets, unless the subscriber pays $200 a month. In a nutshell, 
> you have full control over the content cached in the CDN with Fastly and full 
> control of the price paid, but not the service rendered, with Cloudflare.
> 
> 
> 
> Aria Media Sagl
> Via Rompada 40
> 6987 Caslano
> Switzerland
> 
> +41 (0)91 600 9601
> +41 (0)76 303 4477 cell
> skype: ariamedia
> 
>> On Sun, Jan 3, 2016 at 8:00 PM, Toby Crawley  wrote:
>> Cloudflare (or a similar CDN) would be useful - we have an open issue
>> to implement that, but haven't had a chance to get to it:
>> https://github.com/clojars/clojars-web/issues/434
>> 
>> - Toby
>> 
>> On Sat, Jan 2, 2016 at 4:30 AM, Nando Breiter  wrote:
>> > Would CloudFlare help on the short term? I haven't used the service yet, I
>> > just ran across it researching DDoS solutions, but judging from the 
>> > overview
>> > of how it works, it might be able to cache all clojars.org assets in a
>> > distributed manner and handle the DNS issue as well.
>> > https://www.cloudflare.com/ If it would work, the advantage is a very quick
>> > initial setup. All you need to do is let them handle the DNS.
>> >
>> >
>> >
>> >
>> >
>> > Aria Media Sagl
>> > Via Rompada 40
>> > 6987 Caslano
>> > Switzerland
>> >
>> > +41 (0)91 600 9601
>> > +41 (0)76 303 4477 cell
>> > skype: ariamedia
>> >
>> > On Sat, Jan 2, 2016 at 4:31 AM, Toby Crawley  wrote:
>> >>
>> >> Given the recent DDoS-triggered outages at linode (including the one
>> >> today that has been the worst yet, currently 10 hours at the time I'm
>> >> writing this), I've been giving some more thought to how we can make
>> >> future outages less painful for the community.
>> >>
>> >> I have an open issue[1] (but no code yet) to move the repository off
>> >> of the server and on to a block store (s3, etc), with the goal there
>> >> to make repo reads (which is what we use clojars for 99.9% of the
>> >> time) independent of the status of the server. But I'm not sure that
>> >> really solves the problem we are seeing today. Currently, we have two
>> >> points of failure for repo reads:
>> >>
>> >> (1) the server itself (hosted on linode)
>> >> (2) DNS for the clojars.org domain (also hosted on linode)
>> >>
>> >> moving the repo off of the server to a block store still has two
>> >> points of failure:
>> >>
>> >> (1) the block store (aws, rackspace, etc)
>> >> (2) DNS for the clojars.org domain, since we would CNAME the block
>> >>  store (hosted on linode)
>> >>
>> >> Though the block store provider would probably be better distributed,
>> >> and have more resources to withstand a DDoS (but do any block store
>> >> providers have 100% uptime?).
>> >>
>> >> The block store solution is complex - it introduces more moving parts
>> >> into clojars, and requires reworking the way we generate usage stats,
>> >> and how the api gets its data. It also requires reworking the way we
>> >> administer the repo (deletion requests, cleaning up failed/partial
>> >> deploys). And it may not solve the availability problem at all, since
>> >> we still have two points of failure.
>> >>
>> >> I think a better solution may be to have multiple mirrors of the repo,
>> >> either run by concerned citizens or maintained by the clojars staff. I
>> >> know some folks in the community already run internal caching proxies
>> >> or rsynced mirrors (and are probably chuckling knowingly at those of
>> >> us affected by the outage), but those proxies don't really help those
>> >> in the community that don't have that internal infrastructure. And I
>> >> don't want to recommend that everyone set up a private mirror - that
>> >> seems like a lot of wasted effort.
>> >>
>> >> Ideally, it would be nice if we had a turn-key tool for creating a
>> >> mirror of clojars. We currently provide a way to rsync the repo[2], so
>> >> the seed for a mirror could be small, and could then slurp down the
>> >> full repo (and could continue to do so on a schedule to remain up to
>> >> date). We could then publish a list of mirrors that the community
>> >> cou

Re: Reducing the pain of a clojars outage

2016-01-04 Thread Toby Crawley
Nando:

Thanks for looking in to this. I've added your comments to the issue.

- Toby

On Sun, Jan 3, 2016 at 4:14 PM, Nando Breiter  wrote:
> I've spent some time looking into both Cloudflare and Fastly over the
> weekend. Fastly seems to have a sophisticated purging mechanism which the
> ticket mentions would be a requirement. See
> https://docs.fastly.com/guides/purging/
>
> Initial setup is dead easy (for both), basically requiring a signup and a
> change to the DNS record, adding a CNAME. Fastly charges for bandwidth and
> caches everything. Cloudflare charges monthly flat rates but only caches the
> most popular assets, unless the subscriber pays $200 a month. In a nutshell,
> you have full control over the content cached in the CDN with Fastly and
> full control of the price paid, but not the service rendered, with
> Cloudflare.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-04 Thread Toby Crawley
On Sat, Jan 2, 2016 at 8:47 PM, Mikhail Kryshen  wrote:
> I would suggest also considering decentralized technologies.
> IPFS (https://ipfs.io/) looks like a good fit for the task.

IPFS looks interesting, but I'm not sure it's worth moving to an
experimental solution, especially when there are simpler,
battle-tested solutions (block stores, CDNs) we're not yet taking
advantage of.

- Toby

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-03 Thread Nando Breiter
I've spent some time looking into both Cloudflare and Fastly over the
weekend. Fastly seems to have a sophisticated purging mechanism which the
ticket mentions would be a requirement. See
https://docs.fastly.com/guides/purging/

Initial setup is dead easy (for both), basically requiring a signup and a
change to the DNS record, adding a CNAME. Fastly charges for bandwidth and
caches everything. Cloudflare charges monthly flat rates but only caches
the most popular assets, unless the subscriber pays $200 a month. In a
nutshell, you have full control over the content cached in the CDN with
Fastly and full control of the price paid, but not the service rendered,
with Cloudflare.



Aria Media Sagl
Via Rompada 40
6987 Caslano
Switzerland

+41 (0)91 600 9601
+41 (0)76 303 4477 cell
skype: ariamedia

On Sun, Jan 3, 2016 at 8:00 PM, Toby Crawley  wrote:

> Cloudflare (or a similar CDN) would be useful - we have an open issue
> to implement that, but haven't had a chance to get to it:
> https://github.com/clojars/clojars-web/issues/434
>
> - Toby
>
> On Sat, Jan 2, 2016 at 4:30 AM, Nando Breiter 
> wrote:
> > Would CloudFlare help on the short term? I haven't used the service yet,
> I
> > just ran across it researching DDoS solutions, but judging from the
> overview
> > of how it works, it might be able to cache all clojars.org assets in a
> > distributed manner and handle the DNS issue as well.
> > https://www.cloudflare.com/ If it would work, the advantage is a very
> quick
> > initial setup. All you need to do is let them handle the DNS.
> >
> >
> >
> >
> >
> > Aria Media Sagl
> > Via Rompada 40
> > 6987 Caslano
> > Switzerland
> >
> > +41 (0)91 600 9601
> > +41 (0)76 303 4477 cell
> > skype: ariamedia
> >
> > On Sat, Jan 2, 2016 at 4:31 AM, Toby Crawley  wrote:
> >>
> >> Given the recent DDoS-triggered outages at linode (including the one
> >> today that has been the worst yet, currently 10 hours at the time I'm
> >> writing this), I've been giving some more thought to how we can make
> >> future outages less painful for the community.
> >>
> >> I have an open issue[1] (but no code yet) to move the repository off
> >> of the server and on to a block store (s3, etc), with the goal there
> >> to make repo reads (which is what we use clojars for 99.9% of the
> >> time) independent of the status of the server. But I'm not sure that
> >> really solves the problem we are seeing today. Currently, we have two
> >> points of failure for repo reads:
> >>
> >> (1) the server itself (hosted on linode)
> >> (2) DNS for the clojars.org domain (also hosted on linode)
> >>
> >> moving the repo off of the server to a block store still has two
> >> points of failure:
> >>
> >> (1) the block store (aws, rackspace, etc)
> >> (2) DNS for the clojars.org domain, since we would CNAME the block
> >>  store (hosted on linode)
> >>
> >> Though the block store provider would probably be better distributed,
> >> and have more resources to withstand a DDoS (but do any block store
> >> providers have 100% uptime?).
> >>
> >> The block store solution is complex - it introduces more moving parts
> >> into clojars, and requires reworking the way we generate usage stats,
> >> and how the api gets its data. It also requires reworking the way we
> >> administer the repo (deletion requests, cleaning up failed/partial
> >> deploys). And it may not solve the availability problem at all, since
> >> we still have two points of failure.
> >>
> >> I think a better solution may be to have multiple mirrors of the repo,
> >> either run by concerned citizens or maintained by the clojars staff. I
> >> know some folks in the community already run internal caching proxies
> >> or rsynced mirrors (and are probably chuckling knowingly at those of
> >> us affected by the outage), but those proxies don't really help those
> >> in the community that don't have that internal infrastructure. And I
> >> don't want to recommend that everyone set up a private mirror - that
> >> seems like a lot of wasted effort.
> >>
> >> Ideally, it would be nice if we had a turn-key tool for creating a
> >> mirror of clojars. We currently provide a way to rsync the repo[2], so
> >> the seed for a mirror could be small, and could then slurp down the
> >> full repo (and could continue to do so on a schedule to remain up to
> >> date). We could then publish a list of mirrors that the community
> >> could turn to in times of need (or use all the time, if they are
> >> closer geographically or just generally more responsive). Any deploys
> >> would still need to hit the primary server, but deploys are are
> >> dwarfed by reads.
> >>
> >> There are a few issues with using mirrors:
> >>
> >> (1) security - with artifacts in more places, there are more
> >> opportunities to to introduce malicious versions. This could be
> >> prevented if we had better tools for verifying that the artifacts
> >> are signed by trusted keys, and we required that all artifacts be
> >> signed, 

Re: Reducing the pain of a clojars outage

2016-01-03 Thread Toby Crawley
Cloudflare (or a similar CDN) would be useful - we have an open issue
to implement that, but haven't had a chance to get to it:
https://github.com/clojars/clojars-web/issues/434

- Toby

On Sat, Jan 2, 2016 at 4:30 AM, Nando Breiter  wrote:
> Would CloudFlare help on the short term? I haven't used the service yet, I
> just ran across it researching DDoS solutions, but judging from the overview
> of how it works, it might be able to cache all clojars.org assets in a
> distributed manner and handle the DNS issue as well.
> https://www.cloudflare.com/ If it would work, the advantage is a very quick
> initial setup. All you need to do is let them handle the DNS.
>
>
>
>
>
> Aria Media Sagl
> Via Rompada 40
> 6987 Caslano
> Switzerland
>
> +41 (0)91 600 9601
> +41 (0)76 303 4477 cell
> skype: ariamedia
>
> On Sat, Jan 2, 2016 at 4:31 AM, Toby Crawley  wrote:
>>
>> Given the recent DDoS-triggered outages at linode (including the one
>> today that has been the worst yet, currently 10 hours at the time I'm
>> writing this), I've been giving some more thought to how we can make
>> future outages less painful for the community.
>>
>> I have an open issue[1] (but no code yet) to move the repository off
>> of the server and on to a block store (s3, etc), with the goal there
>> to make repo reads (which is what we use clojars for 99.9% of the
>> time) independent of the status of the server. But I'm not sure that
>> really solves the problem we are seeing today. Currently, we have two
>> points of failure for repo reads:
>>
>> (1) the server itself (hosted on linode)
>> (2) DNS for the clojars.org domain (also hosted on linode)
>>
>> moving the repo off of the server to a block store still has two
>> points of failure:
>>
>> (1) the block store (aws, rackspace, etc)
>> (2) DNS for the clojars.org domain, since we would CNAME the block
>>  store (hosted on linode)
>>
>> Though the block store provider would probably be better distributed,
>> and have more resources to withstand a DDoS (but do any block store
>> providers have 100% uptime?).
>>
>> The block store solution is complex - it introduces more moving parts
>> into clojars, and requires reworking the way we generate usage stats,
>> and how the api gets its data. It also requires reworking the way we
>> administer the repo (deletion requests, cleaning up failed/partial
>> deploys). And it may not solve the availability problem at all, since
>> we still have two points of failure.
>>
>> I think a better solution may be to have multiple mirrors of the repo,
>> either run by concerned citizens or maintained by the clojars staff. I
>> know some folks in the community already run internal caching proxies
>> or rsynced mirrors (and are probably chuckling knowingly at those of
>> us affected by the outage), but those proxies don't really help those
>> in the community that don't have that internal infrastructure. And I
>> don't want to recommend that everyone set up a private mirror - that
>> seems like a lot of wasted effort.
>>
>> Ideally, it would be nice if we had a turn-key tool for creating a
>> mirror of clojars. We currently provide a way to rsync the repo[2], so
>> the seed for a mirror could be small, and could then slurp down the
>> full repo (and could continue to do so on a schedule to remain up to
>> date). We could then publish a list of mirrors that the community
>> could turn to in times of need (or use all the time, if they are
>> closer geographically or just generally more responsive). Any deploys
>> would still need to hit the primary server, but deploys are are
>> dwarfed by reads.
>>
>> There are a few issues with using mirrors:
>>
>> (1) security - with artifacts in more places, there are more
>> opportunities to to introduce malicious versions. This could be
>> prevented if we had better tools for verifying that the artifacts
>> are signed by trusted keys, and we required that all artifacts be
>> signed, but that's not the case currently. But if we had a regular
>> process that crawled all of the mirrors and the canonical repo to
>> verify that the checksums every artifact are identical, this could
>> actually improve security, since we could detect if any checksum
>> had been changed (a malicious party would have to change the
>> checksum of a modified artifact, since maven/lein/boot all confirm
>> checksums by default).
>>
>> (2) download stats - any downloads from a mirror wouldn't get
>> reflected in the stats for the artifact unless we had some way to
>> report those stats back to clojars.org. We currently generate the
>> stats by parsing the nginx access logs, mirrors could do the same
>> and report stats back to clojars.org if we care enough about
>> this. We don't get stats from the existing private mirrors, and
>> the stats aren't critical, so this may be a non-issue, and
>> definitely isn't something that has to be solved right away, if
>> ever.
>>
>> The repo 

Re: Reducing the pain of a clojars outage

2016-01-02 Thread Mikhail Kryshen
I would suggest also considering decentralized technologies.
IPFS (https://ipfs.io/) looks like a good fit for the task.

- It is distributed, every node used to access the repository will
  contribute to it's availability.

- Directory trees in IPFS work like Clojure's persistent data
  structures: they are immutable and share identical substructures.

- IPFS is content-addressable: to access the repository one will only
  need to know the hash of the current version of the root directory.
  The content can not be changed without also changing the hash.

- The current hash can be published using IPNS
  (https://github.com/ipfs/examples/tree/master/examples/ipns) or using
  a special DNS TXT record ("dnslink=/ipfs/") on clojars.org
  domain.  Then the current version of the repository (at least the
  files other nodes have copies of) will be accessible regardless of the
  availability of the main server  
  - via public IPFS gateway: https://ipfs.io/ipns/clojars.org/
  - or local gateway: http://localhost:8080/ipns/clojars.org/
  - or local fuse mount at /ipns/clojars.org/

Toby Crawley  writes:

> Given the recent DDoS-triggered outages at linode (including the one
> today that has been the worst yet, currently 10 hours at the time I'm
> writing this), I've been giving some more thought to how we can make
> future outages less painful for the community.
>
> I have an open issue[1] (but no code yet) to move the repository off
> of the server and on to a block store (s3, etc), with the goal there
> to make repo reads (which is what we use clojars for 99.9% of the
> time) independent of the status of the server. But I'm not sure that
> really solves the problem we are seeing today. Currently, we have two
> points of failure for repo reads:
>
> (1) the server itself (hosted on linode)
> (2) DNS for the clojars.org domain (also hosted on linode)
>
> moving the repo off of the server to a block store still has two
> points of failure:
>
> (1) the block store (aws, rackspace, etc)
> (2) DNS for the clojars.org domain, since we would CNAME the block
>  store (hosted on linode)
>
> Though the block store provider would probably be better distributed,
> and have more resources to withstand a DDoS (but do any block store
> providers have 100% uptime?).
>
> The block store solution is complex - it introduces more moving parts
> into clojars, and requires reworking the way we generate usage stats,
> and how the api gets its data. It also requires reworking the way we
> administer the repo (deletion requests, cleaning up failed/partial
> deploys). And it may not solve the availability problem at all, since
> we still have two points of failure.
>
> I think a better solution may be to have multiple mirrors of the repo,
> either run by concerned citizens or maintained by the clojars staff. I
> know some folks in the community already run internal caching proxies
> or rsynced mirrors (and are probably chuckling knowingly at those of
> us affected by the outage), but those proxies don't really help those
> in the community that don't have that internal infrastructure. And I
> don't want to recommend that everyone set up a private mirror - that
> seems like a lot of wasted effort.
>
> Ideally, it would be nice if we had a turn-key tool for creating a
> mirror of clojars. We currently provide a way to rsync the repo[2], so
> the seed for a mirror could be small, and could then slurp down the
> full repo (and could continue to do so on a schedule to remain up to
> date). We could then publish a list of mirrors that the community
> could turn to in times of need (or use all the time, if they are
> closer geographically or just generally more responsive). Any deploys
> would still need to hit the primary server, but deploys are are
> dwarfed by reads.
>
> There are a few issues with using mirrors:
>
> (1) security - with artifacts in more places, there are more
> opportunities to to introduce malicious versions. This could be
> prevented if we had better tools for verifying that the artifacts
> are signed by trusted keys, and we required that all artifacts be
> signed, but that's not the case currently. But if we had a regular
> process that crawled all of the mirrors and the canonical repo to
> verify that the checksums every artifact are identical, this could
> actually improve security, since we could detect if any checksum
> had been changed (a malicious party would have to change the
> checksum of a modified artifact, since maven/lein/boot all confirm
> checksums by default).
>
> (2) download stats - any downloads from a mirror wouldn't get
> reflected in the stats for the artifact unless we had some way to
> report those stats back to clojars.org. We currently generate the
> stats by parsing the nginx access logs, mirrors could do the same
> and report stats back to clojars.org if we care enough about
> this. We don't get stats from the existing private mirro

Re: Reducing the pain of a clojars outage

2016-01-02 Thread Colin Fleming
I'm travelling at the moment so I don't have time to respond to everything
right now, but one thing about the Java 6 issue - IntelliJ won't be fully
on Java 8 until IntelliJ 16. This means that Java 6 will be around until a)
everyone is on whatever comes after El Capitan (the last OSX to support
Apple's Java 6, which came out not long ago), or b) everyone is on IntelliJ
16, which has only just gone into beta. I support the last two major
IntelliJ versions, so that'll be another two years or so. Of course, there
may be a vanishingly small number of users still on Java 6 at that point
but that's the timeline. It's anyone's guess when a majority of OSX users
will be on JDK 8 - at some point I'll just have to say that you need to
upgrade IntelliJ if you want to use Leiningen on OSX, but that won't be for
a while yet - at least a year I guess.

On 3 January 2016 at 09:33, Toby Crawley  wrote:

> On Sat, Jan 2, 2016 at 1:59 PM, Michael Gardner 
> wrote:
> > Still, my personal opinion (for whatever it's worth) is that ensuring
> the entire process is always cryptographically secure end-to-end should be
> a higher priority than establishing mirrors.
>
> I agree, ensuring the process is cryptographically secure end-to-end
> should be a priority, but it is also a Sisyphean task, since it would
> at least require:
>
> * getting everyone to sign releases: not difficult - we just require
>   signatures at deploy time on clojars.org and deal with the pain of
>   bringing everyone up to speed
> * dealing with existing unsigned releases: deprecate them? give the
>   authors a way to sign them after the fact?
> * changing tooling to confirm that the artifacts are signed with keys
>   that are in your web of trust: lein and boot can already tell you
>   what in the dep graph is signed, and verify that the signatures are
>   valid, but don't yet confirm against the caller's web of
>   trust. Without that, how would you know that the artifact isn't
>   signed with a random, throwaway key?
> * organizing key-signing parties around the world to build the web of
>   trust for the clojure community: Phil Hagelberg started that process
>   with key-signing meetings at clojure conferences, but it didn't
>   spread very far. Initiatives like https://keybase.io/ may help with
>   this.
>
> And this assumes that everyone in your web of trust that publishes
> artifacts is who you think they are, keeps their keys 100% secure,
> and aren't coerceable.
>
> Even after all that, we still won't be able to pull jars when
> clojars.org is down unless we have some alternate source.
>
> - Toby
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-02 Thread Glen Mailer
This seems like it could be a fruitful avenue to me (cloudflare or another CDN)

I know the folks at npm use fastly in a similar fashion - gaining both 
geographical distribution and improved resiliency.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-02 Thread Toby Crawley
On Sat, Jan 2, 2016 at 1:59 PM, Michael Gardner  wrote:
> Still, my personal opinion (for whatever it's worth) is that ensuring the 
> entire process is always cryptographically secure end-to-end should be a 
> higher priority than establishing mirrors.

I agree, ensuring the process is cryptographically secure end-to-end
should be a priority, but it is also a Sisyphean task, since it would
at least require:

* getting everyone to sign releases: not difficult - we just require
  signatures at deploy time on clojars.org and deal with the pain of
  bringing everyone up to speed
* dealing with existing unsigned releases: deprecate them? give the
  authors a way to sign them after the fact?
* changing tooling to confirm that the artifacts are signed with keys
  that are in your web of trust: lein and boot can already tell you
  what in the dep graph is signed, and verify that the signatures are
  valid, but don't yet confirm against the caller's web of
  trust. Without that, how would you know that the artifact isn't
  signed with a random, throwaway key?
* organizing key-signing parties around the world to build the web of
  trust for the clojure community: Phil Hagelberg started that process
  with key-signing meetings at clojure conferences, but it didn't
  spread very far. Initiatives like https://keybase.io/ may help with
  this.

And this assumes that everyone in your web of trust that publishes
artifacts is who you think they are, keeps their keys 100% secure,
and aren't coerceable.

Even after all that, we still won't be able to pull jars when
clojars.org is down unless we have some alternate source.

- Toby

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-02 Thread Michael Gardner

> On Jan 2, 2016, at 10:27, Toby Crawley  wrote:
> 
> On Sat, Jan 2, 2016 at 12:47 AM, Michael Gardner  wrote:
>> 
>> I would caution against this approach. An attacker could easily target 
>> specific organizations, serving compromised artifacts only to particular IP 
>> ranges. A periodic verification process wouldn't detect this[1], and might 
>> lend a false sense of security that lulls people into putting off real 
>> security measures.
>> 
>> [1] Unless run by every organization that uses lein, and even then it still 
>> might not catch anything if the attackers are clever.
>> 
> 
> That's a good point. Would you trust this approach more if the mirrors
> were all managed by the clojars staff instead of by community members?
> You currently trust the clojars staff to not act maliciously, and to
> detect an intrusion by a third party against clojars.org.

I would trust it somewhat more. An increase in the number of servers still 
means an increase in the system's attack surface, but at least there shouldn't 
be any additional risk from those running the mirrors.

Still, my personal opinion (for whatever it's worth) is that ensuring the 
entire process is always cryptographically secure end-to-end should be a higher 
priority than establishing mirrors.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-02 Thread Toby Crawley
On Sat, Jan 2, 2016 at 12:47 AM, Michael Gardner  wrote:
>
>> On Jan 1, 2016, at 21:31, Toby Crawley  wrote:
>>
>> But if we had a regular
>>process that crawled all of the mirrors and the canonical repo to
>>verify that the checksums every artifact are identical, this could
>>actually improve security, since we could detect if any checksum
>>had been changed
>
> I would caution against this approach. An attacker could easily target 
> specific organizations, serving compromised artifacts only to particular IP 
> ranges. A periodic verification process wouldn't detect this[1], and might 
> lend a false sense of security that lulls people into putting off real 
> security measures.
>
> [1] Unless run by every organization that uses lein, and even then it still 
> might not catch anything if the attackers are clever.
>

That's a good point. Would you trust this approach more if the mirrors
were all managed by the clojars staff instead of by community members?
You currently trust the clojars staff to not act maliciously, and to
detect an intrusion by a third party against clojars.org.

- Toby

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-02 Thread Nando Breiter
Would CloudFlare help on the short term? I haven't used the service yet, I
just ran across it researching DDoS solutions, but judging from
the overview of how it works, it *might* be able to cache all clojars.org
assets in a distributed manner and handle the DNS issue as well.
https://www.cloudflare.com/ If it would work, the advantage is a very quick
initial setup. All you need to do is let them handle the DNS.





Aria Media Sagl
Via Rompada 40
6987 Caslano
Switzerland

+41 (0)91 600 9601
+41 (0)76 303 4477 cell
skype: ariamedia

On Sat, Jan 2, 2016 at 4:31 AM, Toby Crawley  wrote:

> Given the recent DDoS-triggered outages at linode (including the one
> today that has been the worst yet, currently 10 hours at the time I'm
> writing this), I've been giving some more thought to how we can make
> future outages less painful for the community.
>
> I have an open issue[1] (but no code yet) to move the repository off
> of the server and on to a block store (s3, etc), with the goal there
> to make repo reads (which is what we use clojars for 99.9% of the
> time) independent of the status of the server. But I'm not sure that
> really solves the problem we are seeing today. Currently, we have two
> points of failure for repo reads:
>
> (1) the server itself (hosted on linode)
> (2) DNS for the clojars.org domain (also hosted on linode)
>
> moving the repo off of the server to a block store still has two
> points of failure:
>
> (1) the block store (aws, rackspace, etc)
> (2) DNS for the clojars.org domain, since we would CNAME the block
>  store (hosted on linode)
>
> Though the block store provider would probably be better distributed,
> and have more resources to withstand a DDoS (but do any block store
> providers have 100% uptime?).
>
> The block store solution is complex - it introduces more moving parts
> into clojars, and requires reworking the way we generate usage stats,
> and how the api gets its data. It also requires reworking the way we
> administer the repo (deletion requests, cleaning up failed/partial
> deploys). And it may not solve the availability problem at all, since
> we still have two points of failure.
>
> I think a better solution may be to have multiple mirrors of the repo,
> either run by concerned citizens or maintained by the clojars staff. I
> know some folks in the community already run internal caching proxies
> or rsynced mirrors (and are probably chuckling knowingly at those of
> us affected by the outage), but those proxies don't really help those
> in the community that don't have that internal infrastructure. And I
> don't want to recommend that everyone set up a private mirror - that
> seems like a lot of wasted effort.
>
> Ideally, it would be nice if we had a turn-key tool for creating a
> mirror of clojars. We currently provide a way to rsync the repo[2], so
> the seed for a mirror could be small, and could then slurp down the
> full repo (and could continue to do so on a schedule to remain up to
> date). We could then publish a list of mirrors that the community
> could turn to in times of need (or use all the time, if they are
> closer geographically or just generally more responsive). Any deploys
> would still need to hit the primary server, but deploys are are
> dwarfed by reads.
>
> There are a few issues with using mirrors:
>
> (1) security - with artifacts in more places, there are more
> opportunities to to introduce malicious versions. This could be
> prevented if we had better tools for verifying that the artifacts
> are signed by trusted keys, and we required that all artifacts be
> signed, but that's not the case currently. But if we had a regular
> process that crawled all of the mirrors and the canonical repo to
> verify that the checksums every artifact are identical, this could
> actually improve security, since we could detect if any checksum
> had been changed (a malicious party would have to change the
> checksum of a modified artifact, since maven/lein/boot all confirm
> checksums by default).
>
> (2) download stats - any downloads from a mirror wouldn't get
> reflected in the stats for the artifact unless we had some way to
> report those stats back to clojars.org. We currently generate the
> stats by parsing the nginx access logs, mirrors could do the same
> and report stats back to clojars.org if we care enough about
> this. We don't get stats from the existing private mirrors, and
> the stats aren't critical, so this may be a non-issue, and
> definitely isn't something that has to be solved right away, if
> ever.
>
> The repo is just served as static files, so I think a mirror could
> simply be:
>
> (1) a webserver (preferably (required to be?) HTTPS)
> (2) a cronjob that rsyncs every N minutes
>
> And the cronjob would just need the rsync command in [2], so, to get
> this started, we just need:
>
> (1) linode to be up
> (2) people willing to run mirrors
>
> (I would say "

Re: Reducing the pain of a clojars outage

2016-01-01 Thread Michael Gardner

> On Jan 1, 2016, at 21:31, Toby Crawley  wrote:
> 
> But if we had a regular
>process that crawled all of the mirrors and the canonical repo to
>verify that the checksums every artifact are identical, this could
>actually improve security, since we could detect if any checksum
>had been changed

I would caution against this approach. An attacker could easily target specific 
organizations, serving compromised artifacts only to particular IP ranges. A 
periodic verification process wouldn't detect this[1], and might lend a false 
sense of security that lulls people into putting off real security measures.

[1] Unless run by every organization that uses lein, and even then it still 
might not catch anything if the attackers are clever.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-01 Thread Toby Crawley
On Fri, Jan 1, 2016 at 11:50 PM, Daniel Compton
 wrote:
> IntelliJ 15 (the new version), bundles JDK8 for Mac OS X so the concern about 
> Java 6 will get less over time.

Ah, good to know.

>
> It could be helpful to extend 
> https://github.com/clojars/clojars-web/issues/432 to support these third 
> party mirrors so people just need to point an Ansible script at a server and 
> it will be set up for them.

Yes, definitely. I was thinking of the bare minimum to get a few
mirrors started.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-01 Thread Ken Restivo
Any tooling would also have to upgrade to clj-http 2.0.0 and/or HttpClient 4.5, 
because before that SNI was broken even on Java 8:

https://issues.apache.org/jira/browse/HTTPCLIENT-1613?devStatusDetailDialog=repository

Supposedly fixed in 4.5 of HttpClient, which 2.0.0 of clj-http pulls in, but I 
haven't tested to confirm.

-ken
--
-
On Fri, Jan 01, 2016 at 10:49:13PM -0500, Toby Crawley wrote:
> One potential issue with the mirrors is java 6 and HTTPS - the mirrors
> couldn't use 2048-bit dhparams[1] or SNI[2], since neither are
> supported in java 6. Yes, we all should be on java 7 or 8 at this
> point, but I believe Intellij still uses java 6 on MacOS, which would
> mean Cursive couldn't download from the mirrors.
> 
> [1]: https://weakdh.org/sysadmin.html
> [2]: https://en.wikipedia.org/wiki/Server_Name_Indication
> 
> On Fri, Jan 1, 2016 at 10:31 PM, Toby Crawley  wrote:
> > Given the recent DDoS-triggered outages at linode (including the one
> > today that has been the worst yet, currently 10 hours at the time I'm
> > writing this), I've been giving some more thought to how we can make
> > future outages less painful for the community.
> >
> > I have an open issue[1] (but no code yet) to move the repository off
> > of the server and on to a block store (s3, etc), with the goal there
> > to make repo reads (which is what we use clojars for 99.9% of the
> > time) independent of the status of the server. But I'm not sure that
> > really solves the problem we are seeing today. Currently, we have two
> > points of failure for repo reads:
> >
> > (1) the server itself (hosted on linode)
> > (2) DNS for the clojars.org domain (also hosted on linode)
> >
> > moving the repo off of the server to a block store still has two
> > points of failure:
> >
> > (1) the block store (aws, rackspace, etc)
> > (2) DNS for the clojars.org domain, since we would CNAME the block
> >  store (hosted on linode)
> >
> > Though the block store provider would probably be better distributed,
> > and have more resources to withstand a DDoS (but do any block store
> > providers have 100% uptime?).
> >
> > The block store solution is complex - it introduces more moving parts
> > into clojars, and requires reworking the way we generate usage stats,
> > and how the api gets its data. It also requires reworking the way we
> > administer the repo (deletion requests, cleaning up failed/partial
> > deploys). And it may not solve the availability problem at all, since
> > we still have two points of failure.
> >
> > I think a better solution may be to have multiple mirrors of the repo,
> > either run by concerned citizens or maintained by the clojars staff. I
> > know some folks in the community already run internal caching proxies
> > or rsynced mirrors (and are probably chuckling knowingly at those of
> > us affected by the outage), but those proxies don't really help those
> > in the community that don't have that internal infrastructure. And I
> > don't want to recommend that everyone set up a private mirror - that
> > seems like a lot of wasted effort.
> >
> > Ideally, it would be nice if we had a turn-key tool for creating a
> > mirror of clojars. We currently provide a way to rsync the repo[2], so
> > the seed for a mirror could be small, and could then slurp down the
> > full repo (and could continue to do so on a schedule to remain up to
> > date). We could then publish a list of mirrors that the community
> > could turn to in times of need (or use all the time, if they are
> > closer geographically or just generally more responsive). Any deploys
> > would still need to hit the primary server, but deploys are are
> > dwarfed by reads.
> >
> > There are a few issues with using mirrors:
> >
> > (1) security - with artifacts in more places, there are more
> > opportunities to to introduce malicious versions. This could be
> > prevented if we had better tools for verifying that the artifacts
> > are signed by trusted keys, and we required that all artifacts be
> > signed, but that's not the case currently. But if we had a regular
> > process that crawled all of the mirrors and the canonical repo to
> > verify that the checksums every artifact are identical, this could
> > actually improve security, since we could detect if any checksum
> > had been changed (a malicious party would have to change the
> > checksum of a modified artifact, since maven/lein/boot all confirm
> > checksums by default).
> >
> > (2) download stats - any downloads from a mirror wouldn't get
> > reflected in the stats for the artifact unless we had some way to
> > report those stats back to clojars.org. We currently generate the
> > stats by parsing the nginx access logs, mirrors could do the same
> > and report stats back to clojars.org if we care enough about
> > this. We don't get stats from the existing private mirrors, and
> > the stats aren't critical, so this may be a non-issue, and

Re: Reducing the pain of a clojars outage

2016-01-01 Thread Daniel Compton
IntelliJ 15 (the new version), bundles JDK8 for Mac OS X so the concern about 
Java 6 will get less over time. 

It could be helpful to extend https://github.com/clojars/clojars-web/issues/432 
to support these third party mirrors so people just need to point an Ansible 
script at a server and it will be set up for them. 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reducing the pain of a clojars outage

2016-01-01 Thread Toby Crawley
One potential issue with the mirrors is java 6 and HTTPS - the mirrors
couldn't use 2048-bit dhparams[1] or SNI[2], since neither are
supported in java 6. Yes, we all should be on java 7 or 8 at this
point, but I believe Intellij still uses java 6 on MacOS, which would
mean Cursive couldn't download from the mirrors.

[1]: https://weakdh.org/sysadmin.html
[2]: https://en.wikipedia.org/wiki/Server_Name_Indication

On Fri, Jan 1, 2016 at 10:31 PM, Toby Crawley  wrote:
> Given the recent DDoS-triggered outages at linode (including the one
> today that has been the worst yet, currently 10 hours at the time I'm
> writing this), I've been giving some more thought to how we can make
> future outages less painful for the community.
>
> I have an open issue[1] (but no code yet) to move the repository off
> of the server and on to a block store (s3, etc), with the goal there
> to make repo reads (which is what we use clojars for 99.9% of the
> time) independent of the status of the server. But I'm not sure that
> really solves the problem we are seeing today. Currently, we have two
> points of failure for repo reads:
>
> (1) the server itself (hosted on linode)
> (2) DNS for the clojars.org domain (also hosted on linode)
>
> moving the repo off of the server to a block store still has two
> points of failure:
>
> (1) the block store (aws, rackspace, etc)
> (2) DNS for the clojars.org domain, since we would CNAME the block
>  store (hosted on linode)
>
> Though the block store provider would probably be better distributed,
> and have more resources to withstand a DDoS (but do any block store
> providers have 100% uptime?).
>
> The block store solution is complex - it introduces more moving parts
> into clojars, and requires reworking the way we generate usage stats,
> and how the api gets its data. It also requires reworking the way we
> administer the repo (deletion requests, cleaning up failed/partial
> deploys). And it may not solve the availability problem at all, since
> we still have two points of failure.
>
> I think a better solution may be to have multiple mirrors of the repo,
> either run by concerned citizens or maintained by the clojars staff. I
> know some folks in the community already run internal caching proxies
> or rsynced mirrors (and are probably chuckling knowingly at those of
> us affected by the outage), but those proxies don't really help those
> in the community that don't have that internal infrastructure. And I
> don't want to recommend that everyone set up a private mirror - that
> seems like a lot of wasted effort.
>
> Ideally, it would be nice if we had a turn-key tool for creating a
> mirror of clojars. We currently provide a way to rsync the repo[2], so
> the seed for a mirror could be small, and could then slurp down the
> full repo (and could continue to do so on a schedule to remain up to
> date). We could then publish a list of mirrors that the community
> could turn to in times of need (or use all the time, if they are
> closer geographically or just generally more responsive). Any deploys
> would still need to hit the primary server, but deploys are are
> dwarfed by reads.
>
> There are a few issues with using mirrors:
>
> (1) security - with artifacts in more places, there are more
> opportunities to to introduce malicious versions. This could be
> prevented if we had better tools for verifying that the artifacts
> are signed by trusted keys, and we required that all artifacts be
> signed, but that's not the case currently. But if we had a regular
> process that crawled all of the mirrors and the canonical repo to
> verify that the checksums every artifact are identical, this could
> actually improve security, since we could detect if any checksum
> had been changed (a malicious party would have to change the
> checksum of a modified artifact, since maven/lein/boot all confirm
> checksums by default).
>
> (2) download stats - any downloads from a mirror wouldn't get
> reflected in the stats for the artifact unless we had some way to
> report those stats back to clojars.org. We currently generate the
> stats by parsing the nginx access logs, mirrors could do the same
> and report stats back to clojars.org if we care enough about
> this. We don't get stats from the existing private mirrors, and
> the stats aren't critical, so this may be a non-issue, and
> definitely isn't something that has to be solved right away, if
> ever.
>
> The repo is just served as static files, so I think a mirror could
> simply be:
>
> (1) a webserver (preferably (required to be?) HTTPS)
> (2) a cronjob that rsyncs every N minutes
>
> And the cronjob would just need the rsync command in [2], so, to get
> this started, we just need:
>
> (1) linode to be up
> (2) people willing to run mirrors
>
> (I would say "(3) add a page to the wiki on how to use a mirror", but
> that would destroy the symmetry of all the other 2-item lists in

Reducing the pain of a clojars outage

2016-01-01 Thread Toby Crawley
Given the recent DDoS-triggered outages at linode (including the one
today that has been the worst yet, currently 10 hours at the time I'm
writing this), I've been giving some more thought to how we can make
future outages less painful for the community.

I have an open issue[1] (but no code yet) to move the repository off
of the server and on to a block store (s3, etc), with the goal there
to make repo reads (which is what we use clojars for 99.9% of the
time) independent of the status of the server. But I'm not sure that
really solves the problem we are seeing today. Currently, we have two
points of failure for repo reads:

(1) the server itself (hosted on linode)
(2) DNS for the clojars.org domain (also hosted on linode)

moving the repo off of the server to a block store still has two
points of failure:

(1) the block store (aws, rackspace, etc)
(2) DNS for the clojars.org domain, since we would CNAME the block
 store (hosted on linode)

Though the block store provider would probably be better distributed,
and have more resources to withstand a DDoS (but do any block store
providers have 100% uptime?).

The block store solution is complex - it introduces more moving parts
into clojars, and requires reworking the way we generate usage stats,
and how the api gets its data. It also requires reworking the way we
administer the repo (deletion requests, cleaning up failed/partial
deploys). And it may not solve the availability problem at all, since
we still have two points of failure.

I think a better solution may be to have multiple mirrors of the repo,
either run by concerned citizens or maintained by the clojars staff. I
know some folks in the community already run internal caching proxies
or rsynced mirrors (and are probably chuckling knowingly at those of
us affected by the outage), but those proxies don't really help those
in the community that don't have that internal infrastructure. And I
don't want to recommend that everyone set up a private mirror - that
seems like a lot of wasted effort.

Ideally, it would be nice if we had a turn-key tool for creating a
mirror of clojars. We currently provide a way to rsync the repo[2], so
the seed for a mirror could be small, and could then slurp down the
full repo (and could continue to do so on a schedule to remain up to
date). We could then publish a list of mirrors that the community
could turn to in times of need (or use all the time, if they are
closer geographically or just generally more responsive). Any deploys
would still need to hit the primary server, but deploys are are
dwarfed by reads.

There are a few issues with using mirrors:

(1) security - with artifacts in more places, there are more
opportunities to to introduce malicious versions. This could be
prevented if we had better tools for verifying that the artifacts
are signed by trusted keys, and we required that all artifacts be
signed, but that's not the case currently. But if we had a regular
process that crawled all of the mirrors and the canonical repo to
verify that the checksums every artifact are identical, this could
actually improve security, since we could detect if any checksum
had been changed (a malicious party would have to change the
checksum of a modified artifact, since maven/lein/boot all confirm
checksums by default).

(2) download stats - any downloads from a mirror wouldn't get
reflected in the stats for the artifact unless we had some way to
report those stats back to clojars.org. We currently generate the
stats by parsing the nginx access logs, mirrors could do the same
and report stats back to clojars.org if we care enough about
this. We don't get stats from the existing private mirrors, and
the stats aren't critical, so this may be a non-issue, and
definitely isn't something that has to be solved right away, if
ever.

The repo is just served as static files, so I think a mirror could
simply be:

(1) a webserver (preferably (required to be?) HTTPS)
(2) a cronjob that rsyncs every N minutes

And the cronjob would just need the rsync command in [2], so, to get
this started, we just need:

(1) linode to be up
(2) people willing to run mirrors

(I would say "(3) add a page to the wiki on how to use a mirror", but
that would destroy the symmetry of all the other 2-item lists in this
message)

And it would be nice to have the process in place to verify checksums
soon - that would actually be a boon if we had another linode
compromise[3].

Does anyone see any issues with this plan - I'm curious if there are
security implications (or anything else) that I haven't thought of?

Are you willing to run a mirror?

One issue that comes to mind is if we do decide to move the repo to a
block store, it actually makes mirroring more difficult unless we keep
a copy of the repo on disk on clojars.org as well. But I would like to
have mirrors in place as soon as possible, and worry about that later.

- Toby

[1]: ht