Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Samuel Merritt

On 9/9/14, 12:03 PM, Monty Taylor wrote:

On 09/04/2014 01:30 AM, Clint Byrum wrote:

Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:

Greetings,

Last Tuesday the TC held the first graduation review for Zaqar. During
the meeting some concerns arose. I've listed those concerns below with
some comments hoping that it will help starting a discussion before the
next meeting. In addition, I've added some comments about the project
stability at the bottom and an etherpad link pointing to a list of use
cases for Zaqar.



Hi Flavio. This was an interesting read. As somebody whose attention has
recently been drawn to Zaqar, I am quite interested in seeing it
graduate.


# Concerns

- Concern on operational burden of requiring NoSQL deploy expertise to
the mix of openstack operational skills

For those of you not familiar with Zaqar, it currently supports 2 nosql
drivers - MongoDB and Redis - and those are the only 2 drivers it
supports for now. This will require operators willing to use Zaqar to
maintain a new (?) NoSQL technology in their system. Before expressing
our thoughts on this matter, let me say that:

 1. By removing the SQLAlchemy driver, we basically removed the
chance
for operators to use an already deployed OpenStack-technology
 2. Zaqar won't be backed by any AMQP based messaging technology for
now. Here's[0] a summary of the research the team (mostly done by
Victoria) did during Juno
 3. We (OpenStack) used to require Redis for the zmq matchmaker
 4. We (OpenStack) also use memcached for caching and as the oslo
caching lib becomes available - or a wrapper on top of dogpile.cache -
Redis may be used in place of memcached in more and more deployments.
 5. Ceilometer's recommended storage driver is still MongoDB,
although
Ceilometer has now support for sqlalchemy. (Please correct me if I'm
wrong).

That being said, it's obvious we already, to some extent, promote some
NoSQL technologies. However, for the sake of the discussion, lets assume
we don't.

I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
keep avoiding these technologies. NoSQL technologies have been around
for years and we should be prepared - including OpenStack operators - to
support these technologies. Not every tool is good for all tasks - one
of the reasons we removed the sqlalchemy driver in the first place -
therefore it's impossible to keep an homogeneous environment for all
services.



I whole heartedly agree that non traditional storage technologies that
are becoming mainstream are good candidates for use cases where SQL
based storage gets in the way. I wish there wasn't so much FUD
(warranted or not) about MongoDB, but that is the reality we live in.


With this, I'm not suggesting to ignore the risks and the extra burden
this adds but, instead of attempting to avoid it completely by not
evolving the stack of services we provide, we should probably work on
defining a reasonable subset of NoSQL services we are OK with
supporting. This will help making the burden smaller and it'll give
operators the option to choose.

[0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/


- Concern on should we really reinvent a queue system rather than
piggyback on one

As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
flavor on top. [0]



I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
trying to connect two processes in real time. You're trying to do fully
asynchronous messaging with fully randomized access to any message.

Perhaps somebody should explore whether the approaches taken by large
scale IMAP providers could be applied to Zaqar.

Anyway, I can't imagine writing a system to intentionally use the
semantics of IMAP and SMTP. I'd be very interested in seeing actual use
cases for it, apologies if those have been posted before.


It seems like you're EITHER describing something called XMPP that has at
least one open source scalable backend called ejabberd. OR, you've
actually hit the nail on the head with bringing up SMTP and IMAP but for
some reason that feels strange.

SMTP and IMAP already implement every feature you've described, as well
as retries/failover/HA and a fully end to end secure transport (if
installed properly) If you don't actually set them up to run as a public
messaging interface but just as a cloud-local exchange, then you could
get by with very low overhead for a massive throughput - it can very
easily be run on a single machine for Sean's simplicity, and could just
as easily be scaled out using well known techniques for public cloud
sized deployments?

So why not use existing daemons that do this? You could still use the
REST API you've got, but instead of writing it to a mongo backend and
trying to implement all of the things that already exist in SMTP/IMAP -
you could just have them front to it. You could even bypass normal

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Samuel Merritt

On 9/9/14, 4:47 PM, Devananda van der Veen wrote:

On Tue, Sep 9, 2014 at 4:12 PM, Samuel Merritt s...@swiftstack.com wrote:

On 9/9/14, 12:03 PM, Monty Taylor wrote:

[snip]

So which is it? Because it sounds like to me it's a thing that actually
does NOT need to diverge in technology in any way, but that I've been
told that it needs to diverge because it's delivering a different set of
features - and I'm pretty sure if it _is_ the thing that needs to
diverge in technology because of its feature set, then it's a thing I
don't think we should be implementing in python in OpenStack because it
already exists and it's called AMQP.



Whether Zaqar is more like AMQP or more like email is a really strange
metric to use for considering its inclusion.



I don't find this strange at all -- I had been judging the technical
merits of Zaqar (ex-Marconi) for the last ~18 months based on the
understanding that it aimed to provide Queueing-as-a-Service, and
found its delivery of that to be lacking on technical grounds. The
implementation did not meet my view of what a queue service should
provide; it is based on some serious antipatterns (storing a queue in
an RDBMS is probably the most obvious); and in fact, it isn't even
queue-like in the access patterns enabled by the REST API (random
access to a set != a queue). That was the basis for a large part of my
objections to the project over time, and a source of frustration for
me as the developers justified many of their positions rather than
accepted feedback and changed course during the incubation period. The
reason for this seems clear now...

As was pointed out in the TC meeting today, Zaqar is (was?) actually
aiming to provide Messaging-as-a-Service -- not queueing as a service!
This is another way of saying it's more like email and less like
AMQP, which means my but-its-not-a-queue objection to the project's
graduation is irrelevant, and I need to rethink about all my previous
assessments of the project.

The questions now before us are:
- should OpenStack include, in the integrated release, a
messaging-as-a-service component?


I certainly think so. I've worked on a few reasonable-scale web 
applications, and they all followed the same pattern: HTTP app servers 
serving requests quickly, background workers for long-running tasks, and 
some sort of durable message-broker/queue-server thing for conveying 
work from the first to the second.


A quick straw poll of my nearby coworkers shows that every non-trivial 
web application that they've worked on in the last decade follows the 
same pattern.


While not *every* application needs such a thing, web apps are quite 
common these days, and Zaqar satisfies one of their big requirements. 
Not only that, it does so in a way that requires much less babysitting 
than run-your-own-broker does.



- is Zaqar a technically sound implementation of such a service?

As an aside, there are still references to Zaqar as a queue in both
the wiki [0], in the governance repo [1], and on launchpad [2].

Regards,
Devananda


[0] Multi-tenant queues based on Keystone project IDs
   https://wiki.openstack.org/wiki/Zaqar#Key_features

[1] Queue service is even the official OpenStack Program name, and
the mission statement starts with To produce an OpenStack message
queueing API and service.
   
http://git.openstack.org/cgit/openstack/governance/tree/reference/programs.yaml#n315

[2] Zaqar is a new OpenStack project to create a multi-tenant cloud
queuing service
   https://launchpad.net/zaqar



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster

2014-01-08 Thread Samuel Merritt

On 1/7/14 2:53 PM, Michael Still wrote:

Hi. Thanks for reaching out about this.

It seems this patch has now passed turbo hipster, so I am going to
treat this as a more theoretical question than perhaps you intended. I
should note though that Joshua Hesketh and I have been trying to read
/ triage every turbo hipster failure, but that has been hard this week
because we're both at a conference.

The problem this patch faced is that we are having trouble defining
what is a reasonable amount of time for a database migration to run
for. Specifically:

2014-01-07 14:59:32,012 [output] 205 - 206...
2014-01-07 14:59:32,848 [heartbeat]
2014-01-07 15:00:02,848 [heartbeat]
2014-01-07 15:00:32,849 [heartbeat]
2014-01-07 15:00:39,197 [output] done

So applying migration 206 took slightly over a minute (67 seconds).
Our historical data (mean + 2 standard deviations) says that this
migration should take no more than 63 seconds. So this only just
failed the test.


It seems to me that requiring a runtime less than (mean + 2 stddev) 
leads to a false-positive rate of 1 in 40, right? If the runtimes have a 
normal(-ish) distribution, then 95% of them will be within 2 standard 
deviations of the mean, so that's 1 in 20 falling outside that range. 
Then discard the ones that are faster than (mean - 2 stddev), and that 
leaves 1 in 40. Please correct me if I'm wrong; I'm no statistician.


Such a high false-positive may make it too easy to ignore turbo hipster 
as the bot that cried wolf. This problem already exists with Jenkins and 
the devstack/tempest tests; when one of those fails, I don't wonder what 
I broke, but rather how many times I'll have to recheck the patch until 
the tests pass.


Unfortunately, I don't have a solution to offer, but perhaps someone 
else will.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Swift] domain-level quotas

2014-01-23 Thread Samuel Merritt

On 1/23/14 1:46 AM, Matthieu Huin wrote:

Hello Christian,

- Original Message -

From: Christian Schwede christian.schw...@enovance.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org, Matthieu
Huin matthieu.h...@enovance.com
Sent: Wednesday, January 22, 2014 10:47:24 PM
Subject: Re: [openstack-dev] [Swift] domain-level quotas

Hi Matthieu,

Am 22.01.14 20:02, schrieb Matthieu Huin:

The idea is to have a middleware checking a domain's current usage
against a limit set in the configuration before allowing an upload.
The domain id can be extracted from the token, then used to query
keystone for a list of projects belonging to the domain. Swift would
then compute the domain usage in a similar fashion as the way it is
currently done for accounts, and proceed from there.


the problem might be to compute the current usage of all accounts within
a domain. It won't be a problem if you have only a few accounts in a
domain, but with tens, hundreds or even thousands accounts in a domain
there will be a performance impact because you need to iterate over all
accounts (doing a HEAD on every account) and sum up the total usage.


One might object that this is already a concern when applying quotas
to potentially huge accounts with lots of containers, although I agree
that domains add an order of magnitude to this problem.


Swift accounts and containers keep track* of the total object count and 
size, so account/container quotas need only perform a single HEAD 
request to get the current usage. The number of containers per account 
or objects per container doesn't affect the speed with which the quota 
check runs.


Since domains don't map directly to a single entity in Swift, getting 
the usage for a domain requires making O(N) requests to fetch the 
individual usage data. Domain quotas wouldn't just make usage checks an 
order of magnitude more costly; they'd take it from roughly constant to 
potentially unbounded.


* subject to eventual consistency, of course

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [swift] what does swift do if the auditor find that all 3 replicas are corrupt?

2013-11-06 Thread Samuel Merritt

On 11/6/13 7:12 AM, Daniel Li wrote:

Hi,
 I have a question about swift:  what does swift do if the auditor
find that all 3 replicas are corrupt?
will it notify the owner of the object(email to the account owner)?
what will happen if the GET request to the corrupted object?
will it return a special error telling that all the replicas are corrupted?
  Or will it just say that the object is not exist?
  Or it just return one of the corrupted replica?
  Or something else?


If all 3 (or N) replicas are corrupt, then the auditors will eventually 
quarantine all of them, and subsequent GET requests will receive 404 
responses.


No notifications are sent, nor is it really feasible to start sending 
them. The auditor is not a single process; there is one Swift auditor 
process running on each node in a cluster. Therefore, when an object is 
quarantined, there's no way for its auditor to know if the other copies 
are okay or not.


Note that this is highly unlikely to ever happen, at least with the 
default of 3 replicas. When an auditor finds a corrupt object, it 
quarantines it (moves it to a quarantines directory). Then, since that 
object is missing, the replication processes will recreate the object by 
copying it from a node with a good copy. You'd need to have all replicas 
become corrupt within a very short timespan so that the replicators 
don't get a chance to replace the damaged ones.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [swift] what does swift do if the auditor find that all 3 replicas are corrupt?

2013-11-07 Thread Samuel Merritt

On 11/7/13 5:59 AM, Daniel Li wrote:


Thanks very much for your help, and please see my inline comments/questions.

On Thu, Nov 7, 2013 at 2:30 AM, Samuel Merritt s...@swiftstack.com
mailto:s...@swiftstack.com wrote:

On 11/6/13 7:12 AM, Daniel Li wrote:

Hi,
  I have a question about swift:  what does swift do if the
auditor
find that all 3 replicas are corrupt?
will it notify the owner of the object(email to the account owner)?
what will happen if the GET request to the corrupted object?
will it return a special error telling that all the replicas are
corrupted?
   Or will it just say that the object is not exist?
   Or it just return one of the corrupted replica?
   Or something else?


If all 3 (or N) replicas are corrupt, then the auditors will
eventually quarantine all of them, and subsequent GET requests will
receive 404 responses.

No notifications are sent, nor is it really feasible to start
sending them. The auditor is not a single process; there is one
Swift auditor process running on each node in a cluster. Therefore,
when an object is quarantined, there's no way for its auditor to
know if the other copies are okay or not.

Note that this is highly unlikely to ever happen, at least with the
default of 3 replicas. When an auditor finds a corrupt object, it
quarantines it (moves it to a quarantines directory).

  Did you mean that when the auditor found the corruption, it did not
copy good replica from other object server to overwrite the corrupted
one, it just moved it to a quarantines directory?


That is correct. The object auditors don't perform any network IO, and 
in fact do not use the ring at all. All they do is scan the filesystems 
and quarantine bad objects in an infinite loop.


(Of course, there are also container and account auditors that do 
similar things, but for container and account databases.)



Then, since that object is missing, the replication processes will
recreate the object by copying it from a node with a good copy.

When did the replication processes recreated the object by copying it
from a node with a good copy? Does the auditor send a message to
replication so the replication will do the copy immediately? And what is
a 'good' copy? Does the good copy's MD5 value is checked before copying?


It'll happen whenever the other replicators, which are running on other 
nodes, get around to it.


Replication in Swift is push-based, not pull-based; there is no receiver 
here to which a message could be sent.


Currently, a good copy is one that hasn't been quarantined. Since 
replication uses rsync to push files around the network, there's no 
checking of MD5 at copy time. However, there is work underway to develop 
a replication protocol that avoids rsync entirely and uses the object 
server throughout the entire replication process, and that would give 
the object server a chance to check MD5 checksums on incoming writes.


Note that this is only important should 2 replicas experience 
near-simultaneous bitrot; in that case, there is a chance that bad-copy 
A will get quarantined and replaced with bad-copy B. Eventually, though, 
a bad copy will get quarantined and replaced with a good copy, and then 
you've got 2 good copies and 1 bad one, which reduces to a 
previously-discussed scenario.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] The gate: a failure analysis

2014-07-21 Thread Samuel Merritt

On 7/21/14, 3:38 AM, Matthew Booth wrote:

[snip]

I would like to make the radical proposal that we stop gating on CI
failures. We will continue to run them on every change, but only after
the change has been successfully merged.

Benefits:
* Without rechecks, the gate will use 8 times fewer resources.
* Log analysis is still available to indicate the emergence of races.
* Fixes can be merged quicker.
* Vastly less developer time spent monitoring gate failures.

Costs:
* A rare class of merge bug will make it into master.

Note that the benefits above will also offset the cost of resolving this
rare class of merge bug.


I think this is definitely a move in the right direction, but I'd like 
to propose a slight modification: let's cease blocking changes on 
*known* CI failures.


More precisely, if Elastic Recheck knows about all the failures that 
happened on a test run, treat that test run as successful.


I think this will gain virtually all the benefits you name while still 
retaining most of the gate's ability to keep breaking changes out.


As a bonus, it'll encourage people to make Elastic Recheck better. 
Currently, the easy path is to just type recheck no bug and click 
submit; it takes a lot less time than scrutinizing log files to guess 
at what went wrong. If failures identified by E-R don't block 
developers' changes, then the easy path is to improve E-R's checks, 
which benefits everyone.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [swift] - question about statsd messages and 404 errors

2014-07-25 Thread Samuel Merritt

On 7/25/14, 4:58 AM, Seger, Mark (Cloud Services) wrote:

I’m trying to track object server GET errors using statsd and I’m not
seeing them.  The test I’m doing is to simply do a GET on an
non-existent object.  As expected, a 404 is returned and the object
server log records it.  However, statsd implies it succeeded because
there were no errors reported.  A read of the admin guide does clearly
say the GET timing includes failed GETs, but my question then becomes
how is one to tell there was a failure?  Should there be another type of
message that DOES report errors?  Or how about including these in the
‘object-server.GET.errors.timing’ message?


What error means with respect to Swift's backend-server timing metrics 
is pretty fuzzy at the moment, and could probably use some work.


The idea is that object-server.GET.timing has timing data for everything 
that Swift handled successfully, and object-server.GET.timing.errors has 
timing data for things where Swift failed.


Some things are pretty easy to divide up. For example, 200-series status 
code always counts as success, and 500-series status code always counts 
as error.


It gets tricky in the 400-series status codes. For example, a 404 means 
that a client asked for an object that doesn't exist. That's not Swift's 
fault, so that goes into the success bucket (object-server.GET.timing). 
Similarly, a 412 means that a client set an unsatisfiable precondition 
in the If-Match, If-None-Match, If-Modified-Since, or 
If-Unmodified-Since headers, and Swift correctly determined that the 
requested object can't fulfill the precondition, so that one goes in the 
success bucket too.


However, there are other status codes that are more ambiguous. Consider 
409; the object server responds with 409 if the request's X-Timestamp is 
less than the object's X-Timestamp (on PUT/POST/DELETE). You can get 
this with two near-simultaneous POSTs:


  1. request A hits proxy; proxy assigns X-Timestamp: 1406316223.851131
  2. request B hits proxy; proxy assigns X-Timestamp: 1406316223.851132
  3. request B hits object server and gets 202
  4. request A hits object server and gets 409

Does that error count as Swift's fault? If the client requests were 
nearly simultaneous, then I think not; there's always going to be *some* 
delay between accept() and gettimeofday(). On the other hand, if one 
proxy server's time is significantly behind another's, then it is 
Swift's fault.


It's even worse with 400; sometimes it's for bad paths (like asking an 
object server for /partition/account/container; this can happen if 
the administrator misconfigures their rings), and sometimes it's for bad 
X-Delete-At / X-Delete-After values (which are set by the client).


I'm not sure what the best way to fix this is, but if you just want to 
see some error metrics, unmount a disk to get some 507s.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [swift] Improving ceilometer.objectstore.swift_middleware

2014-07-30 Thread Samuel Merritt

On 7/30/14, 8:06 AM, Chris Dent wrote:


ceilometer/objectstore/swift_middleware.py[1] counts the size of web
request and reponse bodies through the swift proxy server and publishes
metrics of the size of the request and response and that a request
happened at all.

There are (at least) two bug reports associated with this bit of code:

* avoid requirement on tarball for unit tests
   https://bugs.launchpad.net/ceilometer/+bug/1285388

* significant performance degradation when ceilometer middleware for
   swift proxy uses
   https://bugs.launchpad.net/ceilometer/+bug/1337761

[snip]



Some options appear to be:

* Move the middleware to swift or move the functionality to swift.

   In the process make the functionality drop generic notifications for
   storage.objects.incoming.bytes and storage.objects.outgoing.bytes
   that anyone can consume, including ceilometer.

   This could potentially address both bugs.

* Move or copy swift.common.utils.{InputProxy,split_path} to somewhere
   in oslo, but keep the middleware in ceilometer.

   This would require somebody sharing the info on how to properly
   participate in swift's logging setup without incorporating swift.

   This would fix the first bug without saying anything about the
   second.

* Carry on importing the swift tarball or otherwise depending on
   swift.

   Fixes neither bug, maintains status quo.

What are other options? Of those above which are best or most
realistic?


Swift is already emitting those numbers[1] in statsd format; could 
ceilometer consume those metrics and convert them to whatever 
notification format it uses?


When configured to log to statsd, the Swift proxy will emit metrics of 
the form proxy-server.type.verb.status.xfer; for example, a 
successful object download would have a metric name of 
proxy-server.object.GET.200.xfer and a value of the number of bytes 
downloaded. Similarly, PUTs would look like 
proxy-server.object.PUT.2xx.xfer.


If ceilometer were to consume these metrics in a process outside the 
Swift proxy server, this would solve both problems. The performance fix 
comes by being outside the Swift proxy, and consuming statsd metrics can 
be done without pulling in Swift code[2].


[1] 
http://docs.openstack.org/developer/swift/admin_guide.html#reporting-metrics-to-statsd


[2] e.g. pystatsd

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [swift] Improving ceilometer.objectstore.swift_middleware

2014-07-31 Thread Samuel Merritt

On 7/31/14, 1:06 AM, Eoghan Glynn wrote:




Swift is already emitting those numbers[1] in statsd format; could
ceilometer consume those metrics and convert them to whatever
notification format it uses?


The problem with that approach, IIUC, is that the statsd metrics
provide insufficient context.

Ceilometer wants to meter usage on a per-user  per-tenant basis,
so captures[1] the http-x-user-id and http-x-tenant-id headers from
the incoming request for this purpose.

Similarly, the resource-id is fabricated from the swift account.

I don't think this supporting contextual info would be available
from raw statsd metrics, or?


Good point. Adding per-user and per-tenant fields to the statsd metrics 
is the wrong way to go on a couple of levels. First, it would leak 
Keystone-isms into the core Swift code, which is at odds with Swift 
having pluggable auth systems. Second, it would immediately wreck anyone 
who's got the statds metrics flowing into Graphite, as suddenly there'd 
be lots of new metrics for every single tenant/user pair, which would 
rapidly fill up the Graphite system's disks until it fell over.


I think your suggestion elsewhere in the thread of combining multiple 
API calls into a single notification is a better way to go. That'll 
certainly result in less client-visible slowdown from sending 
notifications, particularly if the notifications are sent in a separate 
greenthread.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Cross distribution talks on Friday

2014-11-10 Thread Samuel Merritt

On 11/1/14, 3:51 PM, Alan Pevec wrote:

%install
export OSLO_PACKAGE_VERSION=%{version}
%{__python} setup.py install -O1 --skip-build --root %{buildroot}

Then everything should be ok and PBR will become your friend.


Still not my friend because I don't want a _build_ tool as runtime dependency :)
e.g. you don't ship make(1) to run C programs, do you?
For runtime, only pbr.version part is required but unfortunately
oslo.version was abandoned.


Swift has an elegant* solution** to this problem that makes PBR into a 
build-time-only dependency.


Take a look at the top-level __init__.py in the Swift source tree: 
https://github.com/openstack/swift/blob/709187b54ff2e9b81ac53977d4283523ce16af38/swift/__init__.py


* kind of ugly
** hack

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] LTFS integration with OpenStack Swift for scenario like - Data Archival as a Service .

2014-11-14 Thread Samuel Merritt

On 11/13/14, 10:19 PM, Sachin Goswami wrote:

In OpenStack Swift - xfs file system is integrated which provides a
maximum file system size of 8 exbibytes minus one byte (263-1 bytes).


Not exactly. The Swift storage nodes keep their data on POSIX 
filesystems with support for extended attributes. While XFS filesystems 
are typically used, XFS is not required.



We are studying use of LTFS integration with OpenStack Swift for
scenario like - *Data Archival as a Service* .

Was integration of LTFS with Swift considered before? If so, can you
 please share your study output? Will integration of LTFS with Swift
fit into existing Swift architecture ?


Assuming it's POSIX enough and supports extended attributes, a tape 
filesystem on a spinning disk might technically work, but I don't see it 
performing well at all.


If you're talking about using actual tapes for data storage, I can't 
imagine that working out for you. Most clients aren't prepared to wait 
multiple minutes for HTTP responses while a tape laboriously spins back 
and forth, so they'll just time out.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Defining API Success in OpenStack APIs (specifically Swift)

2013-06-20 Thread Samuel Merritt

On 6/20/13 4:21 AM, Sean Dague wrote:

The following patch review came into Tempest yesterday to stop checking
for specific 20x codes on a number of Swift API -
https://review.openstack.org/#/c/33689/

The official documentation for these APIs says the following -
http://docs.openstack.org/api/openstack-object-storage/1.0/content/retrieve-account-metadata.html


The HTTP return code will be 2xx (between 200 and 299, inclusive) if
the request succeeds

This seems kind of broken to me that that's the contract provided. I've
got a -1 on the patch right now, but I think this is worth raising for
broader discussion. It seems to go somewhat contrary to
https://wiki.openstack.org/wiki/APIChangeGuidelines and to the spirit of
having stable, well defined interfaces.

So I guess I open up the question of is it ok for OpenStack core
projects to not commit to success codes for API calls? If so, we'll let
the test change into Tempest. If not, we probably need to call that out
on API standards.


I think that's really two separate questions. There's the question of 
what new APIs should be, but there's also the question of what existing 
APIs are. IMHO, it's entirely reasonable to have guidelines or rules for 
new APIs, but to go back and retroactively impose new standards on old 
APIs is too much, especially when it's done without even consulting that 
project's developers.


Remember, Swift predates not only the OpenStack API Change Guidelines 
mentioned above, but it also predates OpenStack, and it's only ever had 
one API version. If an old API isn't up to new standards, that's just 
something to grandfather in.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] SPFE: Authenticated Encryption (AE) Tokens

2015-02-16 Thread Samuel Merritt

On 2/14/15 9:49 PM, Adam Young wrote:

On 02/13/2015 04:19 PM, Morgan Fainberg wrote:

On February 13, 2015 at 11:51:10 AM, Lance Bragstad
(lbrags...@gmail.com mailto:lbrags...@gmail.com) wrote:

Hello all,


I'm proposing the Authenticated Encryption (AE) Token specification
[1] as an SPFE. AE tokens increases scalability of Keystone by
removing token persistence. This provider has been discussed prior
to, and at the Paris summit [2]. There is an implementation that is
currently up for review [3], that was built off a POC. Based on the
POC, there has been some performance analysis done with respect to
the token formats available in Keystone (UUID, PKI, PKIZ, AE) [4].

The Keystone team spent some time discussing limitations of the
current POC implementation at the mid-cycle. One case that still
needs to be addressed (and is currently being worked), is federated
tokens. When requesting unscoped federated tokens, the token contains
unbound groups which would need to be carried in the token. This case
can be handled by AE tokens but it would be possible for an unscoped
federated AE token to exceed an acceptable AE token length (i.e. 
255 characters). Long story short, a federation migration could be
used to ensure federated AE tokens never exceed a certain length.

Feel free to leave your comments on the AE Token spec.

Thanks!

Lance

[1] https://review.openstack.org/#/c/130050/
[2] https://etherpad.openstack.org/p/kilo-keystone-authorization
[3] https://review.openstack.org/#/c/145317/
[4] http://dolphm.com/benchmarking-openstack-keystone-token-formats/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I am for granting this exception as long as it’s clear that the
following is clear/true:

* All current use-cases for tokens (including federation) will be
supported by the new token provider.

* The federation tokens being possibly over 255 characters can be
addressed in the future if they are not addressed here (a “federation
migration” does not clearly state what is meant.


I think the length of the token is not a real issue.  We need to keep
them within header lengths.  That is 8k.  Anything smaller than that
will work.


I'd like to respectfully disagree here. Large tokens can dramatically 
increase the overhead for users of Swift with small objects since the 
token must be passed along with every request.


For example, I have a small static web site: 68 files, mean file size 
5508 bytes, median 636 bytes, total 374517 bytes. (It's an actual site; 
these are genuine data.)


If I upload these things to Swift using a UUID token, then I incur maybe 
400 bytes of overhead per file in the HTTP request, which is a 7.3% 
bloat. On the other hand, if the token + other headers is 8K, then I'm 
looking at 149% bloat, so I've more than doubled my transfer 
requirements just from tokens. :/


I think that, for users of Swift and any other OpenStack data-plane 
APIs, token size is a definite concern. I am very much in favor of 
anything that shrinks token sizes while keeping the scalability benefits 
of PKI tokens.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] SPFE: Authenticated Encryption (AE) Tokens

2015-02-16 Thread Samuel Merritt

On 2/16/15 11:48 AM, Lance Bragstad wrote:



On Mon, Feb 16, 2015 at 1:21 PM, Samuel Merritt s...@swiftstack.com
mailto:s...@swiftstack.com wrote:

On 2/14/15 9:49 PM, Adam Young wrote:

On 02/13/2015 04:19 PM, Morgan Fainberg wrote:

On February 13, 2015 at 11:51:10 AM, Lance Bragstad
(lbrags...@gmail.com mailto:lbrags...@gmail.com
mailto:lbrags...@gmail.com mailto:lbrags...@gmail.com)
wrote:

Hello all,


I'm proposing the Authenticated Encryption (AE) Token
specification
[1] as an SPFE. AE tokens increases scalability of
Keystone by
removing token persistence. This provider has been
discussed prior
to, and at the Paris summit [2]. There is an
implementation that is
currently up for review [3], that was built off a POC.
Based on the
POC, there has been some performance analysis done with
respect to
the token formats available in Keystone (UUID, PKI,
PKIZ, AE) [4].

The Keystone team spent some time discussing limitations
of the
current POC implementation at the mid-cycle. One case
that still
needs to be addressed (and is currently being worked),
is federated
tokens. When requesting unscoped federated tokens, the
token contains
unbound groups which would need to be carried in the
token. This case
can be handled by AE tokens but it would be possible for
an unscoped
federated AE token to exceed an acceptable AE token
length (i.e. 
255 characters). Long story short, a federation
migration could be
used to ensure federated AE tokens never exceed a
certain length.

Feel free to leave your comments on the AE Token spec.

Thanks!

Lance

[1] https://review.openstack.org/#__/c/130050/
https://review.openstack.org/#/c/130050/
[2]
https://etherpad.openstack.__org/p/kilo-keystone-__authorization
https://etherpad.openstack.org/p/kilo-keystone-authorization
[3] https://review.openstack.org/#__/c/145317/
https://review.openstack.org/#/c/145317/
[4]

http://dolphm.com/__benchmarking-openstack-__keystone-token-formats/

http://dolphm.com/benchmarking-openstack-keystone-token-formats/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe

http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I am for granting this exception as long as it’s clear that the
following is clear/true:

* All current use-cases for tokens (including federation)
will be
supported by the new token provider.

* The federation tokens being possibly over 255 characters
can be
addressed in the future if they are not addressed here (a
“federation
migration” does not clearly state what is meant.

I think the length of the token is not a real issue.  We need to
keep
them within header lengths.  That is 8k.  Anything smaller than that
will work.


I'd like to respectfully disagree here. Large tokens can
dramatically increase the overhead for users of Swift with small
objects since the token must be passed along with every request.

For example, I have a small static web site: 68 files, mean file
size 5508 bytes, median 636 bytes, total 374517 bytes. (It's an
actual site; these are genuine data.)

If I upload these things to Swift using a UUID token, then I incur
maybe 400 bytes of overhead per file in the HTTP request, which is a
7.3% bloat. On the other hand, if the token + other headers is 8K,
then I'm looking at 149% bloat, so I've more than doubled my
transfer requirements just from tokens. :/

I think that, for users of Swift and any other OpenStack data-plane
APIs, token size is a definite concern. I am very much in favor of
anything that shrinks token sizes while keeping the scalability
benefits of PKI tokens.


Ideally, what's

Re: [openstack-dev] Reasoning behind my vote on the Go topic

2016-06-07 Thread Samuel Merritt

On 6/7/16 12:00 PM, Monty Taylor wrote:

[snip]

>

I'd rather see us focus energy on Python3, asyncio and its pluggable
event loops. The work in:

http://magic.io/blog/uvloop-blazing-fast-python-networking/

is a great indication in an actual apples-to-apples comparison of what
can be accomplished in python doing IO-bound activities by using modern
Python techniques. I think that comparing python2+eventlet to a fresh
rewrite in Go isn't 100% of the story. A TON of work has gone in to
Python that we're not taking advantage of because we're still supporting
Python2. So what I've love to see in the realm of comparative
experimentation is to see if the existing Python we already have can be
leveraged as we adopt newer and more modern things.


Asyncio, eventlet, and other similar libraries are all very good for 
performing asynchronous IO on sockets and pipes. However, none of them 
help for filesystem IO. That's why Swift needs a golang object server: 
the go runtime will keep some goroutines running even though some other 
goroutines are performing filesystem IO, whereas filesystem IO in Python 
blocks the entire process, asyncio or no asyncio.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [swift] Can swift identify user agent come from chrome browser?

2016-03-19 Thread Samuel Merritt

On 3/17/16 1:53 AM, Linpeimin wrote:

Hello, everyone.

I have config a web server (tengine) as a proxy server for swift, and
sent a GET request via a chrome browser in order to access swift
container. From the log file, it can be seen that web server has pass
the request to swift, but swift returns an unauthorized error. Log file
record like this:

Access logs of *tengine:*

10.74.167.183 - - [17/Mar/2016:16:30:03 +] "GET /auth/v1.0 HTTP/1.1"
401 131 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36" "-"

10.74.167.183 - - [17/Mar/2016:16:30:03 +] "GET /favicon.ico
HTTP/1.1" 401 649 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72
Safari/537.36" "-"

Proxy logs of *swift*:

Mar 17 15:12:27 localhost journal: proxy-logging 10.74.167.183
192.168.1.5 17/Mar/2016/15/12/27 GET /auth/v1.0 HTTP/1.0 401 -
Mozilla/5.0%20%28Windows%20NT%206.1%3B%20WOW64%29%20AppleWebKit/537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome/28.0.1500.72%20Safari/537.36
- - 131 - tx21863381504d47098a73846d621fcbd0 - 0.0003 -

Mar 17 15:12:27 localhost journal: tempauth 10.74.167.183 192.168.1.5
17/Mar/2016/15/12/27 GET /auth/v1.0 HTTP/1.0 401 -
Mozilla/5.0%20%28Windows%20NT%206.1%3B%20WOW64%29%20AppleWebKit/537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome/28.0.1500.72%20Safari/537.36
- - - - tx21863381504d47098a73846d621fcbd0 - 0.0013


It's the same value, just URL-encoded. Swift's access log is formatted 
as one record per line, with fields delimited by spaces. Since the 
user-agent string may contain spaces, it's escaped before logging so 
that the log formatting isn't broken.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] supporting Go

2016-05-10 Thread Samuel Merritt

On 5/9/16 5:21 PM, Robert Collins wrote:

On 10 May 2016 at 10:54, John Dickinson  wrote:

On 9 May 2016, at 13:16, Gregory Haynes wrote:


This is a bit of an aside but I am sure others are wondering the same
thing - Is there some info (specs/etherpad/ML thread/etc) that has more
details on the bottleneck you're running in to? Given that the only
clients of your service are the public facing DNS servers I am now even
more surprised that you're hitting a python-inherent bottleneck.


In Swift's case, the summary is that it's hard[0] to write a network
service in Python that shuffles data between the network and a block
device (hard drive) and effectively utilizes all of the hardware
available. So far, we've done very well by fork()'ing child processes,

...

Initial results from a golang reimplementation of the object server in
Python are very positive[1]. We're not proposing to rewrite Swift
entirely in Golang. Specifically, we're looking at improving object
replication time in Swift. This service must discover what data is on
a drive, talk to other servers in the cluster about what they have,
and coordinate any data sync process that's needed.

[0] Hard, not impossible. Of course, given enough time, we can do
 anything in a Turing-complete language, right? But we're not talking
 about possible, we're talking about efficient tools for the job at
 hand.

...

I'm glad you're finding you can get good results in (presumably)
clean, understandable code.

Given go's historically poor perfornance with multiple cores
(https://golang.org/doc/faq#Why_GOMAXPROCS) I'm going to presume the
major advantage is in the CSP programming model - something that
Twisted does very well: and frustratingly we've had numerous
discussions from folk in the Twisted world who see the pain we have
and want to help, but as a community we've consistently stayed with
eventlet, which has a threaded programming model - and threaded models
are poorly suited for the case here.


At its core, the problem is that filesystem IO can take a surprisingly 
long time, during which the calling thread/process is blocked, and 
there's no good asynchronous alternative.


Some background:

With Eventlet, when your greenthread tries to read from a socket and the 
socket is not readable, then recvfrom() returns -1/EWOULDBLOCK; then, 
the Eventlet hub steps in, unschedules your greenthread, finds an 
unblocked one, and lets it proceed. It's pretty good at servicing a 
bunch of concurrent connections and keeping the CPU busy.


On the other hand, when the socket is readable, then recvfrom() returns 
quickly (a few microseconds). The calling process was technically 
blocked, but the syscall is so fast that it hardly matters.


Now, when your greenthread tries to read from a file, that read() call 
doesn't return until the data is in your process's memory. This can take 
a surprisingly long time. If the data isn't in buffer cache and the 
kernel has to go fetch it from a spinning disk, then you're looking at a 
seek time of ~7 ms, and that's assuming there are no other pending 
requests for the disk.


There's no EWOULDBLOCK when reading from a plain file, either. If the 
file pointer isn't at EOF, then the calling process blocks until the 
kernel fetches data for it.


Back to Swift:

The Swift object server basically does two things: it either reads from 
a disk and writes to a socket or vice versa. There's a little HTTP 
parsing in there, but the vast majority of the work is shuffling bytes 
between network and disk. One Swift object server can service many 
clients simultaneously.


The problem is those pauses due to read(). If your process is servicing 
hundreds of clients reading from and writing to dozens of disks (in, 
say, a 48-disk 4U server), then all those little 7 ms waits are pretty 
bad for throughput. Now, a lot of the time, the kernel does some 
readahead so your read() calls can quickly return data from buffer 
cache, but there are still lots of little hitches.


But wait: it gets worse. Sometimes a disk gets slow. Maybe it's got a 
lot of pending IO requests, maybe its filesystem is getting close to 
full, or maybe the disk hardware is just starting to get flaky. For 
whatever reason, IO to this disk starts taking a lot longer than 7 ms on 
average; think dozens or hundreds of milliseconds. Now, every time your 
process tries to read from this disk, all other work stops for quite a 
long time. The net effect is that the object server's throughput 
plummets while it spends most of its time blocked on IO from that one 
slow disk.


Now, of course there's things we can do. The obvious one is to use a 
couple of IO threads per disk and push the blocking syscalls out 
there... and, in fact, Swift did that. In commit b491549, the object 
server gained a small threadpool for each disk[1] and started doing its 
IO there.


This worked pretty well for avoiding the slow-disk problem. Requests 
that touched the slow disk would back up, 

Re: [openstack-dev] [tc] supporting Go

2016-05-11 Thread Samuel Merritt

On 5/11/16 7:09 AM, Thomas Goirand wrote:

On 05/10/2016 09:56 PM, Samuel Merritt wrote:

On 5/9/16 5:21 PM, Robert Collins wrote:

On 10 May 2016 at 10:54, John Dickinson <m...@not.mn> wrote:

On 9 May 2016, at 13:16, Gregory Haynes wrote:


This is a bit of an aside but I am sure others are wondering the same
thing - Is there some info (specs/etherpad/ML thread/etc) that has more
details on the bottleneck you're running in to? Given that the only
clients of your service are the public facing DNS servers I am now even
more surprised that you're hitting a python-inherent bottleneck.


In Swift's case, the summary is that it's hard[0] to write a network
service in Python that shuffles data between the network and a block
device (hard drive) and effectively utilizes all of the hardware
available. So far, we've done very well by fork()'ing child processes,

...

Initial results from a golang reimplementation of the object server in
Python are very positive[1]. We're not proposing to rewrite Swift
entirely in Golang. Specifically, we're looking at improving object
replication time in Swift. This service must discover what data is on
a drive, talk to other servers in the cluster about what they have,
and coordinate any data sync process that's needed.

[0] Hard, not impossible. Of course, given enough time, we can do
 anything in a Turing-complete language, right? But we're not talking
 about possible, we're talking about efficient tools for the job at
 hand.

...

I'm glad you're finding you can get good results in (presumably)
clean, understandable code.

Given go's historically poor perfornance with multiple cores
(https://golang.org/doc/faq#Why_GOMAXPROCS) I'm going to presume the
major advantage is in the CSP programming model - something that
Twisted does very well: and frustratingly we've had numerous
discussions from folk in the Twisted world who see the pain we have
and want to help, but as a community we've consistently stayed with
eventlet, which has a threaded programming model - and threaded models
are poorly suited for the case here.


At its core, the problem is that filesystem IO can take a surprisingly
long time, during which the calling thread/process is blocked, and
there's no good asynchronous alternative.

Some background:

With Eventlet, when your greenthread tries to read from a socket and the
socket is not readable, then recvfrom() returns -1/EWOULDBLOCK; then,
the Eventlet hub steps in, unschedules your greenthread, finds an
unblocked one, and lets it proceed. It's pretty good at servicing a
bunch of concurrent connections and keeping the CPU busy.

On the other hand, when the socket is readable, then recvfrom() returns
quickly (a few microseconds). The calling process was technically
blocked, but the syscall is so fast that it hardly matters.

Now, when your greenthread tries to read from a file, that read() call
doesn't return until the data is in your process's memory. This can take
a surprisingly long time. If the data isn't in buffer cache and the
kernel has to go fetch it from a spinning disk, then you're looking at a
seek time of ~7 ms, and that's assuming there are no other pending
requests for the disk.

There's no EWOULDBLOCK when reading from a plain file, either. If the
file pointer isn't at EOF, then the calling process blocks until the
kernel fetches data for it.

Back to Swift:

The Swift object server basically does two things: it either reads from
a disk and writes to a socket or vice versa. There's a little HTTP
parsing in there, but the vast majority of the work is shuffling bytes
between network and disk. One Swift object server can service many
clients simultaneously.

The problem is those pauses due to read(). If your process is servicing
hundreds of clients reading from and writing to dozens of disks (in,
say, a 48-disk 4U server), then all those little 7 ms waits are pretty
bad for throughput. Now, a lot of the time, the kernel does some
readahead so your read() calls can quickly return data from buffer
cache, but there are still lots of little hitches.

But wait: it gets worse. Sometimes a disk gets slow. Maybe it's got a
lot of pending IO requests, maybe its filesystem is getting close to
full, or maybe the disk hardware is just starting to get flaky. For
whatever reason, IO to this disk starts taking a lot longer than 7 ms on
average; think dozens or hundreds of milliseconds. Now, every time your
process tries to read from this disk, all other work stops for quite a
long time. The net effect is that the object server's throughput
plummets while it spends most of its time blocked on IO from that one
slow disk.

Now, of course there's things we can do. The obvious one is to use a
couple of IO threads per disk and push the blocking syscalls out
there... and, in fact, Swift did that. In commit b491549, the object
server gained a small threadpool for each disk[1] and started doing its
IO there.

This worked pretty well for avoiding the slow-disk problem. Re