Hey Kota,

We’re currently using the following versions;

# Download RiakCS 
# Version: 1.4.5
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb

# Download Riak
# Version: 1.4.8
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb

I checked our RiakCS app.config and fold_objects_for_list_keys is set to false. 
What impact would it have my cluster if I flip that to true? Would I simply 
update the app.config and restart RiakCS?

As for the consideration on garbage collection, the slow performance is 
happening consistently over the span of a week (since we noticed it as we don’t 
often list buckets). I suspect its not the case regarding large amounts of 
objects being deleted as generally all data going into that bucket is 
write-once (we process PDFs pages to .JPG and PUT them in that bucket, the only 
time overwrites occur is if we manually re-trigger the processing script to run 
on a specific document) 

Adding [email protected] as we have another thread going on about this same 
topic, I figured we could merge the discussion to reduce duplicate effort here. 

                 Alex Millar, CTO  
Office: 1-800-354-8010 ext. 704  
Mobile: 519-729-2539  
GoBonfire.com

From: Kota Uenishi <[email protected]>
Reply: Kota Uenishi <[email protected]>>
Date: August 18, 2014 at 10:03:40 PM
To: Alex Millar <[email protected]>>
Cc: Charlie Voiselle <[email protected]>>, Tad Bickford 
<[email protected]>>, Riak-Users <[email protected]>>, Brandon Noad 
<[email protected]>>
Subject:  Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys  

Alex,

Riak CS 1.4.5 and 1.5.0 had a lot of improvement after those articles you put 
the URL, not it is not using Riak's bucket listing but using Riak's internal 
API for more efficient listing. What version of Riak CS are you using? I want 
you to make sure you're using those versions and a line 
`{fold_objects_for_list_keys, true},` at riak_cs section of app.config 
(assuming all other Riak part correctly configured). 

> Based on this I’m thinking that cost of this type of query is only going to 
>get worse over time as we add more keys to this bucket (unless secondary 
>indexes can be added). Or am I totally out to lunch here and there’s some 
>other underlying problem?

The strange part is s3cmd. Riak CS has incremental bucket listing API that 
requires clients to iterate on every 1000 objects (common prefixes), but s3cmd 
iterates all the specified bucket before printing them all. You can observe how 
s3cmd and Riak CS interacts if you specify '-d' option like this:

```
s3cmd -d -c yours.s3cfg ls -r s3://yourbucket/yourdir/
```

I would expect Riak CS's listing API is not much slow  as to need 5 seconds 
(or, say, >10 seconds) because, on each request it just returns 1000 objects. 

There might be another possibility on slow query - if you had many (say, more 
than 10 thousands) deleted objects on the same bucket it might affect each 1000 
listing. This will eventually be solved as Riak CS's garbage collection removes 
deleted manifests, which is just marked as deleted (and to be ignored 
correctly).

[1] 
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why

On Thu, Aug 14, 2014 at 6:05 AM, Alex Millar <[email protected]> wrote:
Good afternoon Charlie,

So the issue we’re having is only with bucket listing.

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/

real 2m0.747s
user 0m0.076s
sys 0m0.030s

where as…

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/

real 0m10.262s
user 0m0.075s
sys 0m0.028s

The contents of this bucket contains a lot of very small files (basically for 
each PDF we receive I split it to .JPG foreach page and store them here. Based 
on the my latest counts it looks like we have around 170,000 .JPG files in that 
bucket.

Here’s a snippet from the HAProxy log for the 504 timeouts…

Aug 12 16:01:34 localhost.localdomain haproxy[4718]: 192.0.223.236:48457 
[12/Aug/2014:16:01:24.454] riak_cs~ riak_cs_backend/riak3 161/0/0/-1/10162 504 
194 - - sH-- 0/0/0/0/0 0/0 
{bonfirehub-resources-can-east-doc-conversion.bf-riakcs.com} "GET /?delimiter=/ 
HTTP/1.1"

I’ve put together a video showing off the top results of each of the 5 riak 
nodes while performing $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion

https://dl.dropboxusercontent.com/u/5723659/RiakCS%20ls%20monitoring%20results.mov

Now I’ve had a hunch this is just a fundamentally expensive operation which 
exceeds the 5000ms response time threshold set in our HAProxy config (which I 
raised during the video to illustrate what’s going on). After reading 
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why
 and http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m 
feeling like this is just a fundamental issue with the data structure in Riak. 

Based on this I’m thinking that cost of this type of query is only going to get 
worse over time as we add more keys to this bucket (unless secondary indexes 
can be added). Or am I totally out to lunch here and there’s some other 
underlying problem?

I’ve cc’d the mailing list on this as suggested.

        Alex Millar, CTO
Office: 1-800-354-8010 ext. 704
Mobile: 519-729-2539  
GoBonfire.com

From: Charlie Voiselle <[email protected]>
Reply: Charlie Voiselle <[email protected]>>
Date: August 13, 2014 at 10:36:51 AM
To: Alex Millar <[email protected]>>
Cc: Tad Bickford <[email protected]>>
Subject:  Fwd: RiakCS 504 Timeout on s3cmd for certain keys



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Kota UENISHI / @kuenishi
Basho Japan KK
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to