Re: Retrieving an S3 Object using Provider Specific APIs

2015-09-29 Thread Ignasi Barrera
Hi Jeremy,

Apologies for the late reply. I'm not very familiar with the BlobStore
APIs, but I'll try to help with how you can instantiate them. I'll also fix
the docs in the site as soon as I can.

You can use the ContextBuilder to create a view [1] such as the BlobStore
one, that provides portable interfaces across multiple cloud providers, or
to directly create the provider APIs, by calling, say,
"buildApi(AWSS3Api.class)".

I'd suggest you create the BlobStore view to keep your code as portable as
possible and get the provider specific API just when you need. Views are
just... views :) and delegate operations to the underlying provider api,
but hiding the complexity to you. When you need to access the api, you can
use the "unwrapApi" method from the context like:
"context.unwrapApi(AWSS3Api.class)" to get it.

HTH!

I.

[1] https://jclouds.apache.org/start/concepts/
El 24/9/2015 0:28, "Jeremy Glesner" 
escribió:

> Hello,
>
> We're using jclouds 1.9.1 for working with S3.  I am new to jclouds, and
> trying to get up to speed on the API. Apologize in advance for the newbie
> questions.  Based on our use cases, which include assigning ACLs to
> objects, it appears that I need to use the Provider specific APIs. However,
> I'm struggling to make use of those.  The example on the jclouds site (
> https://jclouds.apache.org/guides/aws/) refers to
> context.getProviderSpecificContext().getApi(), while other examples on the
> web point to RestContext, both of which appear to be deprecated. I've seen
> some conversations about unwrap() replacing getProviderSpecificContext(),
> but also seen references in the javadocs to using contextBuilder to
> assemble APIs.  Would greatly appreciate being pointed in the right
> direction.
>
> Could you please provide, or point me to a current example that:
> 1. Demonstrates how to set up a S3Client and S3Object that leverages the
> AWS S3 provider specific API,
> 2. Connects to an object via a "bucket" and "key",
> 3. Retrieves that Object ... recognizing that the Object was stored as a
> byte[] payload,
> 4. Reconstitutes the byte[] from the payload object
>
> I would *greatly* appreciate any help.
>
> Thanks much,
> Jeremy
>
> Jeremy M Glesner
> Chief Technology Officer
> Berico Technologies, LLC.
>
> 11130 Sunrise Valley Drive, Suite 300
> Reston, VA 20191
>
> 703.731.6984 (m)
> 703.390.9926 x2014 (o)
> www.bericotechnologies.com
>
>
>


Re: aws-s3 etag when using multipart

2015-09-29 Thread Andrew Gaul
S3 emits different ETags for single- and multi-part uploads.  You can
use both types of ETags for future conditional GET and PUT operations
but only single-part upload returns an MD5 hash.  Multi-part upload
returns an opaque token which is likely a hash of hashes combined with
number of parts.

You can ensure data integrity in-transit via comparing the ETag or via
providing a Content-MD5 for single-part uploads.  Multi-part is more
complicated; each upload part call can have a Content-MD5 and each call
returns the MD5 hash.  jclouds supplies the per-part ETag hashes to the
final complete multi-part upload call but does not provide a way to
check the results of per-part calls or a way to supply a Content-MD5 for
each.

Fixing this requires calculating the MD5 in
BaseBlobStore.putMultipartBlob.  We could either calculate it beforehand
for repeatable Payloads or compare afterwards for InputStream payloads.
There is some subtlety to this for providers like Azure which do not
return an MD5 ETag.  We would likely want to guard this with a property
since not every caller wants to pay the CPU overhead.  Would you like to
take a look at this?

If you want a purely application fix, look at calling the BlobStore
methods initiateMultipartUpload, uploadMultipartPart, and
completeMultipartUpload.  jclouds internally uses these to implement
putBlob(new PutOptions.multipart()).

On Tue, Sep 22, 2015 at 05:10:18PM +0200, Veit Guna wrote:
> Hi.
>  
> We're using jclouds 1.9.1 with the aws-s3 provider. Until now, we have used 
> the returned etag of blobStore.putBlob() to manually verify
> against a client provided hash. That worked quite well for us. But since we 
> are hitting the 5GB limit of S3, we switched to the multipart() upload
> that jclouds offers. But now, putBlob() returns someting like 
> - e.g. 90644a2d0c7b74483f8d2036f3e29fc5-2 that of course
> fails with our validation.
>  
> I guess this is due to the fact, that each chunk is hashed separately and 
> send to S3. So there is no complete hash over the whole payload that could
> be returned by putBlob() - is that correct?
>  
> During my research I stumbled across this:
>  
> https://github.com/jclouds/jclouds/commit/f2d897d9774c2c0225c199c7f2f46971637327d6
>  
> Now I'm wondering, what the contract of putBlob() is. Should it only return 
> valid etag/hashes otherwise return null?
>  
> I'm asking that, because otherwise, I would have to start parsing and 
> validating the returned value by myself and skip any
> validation when it isn't a normal md5 hash. My guess is, that this is the 
> hash from the last transferred chunk plus
> the chunk number?
>  
> Maybe someone can shed some light on this :).
>  
> Thanks
> Veit
>  

-- 
Andrew Gaul
http://gaul.org/