Re: Retrieving an S3 Object using Provider Specific APIs
Hi Jeremy, Apologies for the late reply. I'm not very familiar with the BlobStore APIs, but I'll try to help with how you can instantiate them. I'll also fix the docs in the site as soon as I can. You can use the ContextBuilder to create a view [1] such as the BlobStore one, that provides portable interfaces across multiple cloud providers, or to directly create the provider APIs, by calling, say, "buildApi(AWSS3Api.class)". I'd suggest you create the BlobStore view to keep your code as portable as possible and get the provider specific API just when you need. Views are just... views :) and delegate operations to the underlying provider api, but hiding the complexity to you. When you need to access the api, you can use the "unwrapApi" method from the context like: "context.unwrapApi(AWSS3Api.class)" to get it. HTH! I. [1] https://jclouds.apache.org/start/concepts/ El 24/9/2015 0:28, "Jeremy Glesner"escribió: > Hello, > > We're using jclouds 1.9.1 for working with S3. I am new to jclouds, and > trying to get up to speed on the API. Apologize in advance for the newbie > questions. Based on our use cases, which include assigning ACLs to > objects, it appears that I need to use the Provider specific APIs. However, > I'm struggling to make use of those. The example on the jclouds site ( > https://jclouds.apache.org/guides/aws/) refers to > context.getProviderSpecificContext().getApi(), while other examples on the > web point to RestContext, both of which appear to be deprecated. I've seen > some conversations about unwrap() replacing getProviderSpecificContext(), > but also seen references in the javadocs to using contextBuilder to > assemble APIs. Would greatly appreciate being pointed in the right > direction. > > Could you please provide, or point me to a current example that: > 1. Demonstrates how to set up a S3Client and S3Object that leverages the > AWS S3 provider specific API, > 2. Connects to an object via a "bucket" and "key", > 3. Retrieves that Object ... recognizing that the Object was stored as a > byte[] payload, > 4. Reconstitutes the byte[] from the payload object > > I would *greatly* appreciate any help. > > Thanks much, > Jeremy > > Jeremy M Glesner > Chief Technology Officer > Berico Technologies, LLC. > > 11130 Sunrise Valley Drive, Suite 300 > Reston, VA 20191 > > 703.731.6984 (m) > 703.390.9926 x2014 (o) > www.bericotechnologies.com > > >
Re: aws-s3 etag when using multipart
S3 emits different ETags for single- and multi-part uploads. You can use both types of ETags for future conditional GET and PUT operations but only single-part upload returns an MD5 hash. Multi-part upload returns an opaque token which is likely a hash of hashes combined with number of parts. You can ensure data integrity in-transit via comparing the ETag or via providing a Content-MD5 for single-part uploads. Multi-part is more complicated; each upload part call can have a Content-MD5 and each call returns the MD5 hash. jclouds supplies the per-part ETag hashes to the final complete multi-part upload call but does not provide a way to check the results of per-part calls or a way to supply a Content-MD5 for each. Fixing this requires calculating the MD5 in BaseBlobStore.putMultipartBlob. We could either calculate it beforehand for repeatable Payloads or compare afterwards for InputStream payloads. There is some subtlety to this for providers like Azure which do not return an MD5 ETag. We would likely want to guard this with a property since not every caller wants to pay the CPU overhead. Would you like to take a look at this? If you want a purely application fix, look at calling the BlobStore methods initiateMultipartUpload, uploadMultipartPart, and completeMultipartUpload. jclouds internally uses these to implement putBlob(new PutOptions.multipart()). On Tue, Sep 22, 2015 at 05:10:18PM +0200, Veit Guna wrote: > Hi. > > We're using jclouds 1.9.1 with the aws-s3 provider. Until now, we have used > the returned etag of blobStore.putBlob() to manually verify > against a client provided hash. That worked quite well for us. But since we > are hitting the 5GB limit of S3, we switched to the multipart() upload > that jclouds offers. But now, putBlob() returns someting like > - e.g. 90644a2d0c7b74483f8d2036f3e29fc5-2 that of course > fails with our validation. > > I guess this is due to the fact, that each chunk is hashed separately and > send to S3. So there is no complete hash over the whole payload that could > be returned by putBlob() - is that correct? > > During my research I stumbled across this: > > https://github.com/jclouds/jclouds/commit/f2d897d9774c2c0225c199c7f2f46971637327d6 > > Now I'm wondering, what the contract of putBlob() is. Should it only return > valid etag/hashes otherwise return null? > > I'm asking that, because otherwise, I would have to start parsing and > validating the returned value by myself and skip any > validation when it isn't a normal md5 hash. My guess is, that this is the > hash from the last transferred chunk plus > the chunk number? > > Maybe someone can shed some light on this :). > > Thanks > Veit > -- Andrew Gaul http://gaul.org/