Re: putBlob with an already existing object

2018-04-05 Thread john . calcote
> multi-part uploads.  Unfortunately Atmos is odd for a number of reasons
> and delete and retry was the best workaround at the time, especially for
> a low-popularity provider.  Some blobstores like Ceph can address this
> issue with conditional PUT but this is not supported elsewhere.

In the long run, the best solution might be to throw a documented exception and 
allow the user to handle it. Sadly, this would mean that everyone would have to 
watch for the exception in all cases now because they don't know what the 
underlying provider semantics are.


Re: putBlob with an already existing object

2018-04-05 Thread Andrew Gaul
On Thu, Apr 05, 2018 at 04:03:04PM -, john.calc...@gmail.com wrote:
> Thanks for the quick response Andrew - 
> 
> > The closet analog is AtmosUtils.putBlob which retries on
> > KeyAlreadyExistsException after removing.  Generally the jclouds
> > portable abstraction tries to make all blobstores act the same and uses
> > the native behavior for the providers.  Which blobstore has similar
> > behavior?  I am not sure how we should handle this for almost
> > S3-compatible implementations like Hitachi.
> 
> So, clarifying - you're saying if it gets a KeyAlreadyExistsException, it 
> then deletes the key and retries the put? That seems a bit harsh - what if 
> you're building a distributed system on top of jclouds and you have two 
> cluster nodes racing to put the same key? Would it not be better to at least 
> test the metadata to see if you're trying to overwrite the same data and just 
> silently return ok?

Agreed that this is racy and something
https://issues.apache.org/jira/browse/JCLOUDS- unsuccessfully tried
to address through a newer header that not all implementations support.
Atmos does not return an ETag so we cannot check the same content,
although ETag checking does not always work on S3, for example with
multi-part uploads.  Unfortunately Atmos is odd for a number of reasons
and delete and retry was the best workaround at the time, especially for
a low-popularity provider.  Some blobstores like Ceph can address this
issue with conditional PUT but this is not supported elsewhere.

-- 
Andrew Gaul
http://gaul.org/


Re: putBlob with an already existing object

2018-04-05 Thread john . calcote
Thanks for the quick response Andrew - 

> The closet analog is AtmosUtils.putBlob which retries on
> KeyAlreadyExistsException after removing.  Generally the jclouds
> portable abstraction tries to make all blobstores act the same and uses
> the native behavior for the providers.  Which blobstore has similar
> behavior?  I am not sure how we should handle this for almost
> S3-compatible implementations like Hitachi.

So, clarifying - you're saying if it gets a KeyAlreadyExistsException, it then 
deletes the key and retries the put? That seems a bit harsh - what if you're 
building a distributed system on top of jclouds and you have two cluster nodes 
racing to put the same key? Would it not be better to at least test the 
metadata to see if you're trying to overwrite the same data and just silently 
return ok?

John


Re: putBlob with an already existing object

2018-04-04 Thread Andrew Gaul
On Thu, Apr 05, 2018 at 02:12:28AM -, john.calc...@gmail.com wrote:
> What is jclouds's general policy with regard to putting a blob to a cloud 
> service where the blob already exists and the cloud provider doesn't allow 
> overwrites?
> 
> Seems like it would be nice to be able to treat the operation like it's an 
> idempotent http PUT, but if the service disallows overwrites, jclouds would 
> receive an exception in this case. Jclouds could then verify that the 
> existing object has the same content and silently return "ok" as if the put 
> worked. 
> 
> However, what happens if the cloud service has an object with the same name 
> and different content? The only way to maintain the idempotent quality would 
> be to silently delete the existing object and try the put again under the 
> covers - this seems imprudent to me and unlikely to be the current 
> functionality.

The closet analog is AtmosUtils.putBlob which retries on
KeyAlreadyExistsException after removing.  Generally the jclouds
portable abstraction tries to make all blobstores act the same and uses
the native behavior for the providers.  Which blobstore has similar
behavior?  I am not sure how we should handle this for almost
S3-compatible implementations like Hitachi.

> P.S. I'd look this stuff up myself if I could only trace my way to the bottom 
> levels of the jclouds code. There's so much interface wrapping going on in 
> there, along with dependency injection, it's nearly impossible to tell where 
> the rubber hits the road. If anyone can provide a hint about how to read the 
> code from user-level to wire-level, I'd really appreciate it.

jclouds uses metaprogramming which allows compact notation but obscures
the intent.  Most of the magic lies in RestAnnotationProcessor if you
want to see how it works.

-- 
Andrew Gaul
http://gaul.org/


putBlob with an already existing object

2018-04-04 Thread john . calcote
What is jclouds's general policy with regard to putting a blob to a cloud 
service where the blob already exists and the cloud provider doesn't allow 
overwrites?

Seems like it would be nice to be able to treat the operation like it's an 
idempotent http PUT, but if the service disallows overwrites, jclouds would 
receive an exception in this case. Jclouds could then verify that the existing 
object has the same content and silently return "ok" as if the put worked. 

However, what happens if the cloud service has an object with the same name and 
different content? The only way to maintain the idempotent quality would be to 
silently delete the existing object and try the put again under the covers - 
this seems imprudent to me and unlikely to be the current functionality.

What really happens? 

Thanks,
John

P.S. I'd look this stuff up myself if I could only trace my way to the bottom 
levels of the jclouds code. There's so much interface wrapping going on in 
there, along with dependency injection, it's nearly impossible to tell where 
the rubber hits the road. If anyone can provide a hint about how to read the 
code from user-level to wire-level, I'd really appreciate it.