On Dec 23 2015, Cliff Stanford <[email protected]> wrote:
> On 23/12/15 19:12, Nikolaus Rath wrote:
>
>> And that's not surprising. If you want to store data in S3, you need to
>> use the S3 backend, not the local backend.
>
> I made the foolish assumption that the stored data format would be the same.
>
>>> (b) is there any way to recover from it?
>>
>> Not sure what you mean. Did you delete the local copy? If so, you can
>> just download the data from S3 using whatever tool you used to upload it
>> and you should be able to mount it using the local backend.
>
> The source data is no longer accessible; it is on a hard disk in a
> disconnected machine in Wales.  I am in Spain.  Downloading and
> re-uploading is not a possibility; the data (5 Terabytes) was uploaded
> by sending a USB disk to AWS in Ireland.  They insist on wiping the
> source disk so I can't even recover from that when it's returned.
>
>> If you want to "convert" a local S3QL file system to an S3 one, you can
>> use the contrib/clone_fs.py script from the S3QL tarball.
>
> I don't suppose there's a way to convert the data in situ?  Is it just
> the metadata that is different or is it all data?  Any suggestions?

S3 supports storing metadata for each storage object together with the
actual data. The equivalent for a local filesystem would be something
like extended attributes, but since these are not always available, the
local backend instead stores metadata and data in the file consecutively.

So in principle it's probably possible to do an in-place
conversion. Such a tool would need to read the first kB or so of each
object to get the metadata, and then write it back as "actual"
metadata. That leaves the complication that the actual data is still
contained at an offset in the storage object, but luckily S3QL already
has a mechanism to handle this to support file system upgrades (there is
a 'payload_offset' metadata entry just for this). Doing all this will
require a bit of programming though (i.e., I would charge for doing it).

Can't you use an EC2 instance to do the download + re-upload? I believe
traffic from S3 to EC2 is free, and you should have pretty good
bandwidth even for 5 TB.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to