Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-30 Thread Michael Terry
I'll mark this branch 'rejected' as it has been superseded by the merged 
check-volumes branch.
-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-24 Thread edso
 
 in this case it means we add a new interface like get_info() that backends 
 are free to implement.
 if they do, some extra duplicity functionality kicks in. if not, then it 
 does not ;).
 
 But no reason to invent some complicated thing that all backends have to 
 support.  Why not do get_info now, and if we ever find an actual use for a 
 list of info and we want to allow backends to optimize it, we can add a new 
 get_list_info that certain backends can provide, which if found will be used, 
 else we can fake it, iterating over a file list, calling get_info on each 
 entry.

i explicitly try to avoid the 'backend has to support' here. my solution would 
be a backend.py::get_info() that returns an empty list. backends can override 
that with an own get_info() that returns information and the calling code has 
to decide what to do according to the return data (empty list or meta data).
 
 I'm just saying, we can optimize later for this problem we don't even have 
 right now, rather than requiring all backends to implement a more complicated 
 function.

this is not optimizing but API design. backend.py's main method right now are

def put(self, source_path, remote_filename = None):
def get(self, remote_filename, local_path):
def list(self):
def delete(self, filename_list):

where the two latter actually return a list or take a list as argument so they 
can deal with lists and with singles. it seems to me only natural to have a 

def get_info(self, filename_list = None, attribute_list = None):

which returns a list. if you really implement the list functionality is up to 
you. you might want to throw a NotImplementedError if the list contains more 
than one item, but the interface will be stable and we don't have to modify the 
calling code after we decided to support lists.

maybey we should call it list_info instead as it lists info.
 
 why do we actually log an error here? shouldn't we try num-retries to put 
 the file on the backend
 and only fail if that does not work out?
 
 Backends already do try multiple times.  But it makes sense that if we 
 havne't yet tried the maximum number yet, retry until we do hit the cap.

that's what i meant... thanks all for your efforts btw. ;) ..ede


-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-24 Thread edso
On 24.08.2011 16:12, Michael Terry wrote:
 I feel like the importance of the details of the API design, while not a 
 trivial matter, has still been dwarfed by the amount of virtual ink we've 
 spilled.  :)

that's the nice thing of going paperless ;)
you might have noticed, i try to model future proof, because ever so often i 
see people stumbling over bricks that could have been a nice wall already.

 I'm honestly fine with either way.  I'd be willing to do the work for either 
 way.  Maybe Ken can chime in with further suggestions or how he'd like to see 
 it, 

yeah, a maintainers input would be nice on this matter.

and I can start on a branch once I know how to code the API.

would you be interested in implementing it in sftp or rather ftp? i could do 
one of those.


..ede

-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-24 Thread edso
On 24.08.2011 16:31, Kenneth Loafman wrote:
 I think you two have indeed hashed the design down to it's essentials, so
 here are some thoughts on the issue.
 
 The idea of get_info(filename_list) is a good idea, but it is not necessary
 to specify what info to return.  It is necessary for this enhancement to

we can add that later, if needed (it's just the placeholder is already there), 
same goes for the filename_list implementation.
how about 

def list_info(self, filename_list = None, attribute_list = None):
(because it lists info)

 return the size, but most of the backends already do that, they just ignore
 the data.  For example, ftp and ssh both return the results of 'ls -l'
 although ftp may be problematic due to its various flavors.

i expected so, because i am quite familiar with sftp/ftp and know how they work.
are you sure there is no backend that won't deliver a size because that is not 
part of the protocol or server implemention?

 
 Initially we just wanted it to return size info on just one file.  Maybe we
 should make that default, but return the info in a list of dicts, i.e.
 [{'name' : 'filename', 'size' : }, ...].  'size' could be set to None if
 the backend is not able to supply it.  

i actually would even (because of the above) not have a key named size if the 
backend does not deliver it. a size None can still mean there was a computing 
error or such.

Then we can just warn the user that
 this backend is problematic and they should verify.

hmm. here i think that duplicity until the resume integration had no issue with 
this problem, because it always restarted from the scratch. so i'd rather vote 
to 

make resume optional (--resume parameter) 
OR
tell the user to consider to disable resume (--no-resume) because the backend 
is problematic and they should verify

 Ede, I am in favor of dicts for this.  foo[5] is mostly meaningless if you
 are reading the code for the first time, while foo['size'] tells you
 something about it.

agreed. can python stack dicts? as filename is the id already it might be handy 
to stack it like this

{'filename1' : { 'size' : , ... }, 'filename2' : { 'size' : , ... }, 
... }

so it can be accessed 

foo['filename1']['size']

..ede

-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread edso
On 23.08.2011 20:25, Kenneth Loafman wrote:
 The size info is the problem.  Each backend returns a list of filenames, but
 getting size info is problematic.  That's going to be a major piece of work
 to get that info.

I am no pythonist per se, but is there a possibility to add a list_extended to 
backends that would fetch these infos, if possible? So if a backend supports 
these, they are there in a list structure 
0={[name]=,[size]=,...},1={[name]=,[size]=,...},2= 
if not (or if not implemented by now) only they are missing 
0={[name]=},1={[name]=},2=

The victim of zero sized files here 
http://lists.gnu.org/archive/html/duplicity-talk/2011-08/msg00013.html used the 
ftp backend, which actually could deliver this info.

 
 Downloading the file is not a good idea.  Performance would take a serious
 hit, i.e. double the time for backups.  If we could get file size info, we
 would know that the sizes were correct, and that's what we need.

agreed, see my verify does this point
 
 My guess is that the zero length files are really the last file transmitted
 before a backup failed.  Getting file sizes would help in proving this.

exactly my point. ede


 
 ...Ken
 
 On Tue, Aug 23, 2011 at 1:02 PM, Michael Terry
 michael.te...@canonical.comwrote:
 
 That is a good and sensible idea, but the current backend.list architecture
 isn't well suited for that, performance wise.

 (A) We can't ask for information about just one file, we have to list all
 files each time
 (B) We can't ask for size information.  We'd have to download the file
 again.

 Both could be solved by after each upload, just downloading the file again
 and comparing to the original to 'prove' the round trip worked.  That's a
 decent idea, actually, just seems a lot less performant.

 I could whip that up, depending on interest.  Ken?
 --

 https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607https://code.launchpad.net/%7Emterry/duplicity/early-catch-498933/+merge/72607
 Your team duplicity-team is requested to review the proposed merge of
 lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

 ___
 Mailing list: https://launchpad.net/~duplicity-team
 Post to : duplicity-team@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~duplicity-team
 More help   : https://help.launchpad.net/ListHelp

 

-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread Michael Terry
 true but backends usually list all files anyway. dont they? .. ede

They do if you ask them to, and that's the only listing capability the current 
Duplicity backend.py API has.

But if we're talking about adding to that API, I suspect most backends have a 
query info about a particular file call that doesn't involve listing all 
files.  And when verifying after each volume, that's what we want; we don't 
need to ask for info on more than just the one volume.
-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread edso
On 23.08.2011 21:02, Michael Terry wrote:
 true but backends usually list all files anyway. dont they? .. ede
 
 They do if you ask them to, and that's the only listing capability the 
 current Duplicity backend.py API has.
 
 But if we're talking about adding to that API, I suspect most backends have a 
 query info about a particular file call that doesn't involve listing all 
 files.  And when verifying after each volume, that's what we want; we don't 
 need to ask for info on more than just the one volume.

you are right. let's leave it to the backend. eventually python logic could 
fish it out of a full listing in case a backend really doesn't support it. 
seems like ftp and sftp support it that atomically though. we should really 
implement it.

OR

at least make resuming optionally. too much cases seem to went wrong sice it 
was introduced. while i see the advantage, i also see the disadvantage of 
corrupt backups. and too much users with thin lines do whether verify nor keep 
their chains short. 

COMMENTS?

back to the issue itself. why not have a get_info() method stuffing all 
avilable info into a list structure. that way we wont have to reinvent the 
wheel when theremight be more information necessary.

ede

-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread Michael Terry
ede, making resume optional is a good idea (maybe a --no-resume option).  But 
that's a separate bug/branch.

As for get_info() vs get_size(), I'm nervous about getting *all* infor we can.  
That might be a lot, and some info is more or less expensive on different 
backends.

See, for example, all the info the GIO backend could request of a file: 
file:///usr/share/doc/libglib2.0-doc/gio/GFileInfo.html

I'm sure we could come up with a reasonable subset of metadata to query, but 
adding more later isn't so hard.

What about a get_info() method that returned a dictionary?  And for an initial 
pass, we would only have backends fill out a size entry.  That way, if we 
later discover some piece of metadata we want, we keep the same API and can 
just edit the backends.

Now, as for whether get_info() should return information on a list or a single 
file...  ftp and sftp can operate on multiple files easily, but there would be 
no savings to do so.  We'd ideally want to verify after uploading each file, so 
we're still making the same number of remote requests.  And we're not saving 
computationally, because getting info on the whole list is asking for more 
information than we will use.
-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread Michael Terry
Heh, obviously that GFileInfo link should have been 
http://developer.gnome.org/gio/stable/GFileInfo.html
-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp


Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity

2011-08-23 Thread edso
On 23.08.2011 22:49, Michael Terry wrote:
 this is a mapped list, right? 
 
 I'm not familiar with that term, but were you asking what I meant by 
 dictionary?  I meant a python dictionary, like:
 
 a = {'key': 'value'}
 a['key'] == 'value'

i am no pythonist, so yes a key-value list is called a map in java and was 
what i meant

 
 get_info(path/to/file, (size,last_mod) )
 
 With this API, you'll still have to handle the backend not providing 
 'last_mod', so why bother telling the backend which bits you want?  Backends 
 can just fill in all the bits that duplicity needs (which for right now is 
 just size).

because it is the more elegant approach. you were worrying about too much info 
by default. ok, so we know which info we need, so let's request it. if we can't 
get it, we'll get a no key=value pair for it, so we can deal with it.

performancewise it also makes sense, becuse no info is requested that is not 
needed.

we'll probably end up with backends, that won't/can't tell us the size anyway, 
so we should design the api with the possibility to deliver a NULL info. we 
simply cannot know if every backend is going to deliver it. this way we could 
easily implement a dummy method in backend that returns an empty list and 
implementing backends can overwrite and return whatever they are capable of is 
requested.

 
 i was just argumenting API-wise, and for the api it'd make sense to be able 
 to return lists
 instead of atomic data sets. who knows what we'll need it for in the future.
 
 I'm only leery of over-engineering.  The more we build into the API, the 
 harder it is to write a backend (the ease of which is a nice thing right 
 now).  The rest of the current backend API is oriented around single objects. 

except of list() of course ;)

 If we don't know why we'd need to be able to query multiple files at once, 
 why optimize for it?

that's exactly the reason we don't know! 

i actually believe that some extra thought in application design will save you 
the extra hours of doctoring afterwards. so therefor i plead to keep the API 
open. that does not mean that everyting is to be implemented. it just means 
that you don't restrict yourself without any need for it. so why restrict the 
return value to one value set, if we can as well wrap this by default into a 
list again, just to have the possibility to return a list of dictionaries for 
several files.

in this case it means we add a new interface like get_info() that backends are 
free to implement. if they do, some extra duplicity functionality kicks in. if 
not, then it does not ;).

another thought:

why do we actually log an error here? shouldn't we try num-retries to put the 
file on the backend and only fail if that does not work out?

ede


-- 
https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607
Your team duplicity-team is requested to review the proposed merge of 
lp:~mterry/duplicity/early-catch-498933 into lp:duplicity.

___
Mailing list: https://launchpad.net/~duplicity-team
Post to : duplicity-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp