Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
I'll mark this branch 'rejected' as it has been superseded by the merged check-volumes branch. -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
in this case it means we add a new interface like get_info() that backends are free to implement. if they do, some extra duplicity functionality kicks in. if not, then it does not ;). But no reason to invent some complicated thing that all backends have to support. Why not do get_info now, and if we ever find an actual use for a list of info and we want to allow backends to optimize it, we can add a new get_list_info that certain backends can provide, which if found will be used, else we can fake it, iterating over a file list, calling get_info on each entry. i explicitly try to avoid the 'backend has to support' here. my solution would be a backend.py::get_info() that returns an empty list. backends can override that with an own get_info() that returns information and the calling code has to decide what to do according to the return data (empty list or meta data). I'm just saying, we can optimize later for this problem we don't even have right now, rather than requiring all backends to implement a more complicated function. this is not optimizing but API design. backend.py's main method right now are def put(self, source_path, remote_filename = None): def get(self, remote_filename, local_path): def list(self): def delete(self, filename_list): where the two latter actually return a list or take a list as argument so they can deal with lists and with singles. it seems to me only natural to have a def get_info(self, filename_list = None, attribute_list = None): which returns a list. if you really implement the list functionality is up to you. you might want to throw a NotImplementedError if the list contains more than one item, but the interface will be stable and we don't have to modify the calling code after we decided to support lists. maybey we should call it list_info instead as it lists info. why do we actually log an error here? shouldn't we try num-retries to put the file on the backend and only fail if that does not work out? Backends already do try multiple times. But it makes sense that if we havne't yet tried the maximum number yet, retry until we do hit the cap. that's what i meant... thanks all for your efforts btw. ;) ..ede -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
On 24.08.2011 16:12, Michael Terry wrote: I feel like the importance of the details of the API design, while not a trivial matter, has still been dwarfed by the amount of virtual ink we've spilled. :) that's the nice thing of going paperless ;) you might have noticed, i try to model future proof, because ever so often i see people stumbling over bricks that could have been a nice wall already. I'm honestly fine with either way. I'd be willing to do the work for either way. Maybe Ken can chime in with further suggestions or how he'd like to see it, yeah, a maintainers input would be nice on this matter. and I can start on a branch once I know how to code the API. would you be interested in implementing it in sftp or rather ftp? i could do one of those. ..ede -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
On 24.08.2011 16:31, Kenneth Loafman wrote: I think you two have indeed hashed the design down to it's essentials, so here are some thoughts on the issue. The idea of get_info(filename_list) is a good idea, but it is not necessary to specify what info to return. It is necessary for this enhancement to we can add that later, if needed (it's just the placeholder is already there), same goes for the filename_list implementation. how about def list_info(self, filename_list = None, attribute_list = None): (because it lists info) return the size, but most of the backends already do that, they just ignore the data. For example, ftp and ssh both return the results of 'ls -l' although ftp may be problematic due to its various flavors. i expected so, because i am quite familiar with sftp/ftp and know how they work. are you sure there is no backend that won't deliver a size because that is not part of the protocol or server implemention? Initially we just wanted it to return size info on just one file. Maybe we should make that default, but return the info in a list of dicts, i.e. [{'name' : 'filename', 'size' : }, ...]. 'size' could be set to None if the backend is not able to supply it. i actually would even (because of the above) not have a key named size if the backend does not deliver it. a size None can still mean there was a computing error or such. Then we can just warn the user that this backend is problematic and they should verify. hmm. here i think that duplicity until the resume integration had no issue with this problem, because it always restarted from the scratch. so i'd rather vote to make resume optional (--resume parameter) OR tell the user to consider to disable resume (--no-resume) because the backend is problematic and they should verify Ede, I am in favor of dicts for this. foo[5] is mostly meaningless if you are reading the code for the first time, while foo['size'] tells you something about it. agreed. can python stack dicts? as filename is the id already it might be handy to stack it like this {'filename1' : { 'size' : , ... }, 'filename2' : { 'size' : , ... }, ... } so it can be accessed foo['filename1']['size'] ..ede -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
On 23.08.2011 20:25, Kenneth Loafman wrote: The size info is the problem. Each backend returns a list of filenames, but getting size info is problematic. That's going to be a major piece of work to get that info. I am no pythonist per se, but is there a possibility to add a list_extended to backends that would fetch these infos, if possible? So if a backend supports these, they are there in a list structure 0={[name]=,[size]=,...},1={[name]=,[size]=,...},2= if not (or if not implemented by now) only they are missing 0={[name]=},1={[name]=},2= The victim of zero sized files here http://lists.gnu.org/archive/html/duplicity-talk/2011-08/msg00013.html used the ftp backend, which actually could deliver this info. Downloading the file is not a good idea. Performance would take a serious hit, i.e. double the time for backups. If we could get file size info, we would know that the sizes were correct, and that's what we need. agreed, see my verify does this point My guess is that the zero length files are really the last file transmitted before a backup failed. Getting file sizes would help in proving this. exactly my point. ede ...Ken On Tue, Aug 23, 2011 at 1:02 PM, Michael Terry michael.te...@canonical.comwrote: That is a good and sensible idea, but the current backend.list architecture isn't well suited for that, performance wise. (A) We can't ask for information about just one file, we have to list all files each time (B) We can't ask for size information. We'd have to download the file again. Both could be solved by after each upload, just downloading the file again and comparing to the original to 'prove' the round trip worked. That's a decent idea, actually, just seems a lot less performant. I could whip that up, depending on interest. Ken? -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607https://code.launchpad.net/%7Emterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
true but backends usually list all files anyway. dont they? .. ede They do if you ask them to, and that's the only listing capability the current Duplicity backend.py API has. But if we're talking about adding to that API, I suspect most backends have a query info about a particular file call that doesn't involve listing all files. And when verifying after each volume, that's what we want; we don't need to ask for info on more than just the one volume. -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
On 23.08.2011 21:02, Michael Terry wrote: true but backends usually list all files anyway. dont they? .. ede They do if you ask them to, and that's the only listing capability the current Duplicity backend.py API has. But if we're talking about adding to that API, I suspect most backends have a query info about a particular file call that doesn't involve listing all files. And when verifying after each volume, that's what we want; we don't need to ask for info on more than just the one volume. you are right. let's leave it to the backend. eventually python logic could fish it out of a full listing in case a backend really doesn't support it. seems like ftp and sftp support it that atomically though. we should really implement it. OR at least make resuming optionally. too much cases seem to went wrong sice it was introduced. while i see the advantage, i also see the disadvantage of corrupt backups. and too much users with thin lines do whether verify nor keep their chains short. COMMENTS? back to the issue itself. why not have a get_info() method stuffing all avilable info into a list structure. that way we wont have to reinvent the wheel when theremight be more information necessary. ede -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
ede, making resume optional is a good idea (maybe a --no-resume option). But that's a separate bug/branch. As for get_info() vs get_size(), I'm nervous about getting *all* infor we can. That might be a lot, and some info is more or less expensive on different backends. See, for example, all the info the GIO backend could request of a file: file:///usr/share/doc/libglib2.0-doc/gio/GFileInfo.html I'm sure we could come up with a reasonable subset of metadata to query, but adding more later isn't so hard. What about a get_info() method that returned a dictionary? And for an initial pass, we would only have backends fill out a size entry. That way, if we later discover some piece of metadata we want, we keep the same API and can just edit the backends. Now, as for whether get_info() should return information on a list or a single file... ftp and sftp can operate on multiple files easily, but there would be no savings to do so. We'd ideally want to verify after uploading each file, so we're still making the same number of remote requests. And we're not saving computationally, because getting info on the whole list is asking for more information than we will use. -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
Heh, obviously that GFileInfo link should have been http://developer.gnome.org/gio/stable/GFileInfo.html -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp
Re: [Duplicity-team] [Merge] lp:~mterry/duplicity/early-catch-498933 into lp:duplicity
On 23.08.2011 22:49, Michael Terry wrote: this is a mapped list, right? I'm not familiar with that term, but were you asking what I meant by dictionary? I meant a python dictionary, like: a = {'key': 'value'} a['key'] == 'value' i am no pythonist, so yes a key-value list is called a map in java and was what i meant get_info(path/to/file, (size,last_mod) ) With this API, you'll still have to handle the backend not providing 'last_mod', so why bother telling the backend which bits you want? Backends can just fill in all the bits that duplicity needs (which for right now is just size). because it is the more elegant approach. you were worrying about too much info by default. ok, so we know which info we need, so let's request it. if we can't get it, we'll get a no key=value pair for it, so we can deal with it. performancewise it also makes sense, becuse no info is requested that is not needed. we'll probably end up with backends, that won't/can't tell us the size anyway, so we should design the api with the possibility to deliver a NULL info. we simply cannot know if every backend is going to deliver it. this way we could easily implement a dummy method in backend that returns an empty list and implementing backends can overwrite and return whatever they are capable of is requested. i was just argumenting API-wise, and for the api it'd make sense to be able to return lists instead of atomic data sets. who knows what we'll need it for in the future. I'm only leery of over-engineering. The more we build into the API, the harder it is to write a backend (the ease of which is a nice thing right now). The rest of the current backend API is oriented around single objects. except of list() of course ;) If we don't know why we'd need to be able to query multiple files at once, why optimize for it? that's exactly the reason we don't know! i actually believe that some extra thought in application design will save you the extra hours of doctoring afterwards. so therefor i plead to keep the API open. that does not mean that everyting is to be implemented. it just means that you don't restrict yourself without any need for it. so why restrict the return value to one value set, if we can as well wrap this by default into a list again, just to have the possibility to return a list of dictionaries for several files. in this case it means we add a new interface like get_info() that backends are free to implement. if they do, some extra duplicity functionality kicks in. if not, then it does not ;). another thought: why do we actually log an error here? shouldn't we try num-retries to put the file on the backend and only fail if that does not work out? ede -- https://code.launchpad.net/~mterry/duplicity/early-catch-498933/+merge/72607 Your team duplicity-team is requested to review the proposed merge of lp:~mterry/duplicity/early-catch-498933 into lp:duplicity. ___ Mailing list: https://launchpad.net/~duplicity-team Post to : duplicity-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp