Re: [Shotwell] RFC - option to force Import to overwrite originals by [Shotwell-perceived] duplicates

Michael Hendry Mon, 25 Oct 2010 12:12:11 -0700

On Thu, 2010-10-14 at 12:05 -0700, Jim Nelson wrote:
> There's a number of different issues and problems in play here, and
> I'm afraid they're all being conflated under the banner of "duplicate
> detection".  It's important to understand at least a couple of basic
> concepts.


Thanks for the clarification, Jim - see my interpolated comments below:

> 
> First, in regards to a file that is twice the size of another being
> regarded as a duplicate, that is most likely due to this bug:
> http://trac.yorba.org/ticket/2587  It was an oversight on my part and
> I would like to fix this for 0.8.

Excellent.

> 
> Now, beyond that, there are three kinds of duplicate detection (that
> banner I was mentioning before):
> 
> 1. The same filepath: If you import /home/jim/photo.jpg and then
> import again /home/jim/photo.jpg, Shotwell will treat that as a
> duplicate.  This is a special case of duplicate detection; Shotwell
> will not allow you to have two photo "objects" in your library
> pointing to the same file.  Note that this is true even if you
> update/change the file outside of Shotwell, i.e. modify its EXIF.
> Shotwell 0.7 does *not* detect external changes and update the
> library.
> 
> Shotwell 0.8, on the other hand, will do that exact thing by detecting
> the change at startup: http://trac.yorba.org/ticket/2476

If I understand this correctly, in Shotwell 0.8 it will be necessary to
close Shotwell and open it again if one of the files is changed (in my
case, if I change the date-stamp in the EXIF data) - an attempt to
import that file again will still be seen as an attempt to import a
duplicate because the file has the same name, but all will be well the
next time I start Shotwell.

> 
> 2. The same file contents: If two photos are byte-for-byte identical,
> they are considered duplicates and Shotwell will only import one of
> them.  This comparison is of the entire file; changing the EXIF in one
> is enough to make the files different.

Precisely what I was expecting from duplicate detection.

> 
> 3. The same file on camera and on disk: Camera import introduces a
> problem.  Because it takes so long to download a file and it's
> desirable to be able to see which photos you've already imported
> before importing them, we want to detect duplicates before pulling
> down the file.  We do this by comparing the photo thumbnails (which is
> what led to that bug I mentioned at the top of this message).
> Otherwise, we would download the file, compare it to the library,
> realize it was a duplicate, and then delete the file.
> 

OK - I found this useful today after I'd failed to delete image files
I'd already downloaded from my camera. I think I'd prefer to have the
option of overwriting the already-downloaded images, though.

> Now, apart from all this, the trash can creates a special case.  If
> you move a photo to the trash and then import a file that is a
> duplicate -- i.e. is either the same filepath OR the same contents --
> Shotwell will restore the photo object from the trash and not import
> the new file.  (In the case of the same filepath, that's essentially
> what you're asking for.)  We assume if you import a duplicate file,
> you're saying "You know what, I want this photo after all," and we
> restore it from the trash.  It's also restoring all the stuff you've
> done to it while it was in Shotwell, i.e. its tags, transformations,
> and so on.  We see this as valuable stuff to be preserving.
> 

I hear what you're saying, but I have to say I find its implementation
in practice totally counterintuitive! If I had put something in the
wastebasket and then realised I wanted it after all, I would take it out
of the wastebasket myself.

I agree that transferring the tags associated with the original file
would be helpful - this could be implemented as a dialog box asking
something like:

Importing file with same name and path - keep tags?
[yes]...[yes to all]...[no]...[no to all]

> This is all quite distinct from a different operation we refer to as
> "re-import": To take an existing photo in the library and re-examine
> all its properties and reflect any changes in the database
> (thumbnails, tags, etc.).  I think what Michael is asking for when
> importing a file already in the trash is for Shotwell to re-import it.
> In 0.8 we're attacking the problem slightly differently, but detecting
> the change automatically and re-importing it in the background.  But
> 0.7 does not have this capability in any manner.

Yes, I think 0.8 will deal with my complaints. How will it handle the
preservation of tags when it discovers a changed file on startup? And
how about the case of a changed file which at some stage been edited via
Shotwell? My preference would be to transfer the tags and drop the edit
trail.

> 
> I know this is a lot to absorb, but I feel like some concepts and
> terms are being muddied.  Some of it is my own fault, as we're trying
> to keep it simple for users, and some of it is due to a bug that's
> being grouped in as a part of designed behavior.  By being aware of
> the different cases of "duplicate detection", I think we can figure
> what is useful, what can be tweaked, and what needs to be changed.
> 
> -- Jim
> 

"Duplicate" means "identical" to me - you can't have shades of
duplicate-ness. If the process of checking a potential duplicate is
going to take a long time, then the user should be given the option to
abort or continue.

Michael

_______________________________________________
Shotwell mailing list
[email protected]
http://lists.yorba.org/cgi-bin/mailman/listinfo/shotwell

Re: [Shotwell] RFC - option to force Import to overwrite originals by [Shotwell-perceived] duplicates

Reply via email to