Re: muchsync files renames

2015-09-09 Thread Amadeusz Żołnowski
Thank you David B. for explanation.  I think that everything is clear
now.

David M., what about updating your website on that?  I think it's
important to warn about possible files moves between new/ and cur/.

And what I can do is to prepare patches for afew to handle it
appropriately.

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-02 Thread Amadeusz Żołnowski
David Bremner  writes:
> If I understand the code correctly, this movement will only happen
> when one of the maildir-flag-equivalent tags is changed. I haven't dug
> ack through the archives, but I think mutt uses presence in new/ as
> some kind of extra unseen state, so people requested not to move files
> until needed.

When I have added 'unread' tag the file was still in new/. Only after
removing 'unread' afterwards the file has been moved to cur/. So it
seems you're right, but take a look at the following excerpt from
T340-maildir-sync.sh:

test_begin_subtest "Message in new with maildir info is moved to cur on any 
tag change"
add_message [filename]='message-with-info-to-be-moved-to-cur:2,' [dir]=new
notmuch tag +anytag id:$gen_msg_id
output=$(cd "$MAIL_DIR"; ls */message-with-info-to-be-moved-to-cur*)
test_expect_equal "$output" "cur/message-with-info-to-be-moved-to-cur:2,"

What is different about the test case and my case is that my mail file
doesn't have ":2," suffix. Adding the suffix to file name makes it
working as expect by test case. I see I would have to convert my mail
files names, but I think this inconsistency in notmuch should also take
some attention.

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-02 Thread David Bremner
Amadeusz Żołnowski  writes:

> When I have added 'unread' tag the file was still in new/. Only after
> removing 'unread' afterwards the file has been moved to cur/.

The unread tag corresponds to the *absence* of the ,S flag, so if you
don't add unread at notmuch new, tagging it unread later is effectively
a no-op from the point of view of maildir-flag synching. I guess the
part that is optional is moving from new/foo to cur/foo:2, . I believe
we used to be more aggressive about doing this, but mutt users
complained.

> So it seems you're right, but take a look at the following excerpt
> from T340-maildir-sync.sh:
>
[...]
> What is different about the test case and my case is that my mail file
> doesn't have ":2," suffix. Adding the suffix to file name makes it
> working as expect by test case. I see I would have to convert my mail
> files names, but I think this inconsistency in notmuch should also take
> some attention.

Have a look at

 http://cr.yp.to/proto/maildir.html
 http://www.qmail.org/man/man5/maildir.html
 
I don't think messages in new are supposed to have : in their names. So
this test is dealing with a corner case of some out-of-spec MUA writing
:info onto the filename. So I don't think adding a suffix is the right
thing to do here. It also seems like leaving a message in new/ when
tagging it as unread is a reasonable option.

The gory details (per David's earlier request) are in
_new_maildir_filename in lib/message.cc.


d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-01 Thread David Bremner
Amadeusz Żołnowski  writes:

> What's more surprising is that there is a test case in notmuch test
> suite which test whether after modifing tag of a mail it is moved from
> new/ to cur/. Yes, it should be moved on any tag modification if I
> understand correctly. But it seems it does not for my maildirs...
>

If I understand the code correctly, this movement will only happen when
one of the maildir-flag-equivalent tags is changed. I haven't dug ack
through the archives, but I think mutt uses presence in new/ as some
kind of extra unseen state, so people requested not to move files until
needed.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-01 Thread dm-list-email-notmuch
David Bremner  writes:

> Amadeusz Żołnowski  writes:
>
>> What's more surprising is that there is a test case in notmuch test
>> suite which test whether after modifing tag of a mail it is moved from
>> new/ to cur/. Yes, it should be moved on any tag modification if I
>> understand correctly. But it seems it does not for my maildirs...
>>
>
> If I understand the code correctly, this movement will only happen when
> one of the maildir-flag-equivalent tags is changed. I haven't dug ack
> through the archives, but I think mutt uses presence in new/ as some
> kind of extra unseen state, so people requested not to move files until
> needed.

Can you explain how/where this is implemented?  I would like muchsync to
do exactly what notmuch does, and ideally without replicating its logic,
if I can just have libnotmuch handle this.  Currently, my code looks
something like this:

  notmuch_message_freeze()
  notmuch_message_remove_all_tags()
  notmuch_message_add_tag(); notmuch_message_add_tag(); ...
  if (synchronize_tags)
notmuch_message_tags_to_maildir_flag()
  notmuch_message_thaw()

And what we're finding is the above code causes the message to move from
new/ to cur/, while the "notmuch tag" command does not, even while
changing between the same before and after tag sets.

Any ideas?

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-01 Thread Amadeusz Żołnowski
Hi David,

David Mazieres  writes:
> Let's just make sure I understand:  Your mail starts out like this:
>
> Path:  spam/new/nnn.MnnnPnnnQnRn.machine
> Tags:  new
>
> Then you run afew, and afew runs
>
> notmuch tag -new +spam 
>
> You are saying that that even though maildir.synchronize_tags is true,
> you end up with:
>
> Path:  spam/new/nnn.MnnnPnnnQnRn.machine
> Tags:  spam

Yes.


> That's a little surprising, because the next time you run "notmuch new,"
> I would have expected it to add the unread flag based on the pathname.

What's more surprising is that there is a test case in notmuch test
suite which test whether after modifing tag of a mail it is moved from
new/ to cur/. Yes, it should be moved on any tag modification if I
understand correctly. But it seems it does not for my maildirs...

$ notmuch search --output=files thread:000108bf
/home/aidecoe/Mail/aidecoe/2015/new/1441022521.M714465P23412VFE04I00141A38_0.freja,S=53857
$ notmuch search thread:000108bf
thread:000108bf  Yest. 11:58 [1/1] Somebody; Subject (reklama unread)
$ notmuch tag +hey thread:000108bf
$ notmuch search thread:000108bf
thread:000108bf  Yest. 11:58 [1/1] Somebody; Subject (hey reklama 
unread)
$ notmuch search --output=files thread:000108bf
/home/aidecoe/Mail/aidecoe/2015/new/1441022521.M714465P23412VFE04I00141A38_0.freja,S=53857


> Then it will add the unread tag to the Xapian database.  But maybe if it
> finds a file in the new folder it doesn't add the unread flag.

Might be.


> But why does notmuch_message_tags_to_maildir_flag() then feel the need
> to rename the file when muchsync calls it.  Muchsync should ideally
> behave exactly the same as the notmuch tag command.  Specifically, when
> muchsync receives a new file from the server, it does the following:
>
>  1. create file in same directory as the server (presumably spam/new)
>
>  2. Call the following functions on this file:
>   notmuch_database_add_message()
>   notmuch_message_freeze()
>   notmuch_message_remove_all_tags()
>   notmuch_message_add_tag() for each tag in new.tags
>   if (synchronize_tags) notmuch_message_tags_to_maildir_flag()
>   notmuch_message_thaw()
>
>  3. get the current tags of the message from the server (presumably just
> spam)
>
>  4. Call the following functions on the Message-ID:
>   notmuch_message_freeze()
>   notmuch_message_remove_all_tags()
>   notmuch_message_add_tag() for each tag sent *by the server*
>   if (synchronize_tags) notmuch_message_tags_to_maildir_flag()
>   notmuch_message_thaw()

So for some reason in my maildirs mails are not moved from new/ to cur/
on tag manipulation, but they are on client side by muchsync.  I will
have to investigate why this happens to me.


-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-31 Thread dm-list-email-notmuch
Amadeusz Żołnowski  writes:

>> So... based on all the evidence so fare the culprit seems to be that
>> something is moving mail files into your Spam folder on the client.
>> If that rings any bells and solves the problem, great.  If not, here
>> is what we need to do to track it down further.
>
> I have followed you hints to track down the issue.  All of these
> messages are spam. What I suspect follows.
>
> All of these files have been placed to new/ subdir by maildrop and
> during posthook (afew) have been stripped of any tags besides 'spam'
> tag, in particular 'unread' tag has been removed, but files still remain
> in new/ subdir.  So... what had to happen is that during muchsync these
> messages have been discovered as already read, so they don't belong to
> new/ but must be moved to cur/.  And this is what happened on client
> side.  During next muchsync these changes had to be pushed to server,
> i.e. move from new/ to cur/.

Right.  Muchsync checks to see if maildir.synchronize_flags is true on
the client.  If it is, then muchsync calls
notmuch_message_tags_to_maildir_flags after setting the flags (which is
the same as what would happen if you set the tags manually with the
"notmuch tag" command).

A maildir file in the new/ directory can't have any tags (except the
implicit unread flag, which is indicated by the absence of "S" in the
end of the filename).  So the notmuch_message_tags_to_maildir_flags()
function is renaming the file to the cur subdirectory, and then
propagating this rename back to the server.

The one thing I'm still unclear on is whether afew is running on the
client of the server.  If you are running it on the client, then this
makes sense.  If you are running it on the server, then somehow afew
must not be respecting the maildir.synchronize_flags setting.
Otherwise, the file should already be moved to the cur directory after
having the unread tag stripped off on the server.  I guess the other
option is that your maildir.synchronize_flags false on the server and
true on the client.

> So if my assumptions are correct, actually there is no issue!  I would
> just have to adjust afew filtering to prevent this behaviour.

Right.  You could have afew preserve the unread flag on spam.
Alternatively, you could just disable maildir.synchronize_flags on both
the client and server.  Finally, you could just accept the performance
penalty, as one would hope that this is a one-time thing and that
usually you don't have 5000 new spam messages every time you synchronize
your mail.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-31 Thread Amadeusz Żołnowski
Hi David,

dm-list-email-notm...@scs.stanford.edu writes:
> The one thing I'm still unclear on is whether afew is running on the
> client of the server.

It is run as a post-hook, i.e. after "notmuch new", so it's on the
server.

> I guess the other option is that your maildir.synchronize_flags false
> on the server and true on the client.

Both have this set to true.

> If you are running it on the server, then somehow afew must not be
> respecting the maildir.synchronize_flags setting.  Otherwise, the file
> should already be moved to the cur directory after having the unread
> tag stripped off on the server.

Not necessarily. The recommended setup of notmuch for afew is that
"notmuch new" tags messages with "new" tag only. Then afew processes all
messages with "new" tag. So if it is a spam, then it gets "new" removed
and "spam" added. A spam message at any time doesn't have "unread" tag
assigned which should explain this behaviour.  So the problem is to be
fixed on the afew side.

Once again thank you for helping with tracking down the issue.

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-31 Thread David Mazieres
Amadeusz Żołnowski  writes:

> Not necessarily. The recommended setup of notmuch for afew is that
> "notmuch new" tags messages with "new" tag only. Then afew processes all
> messages with "new" tag. So if it is a spam, then it gets "new" removed
> and "spam" added. A spam message at any time doesn't have "unread" tag
> assigned which should explain this behaviour.  So the problem is to be
> fixed on the afew side.

Let's just make sure I understand:  Your mail starts out like this:

Path:  spam/new/nnn.MnnnPnnnQnRn.machine
Tags:  new

Then you run afew, and afew runs

notmuch tag -new +spam 

You are saying that that even though maildir.synchronize_tags is true,
you end up with:

Path:  spam/new/nnn.MnnnPnnnQnRn.machine
Tags:  spam

That's a little surprising, because the next time you run "notmuch new,"
I would have expected it to add the unread flag based on the pathname.
But, I suppose it might make sense for notmuch to special-case that
flag.  In other words, if notmuch new finds a file called:

spam/new/nnn.MnnnPnnnQnRn.machine:2,

Then it will add the unread tag to the Xapian database.  But maybe if it
finds a file in the new folder it doesn't add the unread flag.

But why does notmuch_message_tags_to_maildir_flag() then feel the need
to rename the file when muchsync calls it.  Muchsync should ideally
behave exactly the same as the notmuch tag command.  Specifically, when
muchsync receives a new file from the server, it does the following:

 1. create file in same directory as the server (presumably spam/new)

 2. Call the following functions on this file:
  notmuch_database_add_message()
  notmuch_message_freeze()
  notmuch_message_remove_all_tags()
  notmuch_message_add_tag() for each tag in new.tags
  if (synchronize_tags) notmuch_message_tags_to_maildir_flag()
  notmuch_message_thaw()

 3. get the current tags of the message from the server (presumably just
spam)

 4. Call the following functions on the Message-ID:
  notmuch_message_freeze()
  notmuch_message_remove_all_tags()
  notmuch_message_add_tag() for each tag sent *by the server*
  if (synchronize_tags) notmuch_message_tags_to_maildir_flag()
  notmuch_message_thaw()

So what I'm wondering is how this is any different from what is already
happening on the server.  "notmuch new" should be doing what muchsync
does in step 2, and afew (via "notmuch tag") should be doing what
muchsync does in step 4.

Yet somehow you are saying that on the server the file stays in
spam/new/, while on the client muchsync's actions cause the file to move
to spam/cur/?  So that means there's still something I don't really
understand--possibly the series of notmuch library calls happening
server side (which I should then maybe emulate client side).

None of this is super serious, beyond a one-time extra cost, but I like
to understand things thoroughly, particularly when writing software that
manipulates critical data like mail...

It there any possibility that new.tags has a different setting on your
client and server machines?

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-31 Thread Amadeusz Żołnowski
Hi David,

First of all thank you a lot for support.  I am Cc'ing ml because the
last paragraph may be useful hint for other users.


David Mazieres writes:
> So to be clear, you are getting tons of lines that start "[SERVER]
> [notmuch]" and contain the string "Ignoring non-mail file"?  Is the
> "##...##" literal, or is that an ellipsis?

I have just cut off few directories on the path. :-) All of these files
are invalid spam mail, indeed.  I have removed them.  One problem less.


> Also, those file names were not generated were not generated by
> muchsync.  Any mail file created by muchsync will have a file name of
> the form:
>
> nnn.MnnnPnnnQnRn.machine
> nnn.MnnnPnnnQnRn.machine:2,

Just to makes things clear (once again? :-)), these file names are
generated only on client side.  Muchsync is not gonna ever to sync file
names to server, is it?


> When you run "notmuch new" on the server, without muchsync, does it
> take forever and print all these message while scanning non mail
> files?

No. Notmuch doesn't print these messages when I just run "notmuch new"
myself.  Anyway there was only around 100 of invalid mail files.


> Okay, this is the interesting part.  It appears that 5775 out of your
> 115877 messages have been moved to a different directory on the
> client.  I notice that the one message you include above has been
> moved to the Spam maildir.

> Is it possible that A) you have some spam filtering on the client that
> is moving things to the Spam folder,

I have a mailfilter rule which moves mails with "X-Spam-Status: Yes" to
Spam directory, but this happens on delivery before notmuch indexing.


> or B) that one of your two machines is using a case-independent file
> system that is causing confusion between "Spam" and "spam"?

I am testing it on single GNU/Linux host between different users.  It is
ext4 fs.


> So... based on all the evidence so fare the culprit seems to be that
> something is moving mail files into your Spam folder on the client.
> If that rings any bells and solves the problem, great.  If not, here
> is what we need to do to track it down further.

I have followed you hints to track down the issue.  All of these
messages are spam. What I suspect follows.

All of these files have been placed to new/ subdir by maildrop and
during posthook (afew) have been stripped of any tags besides 'spam'
tag, in particular 'unread' tag has been removed, but files still remain
in new/ subdir.  So... what had to happen is that during muchsync these
messages have been discovered as already read, so they don't belong to
new/ but must be moved to cur/.  And this is what happened on client
side.  During next muchsync these changes had to be pushed to server,
i.e. move from new/ to cur/.

So if my assumptions are correct, actually there is no issue!  I would
just have to adjust afew filtering to prevent this behaviour.


Thank you,

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-26 Thread Amadeusz Żołnowski
Hi David,

(Resending, because I forgot to Cc mailing list.)

David Mazieres dm-list-email-notm...@scs.stanford.edu writes:
 3. I run muchsync SERVER.
 4. When it lasted much longer then initialization I canceled it by
 single SIGINT (^c).

 Interesting.  I wish I knew why this was taking much longer than running
 it on the server, and whether the delay was caused by client activity or
 server activity.

I think there was something happening on server side because with --noup
it has been completed in few seconds.

  I don't suppose you'd be willing to make a copy of your mail database
  to repeat the experiment without any risk of messing up your real
  maildir?

I would try it, but unfortunately I would have to make a bit more space
for having second copy of my mail.  I am testing muchsync on the same
machine between different users home directories, so it already takes
some space.  I'll try the experiment some day this week, I hope.

 5. I rerun muchsync SERVER and then it notified me that notmuch
 identified files names changes - more than 1000.

 Were the link changes on the client (sent) or the server (received)
 side?

On the server side.  That's why I am worried.


 I don't think that will change things.  maildir.synchronized_flags
 will make things slower, but it shouldn't affect the SUMMARY numbers
 you see at the end of muchsync, other than maybe files moving from
 .../new to .../cur.  But presumably most of your mail files were
 already in cur directories.

So what would happen on my machine is that first client initialization
took place.  During this stage muchsync moved some files from new/ to
cur/. Later running muchsync SERVER tried to reflect client changes on
server by pushing renames requests to server.  Is it what actually could
happend?


-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-24 Thread Amadeusz Żołnowski
David Mazieres dm-list-email-notm...@scs.stanford.edu writes:
 Initially, when you run muchsync --init, it copies all the files to
 your maildir, and for each file invokes
 notmuch_message_tags_to_maildir_flag.  That changes the name of the
 file, with the result that the sql database and the actual mail
 directory end up out of sync.  That on it's own is not a big deal, but
 it means that the next time muchsync, muchsync will have to rescan all
 of the files, as their names are no longer correct.  That shouldn't
 cause any extra traffic between the two machines, but it will require
 time on the client.  That is likely the source of the delay you were
 seeing.

Yes, that seems to be probable scenario.

 However, if you C-c the client during this process, I still don't see
 any problems arising that cause more links to be transferred between
 machines.  So I'm kind of stumped about that part.

I don't think that C-c caused more links transfers. It's just about that
on rerun (after C-c) the notmuch on server reported renames which
probably took place in the previous run.

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-23 Thread David Mazieres
Amadeusz Żołnowski aide...@aidecoe.name writes:

 Hi David,

 Fist of all thank you for such elaborate answer.

 I have missed the paragraph about maildir.synchronize_flags somehow.  I
 have it enabled.  So this must be source of a problem (?).

I've only ever tested with mailder.synchronize_flags = true, because I'm
worried that if I disable it I will have a hard time re-enabling it.  I
do think it is a source of inefficiency, but muchsync should still be
usable.

 Here follows steps I followed:

 1. I initialized server locally with muchsync -vv.  My mail is stored in
 ~/Mail on the server.
 2. I run muchsync --init ~/mail SERVER. (Directory names do not need to
 be the same, do they?)

Confirmed that directory names do not need to be the same.

 3. I run muchsync SERVER.
 4. When it lasted much longer then initialization I canceled it by
 single SIGINT (^c).

Interesting.  I wish I knew why this was taking much longer than running
it on the server, and whether the delay was caused by client activity or
server activity.

I don't suppose you'd be willing to make a copy of your mail database to
repeat the experiment without any risk of messing up your real maildir?
Basically what would be interesting to see is (assuming .notmuch-copy on
server is configured for this disposable copy):

# Log everything for later analysis
script
# Use new config file location for disposable copy
export NOTMUCH_CONFIG=$HOME/.notmuch-copy
# Set up a new initial database
muchsync - --init ~/test-copy SERVER - --config=.notmuch-copy

# Now initial copy is made, see if client is slow
# Is notmuch new itself slow?
notmuch new
# Is resynchronizing muchsync with notmuch slow?
muchsync -

# Now see if something weird happened on server to make notmuch new slow
ssh SERVER notmuch --config=.notmuch-copy new
# Now see if something weird happened on server to make muchsync slow
ssh SERVER muchsync - --config=.notmuch-copy

# Now finally try resynchronizing to see if this is slow
muchsync - SERVER - --config=.notmuch-copy
# Save script file
exit

Does something along these lines make sense?  If you can figure out
which of these is slow (other than --init, which always will be), and
what the verbose chatter is, that would help me narrow down and identify
any problem.

 5. I rerun muchsync SERVER and then it notified me that notmuch
 identified files names changes - more than 1000.

Were the link changes on the client (sent) or the server (received)
side?

 6. I waited a bit and then I canceled it by SIGINT.
 7. I run muchsync --noup SERVER. This took only seconds to finish.

So this suggests the issue is on the client side.  It sounds like a bug.
I wonder if I we can just reproduce this using a public email corpus, so
we don't have to worry about people's private email.

 I suspected that muchsync at step 3 and 5 tried to push files renames
 back to server.  But now I am not sure what was going on.  Have I
 desynchronized file mail flags?  It's hard to say if anything has broken
 for me, but I am a bit worried anyway.

 If I just disable maildir.synchronize_flags and rerun muchsync, will
 everything get synchronized properly?

I don't think that will change things.  maildir.synchronized_flags will
make things slower, but it shouldn't affect the SUMMARY numbers you see
at the end of muchsync, other than maybe files moving from .../new to
.../cur.  But presumably most of your mail files were already in cur
directories.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-23 Thread Amadeusz Żołnowski
Hi David,

Fist of all thank you for such elaborate answer.

I have missed the paragraph about maildir.synchronize_flags somehow.  I
have it enabled.  So this must be source of a problem (?).

Here follows steps I followed:

1. I initialized server locally with muchsync -vv.  My mail is stored in
~/Mail on the server.
2. I run muchsync --init ~/mail SERVER. (Directory names do not need to
be the same, do they?)
3. I run muchsync SERVER.
4. When it lasted much longer then initialization I canceled it by
single SIGINT (^c).
5. I rerun muchsync SERVER and then it notified me that notmuch
identified files names changes - more than 1000.
6. I waited a bit and then I canceled it by SIGINT.
7. I run muchsync --noup SERVER. This took only seconds to finish.

I suspected that muchsync at step 3 and 5 tried to push files renames
back to server.  But now I am not sure what was going on.  Have I
desynchronized file mail flags?  It's hard to say if anything has broken
for me, but I am a bit worried anyway.

If I just disable maildir.synchronize_flags and rerun muchsync, will
everything get synchronized properly?


-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-23 Thread David Mazieres
So just to follow up a bit.  I looked into things a bit further, and
here is what I found with maildir.synchronize_flags set to true.

Initially, when you run muchsync --init, it copies all the files to
your maildir, and for each file invokes
notmuch_message_tags_to_maildir_flag.  That changes the name of the
file, with the result that the sql database and the actual mail
directory end up out of sync.  That on it's own is not a big deal, but
it means that the next time muchsync, muchsync will have to rescan all
of the files, as their names are no longer correct.  That shouldn't
cause any extra traffic between the two machines, but it will require
time on the client.  That is likely the source of the delay you were
seeing.

However, if you C-c the client during this process, I still don't see
any problems arising that cause more links to be transferred between
machines.  So I'm kind of stumped about that part.

Maybe muchsync should create the original file name based on tags, so as
to avoid having to rescan in the common case.  I wish there were some
way of getting the renamed file after
notmuch_message_tags_to_maildir_flags.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-22 Thread David Mazieres
Amadeusz Żołnowski aide...@aidecoe.name writes:

 Hi,

 I am testing muchsync-2 and it looks to me that files names across
 machines are different.  Moreover when syncing again after
 initialization it seems muchsync is working on something.  I have
 canceled this and rerun muchsync.  notmuch reported lots of files
 renames on server.  What and why it happens?

What muchsync specifically synchronizes for messages in the mapping:

(directory, SHA-1-hash, link-count)

So if a directory contains two copies of a file on one machine, it will
end up with two copies on the other machine.  However, the file names
themselves are not the same, but rather are created in accordance with
the maildir spec.  (Note SHA-1 wouldn't be my first choice of hash
function, but notmuch already uses this for messages with long message
IDs, so I figured I'd just be consistent with existing practice.)

In terms of what muchsync is working on, you can run it with - on
both sides to get an idea, as in muchsync - server -.  Better
yet, you can just run it on one side with muchsync -.  You'll get
a lot of output, so maybe run it inside the script command to save the
output.maybe run it inside the script command to save the output.  If
you have enabled maildir.synchronize_flags, it could be that notmuch is
initially renaming all of your files, in which case muchsync needs to
re-hash them to make sure they haven't changed.

How did you cancel muchsync?  If you send it a single SIGINT or SIGTERM,
it attempts to clean up after itself.  However, upon multiple signals or
other signals, it immediately exits.  Muchsync is conservative about
updating the database, to avoid missing tags or files that have been
changed.  It always updates the notmuch database first, then its own
sqlite database with a version number.  That means if you kill muchsync,
some number of files may get picked up as changed again even though
really they were just copied from a peer.

To mitigate this problem, the muchsync client syncs the database every
10 seconds, so that in theory you should only get 10 seconds of extra
work from killing the client.  However, the server does not sync
periodically, on the assumption that it is more likely to read an EOF
than get killed, although currently it doesn't appear to commit any
pending transactions to the sqlite database upon EOF, which may be an
oversight.

So to summarize:

  * File names are not the same across machine, only file contents and
directory structure.

  * Give muchsync lots of -v options to see what it is doing.

  * Try to avoid killing muchsync.  Doing so is safe, but likely to
generate extra work in the form of phantom renames or tag changes
that get synchronized even though they don't need to be.

  * Possibly the server should handle EOF more gracefully and commit any
pending transactions, or the client should periodically send a
commit command to the server.

If you think something is wrong, I can help you figure it out, but I
need to know what maildir.synchronize_flags is set to on each replica,
what you mean by canceled, and roughly what was happening when you
canceled (uploading or downloading).

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch