thread id specification

2014-10-09 Thread Rama
Can anyone in-the-know shed some light on how notmuch generates its
thread ids? Until recently, I'd only seen numeric thread ids 16
characters long, padded with zeroes. For example:

thread:0001
thread:0002
thread:0005
etc etc

Today, several new threads were created with non numeric ids:

thread:000d
thread:000e
thread:000f

Is this normal behavior?

--
Rama





[PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar

2014-10-09 Thread Michal Sojka
On Mon, Oct 06 2014, Tomi Ollila wrote:
> On Sun, Oct 05 2014, Michal Sojka  wrote:
>
>> The new outputs allow printing senders, recipients or both of matching
>> messages.
>>
>> This code based on a patch from Jani Nikula.
>
> OK, IMO...
>
> 1/4 OK
>
> Before 2/4 add support for 'flag' arguments, drop the --output=addresses
> option which is now done as --output=sender --output=recipients

OK

> In deduplication comment did not describe the deduplication at all...
> so I looked a bit into the code now... the Default you described was
> that with "John Doe"  and "John Doe"  EXAMPLE.COM>
> only one was printed (but not which one).

I intentionally didn't want to define which one, but I agree that it
might be useful in same cases. It would depend on --sort option and on
the order of addresses in email headers.

> Secondly, what happens with "Doe, John"  and
> "John Doe" ... ah, it is same as *addr* with
> case-insensitive address.
>
> Sorry, but IMO these options are a bit strange.

My impression is that I did bad job describing the deduplication
algorithm, which is why you don't understand it. Maybe, we can also
change the name of the option to --filter-by, or something like this.

When thinking about how to best document such an option, it seems that
the user must be aware that this is implemented as flags that are ORed.
Which means that the default should be what was in the previous patch
--unique=none.

What about the following?

``--filter-flag=``\ (**addr**\ \|\ **name**\ \|\ **addrfold**)

Can be used with ``--output=addresses``, ``--output=sender`` or
``--output=recipients`` to filter out duplicate addresses. The
filtering algorithm receives a sequence of email addresses and
outputs the same sequence without the addresses that are
considered a duplicate of a previously output address. What is
considered a duplicate depends on the flags given:

**addr** means that the address part is compared.
Case-sensitivity can be controlled by **addrfold** flag (see
below). For example, the addresses "John Doe "
and "Dr. John Doe " will be considered
duplicate.

**name** means that the name part is compared in case-sensitive
manner. For example, the addresses "John Doe "
and "John Doe " will be considered duplicate.

**addrfold** when used with the **addr** flag, the address
comparison is performed in case-insensitive manner. For example,
the addresses "John Doe " and "Dr. John Doe
" will be considered duplicate.

To specify multiple flags, this option can be given multiple
times. For example, ``--filter-flag=name --filter-flag=addr``
will print unique case-sensitive combinations of both name and
address parts.

With this, the previously default behavior would now has to be spelled
as "--filter-flag=addr --filter-flag=addrfold".

I'm not sure it is wise present such a low-level interface (flags) to
command-line users, but it is hopefully more understandable now. What do
you think?

> Not to go to choose which one to choose (first, last, most common) instead
> of the suggested options these should be the ones:
>
> 1) "John Doe"  and "John Doe"  EXAMPLE.COM>:
> only one printed, but if either were "Dr. John Doe", both of these are printed
> (this as default).

According to the above, which could be achieved as --filter-flag=name
--filter-flag=addr --filter-flag=addrfold.

> 2) same as above, but only make case-insensitive

case-insensitive is already in 1), you probably mean case-sensitive.

> address match -- i.e. in the 2 above cases in option 1, print only
> one.

This would be --filter-flag=name --filter-flag=addr.

> (and same name but different address to perhaps never been an option...)
>
> I might like to have option that does case-sensitive address match, 

This would be just --filter-flag=addr.

As a side note, it is interesting, that you mentioned your options as an
enumeration even though they are actually combinations of several on/off
flags. I think that it is more natural for human brains to think in
terms of simple lists than in terms of combinations of flags. That's why
I originally implemented --output=addresses as just another keyword,
rather than requiring the user to specify both sender and receivers.

Thanks for the review.
-Michal

> In those cases I don't know the recipient's culture and the email he
> sent to me used format  (and not knowing which
> one is the first and which last name (or whatever names these are) --
> just to reply in same case format in respect...


>
>
> Tomi
>
>
>> ---
>>  completion/notmuch-completion.bash |   2 +-
>>  completion/notmuch-completion.zsh  |   3 +-
>>  doc/man1/notmuch-search.rst|  22 +++-
>>  notmuch-search.c   | 100 
>> ++---
>>  test/T090-search-output.sh |  64 
>>  5 files changed, 

Re: [PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar

2014-10-09 Thread Michal Sojka
On Mon, Oct 06 2014, Tomi Ollila wrote:
 On Sun, Oct 05 2014, Michal Sojka sojk...@fel.cvut.cz wrote:

 The new outputs allow printing senders, recipients or both of matching
 messages.

 This code based on a patch from Jani Nikula.

 OK, IMO...

 1/4 OK

 Before 2/4 add support for 'flag' arguments, drop the --output=addresses
 option which is now done as --output=sender --output=recipients

OK

 In deduplication comment did not describe the deduplication at all...
 so I looked a bit into the code now... the Default you described was
 that with John Doe john@example.com and John Doe 
 john@example.com
 only one was printed (but not which one).

I intentionally didn't want to define which one, but I agree that it
might be useful in same cases. It would depend on --sort option and on
the order of addresses in email headers.

 Secondly, what happens with Doe, John john@example.com and
 John Doe john@example.com... ah, it is same as *addr* with
 case-insensitive address.

 Sorry, but IMO these options are a bit strange.

My impression is that I did bad job describing the deduplication
algorithm, which is why you don't understand it. Maybe, we can also
change the name of the option to --filter-by, or something like this.

When thinking about how to best document such an option, it seems that
the user must be aware that this is implemented as flags that are ORed.
Which means that the default should be what was in the previous patch
--unique=none.

What about the following?

``--filter-flag=``\ (**addr**\ \|\ **name**\ \|\ **addrfold**)

Can be used with ``--output=addresses``, ``--output=sender`` or
``--output=recipients`` to filter out duplicate addresses. The
filtering algorithm receives a sequence of email addresses and
outputs the same sequence without the addresses that are
considered a duplicate of a previously output address. What is
considered a duplicate depends on the flags given:

**addr** means that the address part is compared.
Case-sensitivity can be controlled by **addrfold** flag (see
below). For example, the addresses John Doe j...@example.com
and Dr. John Doe j...@example.com will be considered
duplicate.

**name** means that the name part is compared in case-sensitive
manner. For example, the addresses John Doe j...@example.com
and John Doe j...@doe.name will be considered duplicate.

**addrfold** when used with the **addr** flag, the address
comparison is performed in case-insensitive manner. For example,
the addresses John Doe j...@example.com and Dr. John Doe
j...@example.com will be considered duplicate.

To specify multiple flags, this option can be given multiple
times. For example, ``--filter-flag=name --filter-flag=addr``
will print unique case-sensitive combinations of both name and
address parts.

With this, the previously default behavior would now has to be spelled
as --filter-flag=addr --filter-flag=addrfold.

I'm not sure it is wise present such a low-level interface (flags) to
command-line users, but it is hopefully more understandable now. What do
you think?

 Not to go to choose which one to choose (first, last, most common) instead
 of the suggested options these should be the ones:

 1) John Doe john@example.com and John Doe john@example.com:
 only one printed, but if either were Dr. John Doe, both of these are printed
 (this as default).

According to the above, which could be achieved as --filter-flag=name
--filter-flag=addr --filter-flag=addrfold.

 2) same as above, but only make case-insensitive

case-insensitive is already in 1), you probably mean case-sensitive.

 address match -- i.e. in the 2 above cases in option 1, print only
 one.

This would be --filter-flag=name --filter-flag=addr.

 (and same name but different address to perhaps never been an option...)

 I might like to have option that does case-sensitive address match, 

This would be just --filter-flag=addr.

As a side note, it is interesting, that you mentioned your options as an
enumeration even though they are actually combinations of several on/off
flags. I think that it is more natural for human brains to think in
terms of simple lists than in terms of combinations of flags. That's why
I originally implemented --output=addresses as just another keyword,
rather than requiring the user to specify both sender and receivers.

Thanks for the review.
-Michal

 In those cases I don't know the recipient's culture and the email he
 sent to me used format foo@example.org (and not knowing which
 one is the first and which last name (or whatever names these are) --
 just to reply in same case format in respect...




 Tomi


 ---
  completion/notmuch-completion.bash |   2 +-
  completion/notmuch-completion.zsh  |   3 +-
  doc/man1/notmuch-search.rst|  22 +++-
  notmuch-search.c