Searching for phrases in the body of an email

2015-07-19 Thread Suvayu Ali
On Sat, Jul 18, 2015 at 12:34:16PM -0400, Xu Wang wrote:
> On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
>  wrote:
> 
> > Of course this does not help me solve my original goal, but I guess now
> > I can try different queries based on your idea.
> 
> Ah I see. Your goal is to search for phrases close to "no plain text".
> But if you use fuzzy searching but an exact grep, then it is normal
> that the numbers are not consistent, no? Because your grep is not
> fuzzy.

My grep was this (case insensitive): 'plain[[:space:]/]+text'.  Since I
thought I was searching for the _phrase_ "plain text", that would be
adequately fuzzy.  However after following Jani's advice, I realise it
wasn't always being treated as a phrase, neither was NEAR being treated
as an operator.  

I wanted to combine a phrase (plain text) with the NEAR query (NEAR no),
but maybe that combination is not possible.  That's why I tried to
combine NEAR and ADJ (as per your suggestion) by grouping them, that
does not seem to work either!

-- 
Suvayu

Open source is the future. It sets us free.


Searching for phrases in the body of an email

2015-07-19 Thread Suvayu Ali
Hi Jani,

On Sat, Jul 18, 2015 at 06:53:53PM +0300, Jani Nikula wrote:
> On Jul 18, 2015 6:32 PM, "Suvayu Ali"  wrote:
> > On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> > >
> > > First note that I believe notmuch search is case insensitive by
> > > default, so your grep should be case insensitive as well.
> >
> > Good point, I tried that, didn't change the numbers much.  The number of
> > matches from grep went up to 24, whereas notmuch count says 463.
> >
> > > More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> > > parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> > >
> >
> > Exactly, that's what I do not understand.
> >
> 
> export NOTMUCH_DEBUG_QUERY=1
> 
> might help.

That helped a lot!  This is what I get:

  $ notmuch count -- no NEAR \"plain\ text\"
  Query string is:
  no NEAR "plain text"
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND 
text:(pos=4)))
  465
  $ notmuch count -- \"plain\ text\"
  Query string is:
  "plain text"
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND (plain:(pos=1) PHRASE 2 text:(pos=2
  870

I wanted the "plain text" to be treated as a phrase, as in the second
case.  I have tried nesting the quotes.  The closest I got to was this:

  $ notmuch count -- no NEAR 'plain\ text'
  Query string is:
  no NEAR plain\ text
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND (no:(pos=1) NEAR 11 plain:(pos=2)) AND 
Ztext:(pos=3)))
  151

I then tried this:

  $ notmuch count -- no NEAR \(plain ADJ/1 text\)
  Query string is:
  no NEAR (plain ADJ/1 text)
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND 
(adj:(pos=4) PHRASE 2 1:(pos=5)) AND Ztext:(pos=6)))
  0

Again, this is not what I was expecting.  With the last one, I was
expecting to group "plain" and "text" within a distance of 1, in the
given order, and then requring "no" to be near (within 10 words, the
default) the "plain ADJ/1 text" combination.

Is my understanding of the query language completely wrong?  Apart from
`man notmuch-search-terms', I looked here:
http://xapian.org/docs/queryparser.html

Thanks for any help.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.


Searching for phrases in the body of an email

2015-07-18 Thread Jani Nikula
On Jul 18, 2015 6:32 PM, "Suvayu Ali"  wrote:
>
> Hi Xu,
>
> On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> >
> > First note that I believe notmuch search is case insensitive by
> > default, so your grep should be case insensitive as well.
>
> Good point, I tried that, didn't change the numbers much.  The number of
> matches from grep went up to 24, whereas notmuch count says 463.
>
> > More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> > parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> >
>
> Exactly, that's what I do not understand.
>

export NOTMUCH_DEBUG_QUERY=1

might help.

> > You would like to search for the exact phrase, correct? How about the
> > following?
> >
> > notmuch search no adj plain adj text
>
> Good suggestion.  I tried it, and gives me very consistent numbers:
>
> $ notmuch count -- no ADJ plain ADJ text
> 20
> $ notmuch show -- $(notmuch search --output=messages -- no NEAR \"plain\
text\") | \
>   grep -c -iE 'plain[[:space:]/]+text'
> 24
>
> Of course this does not help me solve my original goal, but I guess now
> I can try different queries based on your idea.
>
> Thanks a lot!
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
-- next part --
An HTML attachment was scrubbed...
URL: 



Searching for phrases in the body of an email

2015-07-18 Thread Suvayu Ali
Hi Xu,

On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> 
> First note that I believe notmuch search is case insensitive by
> default, so your grep should be case insensitive as well.

Good point, I tried that, didn't change the numbers much.  The number of
matches from grep went up to 24, whereas notmuch count says 463.

> More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> 

Exactly, that's what I do not understand.

> You would like to search for the exact phrase, correct? How about the
> following?
> 
> notmuch search no adj plain adj text

Good suggestion.  I tried it, and gives me very consistent numbers:

$ notmuch count -- no ADJ plain ADJ text
20
$ notmuch show -- $(notmuch search --output=messages -- no NEAR \"plain\ 
text\") | \
  grep -c -iE 'plain[[:space:]/]+text'
24

Of course this does not help me solve my original goal, but I guess now
I can try different queries based on your idea.

Thanks a lot!

-- 
Suvayu

Open source is the future. It sets us free.


Searching for phrases in the body of an email

2015-07-18 Thread Xu Wang
On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
 wrote:

> Of course this does not help me solve my original goal, but I guess now
> I can try different queries based on your idea.

Ah I see. Your goal is to search for phrases close to "no plain text".
But if you use fuzzy searching but an exact grep, then it is normal
that the numbers are not consistent, no? Because your grep is not
fuzzy.

Kind regards,

Xu


Searching for phrases in the body of an email

2015-07-18 Thread Suvayu Ali
Hi Lewis,

On Fri, Jul 17, 2015 at 10:48:57AM -0500, J. Lewis Muir wrote:
> 
> 1. Perhaps you are remembering the "no plain text" message incorrectly?
>For example, the message could have referred to "text/plain" or
>"plaintext" (no space).  These would be sufficiently different to not
>match your grep pattern.

True, but my puzzlement is notmuch shouldn't return those results in the
first place, since I provided a quoted string: "plain text", unless of
course I need to escape the quotes.  Okay, just checked it, doesn't make
a difference in the number of hits from notmuch.

> 2. Perhaps your email client rendered the "no plain text" message when
>it encountered an email with only a "text/html" content type?  In
>this case, the "no plain text" (or whatever) message would not be
>present in the email itself since it would be generated by the email
>client when rendering the email.

This is possible, but I use mutt.  As far as I know, it doesn't do
"smart" things like that.  I also recall looking at the mime parts
individually as I was surprised at the behaviour, and it was indeed a
useless text/plain part with that message.

> 3. A really long shot, but could a line wrap have occurred after "plain"
>such that "text" appeared on the next line?  Your grep pattern would
>not match that.

Good point, I tried grepping for this instead: 'plain[[:space:]/]+text',
no luck.

Thanks for your comments.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.


Searching for phrases in the body of an email

2015-07-18 Thread Xu Wang
On Sat, Jul 18, 2015 at 5:11 AM, Suvayu Ali  
wrote:
> Hi Lewis,
>
> On Fri, Jul 17, 2015 at 10:48:57AM -0500, J. Lewis Muir wrote:
>>
>> 1. Perhaps you are remembering the "no plain text" message incorrectly?
>>For example, the message could have referred to "text/plain" or
>>"plaintext" (no space).  These would be sufficiently different to not
>>match your grep pattern.
>
> True, but my puzzlement is notmuch shouldn't return those results in the
> first place, since I provided a quoted string: "plain text", unless of
> course I need to escape the quotes.  Okay, just checked it, doesn't make
> a difference in the number of hits from notmuch.
>
>> 2. Perhaps your email client rendered the "no plain text" message when
>>it encountered an email with only a "text/html" content type?  In
>>this case, the "no plain text" (or whatever) message would not be
>>present in the email itself since it would be generated by the email
>>client when rendering the email.
>
> This is possible, but I use mutt.  As far as I know, it doesn't do
> "smart" things like that.  I also recall looking at the mime parts
> individually as I was surprised at the behaviour, and it was indeed a
> useless text/plain part with that message.
>
>> 3. A really long shot, but could a line wrap have occurred after "plain"
>>such that "text" appeared on the next line?  Your grep pattern would
>>not match that.
>
> Good point, I tried grepping for this instead: 'plain[[:space:]/]+text',
> no luck.
>
> Thanks for your comments.
>
> Cheers,
>
> --
> Suvayu

Hi Suvayu,

First note that I believe notmuch search is case insensitive by
default, so your grep should be case insensitive as well.

More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
parsed. Maybe it is parsed as {no NEAR plain} or {text}.

You would like to search for the exact phrase, correct? How about the
following?

notmuch search no adj plain adj text

Best,

Xu


Re: Searching for phrases in the body of an email

2015-07-18 Thread Suvayu Ali
Hi Xu,

On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
 
 First note that I believe notmuch search is case insensitive by
 default, so your grep should be case insensitive as well.

Good point, I tried that, didn't change the numbers much.  The number of
matches from grep went up to 24, whereas notmuch count says 463.

 More importantly, I'm not sure how 'no NEAR plain text ' syntax is
 parsed. Maybe it is parsed as {no NEAR plain} or {text}.
 

Exactly, that's what I do not understand.

 You would like to search for the exact phrase, correct? How about the
 following?
 
 notmuch search no adj plain adj text

Good suggestion.  I tried it, and gives me very consistent numbers:

$ notmuch count -- no ADJ plain ADJ text
20
$ notmuch show -- $(notmuch search --output=messages -- no NEAR \plain\ 
text\) | \
  grep -c -iE 'plain[[:space:]/]+text'
24

Of course this does not help me solve my original goal, but I guess now
I can try different queries based on your idea.

Thanks a lot!

-- 
Suvayu

Open source is the future. It sets us free.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Searching for phrases in the body of an email

2015-07-18 Thread Xu Wang
On Sat, Jul 18, 2015 at 5:11 AM, Suvayu Ali fatkasuvayu+li...@gmail.com wrote:
 Hi Lewis,

 On Fri, Jul 17, 2015 at 10:48:57AM -0500, J. Lewis Muir wrote:

 1. Perhaps you are remembering the no plain text message incorrectly?
For example, the message could have referred to text/plain or
plaintext (no space).  These would be sufficiently different to not
match your grep pattern.

 True, but my puzzlement is notmuch shouldn't return those results in the
 first place, since I provided a quoted string: plain text, unless of
 course I need to escape the quotes.  Okay, just checked it, doesn't make
 a difference in the number of hits from notmuch.

 2. Perhaps your email client rendered the no plain text message when
it encountered an email with only a text/html content type?  In
this case, the no plain text (or whatever) message would not be
present in the email itself since it would be generated by the email
client when rendering the email.

 This is possible, but I use mutt.  As far as I know, it doesn't do
 smart things like that.  I also recall looking at the mime parts
 individually as I was surprised at the behaviour, and it was indeed a
 useless text/plain part with that message.

 3. A really long shot, but could a line wrap have occurred after plain
such that text appeared on the next line?  Your grep pattern would
not match that.

 Good point, I tried grepping for this instead: 'plain[[:space:]/]+text',
 no luck.

 Thanks for your comments.

 Cheers,

 --
 Suvayu

Hi Suvayu,

First note that I believe notmuch search is case insensitive by
default, so your grep should be case insensitive as well.

More importantly, I'm not sure how 'no NEAR plain text ' syntax is
parsed. Maybe it is parsed as {no NEAR plain} or {text}.

You would like to search for the exact phrase, correct? How about the
following?

notmuch search no adj plain adj text

Best,

Xu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Searching for phrases in the body of an email

2015-07-18 Thread Jani Nikula
On Jul 18, 2015 6:32 PM, Suvayu Ali fatkasuvayu+li...@gmail.com wrote:

 Hi Xu,

 On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
 
  First note that I believe notmuch search is case insensitive by
  default, so your grep should be case insensitive as well.

 Good point, I tried that, didn't change the numbers much.  The number of
 matches from grep went up to 24, whereas notmuch count says 463.

  More importantly, I'm not sure how 'no NEAR plain text ' syntax is
  parsed. Maybe it is parsed as {no NEAR plain} or {text}.
 

 Exactly, that's what I do not understand.


export NOTMUCH_DEBUG_QUERY=1

might help.

  You would like to search for the exact phrase, correct? How about the
  following?
 
  notmuch search no adj plain adj text

 Good suggestion.  I tried it, and gives me very consistent numbers:

 $ notmuch count -- no ADJ plain ADJ text
 20
 $ notmuch show -- $(notmuch search --output=messages -- no NEAR \plain\
text\) | \
   grep -c -iE 'plain[[:space:]/]+text'
 24

 Of course this does not help me solve my original goal, but I guess now
 I can try different queries based on your idea.

 Thanks a lot!

 --
 Suvayu

 Open source is the future. It sets us free.
 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Searching for phrases in the body of an email

2015-07-18 Thread Suvayu Ali
On Sat, Jul 18, 2015 at 12:34:16PM -0400, Xu Wang wrote:
 On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
 fatkasuvayu+li...@gmail.com wrote:
 
  Of course this does not help me solve my original goal, but I guess now
  I can try different queries based on your idea.
 
 Ah I see. Your goal is to search for phrases close to no plain text.
 But if you use fuzzy searching but an exact grep, then it is normal
 that the numbers are not consistent, no? Because your grep is not
 fuzzy.

My grep was this (case insensitive): 'plain[[:space:]/]+text'.  Since I
thought I was searching for the _phrase_ plain text, that would be
adequately fuzzy.  However after following Jani's advice, I realise it
wasn't always being treated as a phrase, neither was NEAR being treated
as an operator.  

I wanted to combine a phrase (plain text) with the NEAR query (NEAR no),
but maybe that combination is not possible.  That's why I tried to
combine NEAR and ADJ (as per your suggestion) by grouping them, that
does not seem to work either!

-- 
Suvayu

Open source is the future. It sets us free.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Searching for phrases in the body of an email

2015-07-18 Thread Xu Wang
On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
fatkasuvayu+li...@gmail.com wrote:

 Of course this does not help me solve my original goal, but I guess now
 I can try different queries based on your idea.

Ah I see. Your goal is to search for phrases close to no plain text.
But if you use fuzzy searching but an exact grep, then it is normal
that the numbers are not consistent, no? Because your grep is not
fuzzy.

Kind regards,

Xu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Searching for phrases in the body of an email

2015-07-17 Thread Suvayu Ali
Hi,

I'm trying to find those annoying emails which have useless plain text
parts.  As I recall, they had a phrase something along the lines of "not
available in plain text" or "no plain text".  So of course I searched
for "plain text".  But that returns hundreds of messages with no obvious
matches, I can't even find the phrase "plain text" in the body for most
of the results!

Here is an example:

$ notmuch search --limit=1 -- no NEAR "plain text"
thread:a2a6   Sat. 00:30 [1/1] NASA Jet Propulsion Laboratory; 
NASA's Curiosity Mars Rover Tracks Sunspots (2015 2015-07 inbox)
$ notmuch show --format=raw -- thread:a2a6 | grep 'plain text'
$

To make this stranger, here are more numbers:

$ notmuch show -- $(notmuch search --output=messages -- no NEAR "plain text") | 
\
  grep -c -e 'plain text'
7
$ notmuch count -- no NEAR "plain text"
461

I do not understand this at all!  Any thoughts?

Thanks in advance,

-- 
Suvayu

Open source is the future. It sets us free.


Searching for phrases in the body of an email

2015-07-17 Thread J. Lewis Muir
On 7/17/15 7:11 AM, Suvayu Ali wrote:
> Hi,
>
> I'm trying to find those annoying emails which have useless plain
> text parts.  As I recall, they had a phrase something along the lines
> of "not available in plain text" or "no plain text".  So of course I
> searched for "plain text".  But that returns hundreds of messages with
> no obvious matches, I can't even find the phrase "plain text" in the
> body for most of the results!

[snip]

> I do not understand this at all!  Any thoughts?

Hello, Suvayu.

I can't speak to the notmuch search results since I actually don't have
experience with it (I'm planning to switch my email setup to using
notmuch, but I actually haven't switched yet!), but I can give a few
ideas for some of your puzzlements:

1. Perhaps you are remembering the "no plain text" message incorrectly?
   For example, the message could have referred to "text/plain" or
   "plaintext" (no space).  These would be sufficiently different to not
   match your grep pattern.

2. Perhaps your email client rendered the "no plain text" message when
   it encountered an email with only a "text/html" content type?  In
   this case, the "no plain text" (or whatever) message would not be
   present in the email itself since it would be generated by the email
   client when rendering the email.

3. A really long shot, but could a line wrap have occurred after "plain"
   such that "text" appeared on the next line?  Your grep pattern would
   not match that.

Regards,

Lewis


Searching for phrases in the body of an email

2015-07-17 Thread Suvayu Ali
Hi,

I'm trying to find those annoying emails which have useless plain text
parts.  As I recall, they had a phrase something along the lines of not
available in plain text or no plain text.  So of course I searched
for plain text.  But that returns hundreds of messages with no obvious
matches, I can't even find the phrase plain text in the body for most
of the results!

Here is an example:

$ notmuch search --limit=1 -- no NEAR plain text
thread:a2a6   Sat. 00:30 [1/1] NASA Jet Propulsion Laboratory; 
NASA's Curiosity Mars Rover Tracks Sunspots (2015 2015-07 inbox)
$ notmuch show --format=raw -- thread:a2a6 | grep 'plain text'
$

To make this stranger, here are more numbers:

$ notmuch show -- $(notmuch search --output=messages -- no NEAR plain text) | 
\
  grep -c -e 'plain text'
7
$ notmuch count -- no NEAR plain text
461

I do not understand this at all!  Any thoughts?

Thanks in advance,

-- 
Suvayu

Open source is the future. It sets us free.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Searching for phrases in the body of an email

2015-07-17 Thread J. Lewis Muir
On 7/17/15 7:11 AM, Suvayu Ali wrote:
 Hi,

 I'm trying to find those annoying emails which have useless plain
 text parts.  As I recall, they had a phrase something along the lines
 of not available in plain text or no plain text.  So of course I
 searched for plain text.  But that returns hundreds of messages with
 no obvious matches, I can't even find the phrase plain text in the
 body for most of the results!

[snip]

 I do not understand this at all!  Any thoughts?

Hello, Suvayu.

I can't speak to the notmuch search results since I actually don't have
experience with it (I'm planning to switch my email setup to using
notmuch, but I actually haven't switched yet!), but I can give a few
ideas for some of your puzzlements:

1. Perhaps you are remembering the no plain text message incorrectly?
   For example, the message could have referred to text/plain or
   plaintext (no space).  These would be sufficiently different to not
   match your grep pattern.

2. Perhaps your email client rendered the no plain text message when
   it encountered an email with only a text/html content type?  In
   this case, the no plain text (or whatever) message would not be
   present in the email itself since it would be generated by the email
   client when rendering the email.

3. A really long shot, but could a line wrap have occurred after plain
   such that text appeared on the next line?  Your grep pattern would
   not match that.

Regards,

Lewis
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch