[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread Carl Worth
On Sat, 19 Dec 2009 09:02:11 +0100, Marten Veldthuis  
wrote:
> > Anyone have a solution here?
> 
> Something like "git help add" just opens the manpage for git-add. Can't
> we do the same here?

The granularity is different, though. I like that "notmuch help show"
shows just the documentation for "notmuch show". And I also like that
"man notmuch" shows all the documentation, (without having to have N
different man pages for each of the sub-commands).

Meanwhile, the git approach does mean that one doesn't get any "builtin"
help unless the external man-based stuff is working and installed. I'm
not sure that I want to depend on that.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread Marten Veldthuis
On Fri, 18 Dec 2009 21:24:10 -0800, Carl Worth  wrote:
> Currently we're replicating all of our documentation both in the man
> page and in the output from "notmuch help". It's annoying to have to
> add everything in two places, but I don't have a good idea for making
> that sharable yet.
> 
> Anyone have a solution here?

Something like "git help add" just opens the manpage for git-add. Can't
we do the same here?

-- 
- Marten


[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread James Westby
On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth  wrote:
> You can, actually. Just set the NOTMUCH_CONFIG environment variable to
> your alternate configuration file. (And yes, we're missing any mention
> of this in our documentation.)

Sweet. Where would be the best place to document it? Just in the
man page?

Thanks,

James


[notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread James Westby
On Fri, 18 Dec 2009 14:29:21 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 21:21:03 +, James Westby  jameswestby.net> wrote:
> Yes, a value makes sense here and should make the value easy to
> retrieve.

Excellent.

> I usually use a little tool I wrote called xapian-dump. It currently
> exists only in the git history of notmuch. Look at commit:
> 
>   22691064666c03c5e76bc787395bfe586929f4cc
> 
> or so.

Thanks, I found delve, which at least showed that something was
being stored. It's in the xapian-tools package, and

   delve -V2 

prints out the filesize value for each document.

It would be great if we could specify an alternative configuration
file for testing so that I can set up a small maildir and test
against that.

> If the file size is just an integer, then you shouldn't need a custom
> ValueRangeProcessor. One of the existing processors in Xapian should
> work fine.

Correct, I hadn't read the documentation closely enough. After fixing
that and doing some testing I have this working now. Patch incoming.

Thanks,

James


Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread Marten Veldthuis
On Fri, 18 Dec 2009 21:24:10 -0800, Carl Worth cwo...@cworth.org wrote:
 Currently we're replicating all of our documentation both in the man
 page and in the output from notmuch help. It's annoying to have to
 add everything in two places, but I don't have a good idea for making
 that sharable yet.
 
 Anyone have a solution here?

Something like git help add just opens the manpage for git-add. Can't
we do the same here?

-- 
- Marten
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-19 Thread Carl Worth
On Sat, 19 Dec 2009 09:02:11 +0100, Marten Veldthuis mar...@veldthuis.com 
wrote:
  Anyone have a solution here?
 
 Something like git help add just opens the manpage for git-add. Can't
 we do the same here?

The granularity is different, though. I like that notmuch help show
shows just the documentation for notmuch show. And I also like that
man notmuch shows all the documentation, (without having to have N
different man pages for each of the sub-commands).

Meanwhile, the git approach does mean that one doesn't get any builtin
help unless the external man-based stuff is working and installed. I'm
not sure that I want to depend on that.

-Carl


pgpoF4R79J6hi.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Sat, 19 Dec 2009 01:35:46 +, James Westby  
wrote:
> On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth  wrote:
> > You can, actually. Just set the NOTMUCH_CONFIG environment variable to
> > your alternate configuration file. (And yes, we're missing any mention
> > of this in our documentation.)
> 
> Sweet. Where would be the best place to document it? Just in the
> man page?

Currently we're replicating all of our documentation both in the man
page and in the output from "notmuch help". It's annoying to have to
add everything in two places, but I don't have a good idea for making
that sharable yet.

Anyone have a solution here?

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query "filesize:0..100" if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }

+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string , std::string &) {
+if (begin.substr(0, 9) != "filesize:")
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;

 if (asprintf (_path, "%s/%s", path, ".notmuch") == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
notmuch->query_parser->set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->value_range_processor);
+   notmuch->query_parser->add_valuerangeprocessor (_proc);

for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = _PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, "type", "mail");
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message->doc.set_data (s);
 }

+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize < 0) {
+   if (stat (filename, )) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);

 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;

 /* Xapian (with flint backend) complains if we provide a term longer
@@ -193,6 

[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Sat, 19 Dec 2009 00:08:24 +, James Westby  
wrote:
> Thanks, I found delve, which at least showed that something was
> being stored. It's in the xapian-tools package, and
> 
>delve -V2 
> 
> prints out the filesize value for each document.

Ah, right. I had forgotten about that.

> It would be great if we could specify an alternative configuration
> file for testing so that I can set up a small maildir and test
> against that.

You can, actually. Just set the NOTMUCH_CONFIG environment variable to
your alternate configuration file. (And yes, we're missing any mention
of this in our documentation.)

> Correct, I hadn't read the documentation closely enough. After fixing
> that and doing some testing I have this working now. Patch incoming.

Cool!

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 21:21:03 +, James Westby  
wrote:
>   Here's the first part, storing the filesize. I'm using
>   add_value so that we can make it sortable, is that valid
>   for retrieving it as well?

Yes, a value makes sense here and should make the value easy to
retrieve.

>   The only thing I'm not sure about is if it works. Is there
>   a way to inspect a document to see the values that are
>   stored?

I usually use a little tool I wrote called xapian-dump. It currently
exists only in the git history of notmuch. Look at commit:

22691064666c03c5e76bc787395bfe586929f4cc

or so.

> Doing a search isn't working, so I imagine I made a mistake.

Let's see... (just reviewing here, not testing)..

> +struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
> +FilesizeValueRangeProcessor() {}
> +
> +Xapian::valueno operator()(std::string , std::string &) {
> +if (begin.substr(0, 9) != "filesize:")
> +return Xapian::BAD_VALUENO;
> +begin.erase(0, 9);
> +return NOTMUCH_VALUE_FILESIZE;
> +}
> +};

If the file size is just an integer, then you shouldn't need a custom
ValueRangeProcessor. One of the existing processors in Xapian should
work fine.

Having not ever written a custom processor, I can't say whether the one
above is correct or not.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query filesize:0..100 if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }
 
+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string begin, std::string ) {
+if (begin.substr(0, 9) != filesize:)
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;
 
 if (asprintf (notmuch_path, %s/%s, path, .notmuch) == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch-query_parser-set_stemmer (Xapian::Stem (english));
notmuch-query_parser-set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch-query_parser-add_valuerangeprocessor 
(notmuch-value_range_processor);
+   notmuch-query_parser-add_valuerangeprocessor (filesize_proc);
 
for (i = 0; i  ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = BOOLEAN_PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, type, mail);
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message-doc.set_data (s);
 }
 
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize  0) {
+   if (stat (filename, st)) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message-doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);
 
 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;
 
 /* Xapian (with flint backend) complains if we provide a term longer
@@ 

Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 21:21:03 +, James Westby jw+deb...@jameswestby.net 
wrote:
   Here's the first part, storing the filesize. I'm using
   add_value so that we can make it sortable, is that valid
   for retrieving it as well?

Yes, a value makes sense here and should make the value easy to
retrieve.

   The only thing I'm not sure about is if it works. Is there
   a way to inspect a document to see the values that are
   stored?

I usually use a little tool I wrote called xapian-dump. It currently
exists only in the git history of notmuch. Look at commit:

22691064666c03c5e76bc787395bfe586929f4cc

or so.

 Doing a search isn't working, so I imagine I made a mistake.

Let's see... (just reviewing here, not testing)..

 +struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
 +FilesizeValueRangeProcessor() {}
 +
 +Xapian::valueno operator()(std::string begin, std::string ) {
 +if (begin.substr(0, 9) != filesize:)
 +return Xapian::BAD_VALUENO;
 +begin.erase(0, 9);
 +return NOTMUCH_VALUE_FILESIZE;
 +}
 +};

If the file size is just an integer, then you shouldn't need a custom
ValueRangeProcessor. One of the existing processors in Xapian should
work fine.

Having not ever written a custom processor, I can't say whether the one
above is correct or not.

-Carl


pgp7QrUqZ9sn5.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth cwo...@cworth.org wrote:
 You can, actually. Just set the NOTMUCH_CONFIG environment variable to
 your alternate configuration file. (And yes, we're missing any mention
 of this in our documentation.)

Sweet. Where would be the best place to document it? Just in the
man page?

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch