[PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-15 Thread David Bremner
On Sun, 15 Jan 2012 15:35:11 -0500, Austin Clements  wrote:

> We definitely need a round-trip-able dump format.  Did you consider
> using JSON to allow for future flexibility (e.g., expansion of what we
> store in the database) and so we don't have to invent our own
> encodings?  A JSON format wouldn't necessarily be a reason *not* to
> also have this format, especially considering how
> shell-script-friendly this is (versus how shell-script-unfriendly JSON
> is), I'm just curious what trade-offs you're considering.

I was looking for something fairly close to what we have, to allow
people to migrate their various scripts (e.g. nmbug) to the new format
without too much pain.  Maybe some small amount of header information at
the start of the file would support extensibility, while still being
shell script friendly.

I'm also not too sure how much overhead the JSON quoting would
induce. My tags file is currently about 10M, and on my old laptop takes
about 15s to dump. That's a long 15s when I'm trying to sync my mail.
For "normal" backup use, a little more overhead doesn't matter, although
the stories of non-linear slowdowns that people report suggest we
shouldn't get too cavalier about that.

> You might want to call this format something more self-descriptive
> like "text" or "hextext" or something in case we do want to expand in
> the future.  "sup" is probably fine for the legacy format since that's
> set in stone at this point.

yeah, I'm definitely open to better suggestions for a name



Re: [PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-15 Thread David Bremner
On Sun, 15 Jan 2012 15:35:11 -0500, Austin Clements  wrote:

> We definitely need a round-trip-able dump format.  Did you consider
> using JSON to allow for future flexibility (e.g., expansion of what we
> store in the database) and so we don't have to invent our own
> encodings?  A JSON format wouldn't necessarily be a reason *not* to
> also have this format, especially considering how
> shell-script-friendly this is (versus how shell-script-unfriendly JSON
> is), I'm just curious what trade-offs you're considering.

I was looking for something fairly close to what we have, to allow
people to migrate their various scripts (e.g. nmbug) to the new format
without too much pain.  Maybe some small amount of header information at
the start of the file would support extensibility, while still being
shell script friendly.

I'm also not too sure how much overhead the JSON quoting would
induce. My tags file is currently about 10M, and on my old laptop takes
about 15s to dump. That's a long 15s when I'm trying to sync my mail.
For "normal" backup use, a little more overhead doesn't matter, although
the stories of non-linear slowdowns that people report suggest we
shouldn't get too cavalier about that.

> You might want to call this format something more self-descriptive
> like "text" or "hextext" or something in case we do want to expand in
> the future.  "sup" is probably fine for the legacy format since that's
> set in stone at this point.

yeah, I'm definitely open to better suggestions for a name

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-15 Thread Austin Clements
Quoth David Bremner on Jan 14 at  9:40 pm:
> From: David Bremner 
> 
> sup is the old format, and remains the default.
> 
> Each line of the notmuch format is "msg_id tag tag...tag" where each
> space seperated token is 'hex-encoded' to remove troubling characters.
> In particular this format won't have the same problem with e.g. spaces
> in message-ids or tags; they will be round-trip-able.

We definitely need a round-trip-able dump format.  Did you consider
using JSON to allow for future flexibility (e.g., expansion of what we
store in the database) and so we don't have to invent our own
encodings?  A JSON format wouldn't necessarily be a reason *not* to
also have this format, especially considering how
shell-script-friendly this is (versus how shell-script-unfriendly JSON
is), I'm just curious what trade-offs you're considering.

You might want to call this format something more self-descriptive
like "text" or "hextext" or something in case we do want to expand in
the future.  "sup" is probably fine for the legacy format since that's
set in stone at this point.


Re: [PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-15 Thread Austin Clements
Quoth David Bremner on Jan 14 at  9:40 pm:
> From: David Bremner 
> 
> sup is the old format, and remains the default.
> 
> Each line of the notmuch format is "msg_id tag tag...tag" where each
> space seperated token is 'hex-encoded' to remove troubling characters.
> In particular this format won't have the same problem with e.g. spaces
> in message-ids or tags; they will be round-trip-able.

We definitely need a round-trip-able dump format.  Did you consider
using JSON to allow for future flexibility (e.g., expansion of what we
store in the database) and so we don't have to invent our own
encodings?  A JSON format wouldn't necessarily be a reason *not* to
also have this format, especially considering how
shell-script-friendly this is (versus how shell-script-unfriendly JSON
is), I'm just curious what trade-offs you're considering.

You might want to call this format something more self-descriptive
like "text" or "hextext" or something in case we do want to expand in
the future.  "sup" is probably fine for the legacy format since that's
set in stone at this point.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-14 Thread David Bremner
From: David Bremner 

sup is the old format, and remains the default.

Each line of the notmuch format is "msg_id tag tag...tag" where each
space seperated token is 'hex-encoded' to remove troubling characters.
In particular this format won't have the same problem with e.g. spaces
in message-ids or tags; they will be round-trip-able.
---
 dump-restore-private.h |   12 
 notmuch-dump.c |   47 +++
 2 files changed, 51 insertions(+), 8 deletions(-)
 create mode 100644 dump-restore-private.h

diff --git a/dump-restore-private.h b/dump-restore-private.h
new file mode 100644
index 000..34a5022
--- /dev/null
+++ b/dump-restore-private.h
@@ -0,0 +1,12 @@
+#ifndef DUMP_RESTORE_PRIVATE_H
+#define DUMP_RESTORE_PRIVATE_H
+
+#include "hex-escape.h"
+#include "command-line-arguments.h"
+
+typedef enum dump_formats {
+DUMP_FORMAT_SUP,
+DUMP_FORMAT_NOTMUCH
+} dump_format_t;
+
+#endif
diff --git a/notmuch-dump.c b/notmuch-dump.c
index a735875..0231db2 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -19,6 +19,7 @@
  */

 #include "notmuch-client.h"
+#include "dump-restore-private.h"

 int
 notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
@@ -44,9 +45,15 @@ notmuch_dump_command (unused (void *ctx), int argc, char 
*argv[])
 char *output_file_name = NULL;
 int opt_index;

+int output_format = DUMP_FORMAT_SUP;
+
 notmuch_opt_desc_t options[] = {
-   { NOTMUCH_OPT_POSITION, &output_file_name, 0, 0, 0  },
-   { 0, 0, 0, 0, 0 }
+   { NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f',
+ (notmuch_keyword_t []){ { "sup", DUMP_FORMAT_SUP },
+ { "notmuch", DUMP_FORMAT_NOTMUCH },
+ {0, 0} } },
+   { NOTMUCH_OPT_POSITION, &output_file_name, 0, 0, 0 },
+   { 0,0, 0, 0, 0 }
 };

 opt_index = parse_arguments (argc, argv, options, 1);
@@ -85,29 +92,53 @@ notmuch_dump_command (unused (void *ctx), int argc, char 
*argv[])
  */
 notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);

+char *buffer = NULL;
+size_t buffer_size = 0;
+
 for (messages = notmuch_query_search_messages (query);
 notmuch_messages_valid (messages);
 notmuch_messages_move_to_next (messages))
 {
int first = 1;
-   message = notmuch_messages_get (messages);
+   const char *message_id;

-   fprintf (output,
-"%s (", notmuch_message_get_message_id (message));
+   message = notmuch_messages_get (messages);
+   message_id = notmuch_message_get_message_id (message);
+
+   if (output_format == DUMP_FORMAT_SUP) {
+   fprintf (output, "%s (", message_id);
+   } else {
+   if (hex_encode (notmuch, message_id,
+   &buffer, &buffer_size) != HEX_SUCCESS)
+   return 1;
+   fprintf (output, "%s ", buffer);
+   }

for (tags = notmuch_message_get_tags (message);
 notmuch_tags_valid (tags);
 notmuch_tags_move_to_next (tags))
{
+   const char *tag_str = notmuch_tags_get (tags);
+
if (! first)
-   fprintf (output, " ");
+   fputs (" ", output);

-   fprintf (output, "%s", notmuch_tags_get (tags));
+   if (output_format == DUMP_FORMAT_SUP) {
+   fputs (tag_str, output);
+   } else {
+   if (hex_encode (notmuch, tag_str,
+   &buffer, &buffer_size) != HEX_SUCCESS)
+   return 1;

+   fputs (buffer, output);
+   }
first = 0;
}

-   fprintf (output, ")\n");
+   if (output_format == DUMP_FORMAT_SUP)
+   fputs (")\n", output);
+   else
+   fputs ("\n", output);

notmuch_message_destroy (message);
 }
-- 
1.7.7.3



[PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup)

2012-01-14 Thread David Bremner
From: David Bremner 

sup is the old format, and remains the default.

Each line of the notmuch format is "msg_id tag tag...tag" where each
space seperated token is 'hex-encoded' to remove troubling characters.
In particular this format won't have the same problem with e.g. spaces
in message-ids or tags; they will be round-trip-able.
---
 dump-restore-private.h |   12 
 notmuch-dump.c |   47 +++
 2 files changed, 51 insertions(+), 8 deletions(-)
 create mode 100644 dump-restore-private.h

diff --git a/dump-restore-private.h b/dump-restore-private.h
new file mode 100644
index 000..34a5022
--- /dev/null
+++ b/dump-restore-private.h
@@ -0,0 +1,12 @@
+#ifndef DUMP_RESTORE_PRIVATE_H
+#define DUMP_RESTORE_PRIVATE_H
+
+#include "hex-escape.h"
+#include "command-line-arguments.h"
+
+typedef enum dump_formats {
+DUMP_FORMAT_SUP,
+DUMP_FORMAT_NOTMUCH
+} dump_format_t;
+
+#endif
diff --git a/notmuch-dump.c b/notmuch-dump.c
index a735875..0231db2 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -19,6 +19,7 @@
  */
 
 #include "notmuch-client.h"
+#include "dump-restore-private.h"
 
 int
 notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
@@ -44,9 +45,15 @@ notmuch_dump_command (unused (void *ctx), int argc, char 
*argv[])
 char *output_file_name = NULL;
 int opt_index;
 
+int output_format = DUMP_FORMAT_SUP;
+
 notmuch_opt_desc_t options[] = {
-   { NOTMUCH_OPT_POSITION, &output_file_name, 0, 0, 0  },
-   { 0, 0, 0, 0, 0 }
+   { NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f',
+ (notmuch_keyword_t []){ { "sup", DUMP_FORMAT_SUP },
+ { "notmuch", DUMP_FORMAT_NOTMUCH },
+ {0, 0} } },
+   { NOTMUCH_OPT_POSITION, &output_file_name, 0, 0, 0 },
+   { 0,0, 0, 0, 0 }
 };
 
 opt_index = parse_arguments (argc, argv, options, 1);
@@ -85,29 +92,53 @@ notmuch_dump_command (unused (void *ctx), int argc, char 
*argv[])
  */
 notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
 
+char *buffer = NULL;
+size_t buffer_size = 0;
+
 for (messages = notmuch_query_search_messages (query);
 notmuch_messages_valid (messages);
 notmuch_messages_move_to_next (messages))
 {
int first = 1;
-   message = notmuch_messages_get (messages);
+   const char *message_id;
 
-   fprintf (output,
-"%s (", notmuch_message_get_message_id (message));
+   message = notmuch_messages_get (messages);
+   message_id = notmuch_message_get_message_id (message);
+
+   if (output_format == DUMP_FORMAT_SUP) {
+   fprintf (output, "%s (", message_id);
+   } else {
+   if (hex_encode (notmuch, message_id,
+   &buffer, &buffer_size) != HEX_SUCCESS)
+   return 1;
+   fprintf (output, "%s ", buffer);
+   }
 
for (tags = notmuch_message_get_tags (message);
 notmuch_tags_valid (tags);
 notmuch_tags_move_to_next (tags))
{
+   const char *tag_str = notmuch_tags_get (tags);
+
if (! first)
-   fprintf (output, " ");
+   fputs (" ", output);
 
-   fprintf (output, "%s", notmuch_tags_get (tags));
+   if (output_format == DUMP_FORMAT_SUP) {
+   fputs (tag_str, output);
+   } else {
+   if (hex_encode (notmuch, tag_str,
+   &buffer, &buffer_size) != HEX_SUCCESS)
+   return 1;
 
+   fputs (buffer, output);
+   }
first = 0;
}
 
-   fprintf (output, ")\n");
+   if (output_format == DUMP_FORMAT_SUP)
+   fputs (")\n", output);
+   else
+   fputs ("\n", output);
 
notmuch_message_destroy (message);
 }
-- 
1.7.7.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch