Re: [hackers] [PATCH][sbase] tar: use bigger buffer size to increase performance

2017-09-10 Thread Silvan Jegen
Hi

On Sat, Sep 09, 2017 at 10:29:21AM -0700, Michael Forney wrote:
> On Sat, Sep 9, 2017 at 2:08 AM, Silvan Jegen  wrote:
> > From: Jim Beveridge 
> >
> > The original code is by Jim Beveridge working on Fuchsia. I merged it
> > with slight changes.
> >
> > Time to tar two 1GB files:
> >
> > Before patch:
> >
> > real0m6.428s
> > user0m0.245s
> > sys 0m4.881s
> >
> > real0m6.454s
> > user0m0.239s
> > sys 0m4.883s
> >
> > real0m6.515s
> > user0m0.259s
> > sys 0m4.839s
> >
> > After patch:
> >
> > real0m4.755s
> > user0m0.026s
> > sys 0m1.598s
> >
> > real0m4.788s
> > user0m0.063s
> > sys 0m1.578s
> >
> > real0m4.822s
> > user0m0.007s
> > sys 0m1.662s
> >
> > A similar speedup can be observed for untaring.
> >
> > In addition to the buffer size increase we change the code to only create
> > directories for non-compliant tar files and we check for st to be NULL
> > in the recursive copy function.
> 
> He also sent me a pull request on my github branch for oasis:
> https://github.com/michaelforney/sbase/pull/2

Ah, I did not see that one.
 

> I think we should work on fixing correctness of tar before trying to
> optimize it. Currently it does not handle short reads or writes at all
> (when working with pipes). I was thinking we should add a readall in
> libutil analogous to writeall and then make use of that.

That sounds good to me.
 

> Regarding COPY_CHUNK_SIZE, it is probably a good idea to put that in
> util.h (perhaps with a different name). concat has the same problem
> with a small BUFSIZ (musl's is only 1024).

What do you think of Hiltjo's idea of querying for the page size with

long sz = sysconf(_SC_PAGESIZE);

or similar? Such code could still be put inot util.h of course.
 

> > ---
> >  tar.c | 72 
> > +++
> >  1 file changed, 55 insertions(+), 17 deletions(-)
> >
> > diff --git a/tar.c b/tar.c
> > index 53a737c..8cd1abe 100644
> > --- a/tar.c
> > +++ b/tar.c
> > @@ -16,6 +16,8 @@
> >  #include "util.h"
> >
> >  #define BLKSIZ 512
> > +// COPY_CHUNK_SIZE must be a power of 2
> > +#define COPY_CHUNK_SIZE 8192
> >
> >  enum Type {
> > REG   = '0',
> > @@ -236,10 +238,13 @@ archive(const char *path)
> > ewrite(tarfd, b, BLKSIZ);
> >
> > if (fd != -1) {
> > -   while ((l = eread(fd, b, BLKSIZ)) > 0) {
> > -   if (l < BLKSIZ)
> > -   memset(b + l, 0, BLKSIZ - l);
> > -   ewrite(tarfd, b, BLKSIZ);
> > +   char chunk[COPY_CHUNK_SIZE];
> > +   while ((l = eread(fd, chunk, COPY_CHUNK_SIZE)) > 0) {
> > +   // Ceiling to BLKSIZ boundary
> > +   int ceilsize = (l + (BLKSIZ-1)) & ~(BLKSIZ-1);
> > +   if (l < ceilsize)
> > +   memset(chunk + l, 0, ceilsize - l);
> > +   ewrite(tarfd, chunk, ceilsize);
> > }
> > close(fd);
> > }
> > @@ -250,7 +255,7 @@ archive(const char *path)
> >  static int
> >  unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> >  {
> > -   char lname[101], *tmp, *p;
> > +   char lname[101], *p;
> > long mode, major, minor, type, mtime, uid, gid;
> > struct header *h = (struct header *)b;
> > int fd = -1;
> > @@ -261,9 +266,13 @@ unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> > if (remove(fname) < 0 && errno != ENOENT)
> > weprintf("remove %s:", fname);
> >
> > -   tmp = estrdup(fname);
> > -   mkdirp(dirname(tmp), 0777, 0777);
> > -   free(tmp);
> > +   // tar files normally create the directory chain. This is a fallback
> > +   // for noncompliant tar files.
> > +   if (h->type != DIRECTORY) {
> > +   char* tmp = estrdup(fname);
> > +   mkdirp(dirname(tmp), 0777, 0777);
> > +   free(tmp);
> > +   }
> 
> If you tar a file within another directory, you end up with just one
> entry. This check means that the parent directory won't be created
> when trying to extract this file. Other tar implementations are able
> to extract such an archive.
> 
> >
> > switch (h->type) {
> > case REG:
> > @@ -319,9 +328,25 @@ unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> > eprintf("strtol %s: invalid number\n", h->gid);
> >
> > if (fd != -1) {
> > -   for (; l > 0; l -= BLKSIZ)
> > -   if (eread(tarfd, b, BLKSIZ) > 0)
> > -   ewrite(fd, b, MIN(l, BLKSIZ));
> > +   // Ceiling to BLKSIZ boundary
> > +   int readsize = (l + (BLKSIZ-1)) & ~(BLKSIZ-1);
> > +   char chunk[COPY_CHUNK_SIZE];
> > +   int lastread = 0;
> > +
> > +   for (; readsize > 0; l -= lastread, readsize -= lastread) {
> > +   in

Re: [hackers] [PATCH][sbase] tar: use bigger buffer size to increase performance

2017-09-10 Thread Silvan Jegen
Hi Hiltjo

Thanks for the review!

On Sat, Sep 09, 2017 at 01:06:21PM +0200, Hiltjo Posthuma wrote:
> On Sat, Sep 09, 2017 at 11:08:42AM +0200, Silvan Jegen wrote:
> > From: Jim Beveridge 
> > 
> > The original code is by Jim Beveridge working on Fuchsia. I merged it
> > with slight changes.
> > 
> 
> To be clear: is it under the sbase LICENSE?

Yes, from what I can tell the License for this code is the same as for
the rest of sbase.

> >  #define BLKSIZ 512
> > +// COPY_CHUNK_SIZE must be a power of 2
> > +#define COPY_CHUNK_SIZE 8192
> >  
> 
> Instead of COPY_CHUNK_SIZE is might be worthwhile to query the pagesize, but
> i've not tested it.

Yes, I will have a look.

 
> >  enum Type {
> > REG   = '0',
> > @@ -236,10 +238,13 @@ archive(const char *path)
> > ewrite(tarfd, b, BLKSIZ);
> >  
> > if (fd != -1) {
> > -   while ((l = eread(fd, b, BLKSIZ)) > 0) {
> > -   if (l < BLKSIZ)
> > -   memset(b + l, 0, BLKSIZ - l);
> > -   ewrite(tarfd, b, BLKSIZ);
> > +   char chunk[COPY_CHUNK_SIZE];
> > +   while ((l = eread(fd, chunk, COPY_CHUNK_SIZE)) > 0) {
> > +   // Ceiling to BLKSIZ boundary
> > +   int ceilsize = (l + (BLKSIZ-1)) & ~(BLKSIZ-1);
> > +   if (l < ceilsize)
> > +   memset(chunk + l, 0, ceilsize - l);
> > +   ewrite(tarfd, chunk, ceilsize);
> > }
> > close(fd);
> > }
> > @@ -250,7 +255,7 @@ archive(const char *path)
> >  static int
> >  unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> >  {
> > -   char lname[101], *tmp, *p;
> > +   char lname[101], *p;
> > long mode, major, minor, type, mtime, uid, gid;
> > struct header *h = (struct header *)b;
> > int fd = -1;
> > @@ -261,9 +266,13 @@ unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> > if (remove(fname) < 0 && errno != ENOENT)
> > weprintf("remove %s:", fname);
> >  
> > -   tmp = estrdup(fname);
> > -   mkdirp(dirname(tmp), 0777, 0777);
> > -   free(tmp);
> > +   // tar files normally create the directory chain. This is a fallback
> > +   // for noncompliant tar files.
> > +   if (h->type != DIRECTORY) {
> > +   char* tmp = estrdup(fname);
> > +   mkdirp(dirname(tmp), 0777, 0777);
> > +   free(tmp);
> > +   }
> >  
> > switch (h->type) {
> > case REG:
> > @@ -319,9 +328,25 @@ unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> > eprintf("strtol %s: invalid number\n", h->gid);
> >  
> > if (fd != -1) {
> > -   for (; l > 0; l -= BLKSIZ)
> > -   if (eread(tarfd, b, BLKSIZ) > 0)
> > -   ewrite(fd, b, MIN(l, BLKSIZ));
> > +   // Ceiling to BLKSIZ boundary
> > +   int readsize = (l + (BLKSIZ-1)) & ~(BLKSIZ-1);
> > +   char chunk[COPY_CHUNK_SIZE];
> > +   int lastread = 0;
> > +
> > +   for (; readsize > 0; l -= lastread, readsize -= lastread) {
> > +   int chunk_size = MIN(readsize, COPY_CHUNK_SIZE);
> > +   // Short reads are legal, so don't expect to read
> > +   // everything that was requested.
> > +   lastread = eread(tarfd, chunk, chunk_size);
> > +   if (lastread == 0) {
> > +   close(fd);
> > +   remove(fname);
> 
> Do all the tar tools remove the file in this case? It might be better to not
> remove it.

You are right, this code should probably be removed if we use a
ReadAll-like function.


> > +   eprintf("unexpected end of file reading %s.\n",
> > +   fname);
> > +   }
> > +
> > +   ewrite(fd, chunk, MIN(l, lastread));
> > +   }
> > close(fd);
> > }
> >  
> > @@ -331,7 +356,7 @@ unarchive(char *fname, ssize_t l, char b[BLKSIZ])
> > times[0].tv_sec = times[1].tv_sec = mtime;
> > times[0].tv_nsec = times[1].tv_nsec = 0;
> > if (!mflag && utimensat(AT_FDCWD, fname, times, AT_SYMLINK_NOFOLLOW) < 
> > 0)
> > -   weprintf("utimensat %s:\n", fname);
> > +   weprintf("utimensat %s %d:\n", fname, errno);
> > if (h->type == SYMLINK) {
> > if (!getuid() && lchown(fname, uid, gid))
> > weprintf("lchown %s:\n", fname);
> > @@ -349,10 +374,23 @@ static void
> >  skipblk(ssize_t l)
> >  {
> > char b[BLKSIZ];
> > -
> > -   for (; l > 0; l -= BLKSIZ)
> > -   if (!eread(tarfd, b, BLKSIZ))
> > -   break;
> > +   int lastread = 0;
> > +   // Ceiling to BLKSIZ boundary
> > +   int ceilsize = (l + (BLKSIZ-1)) & ~(BLKSIZ-1);
> > +
> > +   off_t offset = lseek(tarfd, ceilsize, SEEK_CUR);
> > +   if (offset >= ceilsize)
> > +   return;
> > +   if (errno != ESPIPE)
> > +   eprintf("unexpected end of file.\n");
> > +
> > +   // This is a pipe, socke

[hackers] [slstatus] slstatus load_avg format string || Kamil Cholewiński

2017-09-10 Thread git
commit b7d7ce9c5f9ea360955325526e7f8010ae5b1347
Author: Kamil Cholewiński 
AuthorDate: Fri Sep 8 23:08:28 2017 +0200
Commit: Aaron Marcher 
CommitDate: Sun Sep 10 14:08:20 2017 +0200

slstatus load_avg format string

diff --git a/config.def.h b/config.def.h
index f4a6bed..0e28d1e 100644
--- a/config.def.h
+++ b/config.def.h
@@ -29,7 +29,7 @@ static const char unknown_str[] = "n/a";
  * ipv6 IPv6 addressinterface name
  * kernel_release   `uname -r`  NULL
  * keyboard_indicators  caps/num lock indicatorsNULL
- * load_avg load averageNULL
+ * load_avg load averageformat string
  * num_filesnumber of files in a directory  path
  * ram_free free memory in GB   NULL
  * ram_perc memory usage in percent NULL
diff --git a/slstatus.c b/slstatus.c
index d0d1767..e69423b 100644
--- a/slstatus.c
+++ b/slstatus.c
@@ -52,7 +52,7 @@ static const char *ipv4(const char *iface);
 static const char *ipv6(const char *iface);
 static const char *kernel_release(void);
 static const char *keyboard_indicators(void);
-static const char *load_avg(void);
+static const char *load_avg(const char *fmt);
 static const char *num_files(const char *dir);
 static const char *ram_free(void);
 static const char *ram_perc(void);
@@ -394,7 +394,7 @@ keyboard_indicators(void)
 }
 
 static const char *
-load_avg(void)
+load_avg(const char *fmt)
 {
double avgs[3];
 
@@ -403,7 +403,7 @@ load_avg(void)
return unknown_str;
}
 
-   return bprintf("%.2f %.2f %.2f", avgs[0], avgs[1], avgs[2]);
+   return bprintf(fmt, avgs[0], avgs[1], avgs[2]);
 }
 
 static const char *



[hackers] [PATCH] dd: Use sigaction(2) to obviate select(2)

2017-09-10 Thread Eric Pruitt
By setting the SIGINT handler with sigaction(2), automatic retries of
the splice(2) syscall can be disabled by not setting SA_RESTART. This
makes it possible to use Ctrl+C even if the "if" operand refers to the
controlling terminal. The SIGINT message has also been moved outside the
signal handler since fprintf(3) is not an async-signal-safe function.
---
 dd.c | 61 -
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git dd.c dd.c
index cc05d40..2dce98e 100644
--- dd.c
+++ dd.c
@@ -8,7 +8,6 @@
  */
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -38,6 +37,15 @@ struct dd_config {
 
 static int sigint = 0;
 
+static void
+sig_int(int unused_1, siginfo_t *unused_2, void *unused_3)
+{
+   (void) unused_1;
+   (void) unused_2;
+   (void) unused_3;
+   sigint = 1;
+}
+
 static int
 prepare_copy(struct dd_config *ddc, int *ifd, int *ofd)
 {
@@ -147,7 +155,6 @@ copy_splice(struct dd_config *ddc)
int ifd, ofd, p[2] = {-1, -1};
ssize_t r = 0;
size_t n = 0;
-   fd_set rfd, wfd;
 
if (prepare_copy(ddc, &ifd, &ofd) < 0)
return -1;
@@ -165,27 +172,24 @@ copy_splice(struct dd_config *ddc)
 #endif
n = ddc->bs;
for (;ddc->b_out != ddc->count && !sigint;) {
-   FD_ZERO(&rfd);
-   FD_ZERO(&wfd);
-   FD_SET(ifd, &rfd);
-   FD_SET(ofd, &wfd);
-   r = select(ifd > ofd ? ifd + 1 : ofd + 1, &rfd, &wfd, NULL, 
NULL);
if (r < 0)
break;
-   if (FD_ISSET(ifd, &rfd) == 1 && FD_ISSET(ofd, &wfd) == 1) {
-   if (n > ddc->count - ddc->b_out)
-   n = ddc->count - ddc->b_out;
-   r = splice(ifd, NULL, p[1], NULL, n, SPLICE_F_MORE);
-   if (r <= 0)
-   break;
-   ++ddc->rec_in;
-   r = splice(p[0], NULL, ofd, NULL, r, SPLICE_F_MORE);
-   if (r <= 0)
-   break;
-   ddc->b_out += r;
-   ++ddc->rec_out;
-   }
+   if (n > ddc->count - ddc->b_out)
+   n = ddc->count - ddc->b_out;
+   r = splice(ifd, NULL, p[1], NULL, n, SPLICE_F_MORE);
+   if (r <= 0)
+   break;
+   ++ddc->rec_in;
+   r = splice(p[0], NULL, ofd, NULL, r, SPLICE_F_MORE);
+   if (r <= 0)
+   break;
+   ddc->b_out += r;
+   ++ddc->rec_out;
}
+
+   if (sigint)
+   fprintf(stderr, "SIGINT! Aborting ...\n");
+
close(ifd);
close(ofd);
close(p[0]);
@@ -227,14 +231,6 @@ print_stat(const struct dd_config *ddc)
 }
 
 static void
-sig_int(int unused)
-{
-   (void) unused;
-   fprintf(stderr, "SIGINT! Aborting ...\n");
-   sigint = 1;
-}
-
-static void
 usage(void)
 {
eprintf("usage: %s [-h] [if=infile] [of=outfile] [bs[=N]] [seek=N] "
@@ -248,6 +244,7 @@ main(int argc, char *argv[])
int i = 0;
char buf[1024];
struct dd_config config;
+   struct sigaction sa;
 
argv0 = argv[0];
memset(&config, 0, sizeof(config));
@@ -286,7 +283,13 @@ main(int argc, char *argv[])
}
 
signal(SIGPIPE, SIG_IGN);
-   signal(SIGINT, sig_int);
+
+   sa.sa_flags = SA_SIGINFO;
+   sigemptyset(&sa.sa_mask);
+   sa.sa_sigaction = sig_int;
+
+   if (sigaction(SIGINT, &sa, NULL) == -1)
+   weprintf("sigaction");
 
if (copy(&config) < 0)
weprintf("copy:");
-- 
2.11.0





Re: [hackers] [PATCH] [ubase] dd: Use sigaction(2) to obviate select(2)

2017-09-10 Thread Eric Pruitt
On Sun, Sep 10, 2017 at 11:55:53AM +0100, Richard Ipsum wrote:
> This isn't a comment on this patch exactly, since this call exists in
> the existing handler, but unless I'm missing something about the way
> you're doing this it isn't safe to call fprintf in a signal handler
> since your signal may be interrupting an fprintf call itself, and
> since the data structures for stdio aren't reentrant, things will get
> messy.

I personally don't think there's any benefit in displaying the message
at all. I'll post a revised patch since moving the warning is trivial,
and it'll give me a chance to fix my botched commit message.

Eric



Re: [hackers] [PATCH] [ubase] dd: Use sigaction(2) to obviate select(2)

2017-09-10 Thread Richard Ipsum
On Sat, Sep 09, 2017 at 11:06:15PM -0700, Eric Pruitt wrote:
> By setting the SIGINT handler using sigaction(2), automatic retries of
> the splice(2) call can be disabled by not setting SA_RESTART. This makes
> it possible to use Ctrl+C even if standard input is a terminal.
> ---
[snip]
> +static void
> +sig_int(int unused_1, siginfo_t *unused_2, void *unused_3)
> +{
> + (void) unused_1;
> + (void) unused_2;
> + (void) unused_3;
> + fprintf(stderr, "SIGINT! Aborting ...\n");

This isn't a comment on this patch exactly, since this call
exists in the existing handler, but unless I'm missing something about
the way you're doing this it isn't safe to call fprintf in a signal
handler since your signal may be interrupting an fprintf call itself,
and since the data structures for stdio aren't reentrant, things will
get messy.

Richard