Re: uniq -c outputs SPACES NUMBER TAB LINE

2009-03-05 Thread Leif LeBaron
 Would you care to submit a documentation patch?

... 


I have much respect and appreciation for rms  al. whose work has contributed 
to 
software/information freedom; however, I am increasingly dismayed with 
GNU software, and increasingly diverted to other tools.

It seems useful to promote the potential of potent simplicity.

http://harmful.cat-v.org/cat-v/unix_prog_design.pdf



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


error in join --help

2009-03-05 Thread Toth Alexandru

Hi,

help for join command states:

 -a FILENUMprint unpairable lines coming from file FILENUM, where
 FILENUM is 1 or 2, corresponding to FILE1 or FILE2


Maybe it should be:

 -a FILENUMprint joined lines 
 and unpairable lines coming from file FILENUM, where
 FILENUM is 1 or 2, corresponding to FILE1 or FILE2

For example option -v is correctly documented.

-Alex


  


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


code retour de test -s incorrect.

2009-03-05 Thread Marc POIRIER



Bonjour,


# ls

# NOM.htm - En minuscules

# [ -s NOM.HTM ] 

# echo $?

# 0  - Devrait indiquer 1


Cdt

M.P.


  


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: code retour de test -s incorrect.

2009-03-05 Thread Philip Rowlands

On Thu, 5 Mar 2009, Marc POIRIER wrote:


# ls

# NOM.htm - En minuscules

# [ -s NOM.HTM ]

# echo $?

# 0  - Devrait indiquer 1


What is the output of the following command (on Linux)?

# strace -e trace=file [ -s NOM.HTM ]


Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Degraded performance in cat + patch

2009-03-05 Thread Tzvi Rotshtein
Hi,
I've been using cat to feed large files into some data cruncher
application using something like this:
   cat my_data | data_cruncher

However, cat was reading/writing the file in sub-optimal speeds (not even
half as fast as the disk  os can provide it). I traced this to the buffer
size selection algorithm in cat, while generally provides good balance
with low memory footprint, it constraints cat from reaching the disk's (or
OS caches) peak performance.

While it is usually not crucial for most applications to have cat
operating at peak performance, I thought it would be useful to let the user
determine that.
I have made the following changes to allow the user provide and override the
buffer sizes in cat, effectively improving performance. Here's a quick
benchmark (it's a typical result after multiple runs on Linux x86 with
kernel 2.6.18):


$ time ./cat test_sample_150mb_file.txt  /dev/null
0.00user 0.54system 0:00.59elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+186minor)pagefaults 0swaps

$ time ./cat -r 1048576 test_sample_150mb_file.txt  /dev/null
0.00user 0.09system 0:00.12elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+444minor)pagefaults 0swaps


The ability to specify an explicit (and larger) buffer size has improved the
performance by a factor of x5 on my test system, which is quite a noticeable
gain, especially when dealing with files at least 50GB in size.

Let me know what do you think of it. The patch I used is available below.

-- Tzvi



*** coreutils-7.1/src/cat.c 2008-12-21 08:13:31.0 -0600
--- coreutils-7.1-new/src/cat.c 2009-03-05 22:52:47.0 -0600
***
*** 40,45 
--- 40,46 
  #include quote.h
  #include safe-read.h
  #include xfreopen.h
+ #include xstrtol.h

  /* The official name of this program (e.g., no `g' prefix).  */
  #define PROGRAM_NAME cat
***
*** 102,107 
--- 103,109 
-e   equivalent to -vE\n\
-E, --show-ends  display $ at end of each line\n\
-n, --number number all output lines\n\
+   -r, --read-buffer-size   enforce a read buffer size (in bytes)\n\
-s, --squeeze-blank  suppress repeated empty output lines\n\
  ), stdout);
fputs (_(\
***
*** 109,114 
--- 111,117 
-T, --show-tabs  display TAB characters as ^I\n\
-u   (ignored)\n\
-v, --show-nonprinting   use ^ and M- notation, except for LFD and
TAB\n\
+   -w, --write-buffer-size  enforce a write buffer size (in bytes)\n\
  ), stdout);
fputs (HELP_OPTION_DESCRIPTION, stdout);
fputs (VERSION_OPTION_DESCRIPTION, stdout);
***
*** 546,551 
--- 549,556 
bool show_ends = false;
bool show_nonprinting = false;
bool show_tabs = false;
+   size_t read_buffer_size = 0;
+   size_t write_buffer_size = 0;
int file_open_mode = O_RDONLY;

static struct option const long_options[] =
***
*** 557,562 
--- 562,569 
  {show-ends, no_argument, NULL, 'E'},
  {show-tabs, no_argument, NULL, 'T'},
  {show-all, no_argument, NULL, 'A'},
+ {read-buffer-size, required_argument, NULL, 'r'},
+ {write-buffer-size, required_argument, NULL, 'w'},
  {GETOPT_HELP_OPTION_DECL},
  {GETOPT_VERSION_OPTION_DECL},
  {NULL, 0, NULL, 0}
***
*** 576,582 

/* Parse command line options.  */

!   while ((c = getopt_long (argc, argv, benstuvAET, long_options, NULL))
   != -1)
  {
switch (c)
--- 583,589 

/* Parse command line options.  */

!   while ((c = getopt_long (argc, argv, benr:stuvw:AET, long_options,
NULL))
   != -1)
  {
switch (c)
***
*** 595,600 
--- 602,616 
number = true;
break;

+ case 'r':
+   {
+   long int buffer_size;
+   if ( xstrtol (optarg, NULL, 10, buffer_size, ) != LONGINT_OK )
+  error (EXIT_FAILURE, 0, _(%s: invalid read buffer size), optarg);
+   read_buffer_size = buffer_size;
+   }
+   break;
+
   case 's':
squeeze_blank = true;
break;
***
*** 612,617 
--- 628,642 
show_nonprinting = true;
break;

+ case 'w':
+   {
+   long int buffer_size;
+   if ( xstrtol (optarg, NULL, 10, buffer_size, ) != LONGINT_OK )
+  error (EXIT_FAILURE, 0, _(%s: invalid write buffer size), optarg);
+   write_buffer_size = buffer_size;
+   }
+   break;
+
   case 'A':
show_nonprinting = true;
show_ends = true;
***
*** 641,646 
--- 666,674 
  error (EXIT_FAILURE, errno, _(standard output));

outsize = ST_BLKSIZE (stat_buf);
+   if ( write_buffer_size  0 )
+  outsize = write_buffer_size;
+
/* Input file can be output file for non-regular files.
   fstat on pipes returns S_IFSOCK on some systems, S_IFIFO
   on others, so the checking should not be done for those types,
***
*** 705,710 
--- 733,740