Re: [OctDev] textread: comment out lines starting with #

Jaroslav Hajek Mon, 19 Oct 2009 03:39:24 -0700

On Mon, Oct 19, 2009 at 10:00 AM, Søren Hauberg <so...@hauberg.org> wrote:
> søn, 18 10 2009 kl. 14:19 +0200, skrev Søren Hauberg:
>> I'm attaching the code for comments. It should be noted that Matlab has
>> a 'strread' function that does the same thing as 'textread' except it
>> works in strings instead of files. So, I changed the code to behave like
>> 'strread' and created a simple wrapper around this for 'textread'.
>>
>> Should I replace the current version with this one?
>
> Attached is a slightly smarter approach.
>
> Søren
>


Attached is a version that gets rid even of the last loop...
however, due to a bug in cellslices, it requires the following patch
to work correctly:
http://hg.savannah.gnu.org/hgweb/octave/rev/78ac37d73557

It's up to you whether this is OK to be included, then...

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {...@var{a} @var{b} ...]=}strread(@var{str},@var{format})
## @deftypefnx {Function File} {...@var{a} @var{b} ...] =}strread(@var{str},@var{format},@var{prop},@var{value})
## Read data from a dtring.
## The string @var{format} describes the different columns of @var{str} and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the string
##
## @example
## @group
## @var{str} = "\
## Bunny Bugs   5.5\n\
## Duck Daffy  -7.5e-5\n\
## Penguin Tux   6"
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = strread(@var{str}, "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item "headerlines":
## @var{value} represents the number of header lines to skip.
## @item "commentstyle":
## @var{value} is the style and can be 
## @itemize
## @item "shell": comment specifier is #
## @item "c": comment specifier is /*
## @item "c++": comment specifier is //
## @item "matlab": comment specifier is %
## @end itemize
## @end itemize
##
## @seealso{textread, load, dlmread, fscanf}
## @end deftypefn

function varargout = strread (str, formatstr = "%f", varargin)
  ## Check input
  if (nargin < 1)
    print_usage ();
  endif
  
  if (!ischar (str) || !ischar (str))
    error ("strread: first and second input arguments must be strings");
  endif

  ## Parse options
  comment_flag = false;
  header_skip = 0;
  numeric_fill_value = 0; # XXX: the user cannot set this
  white_spaces = " \n\r\t"; # XXX: should the user be able to set these?
  for n = 1:2:length (varargin)
    switch (varargin {n})
      case "commentstyle"
        comment_flag = true;
        switch (varargin {n+1})
          case "c"
            comment_specif = {"/*", "*/"};
          case "c++"
            comment_specif = {"//", "\n"};
          case "shell"
            comment_specif = {"#", "\n"};
          case "matlab"
            comment_specif = {"%", "\n"};
          otherwise
            warning ("strread: unknown comment style '%s'", val);
        endswitch
      case "headerlines"
        header_skip = varargin {n+1};
      otherwise
        warning ("strread: unknown option '%s'", varargin {n});
    endswitch
  endfor

  ## Parse format string
  idx = strfind (formatstr, "%")';
  specif = formatstr ([idx, idx+1]);
  nspecif = length (idx);
  idx_star = strfind (formatstr, "%*");
  nfields = length (idx) - length (idx_star);

  if (nargout != nfields)
    error ("strread: the number of output variables must match that of format specifiers");
  endif

  ## Remove header
  if (header_skip > 0)
    e = find (str == "\n", header_skip);
    if (length (e) >= header_skip)
      str = str (e (end)+1:end);
    else
      ## We don't have enough data so we discard it all
      str = "";
    endif
  endif

  ## Remove comments (XXX: can this be done in a smarter way?)
  if (comment_flag)
    cstart = strfind (str, comment_specif{1});
    cstop  = strfind (str, comment_specif{2});
    if (length (cstart) > 0)
      ## Ignore nested openers.
      [idx, cidx] = unique (lookup (cstop, cstart), "first");
      if (idx(end) == length (cstop))
        cidx(end) = []; ## Drop the last one if orphaned.
      endif
      cstart = cstart(cidx);
    endif
    if (length (cstop) > 0)
      ## Ignore nested closers.
      [idx, cidx] = unique (lookup (cstart, cstop), "first");
      if (idx(1) == 0)
        cidx(1) = []; ## Drop the first one if orphaned.
      endif
      cstop = cstop(cidx);
    endif
    len = length (str);
    c2len = length (comment_specif{2});
    str = cellslices (str, [1, cstop + c2len], [cstart - 1, len]);
    str = [str{:}];
  endif
  
  ## Split 'str' into words
  words = split_by (str, white_spaces);
  num_words = numel (words);
  num_lines = ceil (num_words / nspecif);
  
  ## For each specifier
  k = 1;
  for m = 1:nspecif
    data = words (m:nspecif:end);

    ## Map to format
    switch specif (m, :)
      case "%s"
        data (end+1:num_lines) = {""};
        varargout {k} = data';
        k++;
      case {"%d", "%f"}
        data = str2double (data);
        data (end+1:num_lines) = numeric_fill_value;
        varargout {k} = data.';
        k++;
      case "%*"
        ## do nothing
    endswitch
  endfor
endfunction

function out = split_by (text, sep)
  out = strtrim (strsplit (text, sep, true));
endfunction

%!test
%! str = "# comment\n# comment\n1 2 3";
%! [a, b] = strread (str, '%d %s', 'commentstyle', 'shell');
%! assert (a, [1; 3]);
%! assert (b, {"2"; ""});

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! [aa, bb] = strread (str, '%f %s');
%! assert (a, aa, 1e-5);
%! assert (cellstr (b), bb);

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! aa = strread (str, '%f %*s');
%! assert (a, aa, 1e-5);

%!test
%! str = sprintf ('/* this is\nacomment*/ 1 2 3');
%! a = strread (str, '%f', 'commentstyle', 'c');
%! assert (a, [1; 2; 3]);

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] textread: comment out lines starting with #

Reply via email to