Re: [Rd] error handling in strcapture

2016-10-04 Thread Michael Lawrence
Once again, nice catch. I've committed a check for this.

Michael

On Tue, Oct 4, 2016 at 2:37 PM, William Dunlap  wrote:
> It is also not catching the cases where the number of capture expressions
> does not match the number of entries in proto.  I think all of the following
> should give an error about the mismatch.
>
>> strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"),
>> proto=list(A="",B="",C=""))
>A  B  C
> 1  a  b cd
> 2  d fg  f
> 3 ij  i  j
> 4  l  m ab
> Warning message:
> In matrix(as.character(unlist(str)), ncol = ntokens, byrow = TRUE) :
>   data length [15] is not a sub-multiple or multiple of the number of rows
> [4]
>> strcapture("(.)(.)(.)", c("abc", "def", "ghi", "jkl", "mno"),
>> proto=list(A="",B=""))
> A   B
> 1   a   b
> 2 def   d
> 3   f ghi
> 4   h   i
> 5   j   k
> 6 mno   m
> 7   o abc
> Warning message:
> In matrix(as.character(unlist(str)), ncol = ntokens, byrow = TRUE) :
>   data length [20] is not a sub-multiple or multiple of the number of rows
> [7]
>> strcapture("(.)(.)(.)", c("abc", "def"), proto=list(A=""))
>   A
> 1 a
> 2 c
> 3 d
> 4 f
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Oct 4, 2016 at 2:21 PM, Michael Lawrence 
> wrote:
>>
>> Hi Bill,
>>
>> This is a bug in regexec() and I will commit a fix.
>>
>> Thanks for the report,
>> Michael
>>
>> On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap  wrote:
>> > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386),
>> > when
>> > the text contains a missing value and perl=TRUE.
>> >
>> > {
>> >   # NA in text input should map to row of NA's in output, without
>> > warning
>> >   r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1",
>> > NA,
>> > "Fifty 50"), data.frame(Initial=factor(), Number=numeric()))
>> >   e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label =
>> > c("F", "O"), class = "factor"),
>> >Number = c(1, NA, 50)),
>> >   row.names = c(NA, -3L),
>> >   class = "data.frame")
>> >   all.equal(e9p, r9p)
>> >   }
>> > #Error in if (any(ind)) { : missing value where TRUE/FALSE needed
>> >
>> >
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> >
>> > On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence
>> >  wrote:
>> >>
>> >> The new behavior is that it yields NAs when the pattern does not match
>> >> (like strptime) and for empty captures in a matching pattern it yields
>> >> the empty string, which is consistent with regmatches().
>> >>
>> >> Michael
>> >>
>> >> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap 
>> >> wrote:
>> >> > If there are any matches then strcapture can see if the pattern has
>> >> > the
>> >> > same
>> >> > number of capture expressions as the prototype has columns and give
>> >> > an
>> >> > error if not.  That seems appropriate.
>> >> >
>> >> > If there are no matches, then there is no easy way to see if the
>> >> > prototype
>> >> > is compatible with the pattern, so should strcapture just assume the
>> >> > best
>> >> > and fill in the prototype with NA's?
>> >> >
>> >> > Should there be warnings?  This is kind of like strptime(), which
>> >> > silently
>> >> > gives NA's when the format does not match the text input.
>> >> >
>> >> >
>> >> > Bill Dunlap
>> >> > TIBCO Software
>> >> > wdunlap tibco.com
>> >> >
>> >> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
>> >> >  wrote:
>> >> >>
>> >> >> Hi Bill,
>> >> >>
>> >> >> Thanks, another good suggestion. strcapture() now returns NAs for
>> >> >> non-matches. It's nice to have someone kicking the tires on that
>> >> >> function.
>> >> >>
>> >> >> Michael
>> >> >>
>> >> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
>> >> >>  wrote:
>> >> >> > Michael, thanks for looking at my first issue with
>> >> >> > utils::strcapture.
>> >> >> >
>> >> >> > Another issue is how it deals with lines that don't match the
>> >> >> > pattern.
>> >> >> > Currently it gives an error
>> >> >> >
>> >> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
>> >> >> > proto=list(Name="", Number=0))
>> >> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine",
>> >> >> > "Three
>> >> >> > 3"),
>> >> >> > :
>> >> >> >   number of matches does not always match ncol(proto)
>> >> >> >
>> >> >> > First, isn't the 'number of matches' the number of parenthesized
>> >> >> > subpatterns in the regular expression?  I thought that if the
>> >> >> > entire
>> >> >> > pattern matches then the subpatterns without matches would be
>> >> >> > shown as matches at position 0 with length 0.  Hence either the
>> >> >> > pattern is compatible with the prototype or it isn't, it does not
>> >> >> > depend
>> >> >> > on the text input.  E.g.,
>> >> >> >
>> >> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12",
>> >> >> >> "Z280"))
>> >> >> > [[1]]
>> >> >> > [1] 1 1 1 0
>> >> >> > attr(,"match.length")
>> >> >> > [1] 6 6 6 0
>> >> >> > attr(,"useBytes")
>> >> 

Re: [Rd] error handling in strcapture

2016-10-04 Thread William Dunlap via R-devel
It is also not catching the cases where the number of capture expressions
does not match the number of entries in proto.  I think all of the
following should give an error about the mismatch.

> strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"),
proto=list(A="",B="",C=""))
   A  B  C
1  a  b cd
2  d fg  f
3 ij  i  j
4  l  m ab
Warning message:
In matrix(as.character(unlist(str)), ncol = ntokens, byrow = TRUE) :
  data length [15] is not a sub-multiple or multiple of the number of rows
[4]
> strcapture("(.)(.)(.)", c("abc", "def", "ghi", "jkl", "mno"),
proto=list(A="",B=""))
A   B
1   a   b
2 def   d
3   f ghi
4   h   i
5   j   k
6 mno   m
7   o abc
Warning message:
In matrix(as.character(unlist(str)), ncol = ntokens, byrow = TRUE) :
  data length [20] is not a sub-multiple or multiple of the number of rows
[7]
> strcapture("(.)(.)(.)", c("abc", "def"), proto=list(A=""))
  A
1 a
2 c
3 d
4 f


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Oct 4, 2016 at 2:21 PM, Michael Lawrence 
wrote:

> Hi Bill,
>
> This is a bug in regexec() and I will commit a fix.
>
> Thanks for the report,
> Michael
>
> On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap  wrote:
> > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386),
> when
> > the text contains a missing value and perl=TRUE.
> >
> > {
> >   # NA in text input should map to row of NA's in output, without
> > warning
> >   r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1",
> NA,
> > "Fifty 50"), data.frame(Initial=factor(), Number=numeric()))
> >   e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label =
> > c("F", "O"), class = "factor"),
> >Number = c(1, NA, 50)),
> >   row.names = c(NA, -3L),
> >   class = "data.frame")
> >   all.equal(e9p, r9p)
> >   }
> > #Error in if (any(ind)) { : missing value where TRUE/FALSE needed
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence
> >  wrote:
> >>
> >> The new behavior is that it yields NAs when the pattern does not match
> >> (like strptime) and for empty captures in a matching pattern it yields
> >> the empty string, which is consistent with regmatches().
> >>
> >> Michael
> >>
> >> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap 
> wrote:
> >> > If there are any matches then strcapture can see if the pattern has
> the
> >> > same
> >> > number of capture expressions as the prototype has columns and give an
> >> > error if not.  That seems appropriate.
> >> >
> >> > If there are no matches, then there is no easy way to see if the
> >> > prototype
> >> > is compatible with the pattern, so should strcapture just assume the
> >> > best
> >> > and fill in the prototype with NA's?
> >> >
> >> > Should there be warnings?  This is kind of like strptime(), which
> >> > silently
> >> > gives NA's when the format does not match the text input.
> >> >
> >> >
> >> > Bill Dunlap
> >> > TIBCO Software
> >> > wdunlap tibco.com
> >> >
> >> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
> >> >  wrote:
> >> >>
> >> >> Hi Bill,
> >> >>
> >> >> Thanks, another good suggestion. strcapture() now returns NAs for
> >> >> non-matches. It's nice to have someone kicking the tires on that
> >> >> function.
> >> >>
> >> >> Michael
> >> >>
> >> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
> >> >>  wrote:
> >> >> > Michael, thanks for looking at my first issue with
> utils::strcapture.
> >> >> >
> >> >> > Another issue is how it deals with lines that don't match the
> >> >> > pattern.
> >> >> > Currently it gives an error
> >> >> >
> >> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> >> >> > proto=list(Name="", Number=0))
> >> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three
> >> >> > 3"),
> >> >> > :
> >> >> >   number of matches does not always match ncol(proto)
> >> >> >
> >> >> > First, isn't the 'number of matches' the number of parenthesized
> >> >> > subpatterns in the regular expression?  I thought that if the
> entire
> >> >> > pattern matches then the subpatterns without matches would be
> >> >> > shown as matches at position 0 with length 0.  Hence either the
> >> >> > pattern is compatible with the prototype or it isn't, it does not
> >> >> > depend
> >> >> > on the text input.  E.g.,
> >> >> >
> >> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12",
> >> >> >> "Z280"))
> >> >> > [[1]]
> >> >> > [1] 1 1 1 0
> >> >> > attr(,"match.length")
> >> >> > [1] 6 6 6 0
> >> >> > attr(,"useBytes")
> >> >> > [1] TRUE
> >> >> >
> >> >> > [[2]]
> >> >> > [1] 1 1 0 1
> >> >> > attr(,"match.length")
> >> >> > [1] 2 2 0 2
> >> >> > attr(,"useBytes")
> >> >> > [1] TRUE
> >> >> >
> >> >> > [[3]]
> >> >> > [1] -1
> >> >> > attr(,"match.length")
> >> >> > [1] -1
> >> >> > attr(,"useBytes")
> >> >> > [1] TRUE
> >> >> >
> >> >> > Second, an error message like 'some li

Re: [Rd] error handling in strcapture

2016-10-04 Thread Michael Lawrence
Hi Bill,

This is a bug in regexec() and I will commit a fix.

Thanks for the report,
Michael

On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap  wrote:
> I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when
> the text contains a missing value and perl=TRUE.
>
> {
>   # NA in text input should map to row of NA's in output, without
> warning
>   r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA,
> "Fifty 50"), data.frame(Initial=factor(), Number=numeric()))
>   e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label =
> c("F", "O"), class = "factor"),
>Number = c(1, NA, 50)),
>   row.names = c(NA, -3L),
>   class = "data.frame")
>   all.equal(e9p, r9p)
>   }
> #Error in if (any(ind)) { : missing value where TRUE/FALSE needed
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence
>  wrote:
>>
>> The new behavior is that it yields NAs when the pattern does not match
>> (like strptime) and for empty captures in a matching pattern it yields
>> the empty string, which is consistent with regmatches().
>>
>> Michael
>>
>> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap  wrote:
>> > If there are any matches then strcapture can see if the pattern has the
>> > same
>> > number of capture expressions as the prototype has columns and give an
>> > error if not.  That seems appropriate.
>> >
>> > If there are no matches, then there is no easy way to see if the
>> > prototype
>> > is compatible with the pattern, so should strcapture just assume the
>> > best
>> > and fill in the prototype with NA's?
>> >
>> > Should there be warnings?  This is kind of like strptime(), which
>> > silently
>> > gives NA's when the format does not match the text input.
>> >
>> >
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> >
>> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
>> >  wrote:
>> >>
>> >> Hi Bill,
>> >>
>> >> Thanks, another good suggestion. strcapture() now returns NAs for
>> >> non-matches. It's nice to have someone kicking the tires on that
>> >> function.
>> >>
>> >> Michael
>> >>
>> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
>> >>  wrote:
>> >> > Michael, thanks for looking at my first issue with utils::strcapture.
>> >> >
>> >> > Another issue is how it deals with lines that don't match the
>> >> > pattern.
>> >> > Currently it gives an error
>> >> >
>> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
>> >> > proto=list(Name="", Number=0))
>> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three
>> >> > 3"),
>> >> > :
>> >> >   number of matches does not always match ncol(proto)
>> >> >
>> >> > First, isn't the 'number of matches' the number of parenthesized
>> >> > subpatterns in the regular expression?  I thought that if the entire
>> >> > pattern matches then the subpatterns without matches would be
>> >> > shown as matches at position 0 with length 0.  Hence either the
>> >> > pattern is compatible with the prototype or it isn't, it does not
>> >> > depend
>> >> > on the text input.  E.g.,
>> >> >
>> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12",
>> >> >> "Z280"))
>> >> > [[1]]
>> >> > [1] 1 1 1 0
>> >> > attr(,"match.length")
>> >> > [1] 6 6 6 0
>> >> > attr(,"useBytes")
>> >> > [1] TRUE
>> >> >
>> >> > [[2]]
>> >> > [1] 1 1 0 1
>> >> > attr(,"match.length")
>> >> > [1] 2 2 0 2
>> >> > attr(,"useBytes")
>> >> > [1] TRUE
>> >> >
>> >> > [[3]]
>> >> > [1] -1
>> >> > attr(,"match.length")
>> >> > [1] -1
>> >> > attr(,"useBytes")
>> >> > [1] TRUE
>> >> >
>> >> > Second, an error message like 'some lines were bad' is not very
>> >> > helpful.
>> >> > Should it put NA's in all the columns of the current output row if
>> >> > the
>> >> > input line didn't match the pattern and perhaps warn the user that
>> >> > there
>> >> > were problems?  The user could then look for rows of NA's to see
>> >> > where
>> >> > the
>> >> > problems were.
>> >> >
>> >> > Bill Dunlap
>> >> > TIBCO Software
>> >> > wdunlap tibco.com
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > __
>> >> > R-devel@r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> >
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] error handling in strcapture

2016-10-04 Thread William Dunlap via R-devel
I noticed a problem in the strcapture from R-devel (2016-09-27 r71386),
when the text contains a missing value and perl=TRUE.

{
  # NA in text input should map to row of NA's in output, without
warning
  r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA,
"Fifty 50"), data.frame(Initial=factor(), Number=numeric()))
  e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label =
c("F", "O"), class = "factor"),
   Number = c(1, NA, 50)),
  row.names = c(NA, -3L),
  class = "data.frame")
  all.equal(e9p, r9p)
  }
#Error in if (any(ind)) { : missing value where TRUE/FALSE needed


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence  wrote:

> The new behavior is that it yields NAs when the pattern does not match
> (like strptime) and for empty captures in a matching pattern it yields
> the empty string, which is consistent with regmatches().
>
> Michael
>
> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap  wrote:
> > If there are any matches then strcapture can see if the pattern has the
> same
> > number of capture expressions as the prototype has columns and give an
> > error if not.  That seems appropriate.
> >
> > If there are no matches, then there is no easy way to see if the
> prototype
> > is compatible with the pattern, so should strcapture just assume the best
> > and fill in the prototype with NA's?
> >
> > Should there be warnings?  This is kind of like strptime(), which
> silently
> > gives NA's when the format does not match the text input.
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
> >  wrote:
> >>
> >> Hi Bill,
> >>
> >> Thanks, another good suggestion. strcapture() now returns NAs for
> >> non-matches. It's nice to have someone kicking the tires on that
> >> function.
> >>
> >> Michael
> >>
> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
> >>  wrote:
> >> > Michael, thanks for looking at my first issue with utils::strcapture.
> >> >
> >> > Another issue is how it deals with lines that don't match the pattern.
> >> > Currently it gives an error
> >> >
> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> >> > proto=list(Name="", Number=0))
> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three
> 3"),
> >> > :
> >> >   number of matches does not always match ncol(proto)
> >> >
> >> > First, isn't the 'number of matches' the number of parenthesized
> >> > subpatterns in the regular expression?  I thought that if the entire
> >> > pattern matches then the subpatterns without matches would be
> >> > shown as matches at position 0 with length 0.  Hence either the
> >> > pattern is compatible with the prototype or it isn't, it does not
> depend
> >> > on the text input.  E.g.,
> >> >
> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12",
> "Z280"))
> >> > [[1]]
> >> > [1] 1 1 1 0
> >> > attr(,"match.length")
> >> > [1] 6 6 6 0
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > [[2]]
> >> > [1] 1 1 0 1
> >> > attr(,"match.length")
> >> > [1] 2 2 0 2
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > [[3]]
> >> > [1] -1
> >> > attr(,"match.length")
> >> > [1] -1
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > Second, an error message like 'some lines were bad' is not very
> helpful.
> >> > Should it put NA's in all the columns of the current output row if the
> >> > input line didn't match the pattern and perhaps warn the user that
> there
> >> > were problems?  The user could then look for rows of NA's to see where
> >> > the
> >> > problems were.
> >> >
> >> > Bill Dunlap
> >> > TIBCO Software
> >> > wdunlap tibco.com
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-devel@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] error handling in strcapture

2016-09-21 Thread Michael Lawrence
The new behavior is that it yields NAs when the pattern does not match
(like strptime) and for empty captures in a matching pattern it yields
the empty string, which is consistent with regmatches().

Michael

On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap  wrote:
> If there are any matches then strcapture can see if the pattern has the same
> number of capture expressions as the prototype has columns and give an
> error if not.  That seems appropriate.
>
> If there are no matches, then there is no easy way to see if the prototype
> is compatible with the pattern, so should strcapture just assume the best
> and fill in the prototype with NA's?
>
> Should there be warnings?  This is kind of like strptime(), which silently
> gives NA's when the format does not match the text input.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
>  wrote:
>>
>> Hi Bill,
>>
>> Thanks, another good suggestion. strcapture() now returns NAs for
>> non-matches. It's nice to have someone kicking the tires on that
>> function.
>>
>> Michael
>>
>> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
>>  wrote:
>> > Michael, thanks for looking at my first issue with utils::strcapture.
>> >
>> > Another issue is how it deals with lines that don't match the pattern.
>> > Currently it gives an error
>> >
>> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
>> > proto=list(Name="", Number=0))
>> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
>> > :
>> >   number of matches does not always match ncol(proto)
>> >
>> > First, isn't the 'number of matches' the number of parenthesized
>> > subpatterns in the regular expression?  I thought that if the entire
>> > pattern matches then the subpatterns without matches would be
>> > shown as matches at position 0 with length 0.  Hence either the
>> > pattern is compatible with the prototype or it isn't, it does not depend
>> > on the text input.  E.g.,
>> >
>> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
>> > [[1]]
>> > [1] 1 1 1 0
>> > attr(,"match.length")
>> > [1] 6 6 6 0
>> > attr(,"useBytes")
>> > [1] TRUE
>> >
>> > [[2]]
>> > [1] 1 1 0 1
>> > attr(,"match.length")
>> > [1] 2 2 0 2
>> > attr(,"useBytes")
>> > [1] TRUE
>> >
>> > [[3]]
>> > [1] -1
>> > attr(,"match.length")
>> > [1] -1
>> > attr(,"useBytes")
>> > [1] TRUE
>> >
>> > Second, an error message like 'some lines were bad' is not very helpful.
>> > Should it put NA's in all the columns of the current output row if the
>> > input line didn't match the pattern and perhaps warn the user that there
>> > were problems?  The user could then look for rows of NA's to see where
>> > the
>> > problems were.
>> >
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] error handling in strcapture

2016-09-21 Thread William Dunlap via R-devel
If there are any matches then strcapture can see if the pattern has the
same number of capture expressions as the prototype has columns and give an
error if not.  That seems appropriate.

If there are no matches, then there is no easy way to see if the prototype
is compatible with the pattern, so should strcapture just assume the best
and fill in the prototype with NA's?

Should there be warnings?  This is kind of like strptime(), which silently
gives NA's when the format does not match the text input.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence  wrote:

> Hi Bill,
>
> Thanks, another good suggestion. strcapture() now returns NAs for
> non-matches. It's nice to have someone kicking the tires on that
> function.
>
> Michael
>
> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
>  wrote:
> > Michael, thanks for looking at my first issue with utils::strcapture.
> >
> > Another issue is how it deals with lines that don't match the pattern.
> > Currently it gives an error
> >
> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> > proto=list(Name="", Number=0))
> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three
> 3"),  :
> >   number of matches does not always match ncol(proto)
> >
> > First, isn't the 'number of matches' the number of parenthesized
> > subpatterns in the regular expression?  I thought that if the entire
> > pattern matches then the subpatterns without matches would be
> > shown as matches at position 0 with length 0.  Hence either the
> > pattern is compatible with the prototype or it isn't, it does not depend
> > on the text input.  E.g.,
> >
> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
> > [[1]]
> > [1] 1 1 1 0
> > attr(,"match.length")
> > [1] 6 6 6 0
> > attr(,"useBytes")
> > [1] TRUE
> >
> > [[2]]
> > [1] 1 1 0 1
> > attr(,"match.length")
> > [1] 2 2 0 2
> > attr(,"useBytes")
> > [1] TRUE
> >
> > [[3]]
> > [1] -1
> > attr(,"match.length")
> > [1] -1
> > attr(,"useBytes")
> > [1] TRUE
> >
> > Second, an error message like 'some lines were bad' is not very helpful.
> > Should it put NA's in all the columns of the current output row if the
> > input line didn't match the pattern and perhaps warn the user that there
> > were problems?  The user could then look for rows of NA's to see where
> the
> > problems were.
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] error handling in strcapture

2016-09-21 Thread Michael Lawrence
Hi Bill,

Thanks, another good suggestion. strcapture() now returns NAs for
non-matches. It's nice to have someone kicking the tires on that
function.

Michael

On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
 wrote:
> Michael, thanks for looking at my first issue with utils::strcapture.
>
> Another issue is how it deals with lines that don't match the pattern.
> Currently it gives an error
>
>> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> proto=list(Name="", Number=0))
> Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),  :
>   number of matches does not always match ncol(proto)
>
> First, isn't the 'number of matches' the number of parenthesized
> subpatterns in the regular expression?  I thought that if the entire
> pattern matches then the subpatterns without matches would be
> shown as matches at position 0 with length 0.  Hence either the
> pattern is compatible with the prototype or it isn't, it does not depend
> on the text input.  E.g.,
>
>> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
> [[1]]
> [1] 1 1 1 0
> attr(,"match.length")
> [1] 6 6 6 0
> attr(,"useBytes")
> [1] TRUE
>
> [[2]]
> [1] 1 1 0 1
> attr(,"match.length")
> [1] 2 2 0 2
> attr(,"useBytes")
> [1] TRUE
>
> [[3]]
> [1] -1
> attr(,"match.length")
> [1] -1
> attr(,"useBytes")
> [1] TRUE
>
> Second, an error message like 'some lines were bad' is not very helpful.
> Should it put NA's in all the columns of the current output row if the
> input line didn't match the pattern and perhaps warn the user that there
> were problems?  The user could then look for rows of NA's to see where the
> problems were.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel