Re: [Boston.pm] converting a thing to a number
The C functions strtol and strtod would seem to be canonical for this problem. Accessible through the POSIX module, they can handle large numbers, a wide range of representations, and provide both domain and range error checking. You could use a regex like /^\s*(-|\+)?0(.)/ to match binary, octal, and hex representations but octal has an implicit base that will need to be accounted for. Solving for that makes the regex unnecessary. After parsing the first few bytes to extract the sign and base into components*, the rest falls into place: sub parse extract input into sign, base, and str_val convert str_val to numeric value using strto[dl] handle errors # See man POSIX::strto[dl] negate numeric value if sign eq '-' return numeric value * Extracting the sign and base will necessarily require a position variable. Let's call it $i. The final value for $i can be used to extract the (absolute) numeric part of the input value with substr($str_val, $i) In this context, eval is a red herring with a big stench. -Gyepi On Wed, Feb 23, 2022 at 08:31:12AM -0600, em...@greglondon.com wrote: > Copy/pasting a regexp for checking > would probably work in my situation. > > That numpack idea is great. > > It solves another issue i didnt > even mention which is that some > numbers can be arbitrarily huge. > > If i parse one char at a time, > I should be able to use bignum > and still get the correct answer. > > A little surprised there isnt a > way to restrict eval to some > particular perl grammar subrule like > "Numeric literal". (Shrug) > > Regexps will be a bit more work > but will do the job. > > Thanks everyone! > Greg > > > > On 2022-02-22 22:01, Jerrad Pierce wrote: > > There are two things you can do: > > > > a) use regular expressions on the "numbers" to see if they conform > > to known format, and then eval iff they do; you can then bypass eval > > for integer/float since +0 will cover it. > > > > e.g; e.g; /\s+0b[01]+\s+/ > > > > You should be able to crib RE from RegExp::Common, no need to install > > and use the full module. > > > > b) use regular expressions to identify the format plus some packing/ > > unpacking and code-point indexing to convert the number without eval: > > > > # binary 285, this implementation is sensitive to leading zeroes-- > > # others may not be--and requires whole bytes. Padding is left as > > # an exercise for the reader > > $num = numpack("B*", "000100011101"); > > > > #hex 3735928559 > > $num = numpack("H*", "deadbeef") > > > > sub numpack { > > my($fmt, $val)=@_; > > my($i, $sum)=(1, 0); > > > > my @char =split //, unpack("a*", pack($fmt, $val)); > > my $bytes=scalar(@char); > > foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes > > return $sum > > } > > ___ > Boston-pm mailing list > Boston-pm@pm.org > https://mail.pm.org/mailman/listinfo/boston-pm -- The shortest answer is the doing the thing. --Author Unknown ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
Copy/pasting a regexp for checking would probably work in my situation. That numpack idea is great. It solves another issue i didnt even mention which is that some numbers can be arbitrarily huge. If i parse one char at a time, I should be able to use bignum and still get the correct answer. A little surprised there isnt a way to restrict eval to some particular perl grammar subrule like "Numeric literal". (Shrug) Regexps will be a bit more work but will do the job. Thanks everyone! Greg On 2022-02-22 22:01, Jerrad Pierce wrote: There are two things you can do: a) use regular expressions on the "numbers" to see if they conform to known format, and then eval iff they do; you can then bypass eval for integer/float since +0 will cover it. e.g; e.g; /\s+0b[01]+\s+/ You should be able to crib RE from RegExp::Common, no need to install and use the full module. b) use regular expressions to identify the format plus some packing/ unpacking and code-point indexing to convert the number without eval: # binary 285, this implementation is sensitive to leading zeroes-- # others may not be--and requires whole bytes. Padding is left as # an exercise for the reader $num = numpack("B*", "000100011101"); #hex 3735928559 $num = numpack("H*", "deadbeef") sub numpack { my($fmt, $val)=@_; my($i, $sum)=(1, 0); my @char =split //, unpack("a*", pack($fmt, $val)); my $bytes=scalar(@char); foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes return $sum } ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
Third option is to use oct() and hex() based on pattern ~/perl -le 'print hex("0xdeadbeef")' 3735928559 ~/perl -le 'print oct("0177")' #also handles 0b 127 my $match = /^\s*0([xb])?[0-9a-f]+?/; my $num; if( $match && $1 eq 'x' ){ $num = hex() } elsif( $match && ( $1 eq 'b' or $1 eq undef) ){ $num = oct() } else{ $num = $_+0 } ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
There are two things you can do: a) use regular expressions on the "numbers" to see if they conform to known format, and then eval iff they do; you can then bypass eval for integer/float since +0 will cover it. e.g; e.g; /\s+0b[01]+\s+/ You should be able to crib RE from RegExp::Common, no need to install and use the full module. b) use regular expressions to identify the format plus some packing/ unpacking and code-point indexing to convert the number without eval: # binary 285, this implementation is sensitive to leading zeroes-- # others may not be--and requires whole bytes. Padding is left as # an exercise for the reader $num = numpack("B*", "000100011101"); #hex 3735928559 $num = numpack("H*", "deadbeef") sub numpack { my($fmt, $val)=@_; my($i, $sum)=(1, 0); my @char =split //, unpack("a*", pack($fmt, $val)); my $bytes=scalar(@char); foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes return $sum } ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
On Tue, Feb 22, 2022 at 12:12 PM wrote: > Extra level of annoyance: using modules from cpan is difficult. > So, if it can be pure perl, all the better. > The only thing that concerns me a bit is that using string eval() means > the file could contain code I don't want to execute. and I can't figure > out > an easy way to prevent that. There's wisdom here. Avoiding dependencies can be nice, but it's usually not worth adding a risk of arbitrary code injection. -- "We are caught in an inescapable network of mutuality, tied in a single garment of destiny. Whatever affects one directly, affects all indirectly." -- MLK ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
On Tue, Feb 22, 2022, 13:55 Morse, Richard E.,MGH via Boston-pm < boston-pm@pm.org> wrote: > > > On Feb 22, 2022, at 12:12 PM, em...@greglondon.com wrote: > > > > ok, so, I need to read in some numbers from a file. > > Numbers need to allow integer, float, signed/unsigned, > > as well as supporting decimal, hex, binary, octal. > > > > Extra level of annoyance: using modules from cpan is difficult. > > So, if it can be pure perl, all the better. > > You might look at Regexp::Common on CPAN. I know that you can’t use it > directly, but it has a lot of tested regexes, and I don’t think it updates > frequently. So you could probably either just pull the correct regexes out > (or, it looks like, make it generate them for you), or just include the > whole module, kit and caboodle, in some local lib directory as a part of > your > Ricky speaks sense. Copying from Re:C is no worse than from StackExchange or this ailing list First though make a list of test cases - at least one for each numeric format that works with string eval, and a bunch of the edge cases that eval won't detect (hex digits without leading x, use of wrong locale's thousands and decimal, other punctuation errors). You can mark the ones eval can't detect as SKIP while running prove against the eval "$thing"; version. (I.e., e.g., to wit, and viz, This is an ideal miniproject to dip a toe into Test First Development / TDD) > ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] converting a thing to a number
> On Feb 22, 2022, at 12:12 PM, em...@greglondon.com wrote: > > ok, so, I need to read in some numbers from a file. > Numbers need to allow integer, float, signed/unsigned, > as well as supporting decimal, hex, binary, octal. > > Extra level of annoyance: using modules from cpan is difficult. > So, if it can be pure perl, all the better. You might look at Regexp::Common on CPAN. I know that you can’t use it directly, but it has a lot of tested regexes, and I don’t think it updates frequently. So you could probably either just pull the correct regexes out (or, it looks like, make it generate them for you), or just include the whole module, kit and caboodle, in some local lib directory as a part of your project? Ricky ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] converting a thing to a number
ok, so, I need to read in some numbers from a file. Numbers need to allow integer, float, signed/unsigned, as well as supporting decimal, hex, binary, octal. Extra level of annoyance: using modules from cpan is difficult. So, if it can be pure perl, all the better. What I came up with was this: our $TEMPVAL; sub convert_thing_to_numeric{ my ($orig_val)=@_; if($orig_val =~ m{[\'\"]}){ # " ' confess "Error: to_numeric cannot handle input strings with quotes "; } $TEMPVAL = undef; my $evalstring = "use warnings FATAL => 'all'; \$TEMPVAL = $orig_val ; \$TEMPVAL = \$TEMPVAL + 0;"; print ":DEBUG: to_numeric($orig_val) evalstring is '$evalstring' \n" if($DEBUG); eval($evalstring); if($@){ confess "Error: to_numeric() could not convert input string '$orig_val' "; } unless(defined($TEMPVAL)){ confess "Error: to_numeric() ended up with undefined value? "; } return $retval; } so the idea is use perl string eval to read the numerical thing as a perl numeric literal. This will allow every format I need to support. if $orig_val is any of these: 534 -234 4345.652 -256.234 0xbeef 0b1010101 then the eval string turns into $TEMPVAL = 534; $TEMPVAL = -234; $TEMPVAL = 4345.625; $TEMPVAL = -256.234; $TEMPVAL = 0xbeef; $TEMPVAL = 0b1010101; And eval() will correctly convert the thing into a number and store it in TEMPVAL. The last part of eval string adds zero to TEMPVAL $TEMPVAL = $TEMPVAL + 0; This will force perl to numify TEMPVAL. If the user had a bad number in the original file, such as '0xdeadybeef' then TEMPVAL will start out a string, and then adding zero will force perl to try to convert it into a number. Normally, if perl can't turn the thing into a number to add zero, it only throws a WARNING, which doesn't show up in $@ after an eval block. so, the first part of the eval string is to force all warnings into fatals. use warnings FATAL => 'all'; The only thing that concerns me a bit is that using string eval() means the file could contain code I don't want to execute. and I can't figure out an easy way to prevent that. I have a check for quotation marks in the argument. I could add a check for parens, but that would only catch some function calls, not all, and would prevent legitimate users from using mathematical parens. and a bad string might have a function call with no parens. Any suggestions on how to put some checks in here to keep things safe-ish? The file being read is local, mostly under our control, so its not like I have to worry about reading strings passed in by random users from a website. Its more a matter of catching mistakes and typos that end up doing weird things without emitting any warning. ___ Boston-pm mailing list Boston-pm@pm.org https://mail.pm.org/mailman/listinfo/boston-pm