Re: [Boston.pm] converting a thing to a number

2022-02-23 Thread Gyepi SAM


The C functions strtol and strtod would seem to be canonical
for this problem.  Accessible through the POSIX module, they
can handle large numbers, a wide range of representations, and
provide both domain and range error checking.

You could use a regex like /^\s*(-|\+)?0(.)/ to match binary, octal,
and hex representations but octal has an implicit base that will need to be
accounted for.  Solving for that makes the regex unnecessary.

After parsing the first few bytes to extract the sign and base into components*,
the rest falls into place:

sub parse
extract input into sign, base, and str_val
convert str_val to numeric value using strto[dl]
handle errors # See man POSIX::strto[dl] 
negate numeric value if sign eq '-'
return numeric value

* Extracting the sign and base will necessarily require a position variable.
Let's call it $i.  The final value for $i can be used to extract the
(absolute) numeric part of the input value with substr($str_val, $i)

In this context, eval is a red herring with a big stench.

-Gyepi

On Wed, Feb 23, 2022 at 08:31:12AM -0600, em...@greglondon.com wrote:
> Copy/pasting a regexp for checking
> would probably work in my situation.
> 
> That numpack idea is great.
> 
> It solves another issue i didnt
> even mention which is that some
> numbers can be arbitrarily huge.
> 
> If i parse one char at a time,
> I should be able to use bignum
> and still get the correct answer.
> 
> A little surprised there isnt a
> way to restrict eval to some
> particular perl grammar subrule like
> "Numeric literal". (Shrug)
> 
> Regexps will be a bit more work
> but will do the job.
> 
> Thanks everyone!
> Greg
> 
> 
> 
> On 2022-02-22 22:01, Jerrad Pierce wrote:
> > There are two things you can do:
> > 
> > a) use regular expressions on the "numbers" to see if they conform
> > to known format, and then eval iff they do; you can then bypass eval
> > for integer/float since +0 will cover it.
> > 
> > e.g; e.g; /\s+0b[01]+\s+/
> > 
> > You should be able to crib RE from RegExp::Common, no need to install
> > and use the full module.
> > 
> > b) use regular expressions to identify the format plus some packing/
> > unpacking and code-point indexing to convert the number without eval:
> > 
> > # binary 285, this implementation is sensitive to leading zeroes--
> > # others may not be--and requires whole bytes. Padding is left as
> > # an exercise for the reader
> > $num = numpack("B*", "000100011101");
> > 
> > #hex 3735928559
> > $num = numpack("H*", "deadbeef")
> > 
> > sub numpack {
> > my($fmt, $val)=@_;
> > my($i, $sum)=(1, 0);
> > 
> > my @char =split //, unpack("a*", pack($fmt, $val));
> > my $bytes=scalar(@char);
> > foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes
> > return $sum
> > }
> 
> ___
> Boston-pm mailing list
> Boston-pm@pm.org
> https://mail.pm.org/mailman/listinfo/boston-pm

-- 
The shortest answer is the doing the thing. --Author Unknown

___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-23 Thread email

Copy/pasting a regexp for checking
would probably work in my situation.

That numpack idea is great.

It solves another issue i didnt
even mention which is that some
numbers can be arbitrarily huge.

If i parse one char at a time,
I should be able to use bignum
and still get the correct answer.

A little surprised there isnt a
way to restrict eval to some
particular perl grammar subrule like
"Numeric literal". (Shrug)

Regexps will be a bit more work
but will do the job.

Thanks everyone!
Greg



On 2022-02-22 22:01, Jerrad Pierce wrote:

There are two things you can do:

a) use regular expressions on the "numbers" to see if they conform
to known format, and then eval iff they do; you can then bypass eval
for integer/float since +0 will cover it.

e.g; e.g; /\s+0b[01]+\s+/

You should be able to crib RE from RegExp::Common, no need to install
and use the full module.

b) use regular expressions to identify the format plus some packing/
unpacking and code-point indexing to convert the number without eval:

# binary 285, this implementation is sensitive to leading zeroes--
# others may not be--and requires whole bytes. Padding is left as
# an exercise for the reader
$num = numpack("B*", "000100011101");

#hex 3735928559
$num = numpack("H*", "deadbeef")

sub numpack {
my($fmt, $val)=@_;
my($i, $sum)=(1, 0);

my @char =split //, unpack("a*", pack($fmt, $val));
my $bytes=scalar(@char);
foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes
return $sum
}


___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-22 Thread email
Third option is to use oct() and hex() based on pattern

~/perl -le 'print hex("0xdeadbeef")'
3735928559
~/perl -le 'print oct("0177")' #also handles 0b
127

my $match = /^\s*0([xb])?[0-9a-f]+?/;
my $num;
if( $match && $1 eq 'x' ){ $num = hex() }
elsif( $match && ( $1 eq 'b' or $1 eq undef) ){ $num = oct() }
else{ $num = $_+0 }

___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-22 Thread Jerrad Pierce
There are two things you can do:

a) use regular expressions on the "numbers" to see if they conform
to known format, and then eval iff they do; you can then bypass eval
for integer/float since +0 will cover it.

e.g; e.g; /\s+0b[01]+\s+/

You should be able to crib RE from RegExp::Common, no need to install
and use the full module.

b) use regular expressions to identify the format plus some packing/
unpacking and code-point indexing to convert the number without eval:

# binary 285, this implementation is sensitive to leading zeroes--
# others may not be--and requires whole bytes. Padding is left as
# an exercise for the reader
$num = numpack("B*", "000100011101");

#hex 3735928559
$num = numpack("H*", "deadbeef")

sub numpack {
my($fmt, $val)=@_;
my($i, $sum)=(1, 0);

my @char =split //, unpack("a*", pack($fmt, $val));
my $bytes=scalar(@char); 
foreach(@char){ $sum+=ord()<<(8*($bytes-$i++))}; #shift and add bytes
return $sum
}

___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-22 Thread Conor Walsh
On Tue, Feb 22, 2022 at 12:12 PM  wrote:
> Extra level of annoyance: using modules from cpan is difficult.
> So, if it can be pure perl, all the better.

> The only thing that concerns me a bit is that using string eval() means
> the file could contain code I don't want to execute. and I can't figure
> out
> an easy way to prevent that.

There's wisdom here.  Avoiding dependencies can be nice, but it's
usually not worth adding a risk of arbitrary code injection.


-- 
"We are caught in an inescapable network of mutuality, tied in a
single garment of destiny. Whatever affects one directly, affects all
indirectly." -- MLK

___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-22 Thread Bill Ricker via Boston-pm
On Tue, Feb 22, 2022, 13:55 Morse, Richard E.,MGH via Boston-pm <
boston-pm@pm.org> wrote:

>
> > On Feb 22, 2022, at 12:12 PM, em...@greglondon.com wrote:
> >
> > ok, so, I need to read in some numbers from a file.
> > Numbers need to allow integer, float, signed/unsigned,
> > as well as supporting decimal, hex, binary, octal.
> >
> > Extra level of annoyance: using modules from cpan is difficult.
> > So, if it can be pure perl, all the better.
>
> You might look at Regexp::Common on CPAN. I know that you can’t use it
> directly, but it has a lot of tested regexes, and I don’t think it updates
> frequently. So you could probably either just pull the correct regexes out
> (or, it looks like, make it generate them for you), or just include the
> whole module, kit and caboodle, in some local lib directory as a part of
> your
>

Ricky speaks sense. Copying from Re:C is no worse than from StackExchange
or this  ailing list 

First though make a list of test cases - at least one for each numeric
format that works with string eval, and a bunch of the edge cases that eval
won't detect (hex digits without leading x, use of wrong locale's thousands
and decimal, other punctuation errors). You can mark the ones eval can't
detect as SKIP while running prove against the eval "$thing"; version.

(I.e., e.g., to wit, and viz, This is an ideal miniproject to dip a toe
into Test First Development / TDD)

>

___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] converting a thing to a number

2022-02-22 Thread Morse, Richard E.,MGH via Boston-pm

> On Feb 22, 2022, at 12:12 PM, em...@greglondon.com wrote:
> 
> ok, so, I need to read in some numbers from a file.
> Numbers need to allow integer, float, signed/unsigned,
> as well as supporting decimal, hex, binary, octal.
> 
> Extra level of annoyance: using modules from cpan is difficult.
> So, if it can be pure perl, all the better.

You might look at Regexp::Common on CPAN. I know that you can’t use it 
directly, but it has a lot of tested regexes, and I don’t think it updates 
frequently. So you could probably either just pull the correct regexes out (or, 
it looks like, make it generate them for you), or just include the whole 
module, kit and caboodle, in some local lib directory as a part of your project?

Ricky


___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] converting a thing to a number

2022-02-22 Thread email

ok, so, I need to read in some numbers from a file.
Numbers need to allow integer, float, signed/unsigned,
as well as supporting decimal, hex, binary, octal.

Extra level of annoyance: using modules from cpan is difficult.
So, if it can be pure perl, all the better.

What I came up with was this:


our $TEMPVAL;

sub convert_thing_to_numeric{
my ($orig_val)=@_;

if($orig_val =~ m{[\'\"]}){  # " '
confess "Error: to_numeric cannot handle input strings with quotes 
";
}


$TEMPVAL = undef;

	my $evalstring = "use warnings FATAL => 'all'; \$TEMPVAL = $orig_val ; 
\$TEMPVAL = \$TEMPVAL + 0;";


	print ":DEBUG: to_numeric($orig_val) evalstring is '$evalstring' \n" 
if($DEBUG);


eval($evalstring);

if($@){
		confess "Error: to_numeric() could not convert input string 
'$orig_val' ";

}

unless(defined($TEMPVAL)){
confess "Error: to_numeric() ended up with undefined value? ";
}

return $retval;
}

so the idea is use perl string eval to read the numerical thing as a 
perl numeric literal.

This will allow every format I need to support.


if $orig_val is any of these:
534
-234
4345.652
-256.234
0xbeef
0b1010101

then the eval string turns into
$TEMPVAL = 534;
$TEMPVAL = -234;
$TEMPVAL = 4345.625;
$TEMPVAL = -256.234;
$TEMPVAL = 0xbeef;
$TEMPVAL = 0b1010101;

And eval() will correctly convert the thing into a number and store it 
in TEMPVAL.


The last part of eval string adds zero to TEMPVAL
$TEMPVAL = $TEMPVAL + 0;

This will force perl to numify TEMPVAL.
If the user had a bad number in the original file, such as '0xdeadybeef'
then TEMPVAL will start out a string, and then adding zero will force 
perl to try to convert it into a number.


Normally, if perl can't turn the thing into a number to add zero,
it only throws a WARNING, which doesn't show up in $@ after an eval 
block.
so, the first part of the eval string is to force all warnings into 
fatals.

use warnings FATAL => 'all';

The only thing that concerns me a bit is that using string eval() means
the file could contain code I don't want to execute. and I can't figure 
out

an easy way to prevent that.

I have a check for quotation marks in the argument.
I could add a check for parens, but that would only catch some function 
calls,
not all, and would prevent legitimate users from using mathematical 
parens.

and a bad string might have a function call with no parens.

Any suggestions on how to put some checks in here to keep things 
safe-ish?


The file being read is local, mostly under our control, so its not like
I have to worry about reading strings passed in by random users from a 
website.
Its more a matter of catching mistakes and typos that end up doing weird 
things

without emitting any warning.



___
Boston-pm mailing list
Boston-pm@pm.org
https://mail.pm.org/mailman/listinfo/boston-pm