Re: ANN: A new version (0.5.1) of python-gnupg has been released.

2023-07-22 Thread Dan Sommers via Python-list
On 2023-07-22 at 11:04:35 +,
Vinay Sajip via Python-list  wrote:

> What Changed?
> =

What changed, indeed.

Maybe I'm old, and curmudgeonly, but it would be nice if the body of
these annoucement emails (not just this one) contained the name of the
program and a one-line summary of what the program does, preferably
right at the top.

(Admittedly, in this case, once I found the name of the program in the
subject and the footnotes, I was able to figure out what it does.  Not
all software is named that usefully.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why doc call `__init__` as a method rather than function?

2023-09-15 Thread Dan Sommers via Python-list
On 2023-09-15 at 10:49:10 +,
scruel tao via Python-list  wrote:

> ```python
> >>> class A:
> ...   def __init__(self):
> ... pass
> ...
> >>> A.__init__
> 
> >>> a = A()
> >>> a.__init__
> >
> ```
> 
> On many books and even the official documents, it seems that many authors 
> prefer to call `__init__` as a "method" rather than a "function".
> The book PYTHON CRASH COURSE  mentioned that "A function that’s part of a 
> class is a method.", however, ` A.__init__` tells that `__init__` is a 
> function...

I always call __init__ "the initializer."  YMMV.

> I wonder how can I call `__init__` as? Consider the output above.
> Maybe both are OK? If you prefer or think that we must use one of the two, 
> please explain the why, I really want to know, thanks!

Usually, you don't call (or even refer to) __init__ from your
application.  One __init__ can call another one in the case of
initializing superclasses.

When you evaluate A(), Python calls __init__ for you.  You can see this
if you add something "visible" to __init__, like a print statement.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beep on WIndows 11

2023-11-12 Thread Dan Sommers via Python-list
On 2023-11-11 at 23:44:19 +,
Y Y via Python-list  wrote:

> I am curious and humble to ask: What is the purpose of a BEEP?

It's a simple way for a terminal-based program to alert (hence '\a') a
user or an operator that their attention is requested or required.

See also .
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on writing a number as 2^s * q, where q is odd

2023-11-29 Thread Dan Sommers via Python-list
On 2023-11-29 at 21:44:01 -0300,
Julieta Shem via Python-list  wrote:

> How would you write this procedure?
> 
> --8<---cut here---start->8---
> def powers_of_2_in(n):
>   s = 0
>   while "I still find factors of 2 in n...":
> q, r = divmod(n, 2)
> if r == 0:
>   s = s + 1
>   n = n // 2
> else:
>   return s, n
> --8<---cut here---end--->8---

What's wrong with what you have?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How/where to store calibration values - written by program A, read by program B

2023-12-06 Thread Dan Sommers via Python-list
On 2023-12-06 at 09:32:02 +,
Chris Green via Python-list  wrote:

> Thomas Passin  wrote:

[...]

> > Just go with an .ini file. Simple, well-supported by the standard 
> > library. And it gives you key/value pairs.
> > 
> My requirement is *slightly* more complex than just key value pairs,
> it has one level of hierarchy, e.g.:-
> 
> KEY1:
>   a: v1
>   c: v3
>   d: v4
> KEY2:
>   a: v7
>   b: v5
>   d: v6
> 
> Different numbers of value pairs under each KEY.

INI files have sections.

See .
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract lines from file, add to new files

2024-01-12 Thread Dan Sommers via Python-list
On 2024-01-13 at 02:02:39 +0100,
Left Right via Python-list  wrote:

> Actually, after some Web search.  I think, based on this:
> https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-augtarget
> that in Python you call this "augmented assignment target". The term
> isn't in the glossary, but so are many others.

The Python term, at least colloquially, is "tuple unpacking."

HTH.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract lines from file, add to new files

2024-01-13 Thread Dan Sommers via Python-list
On 2024-01-13 at 11:34:29 +0100,
Left Right  wrote:

> > The Python term, at least colloquially, is "tuple unpacking."

That quote is from me.  Please do preserve attributions.

> Well, why use colloquialism if there's a language specification? Also,
> there weren't any tuples used in my example, at least not explicitly
> (i could've been a tuple, but that wasn't specified).

According to the language specification,⁰ it's a "target list," and
there can be more than one target in that list.

The unpacking isn't really called anything, it's just the way Python
assignment works, all the way back to its earliest stages.¹

⁰ https://docs.python.org/3/reference/simple_stmts.html#assignment-statements,
¹ https://docs.python.org/release/1.4/ref/ref6.html#HDR2
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Configuring an object via a dictionary

2024-03-15 Thread Dan Sommers via Python-list
On 2024-03-15 at 15:48:17 -0400,
Thomas Passin via Python-list  wrote:

> [...] And I suppose there is always the possibility that sometime in
> the future an "or" clause like that will be changed to return a
> Boolean, which one would expect anyway.

Not only is the current value is way more useful, but changing it would
be a compatibility and maintenance nightmare.

If I want Java, I know where to find it.  :-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Configuring an object via a dictionary

2024-03-20 Thread Dan Sommers via Python-list
On 2024-03-20 at 09:49:54 +0100,
Roel Schroeven via Python-list  wrote:

> You haven't only checked for None! You have rejected *every* falsish value,
> even though they may very well be acceptable values.

OTOH, only you can answer these questions about your situations.

Every application, every item of configuration data, is going to be a
little bit different.

What, exactly, does "missing" mean?  That there's no entry in a config
file?  That there's some sort of degenerate entry with "missing"
semantics (e.g. a line in a text file that contains the name of the
value and an equals sign, but no value)?  An empty string or list?  Are
you making your program easier for users to use, easier for testers to
test, easier for authors to write and to maintain, or something else?
What is your program allowed and not allowed to do in the face of
"missing" configuration data?

Once you've nailed down the semantics of the configuration data, then
the code usually falls out pretty quickly.  But arguing about corner
cases and failure modes without specifications is a losing battle.
Every piece of code is suspect unless you know what the inputs mean, and
what the application "should" do if the don't look like that.

Python's flexibiliry and expressiveness are double edge swords.  Use
them wisely.  :-)

Sorry for the rant.

Carry on.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to discover what values produced an exception?

2024-05-04 Thread Dan Sommers via Python-list
On 2024-05-03 at 10:56:39 -0300,
Johanne Fairchild via Python-list  wrote:

> How to discover what values produced an exception?  Or perhaps---why
> doesn't the Python traceback show the values involved in the TypeError?
> For instance:
> 
> --8<>8---
> >>> (0,0) < 4
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: '<' not supported between instances of 'tuple' and 'int'
> --8<>8---
> 
> It could have said something like: 
> 
> --8<>8---
> TypeError: '<' not supported between instances of 'tuple' and 'int'
>   in (0,0) < 4.
> --8<>8---
> 
> We would know which were the values that caused the problem, which would
> be very helpful.

I'm not disagreeing that knowing the values could be useful in many
cases.  In the general case, though, it's not practical.  Consider a
function like this:

def f(x, y):
return g(x) < h(y)

The printed values of x, y, g(x), and h(y) could all be millions of (or
more) glyphs.  Worse, one or more of those values could contain circular
lists or similar structures.  And h or g could have changed x or y.  In
summary, printing run-time values isn't always safe or useful.  At least
printing the types is safe.  In the face of ambiguity, refuse to guess.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Terminal Emulator (Posting On Python-List Prohibited)

2024-05-19 Thread Dan Sommers via Python-list
On 2024-05-19 at 18:13:23 +,
Gilmeh Serda via Python-list  wrote:

> Was there a reason they chose the name Pip?

Package Installer for Python

https://pip.pypa.io/en/stable/index.html

Every time I see PIP, I think Peripheral Interchange Program, but I'm
old.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Terminal Emulator (Posting On Python-List Prohibited)

2024-05-19 Thread Dan Sommers via Python-list
On 2024-05-19 at 18:13:23 +,
Gilmeh Serda via Python-list  wrote:


> Was there a reason they chose the name Pip?

Package Installer for Python

https://pip.pypa.io/en/stable/index.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Any marginally usable programming language approaches an ill defined barely usable re-implementation of half of Common-Lisp

2024-05-27 Thread Dan Sommers via Python-list
On 2024-05-27 at 12:37:01 -0700,
HenHanna via Python-list  wrote:

> 
> On 5/27/2024 7:18 AM, Cor wrote:
> > Some entity, AKA "B. Pym" ,
> > wrote this mindboggling stuff:
> > (selectively-snipped-or-not-p)
> > 
> > > On 12/16/2023, [email protected] wrote:
> > > 
> > > > Any marginally usable programming language approaches an ill
> > > > defined barely usable re-implementation of half of common-lisp
> > > 
> > > The good news is, it's not Lisp that sucks, but Common Lisp.
> > >   --- Paul Graham
> > 
> > Just to set the record straight;
> > This is not My line.
> > I quoted it but don't know who the originator of that remark is.

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
-- 
https://mail.python.org/mailman/listinfo/python-list


Formatted Output and Argument Parsing (was: Re: Flubbed it in the second interation through the string: range error... HOW?)

2024-05-29 Thread Dan Sommers via Python-list
On 2024-05-29 at 17:14:51 +1000,
Chris Angelico via Python-list  wrote:

> I wouldn't replace str.format() everywhere, nor would I replace
> percent encoding everywhere - but in this case, I think Thomas is
> correct. Not because it's 2024 (f-strings were brought in back in
> 2015, so they're hardly chronologically special), but because most of
> this looks like debugging output that can take advantage of this
> feature:
> 
> print(f"if block {name[index]=} {index=}")

defsnark:

After years of getopt (and even more, if you include non-Python
experience), I'm glad I decided to wait a decade before chugging the
optparse koolaid.

(For the history-impaired, getopt existed long before Python and will
likely exist long after it, but getopt's "replacement" optparse lasted
only from 2003 until 2011.)

That said, I agree that the = thing makes f-strings eminently useful for
debugging, and I wholeheartedly agree with not fixing things that aren't
broken.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Any marginally usable programming language approaches an ill defined barely usable re-implementation of half of Common-Lisp

2024-05-29 Thread Dan Sommers via Python-list
On 2024-05-29 at 11:39:14 -0700,
HenHanna via Python-list  wrote:

> On 5/27/2024 1:59 PM, [email protected] wrote:

> > https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

[...]

> Are  the Rules 1--9  by  Greenspun   good too?

I don't know; let me look it up.  Oh, there it is:

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule says that
Greenspun said he "was just trying to give the rule a memorable name."

Sadly, the citation link is failing for me right now.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best use of "open" context manager

2024-07-06 Thread Dan Sommers via Python-list
On 2024-07-06 at 11:49:06 +0100,
Rob Cliffe via Python-list  wrote:

> Is there a better / more Pythonic solution?

https://docs.python.org/3/library/fileinput.html

At least this attempts to abstract the problem of iterating over a file
(or multiple files) into a library routine.  I've used it a little, but
I don't know the full depths of your use case and/or requirements.

HTH,
Dan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new here

2024-08-21 Thread Dan Sommers via Python-list
On 2024-08-20 at 23:16:48 -0400,
AVI GROSS via Python-list  wrote:

> I do wonder if the people at python.org want multiple forums. There is
> also one that sort of tutors people that obviously has an overlapping
> but different audience.

$ python -m this
The Zen of Python, by Tim Peters
[...]
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
[...]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a better way? [combining f-string, thousands separator, right align]

2024-08-26 Thread Dan Sommers via Python-list
On 2024-08-26 at 20:42:32 +1200,
dn via Python-list  wrote:

> and if we really want to go over-board:
> 
> >>> RIGHT_JUSTIFIED = ">"
> >>> THOUSANDS_SEPARATOR = ","
> >>> s_format = F"{RIGHT_JUSTIFIED}{S_FIELD_WIDTH}{THOUSANDS_SEPARATOR}"
> 
> or (better) because right-justification is the default for numbers:
> 
> >>> s_format = F"{S_FIELD_WIDTH}{THOUSANDS_SEPARATOR}"
> 
> 
> To the extreme that if your user keeps fiddling with presentations (none
> ever do, do they?), all settings to do with s_format could be added to a
> config/environment file, and thus be even further separated from
> program-logic!

And then you'll need a parser, many of whose Unique Challenges™ aren't
even apparent until you start parsing files from actual users, and
you'll still need some sort of fallback in the code anyway for the case
that s_format can't be parsed (for whatever reason).

Isn't a config file what just caused the global CrowdStrike outage?  ;-)

That said, I understand that report generators are a thing, not to
mention RPG (https://en.wikipedia.org/wiki/IBM_RPG).

Okay, sorry; I'll just crawl back into the hole from whence I came.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-10-01 Thread Dan Sommers via Python-list
On 2024-10-01 at 23:03:01 +0200,
Left Right  wrote:

> > If I recognize the first digit, then I *can* hand that over to an
> > external function to accumulate the digits that follow.
> 
> And what is that external function going to do with this information?
> The point is you didn't parse anything if you just sent the digit.
> You just delegated the parsing further. Parsing is only meaningful if
> you extracted some information, but your idea is, essentially "what if
> I do nothing?".

If the parser detects the first digit of a number, then the parser can
read digits one at a time (i.e., "streaming"), assimilate and accumulate
the value of the number being parsed, and successfully finish parsing
the number it reads a non-digit.  Whether the function that accumulates
the value during the process is internal or external isn't relevant; the
point is that it is possible to parse integers from most significant
digit to least significant digit under a streaming model (and if you're
sufficiently clever, you can even write partial results to external
storage and/or another transmission protocol, thus allowing for numbers
bigger (as measured by JSON or your internal representation) than your
RAM).

At most, the parser has to remember the non-digit character it read so
that it (the parser) can begin to parse whatever comes after the number.
Does that break your notion of "streaming"?

Why do I have to start with the least significant digit?

> > Under that constraint, I'm not sure I can parse anything.  How can I
> > parse a string (and hand it over to an external function) until I've
> > found the closing quote?
> 
> Nobody says that parsing a number is the only pathological case.  You,
> however, exaggerate by saying you cannot parse _anything_. You can
> parse booleans or null, for example.  There's no problem there.

My intent was only to repeat what you implied:  that any parser that
reads its input until it has parsed a value is not streaming.

So how much information can the parser keep before you consider it not
to be "streaming"?

[...]

> In principle, any language that has infinite words will have the same
> problem with streaming [...]

So what magic allows anyone to stream any JSON file over SCSI or IP?
Let alone some kind of "live stream" that by definition is indefinite,
even if it only lasts a few tenths of a second?

> [...] If you ever pondered h/w or low-level
> protocols s.a. SCSI or IP [...]

I spent a good deal of my career designing and implementing all manner
of communicaations protocols, from transmitting and receiving single
bits over a wire all the way up to what are now known as session and
presentation layers.  Some imposed maximum lengths in certain places;
some allowed for indefinite amounts of data to be transferred from one
end to the other without stopping, resetting, or overflowing.  And yet
somehow, the universe never collapsed.

If you believe that some implementation of fsync fails to meet a
specification, or fails to work correctly on files containign JSON, then
file a bug report.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment

2024-10-24 Thread Dan Sommers via Python-list
On 2024-10-24 at 20:54:53 +0100,
MRAB via Python-list  wrote:

> On 2024-10-24 20:21, Left Right wrote:
> > > > > The stack is created on line 760 with os.lstat and entries are 
> > > > > appended
> > > > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat).
> > > > >
> > > > > 'func' is popped off the stack on line 651 and check in the following 
> > > > > lines.
> > > > >
> > > > > I can't see anywhere else where something else is put onto the stack 
> > > > > or
> > > > > an entry is replaced.
> > 
> > But the _rmtree_safe_fd() compares func to a *dynamically* resolved
> > reference: os.lstat. If the reference to os changed (or os object was
> > modified to have new reference at lstat) between the time os.lstat was
> > added to the stack and the time of comparison, then comparison
> > would've failed.  To illustrate my idea:
> > 
> > os.lstat = lambda x: x # thread 1
> > stack.append((os.lstat, ...)) # thread 1
> > os.lstat = lambda x: x # thread 2
> > func, *_ = stack.pop() # thread 1
> > assert func is os.lstat # thread 1 (failure!)
> > 
> > The only question is: is it possible to modify os.lstat like that, and
> > if so, how?
> > 
> > Other alternatives include a malfunctioning "is" operator,
> > malfunctioning module cache... all those are a lot less likely.
> What is the probability of replacing os.lstat, os.close or os.rmdir from
> another thread at just the right time?

That is never the right question in a multi-threaded system.  The answer
is always that is doesn't matter, the odds will beat you in the end.  Or
sometimes right in the middle of a CPU instruction; does anyone remember
the MC680XX series?

Yes, as a matter of fact, I did used to make my living designing,
building, delivering, and maintaining such systems.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Common objects for CLI commands with Typer

2024-09-21 Thread Dan Sommers via Python-list
On 2024-09-21 at 06:38:05 +0100,
Barry via Python-list  wrote:

> > On 20 Sep 2024, at 21:01, Loris Bennett via Python-list 
> >  wrote:
> > 
> > Hi,
> > 
> > Apologies if the following description is to brief - I can expand if no
> > one knows what I'm on about, but maybe a short description is enough.
> > 
> > I am developing a command line application using Typer.  Most commands
> > need to do something in a database and also do LDAP stuff.  Currently
> > each command creates its own Database and LDAP objects, since each
> > command forms an entry point to the program.
> > 
> > With Typer, is there a way I can define the equivalent of class
> > attributes at a single point which are then available to all commands?
> 
> I do not know typer. But the general solution is to create an instance of 
> your class
> and tell typer to call member function of the instance.
> 
> app = Application()
> …
> typer.set_callback(app.my_handler)

Despite the fact that "everything is an object" in Python, you don't
have to put data or functions inside classes or objects.  I also know
nothing about Typer, but there's nothing wrong with functions in a
module.

There's also nothing wrong with writing a function that creates and
returns the database and LDAP connections (perhas as instances of
application-level classes), amd calling that function from within each
command.

DRY.  Yeah, yeah, yeah.  :-/ So there's one line at the top of each
comamnd that initializes things, and possibly a line at the bottom to
close those things down.  Turn those lines into a context manager, which
is actually a sub-framework inside Typer.  Don't convolute/compilicate
your design to eliminate one line at the top of each command.

Go ahead, accuse me of writing FORTRAN (all caps, no numbers or
qualifiers, as $deity intended) in Python.  But neither optimize
prematurely nor invoke the Inner Platform Effect to save one or two
lines in your not-yet-written commands, either.

Sorry for the rant.  :-)

Simple is better than complex.
Complex is better than complicated.

HTH.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Common objects for CLI commands with Typer

2024-09-23 Thread Dan Sommers via Python-list
On 2024-09-23 at 19:00:10 +0100,
Barry Scott  wrote:

> > On 21 Sep 2024, at 11:40, Dan Sommers via Python-list 
> >  wrote:

> But once your code gets big the disciple of using classes helps
> maintenance. Code with lots of globals is problematic.

Even before your code gets big, discipline helps maintenance.  :-)

Every level of your program has globals.  An application with too many
classes is no better (or worse) than a class with too many methods, or a
module with too many functions.  Insert your own definitions of (and
tolerances for) "too many," which will vary in flexibility.

(And as was alluded to elsewhere in this thread, you could probably
deduce the original and/or preferred programming languages of people
with certain such definitions.  But I digress.)

$ python -m this|grep Namespaces
Namespaces are one honking great idea -- let's do more of those!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-30 Thread Dan Sommers via Python-list
On 2024-09-30 at 11:44:50 -0400,
Grant Edwards via Python-list  wrote:

> On 2024-09-30, Left Right via Python-list  wrote:
> > Whether and to what degree you can stream JSON depends on JSON
> > structure. In general, however, JSON cannot be streamed (but commonly
> > it can be).
> >
> > Imagine a pathological case of this shape: 1... <60GB of digits>. This
> > is still a valid JSON (it doesn't have any limits on how many digits a
> > number can have). And you cannot parse this number in a streaming way
> > because in order to do that, you need to start with the least
> > significant digit.
> 
> Which is how arabic numbers were originally parsed, but when
> westerners adopted them from a R->L written language, thet didn't flip
> them around to match the L->R written language into which they were
> being adopted.

Interesting.

> So now long numbers can't be parsed as a stream in software. They
> should have anticipated this problem back in the 13th century and
> flipped the numbers around.

What am I missing?  Handwavingly, start with the first digit, and as
long as the next character is a digit, multipliy the accumulated result
by 10 (or the appropriate base) and add the next value.  Oh, and handle
scientific notation as a special case, and perhaps fail spectacularly
instead of recovering gracefully in certain edge cases.  And in the
pathological case of a single number with 60 billion digits, run out of
memory (and complain loudly to the person who claimed that the file
contained a "dataset").  But why do I need to start with the least
significant digit?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-30 Thread Dan Sommers via Python-list
On 2024-10-01 at 09:09:07 +1000,
Chris Angelico via Python-list  wrote:

> On Tue, 1 Oct 2024 at 08:56, Grant Edwards via Python-list
>  wrote:
> >
> > On 2024-09-30, Dan Sommers via Python-list  wrote:
> >
> > > In Common Lisp, integers can be written in any integer base from two
> > > to thirty six, inclusive.  So knowing the last digit doesn't tell
> > > you whether an integer is even or odd until you know the base
> > > anyway.
> >
> > I had to think about that for an embarassingly long time before it
> > clicked.
> 
> The only part I'm not clear on is what identifies the base. If you're
> going to write numbers little-endian, it's not that hard to also write
> them with a base indicator before the digits [...]

In Common Lisp, you can write integers as #nnR[digits], where nn is the
decimal representation of the base (possibly without a leading zero),
the # and the R are literal characters, and the digits are written in
the intended base.  So the input #16f is read as the integer 65535.

You can also set or bind the global variable *read-base* (yes, the
asterisks are part of the name) to an integer between 2 and 36, and then
anything that looks like an integer in that base is interpreted as such
(including literals in programs).  The literals I described above are
still handled correctly no matter the current value of *read-base*.  So
if the value of *read-base* is 16, then the input  is read as the
integer 65535 (as is the input #16r).

(Pedants may point our details I omitted.  I admit to omitting them.)

IIRC, certain [old 8080 and Z-80?] assemblers used to put the base
indicator at the end.  So 10 meant, well, 10, but 10H meant 16 and 10b
meant 2 (IDK; the capital H and the lower case b both look right to me).

I don't recall numbers written from least significant digit to most
significant digit (big and little endian *storage*, yes, but not the
digits when presented to or read from a human).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-10-01 Thread Dan Sommers via Python-list
On 2024-09-30 at 18:48:02 -0700,
Keith Thompson via Python-list  wrote:

> [email protected] writes:
> [...]
> > In Common Lisp, you can write integers as #nnR[digits], where nn is the
> > decimal representation of the base (possibly without a leading zero),
> > the # and the R are literal characters, and the digits are written in
> > the intended base.  So the input #16f is read as the integer 65535.
> 
> Typo: You meant #16R, not #16f.

Yep.  Sorry.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-10-01 Thread Dan Sommers via Python-list
On 2024-09-30 at 21:34:07 +0200,
Regarding "Re: Help with Streaming and Chunk Processing for Large JSON Data (60 
GB) from Kenna API,"
Left Right via Python-list  wrote:

> > What am I missing?  Handwavingly, start with the first digit, and as
> > long as the next character is a digit, multipliy the accumulated result
> > by 10 (or the appropriate base) and add the next value.  Oh, and handle
> > scientific notation as a special case, and perhaps fail spectacularly
> > instead of recovering gracefully in certain edge cases.  And in the
> > pathological case of a single number with 60 billion digits, run out of
> > memory (and complain loudly to the person who claimed that the file
> > contained a "dataset").  But why do I need to start with the least
> > significant digit?
> 
> You probably forgot that it has to be _streaming_. Suppose you parse
> the first digit: can you hand this information over to an external
> function to process the parsed data? -- No! because you don't know the
> magnitude yet.  What about two digits? -- Same thing.  You cannot
> leave the parser code until you know the magnitude (otherwise the
> information is useless to the external code).

If I recognize the first digit, then I *can* hand that over to an
external function to accumulate the digits that follow.

> So, even if you have enough memory and don't care about special cases
> like scientific notation: yes, you will be able to parse it, but it
> won't be a streaming parser.

Under that constraint, I'm not sure I can parse anything.  How can I
parse a string (and hand it over to an external function) until I've
found the closing quote?

How much state can a parser maintain (before it invokes an external
function) and still be considered streaming?  I fear that we may be
getting hung up on terminology rather than solving the problem at hand.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-30 Thread Dan Sommers via Python-list
On 2024-10-01 at 04:46:35 +1000,
Chris Angelico via Python-list  wrote:

> On Tue, 1 Oct 2024 at 04:30, Dan Sommers via Python-list
>  wrote:
> >
> > But why do I need to start with the least
> > significant digit?
> 
> If you start from the most significant, you don't know anything about
> the number until you finish parsing it. There's almost nothing you can
> say about a number given that it starts with a particular sequence
> (since you don't know how MANY digits there are). However, if you know
> the LAST digits, you can make certain statements about it (trivial
> examples being whether it's odd or even).

But that wasn't the question.  Sure, under certain circumstances and for
specific use cases and/or requirements, there might be arguments to read
potential numbers as strings and possibly not have to parse them
completely before accepting or rejecting them.

And if I start with the least significant digit and the number happens
to be written in scientific notation and/or has a decimal point, then I
can't tell whether it's odd or even until I further process the whole
thing anyway.

> It's not very, well, significant. But there's something to it. And it
> extends nicely to p-adic numbers, which can have an infinite number of
> nonzero digits to the left of the decimal:
> 
> https://en.wikipedia.org/wiki/P-adic_number

In Common Lisp, integers can be written in any integer base from two to
thirty six, inclusive.  So knowing the last digit doesn't tell you
whether an integer is even or odd until you know the base anyway.

Curiously, we agree:  if you move the goal posts arbitrarily, then
some algorithms that parse JSON numbers will fail.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Division-Bug in decimal and mpmath

2024-12-14 Thread Dan Sommers via Python-list
On 2024-12-14 at 12:08:29 +,
Mark Bourne via Python-list  wrote:

> Martin Ruppert wrote:
> > Hi,
> > 
> > the division 0.4/7 provides a wrong result. It should give a periodic
> > decimal fraction with at most six digits, but it doesn't.
> > 
> > Below is the comparison of the result of decimal, mpmath, dc and calc.
> > 
> > 0.0571428571428571460292086417861615440675190516880580357142857 decimal: 
> > 0.4/7
> > 0.0571428571428571460292086417861615440675190516880580357142857 mpmath: 
> > 0.4/7
> > 0.0571428571428571428571428571428571428571428571428571428571428 dc: 0.4/7
> > 0.0571428571428571428571428571428571428571428571428571428571429 calc: 0.4/7
> > 0.05714285714285715 builtin: 0.4/7
> > 
> > Both decimal and mpmath give an identical result, which is not a
> > periodic decimal fraction with at most six digits.
> > 
> > calc and dc provide as well an identical result, which *is* a periodic
> > decimal fraction with six digits, so I think that's right.
> 
> I looks like you might be running into limitations in floating-point
> numbers.  At least with decimal, calculating 4/70 instead of 0.4/7 appears
> to give the correct result.  As does:
> ```
> from decimal import Decimal as dec
> z2 = dec(4) / dec(10)
> print(z2 / dec(nen))
> ```
> You can also pass a string, and `dec("0.4")/dec(10)` gives the correct
> result as well.
> 
> Your `z` is a float, and therefore limited by the precision of a float. It
> doesn't represent exactly 0.4, since that can't be exactly represented by a
> float.  Anything you do from then on is limited to that precision.
> 
> I can't easily find documentation for dc and calc (links from PyPI are
> either broken or don't exist), but I'm guessing they use some heuristics to
> determine that the float passed in very close to 0.4 so that was probably
> intended, rather than using the exact value represented by that float.

I'm going to guess that since dc is a shell utility and calc is either
another shell utility or the calculator in emacs, and that they both do
their own conversion from a string to an internal representation without
going through an IEEE float.

Why couldn't we have evolved with eight fingers on each hand?  ;-)
-- 
https://mail.python.org/mailman/listinfo/python-list