Am I banned from Discuss forum?

2023-02-10 Thread Marco Sulla
I was banned from the mailing list and Discuss forum for a very long time.
Too much IMHO, but I paid my dues.

Now this is my state in the forum:
- I never posted something unrespectful in the last months
- I have a limitation of three posts per threads, but only on some threads
- Some random posts of mine are obscured and must be restored manually by
moderators
- I opened a thread about the proposal of a new section called
Brainstorming. It was closed without a reason.
- I can't post links
- Two discussions I posted in section Idea were moved to Help, without a
single line of explanation.

If I'm not appreciated, I want to be publicly banned with a good reason, or
at least a reason.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to generate a .pyi file for a C Extension using stubgen

2022-07-30 Thread Marco Sulla
On Fri, 29 Jul 2022 at 23:23, Barry  wrote:
>
>
>
> > On 29 Jul 2022, at 19:33, Marco Sulla  wrote:
> >
> > I tried to follow the instructions here:
> >
> > https://mypy.readthedocs.io/en/stable/stubgen.html
> >
> > but the instructions about creating a stub for a C Extension are a little
> > mysterious. I tried to use it on the .so file without luck.
>
> It says that stubgen works on .py files not .so files.
> You will need to write the .pyi for your .so manually.
>
> The docs could do with splitting the need for .pyi for .so
> away from the stubgen description.

But it says:

"Mypy includes the stubgen tool that can automatically generate stub
files (.pyi files) for Python modules and C extension modules."

I tried stubgen -m modulename, but it generates very little code.
-- 
https://mail.python.org/mailman/listinfo/python-list


How to generate a .pyi file for a C Extension using stubgen

2022-07-29 Thread Marco Sulla
I tried to follow the instructions here:

https://mypy.readthedocs.io/en/stable/stubgen.html

but the instructions about creating a stub for a C Extension are a little
mysterious. I tried to use it on the .so file without luck.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why I fail so bad to check for memory leak with this code?

2022-07-22 Thread Marco Sulla
On Fri, 22 Jul 2022 at 09:00, Barry  wrote:
> With code as complex as python’s there will be memory allocations that
occur that will not be directly related to the python code you test.
>
> To put it another way there is noise in your memory allocation signal.
>
> Usually the signal of a memory leak is very clear, as you noticed.
>
> For rare leaks I would use a tool like valgrind.

Thank you all, but I needed a simple decorator to automatize the memory
leak (and segfault) tests. I think that this version is good enough, I hope
that can be useful to someone:

def trace(iterations=100):
def decorator(func):
def wrapper():
print(
f"Loops: {iterations} - Evaluating: {func.__name__}",
flush=True
)

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(iterations):
func()

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
if stat.count_diff * 100 > iterations:
raise ValueError(f"stat: {stat}")

return wrapper

return decorator


If the decorated function fails, you can try to raise the iterations
parameter. I found that in my cases sometimes I needed a value of 200 or 300
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla
I've done this other simple test:

#!/usr/bin/env python3

import tracemalloc
import gc
import pickle

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(1000):
pickle.dumps(iter([]))

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
print(stat)

The result is:

/home/marco/sources/test.py:14: size=3339 B (+3339 B), count=63 (+63),
average=53 B
/home/marco/sources/test.py:9: size=464 B (+464 B), count=1 (+1),
average=464 B
/home/marco/sources/test.py:10: size=456 B (+456 B), count=1 (+1),
average=456 B
/home/marco/sources/test.py:13: size=28 B (+28 B), count=1 (+1), average=28
B

It seems that, after 10 million loops, only 63 have a leak, with only ~3
KB. It seems to me that we can't call it a leak, no? Probably pickle needs
a lot more cycles to be sure there's actually a real leakage.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla
This naif code shows no leak:

import resource
import pickle

c = 0

while True:
pickle.dumps(iter([]))

if (c % 1) == 0:
max_rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f"iteration: {c}, max rss: {max_rss} kb")

c += 1
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla
On Thu, 21 Jul 2022 at 22:28, MRAB  wrote:
>
> It's something to do with pickling iterators because it still occurs
> when I reduce func_76 to:
>
> @trace
> def func_76():
>  pickle.dumps(iter([]))

It's too strange. I found a bunch of true memory leaks with this
decorator. It seems to be reliable. It's correct with pickle and with
iter, but not when pickling iters.
-- 
https://mail.python.org/mailman/listinfo/python-list


Why I fail so bad to check for memory leak with this code?

2022-07-21 Thread Marco Sulla
I tried to check for memory leaks in a bunch of functions of mine using a
simple decorator. It works, but it fails with this code, returning a random
count_diff at every run. Why?

import tracemalloc
import gc
import functools
from uuid import uuid4
import pickle

def getUuid():
return str(uuid4())

def trace(func):
@functools.wraps(func)
def inner():
tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

for i in range(100):
func()

gc.collect()

snapshot2 = tracemalloc.take_snapshot().filter_traces(
(tracemalloc.Filter(True, __file__), )
)

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
tracemalloc.stop()

for stat in top_stats:
if stat.count_diff > 3:
raise ValueError(f"count_diff: {stat.count_diff}")

return inner

dict_1 = {getUuid(): i for i in range(1000)}

@trace
def func_76():
pickle.dumps(iter(dict_1))

func_76()
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Subtract n months from datetime

2022-06-22 Thread Marco Sulla
The package arrow has a simple shift method for months, weeks etc

https://arrow.readthedocs.io/en/latest/#replace-shift
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-19 Thread Marco Sulla
On Wed, 18 May 2022 at 23:32, Cameron Simpson  wrote:
>
> On 17May2022 22:45, Marco Sulla  wrote:
> >Well, I've done a benchmark.
> >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, 
> >>>> number=10)
> >1.5963431186974049
> >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, 
> >>>> number=10)
> >2.5240604374557734
> >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", 
> >>>> globals={"tail":tail}, number=10)
> >1.8944984432309866
>
> This suggests that the file size does not dominate uour runtime.

Yes, this is what I wanted to test and it seems good.

> Ah.
> _Or_ that there are similar numbers of newlines vs text in the files so
> reading similar amounts of data from the end. If the "line desnity" of
> the files were similar you would hope that the runtimes would be
> similar.

No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum,
so really long lines. Indeed I get better results tuning chunk_size.
Anyway, also with the default value the performance is not bad at all.

> >But the time of Linux tail surprise me:
> >
> >marco@buzz:~$ time tail lorem.txt
> >[text]
> >
> >real0m0.004s
> >user0m0.003s
> >sys0m0.001s
> >
> >It's strange that it's so slow. I thought it was because it decodes
> >and print the result, but I timed
>
> You're measuring different things. timeit() tries hard to measure just
> the code snippet you provide. It doesn't measure the startup cost of the
> whole python interpreter. Try:
>
> time python3 your-tail-prog.py /home/marco/lorem.txt

Well, I'll try it, but it's not a bit unfair to compare Python startup with C?
> BTW, does your `tail()` print output? If not, again not measuring the
> same thing.
> [...]
> Also: does tail(1) do character set / encoding stuff? Does your Python
> code do that? Might be apples and oranges.

Well, as I wrote I also timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=10)

and I got ~36 seconds.

> If you have the source of tail(1) to hand, consider getting to the core
> and measuring `time()` immediately before and immediately after the
> central tail operation and printing the result.

IMHO this is a very good idea, but I have to find the time(). Ahah. Emh.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-18 Thread Marco Sulla
Well, I've done a benchmark.

>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, 
>>> number=10)
1.5963431186974049
>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, 
>>> number=10)
2.5240604374557734
>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", 
>>> globals={"tail":tail}, number=10)
1.8944984432309866

small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2
GB. It seems the performance is good, thanks to the chunk suggestion.

But the time of Linux tail surprise me:

marco@buzz:~$ time tail lorem.txt
[text]

real0m0.004s
user0m0.003s
sys0m0.001s

It's strange that it's so slow. I thought it was because it decodes
and print the result, but I timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=10)

and I got ~36 seconds. It seems quite strange to me. Maybe I got the
benchmarks wrong at some point?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-16 Thread Marco Sulla
On Fri, 13 May 2022 at 12:49, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-05-13 at 12:16:57 +0200,
> Marco Sulla  wrote:
>
> > On Fri, 13 May 2022 at 00:31, Cameron Simpson  wrote:
>
> [...]
>
> > > This is nearly the worst "specification" I have ever seen.
>
> > You're lucky. I've seen much worse (or no one).
>
> At least with *no* documentation, the source code stands for itself.

So I did it well to not put one in the first time. I think that after
100 posts about tail, chunks etc it was clear what that stuff was
about and how to use it.

Speaking about more serious things, so far I've done a test with:

* a file that does not end with \n
* a file that ends with \n (after Stefan test)
* a file with more than 10 lines
* a file with less than 10 lines

It seemed to work. I've only to benchmark it. I suppose I have to test
with at least 1 GB file, a big lorem ipsum, and do an unequal
comparison with Linux tail. I'll do it when I have time, so Chris will
be no more angry with me.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-13 Thread Marco Sulla
On Fri, 13 May 2022 at 00:31, Cameron Simpson  wrote:

> On 12May2022 19:48, Marco Sulla  wrote:
> >On Thu, 12 May 2022 at 00:50, Stefan Ram  wrote:
> >>   There's no spec/doc, so one can't even test it.
> >
> >Excuse me, you're very right.
> >
> >"""
> >A function that "tails" the file. If you don't know what that means,
> >google "man tail"
> >
> >filepath: the file path of the file to be "tailed"
> >n: the numbers of lines "tailed"
> >chunk_size: oh don't care, use it as is
>
> This is nearly the worst "specification" I have ever seen.
>

You're lucky. I've seen much worse (or no one).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-12 Thread Marco Sulla
Thank you very much. This helped me to improve the function:

import os

_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"

def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)

if (n % 1 != 0):
raise ValueError(_err_n)

if (chunk_size <= 0):
raise ValueError(_err_chunk_size)

if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)

n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
newlines_to_find = n
first_step = True

with open(filepath, "rb") as f:
text = bytearray()

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size

while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)

if first_step and chunk_line_pos == search_pos - 1:
newlines_to_find += 1

first_step = False

if chunk_line_pos != -1:
newlines_to_find -= 1

if newlines_to_find == 0:
break

search_pos = chunk_line_pos

if newlines_to_find == 0:
break

return bytes(text[chunk_line_pos+1:])



On Thu, 12 May 2022 at 20:29, Stefan Ram  wrote:

>   I am not aware of a definition of "line" above,
>   but the PLR says:
>
> |A physical line is a sequence of characters terminated
> |by an end-of-line sequence.
>
>   . So 10 lines should have 10 end-of-line sequences.
>

Maybe. Maybe not. What if the file ends with no newline?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-12 Thread Marco Sulla
On Thu, 12 May 2022 at 00:50, Stefan Ram  wrote:
>
> Marco Sulla  writes:
> >def tail(filepath, n=10, chunk_size=100):
> >if (n <= 0):
> >raise ValueError(_err_n)
> ...
>
>   There's no spec/doc, so one can't even test it.

Excuse me, you're very right.

"""
A function that "tails" the file. If you don't know what that means,
google "man tail"

filepath: the file path of the file to be "tailed"
n: the numbers of lines "tailed"
chunk_size: oh don't care, use it as is
"""
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Marco Sulla
On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
>
> Have you actually checked those three, or do you merely suppose them to be 
> true?

I only suppose, as I said. I should do some benchmark and some other
tests, and, frankly, I don't want to. I don't want to because I'm
quite sure the implementation is fast, since it reads by chunks and
cache them. I'm not sure it's 100% free of bugs, but the concept is
very simple, since it simply mimics the *nix tail, so it should be
reliable.

>
> > I'd very much like to see a CPython implementation of that function. It
> > could be a method of a file object opened in binary mode, and *only* in
> > binary mode.
> >
> > What do you think about it?
>
> Still not necessary. You can simply have it in your own toolkit. Why
> should it be part of the core language?

Why not?

> How much benefit would it be
> to anyone else?

I suppose that every programmer, at least one time in its life, did a tail.

> All the same assumptions are still there, so it still
> isn't general

It's general. It mimics the *nix tail. I can't think of a more general
way to implement a tail.

> I don't understand why this wants to be in the standard library.

Well, the answer is really simple: I needed it and if I found it in
the stdlib, I used it instead of writing the first horrible function.
Furthermore, tail is such a useful tool that I suppose many others are
interested, based on this quick Google search:

https://www.google.com/search?q=python+tail

A question on Stackoverflow really much voted, many other
Stackoverflow questions, a package that seems to exactly do the same
thing, that is mimic *nix tail, and a blog post about how to tail in
Python. Furthermore, if you search python tail pypi, you can find a
bunch of other packages:

https://www.google.com/search?q=python+tail+pypi

It seems the subject is quite popular, and I can't imagine otherwise.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Marco Sulla
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber 
wrote:
>
> On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla
>  declaimed the following:
>
> >Nevertheless, tail is a fundamental tool in *nix. It's fast and
> >reliable. Also the tail command can't handle different encodings?
>
> Based upon
> https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY
> thing tail looks at is single byte "\n". It does not handle other line
> endings, and appears to performs BINARY I/O, not text I/O. It does nothing
> for bytes that are not "\n". Split multi-byte encodings are irrelevant
> since, if it does not find enough "\n" bytes in the buffer (chunk) it
reads
> another binary chunk and seeks for additional "\n" bytes. Once it finds
the
> desired amount, it is synchronized on the byte following the "\n" (which,
> for multi-byte encodings might be a NUL, but in any event, should be a
safe
> location for subsequent I/O).
>
> Interpretation of encoding appears to fall to the console driver
> configuration when displaying the bytes output by tail.

Ok, I understand. This should be a Python implementation of *nix tail:

import os

_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"

def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)

if (n % 1 != 0):
raise ValueError(_err_n)

if (chunk_size <= 0):
raise ValueError(_err_chunk_size)

if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)

n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n

with open(filepath, "rb") as f:
text = bytearray()

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size

while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)

if chunk_line_pos != -1:
lines_not_found -= 1

if lines_not_found == 0:
break

search_pos = chunk_line_pos

if lines_not_found == 0:
break

return bytes(text[chunk_line_pos+1:])

The function opens the file in binary mode and searches only for b"\n". It
returns the last n lines of the file as bytes.

I suppose this function is fast. It reads the bytes from the file in chunks
and stores them in a bytearray, prepending them to it. The final result is
read from the bytearray and converted to bytes (to be consistent with the
read method).

I suppose the function is reliable. File is opened in binary mode and only
b"\n" is searched as line end, as *nix tail (and python readline in binary
mode) do. And bytes are returned. The caller can use them as is or convert
them to a string using the encoding it wants, or do whatever its
imagination can think :)

Finally, it seems to me the function is quite simple.

If all my affirmations are true, the three obstacles written by Chris
should be passed.

I'd very much like to see a CPython implementation of that function. It
could be a method of a file object opened in binary mode, and *only* in
binary mode.

What do you think about it?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
>
> On Tue, 10 May 2022 at 03:47, Marco Sulla  
> wrote:
> >
> > On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> > >
> > > The point here is that text is a very different thing. Because you
> > > cannot seek to an absolute number of characters in an encoding with
> > > variable sized characters. _If_ you did a seek to an arbitrary number
> > > you can end up in the middle of some character. And there are encodings
> > > where you cannot inspect the data to find a character boundary in the
> > > byte stream.
> >
> > Ooook, now I understand what you and Barry mean. I suppose there's no
> > reliable way to tail a big file opened in text mode with a decent 
> > performance.
> >
> > Anyway, the previous-previous function I posted worked only for files
> > opened in binary mode, and I suppose it's reliable, since it searches
> > only for b"\n", as readline() in binary mode do.
>
> It's still fundamentally impossible to solve this in a general way, so
> the best way to do things will always be to code for *your* specific
> use-case. That means that this doesn't belong in the stdlib or core
> language, but in your own toolkit.

Nevertheless, tail is a fundamental tool in *nix. It's fast and
reliable. Also the tail command can't handle different encodings?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
>
> The point here is that text is a very different thing. Because you
> cannot seek to an absolute number of characters in an encoding with
> variable sized characters. _If_ you did a seek to an arbitrary number
> you can end up in the middle of some character. And there are encodings
> where you cannot inspect the data to find a character boundary in the
> byte stream.

Ooook, now I understand what you and Barry mean. I suppose there's no
reliable way to tail a big file opened in text mode with a decent performance.

Anyway, the previous-previous function I posted worked only for files
opened in binary mode, and I suppose it's reliable, since it searches
only for b"\n", as readline() in binary mode do.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 22:34, Barry  wrote:
>
> > On 8 May 2022, at 20:48, Marco Sulla  wrote:
> >
> > On Sun, 8 May 2022 at 20:31, Barry Scott  wrote:
> >>
> >>>> On 8 May 2022, at 17:05, Marco Sulla  
> >>>> wrote:
> >>>
> >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> >>>   n_chunk_size = n * chunk_size
> >>
> >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its 
> >> typically the smaller size the file system will allocate.
> >> I tend to read on multiple of MiB as its near instant.
> >
> > Well, I tested on a little file, a list of my preferred pizzas, so
>
> Try it on a very big file.

I'm not saying it's a good idea, it's only the value that I needed for my tests.
Anyway, it's not a problem with big files. The problem is with files
with long lines.

> >> In text mode you can only seek to a value return from f.tell() otherwise 
> >> the behaviour is undefined.
> >
> > Why? I don't see any recommendation about it in the docs:
> > https://docs.python.org/3/library/io.html#io.IOBase.seek
>
> What does adding 1 to a pos mean?
> If it’s binary it mean 1 byte further down the file but in text mode it may 
> need to
> move the point 1, 2 or 3 bytes down the file.

Emh. I re-quote

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

And so on. No mention of differences between text and binary mode.

> >> You have on limit on the amount of data read.
> >
> > I explained that previously. Anyway, chunk_size is small, so it's not
> > a great problem.
>
> Typo I meant you have no limit.
>
> You read all the data till the end of the file that might be mega bytes of 
> data.

Yes, I already explained why and how it could be optimized. I quote myself:

Shortly, the file is always opened in text mode. File is read at the
end in bigger and bigger chunks, until the file is finished or all the
lines are found.

Why? Because in encodings that have more than 1 byte per character,
reading a chunk of n bytes, then reading the previous chunk, can
eventually split the character between the chunks in two distinct
bytes.

I think one can read chunk by chunk and test the chunk junction
problem. I suppose the code will be faster this way. Anyway, it seems
that this trick is quite fast anyway and it's a lot simpler.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 22:02, Chris Angelico  wrote:
>
> Absolutely not. As has been stated multiple times in this thread, a
> fully general approach is extremely complicated, horrifically
> unreliable, and hopelessly inefficient.

Well, my implementation is quite general now. It's not complicated and
inefficient. About reliability, I can't say anything without a test
case.

> The ONLY way to make this sort
> of thing any good whatsoever is to know your own use-case and code to
> exactly that. Given the size of files you're working with, for
> instance, a simple approach of just reading the whole file would make
> far more sense than the complex seeking you're doing. For reading a
> multi-gigabyte file, the choices will be different.

Apart from the fact that it's very, very simple to optimize for small
files: this is, IMHO, a premature optimization. The code is quite fast
even if the file is small. Can it be faster? Of course, but it depends
on the use case. Every optimization in CPython must pass the benchmark
suite test. If there's little or no gain, the optimization is usually
rejected.

> No, this does NOT belong in the core language.

I respect your opinion, but IMHO you think that the task is more
complicated than the reality. It seems to me that the method can be
quite simple and fast.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 20:31, Barry Scott  wrote:
>
> > On 8 May 2022, at 17:05, Marco Sulla  wrote:
> >
> > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> >n_chunk_size = n * chunk_size
>
> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically 
> the smaller size the file system will allocate.
> I tend to read on multiple of MiB as its near instant.

Well, I tested on a little file, a list of my preferred pizzas, so

> >pos = os.stat(filepath).st_size
>
> You cannot mix POSIX API with text mode.
> pos is in bytes from the start of the file.
> Textmode will be in code points. bytes != code points.
>
> >chunk_line_pos = -1
> >lines_not_found = n
> >
> >with open(filepath, newline=newline, encoding=encoding) as f:
> >text = ""
> >
> >hard_mode = False
> >
> >if newline == None:
> >newline = _lf
> >elif newline == "":
> >hard_mode = True
> >
> >if hard_mode:
> >while pos != 0:
> >pos -= n_chunk_size
> >
> >if pos < 0:
> >pos = 0
> >
> >f.seek(pos)
>
> In text mode you can only seek to a value return from f.tell() otherwise the 
> behaviour is undefined.

Why? I don't see any recommendation about it in the docs:
https://docs.python.org/3/library/io.html#io.IOBase.seek

> >text = f.read()
>
> You have on limit on the amount of data read.

I explained that previously. Anyway, chunk_size is small, so it's not
a great problem.

> >lf_after = False
> >
> >for i, char in enumerate(reversed(text)):
>
> Simple use text.rindex('\n') or text.rfind('\n') for speed.

I can't use them when I have to find both \n or \r. So I preferred to
simplify the code and use the for cycle every time. Take into mind
anyway that this is a prototype for a Python C Api implementation
(builtin I hope, or a C extension if not)

> > Shortly, the file is always opened in text mode. File is read at the end in
> > bigger and bigger chunks, until the file is finished or all the lines are
> > found.
>
> It will fail if the contents is not ASCII.

Why?

> > Why? Because in encodings that have more than 1 byte per character, reading
> > a chunk of n bytes, then reading the previous chunk, can eventually split
> > the character between the chunks in two distinct bytes.
>
> No it cannot. text mode only knows how to return code points. Now if you are 
> in
> binary it could be split, but you are not in binary mode so it cannot.

>From the docs:

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

> > Do you think there are chances to get this function as a method of the file
> > object in CPython? The method for a file object opened in bytes mode is
> > simpler, since there's no encoding and newline is only \n in that case.
>
> State your requirements. Then see if your implementation meets them.

The method should return the last n lines from a file object.
If the file object is in text mode, the newline parameter must be honored.
If the file object is in binary mode, a newline is always b"\n", to be
consistent with readline.

I suppose the current implementation of tail satisfies the
requirements for text mode. The previous one satisfied binary mode.

Anyway, apart from my implementation, I'm curious if you think a tail
method is worth it to be a method of the builtin file objects in
CPython.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-08 Thread Marco Sulla
I think I've _almost_ found a simpler, general way:

import os

_lf = "\n"
_cr = "\r"

def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n

with open(filepath, newline=newline, encoding=encoding) as f:
text = ""

hard_mode = False

if newline == None:
newline = _lf
elif newline == "":
hard_mode = True

if hard_mode:
while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
text = f.read()
lf_after = False

for i, char in enumerate(reversed(text)):
if char == _lf:
lf_after == True
elif char == _cr:
lines_not_found -= 1

newline_size = 2 if lf_after else 1

lf_after = False
elif lf_after:
lines_not_found -= 1
newline_size = 1
lf_after = False


if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i + newline_size
break

if lines_not_found == 0:
break
else:
while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
text = f.read()

for i, char in enumerate(reversed(text)):
if char == newline:
lines_not_found -= 1

if lines_not_found == 0:
chunk_line_pos = len(text) - 1 - i +
len(newline)
break

if lines_not_found == 0:
break


if chunk_line_pos == -1:
chunk_line_pos = 0

return text[chunk_line_pos:]


Shortly, the file is always opened in text mode. File is read at the end in
bigger and bigger chunks, until the file is finished or all the lines are
found.

Why? Because in encodings that have more than 1 byte per character, reading
a chunk of n bytes, then reading the previous chunk, can eventually split
the character between the chunks in two distinct bytes.

I think one can read chunk by chunk and test the chunk junction problem. I
suppose the code will be faster this way. Anyway, it seems that this trick
is quite fast anyway and it's a lot simpler.

The final result is read from the chunk, and not from the file, so there's
no problems of misalignment of bytes and text. Furthermore, the builtin
encoding parameter is used, so this should work with all the encodings
(untested).

Furthermore, a newline parameter can be specified, as in open(). If it's
equal to the empty string, the things are a little more complicated, anyway
I suppose the code is clear. It's untested too. I only tested with an utf8
linux file.

Do you think there are chances to get this function as a method of the file
object in CPython? The method for a file object opened in bytes mode is
simpler, since there's no encoding and newline is only \n in that case.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 19:02, MRAB  wrote:
>
> On 2022-05-07 17:28, Marco Sulla wrote:
> > On Sat, 7 May 2022 at 16:08, Barry  wrote:
> >> You need to handle the file in bin mode and do the handling of line 
> >> endings and encodings yourself. It’s not that hard for the cases you 
> >> wanted.
> >
> >>>> "\n".encode("utf-16")
> > b'\xff\xfe\n\x00'
> >>>> "".encode("utf-16")
> > b'\xff\xfe'
> >>>> "a\nb".encode("utf-16")
> > b'\xff\xfea\x00\n\x00b\x00'
> >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > b'\n\x00'
> >
> > Can I use the last trick to get the encoding of a LF or a CR in any 
> > encoding?
>
> In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> could be little-endian or big-endian.
>
> As you didn't specify which you wanted, it defaulted to little-endian
> and added a BOM (U+FEFF).
>
> If you specify which endianness you want with "utf-16le" or "utf-16be",
> it won't add the BOM:
>
>  >>> # Little-endian.
>  >>> "\n".encode("utf-16le")
> b'\n\x00'
>  >>> # Big-endian.
>  >>> "\n".encode("utf-16be")
> b'\x00\n'

Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.
Do you think that

"\n".encode(encoding).lstrip("".encode(encoding))

is good for any encoding? Furthermore, is there a way to get the
encoding of an opened file object?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 16:08, Barry  wrote:
> You need to handle the file in bin mode and do the handling of line endings 
> and encodings yourself. It’s not that hard for the cases you wanted.

>>> "\n".encode("utf-16")
b'\xff\xfe\n\x00'
>>> "".encode("utf-16")
b'\xff\xfe'
>>> "a\nb".encode("utf-16")
b'\xff\xfea\x00\n\x00b\x00'
>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
b'\n\x00'

Can I use the last trick to get the encoding of a LF or a CR in any encoding?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber  wrote:
>
> Windows also uses  for the EOL marker, but Python's I/O system
> condenses that to just  internally (for TEXT mode) -- so using the
> length of a string so read to compute a file position may be off-by-one for
> each EOL in the string.

So there's no way to reliably read lines in reverse in text mode using
seek and read, but the only option is readlines?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-06 Thread Marco Sulla
I have a little problem.

I tried to extend the tail function, so it can read lines from the bottom
of a file object opened in text mode.

The problem is it does not work. It gets a starting position that is lower
than the expected by 3 characters. So the first line is read only for 2
chars, and the last line is missing.

import os

_lf = "\n"
_cr = "\r"
_lf_ord = ord(_lf)

def tail(f, n=10, chunk_size=100):
n_chunk_size = n * chunk_size
pos = os.stat(f.fileno()).st_size
chunk_line_pos = -1
lines_not_found = n
binary_mode = "b" in f.mode
lf = _lf_ord if binary_mode else _lf

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)

for i, char in enumerate(reversed(chars)):
if char == lf:
lines_not_found -= 1

if lines_not_found == 0:
chunk_line_pos = len(chars) - i - 1
print(chunk_line_pos, i)
break

if lines_not_found == 0:
break

line_pos = pos + chunk_line_pos + 1

f.seek(line_pos)

res = b"" if binary_mode else ""

for i in range(n):
res += f.readline()

return res

Maybe the problem is 1 char != 1 byte?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-02 Thread Marco Sulla
On Mon, 2 May 2022 at 00:20, Cameron Simpson  wrote:
>
> On 01May2022 18:55, Marco Sulla  wrote:
> >Something like this is OK?
> [...]
> >def tail(f):
> >chunk_size = 100
> >size = os.stat(f.fileno()).st_size
>
> I think you want os.fstat().

It's the same from py 3.3

> >chunk_line_pos = -1
> >pos = 0
> >
> >for pos in positions:
> >f.seek(pos)
> >chars = f.read(chunk_size)
> >chunk_line_pos = chars.rfind(b"\n")
> >
> >if chunk_line_pos != -1:
> >break
>
> Normal text file _end_ in a newline. I'd expect this to stop immediately
> at the end of the file.

I think it's correct. The last line in this case is an empty bytes.

> >if chunk_line_pos == -1:
> >nbytes = pos
> >pos = 0
> >f.seek(pos)
> >chars = f.read(nbytes)
> >chunk_line_pos = chars.rfind(b"\n")
>
> I presume this is because unless you're very lucky, 0 will not be a
> position in the range(). I'd be inclined to avoid duplicating this code
> and special case and instead maybe make the range unbounded and do
> something like this:
>
> if pos < 0:
> pos = 0
> ... seek/read/etc ...
> if pos == 0:
> break
>
> around the for-loop body.

Yes, I was not very happy to duplicate the code... I have to think about it.

> Seems sane. I haven't tried to run it.

Thank you ^^
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-02 Thread Marco Sulla
Ok, I suppose \n and \r are enough:


readline(size=- 1, /)

Read and return one line from the stream. If size is specified, at
most size bytes will be read.

The line terminator is always b'\n' for binary files; for text files,
the newline argument to open() can be used to select the line
terminator(s) recognized.

open(file, mode='r', buffering=- 1, encoding=None, errors=None,
newline=None, closefd=True, opener=None)
[...]
newline controls how universal newlines mode works (it only applies to
text mode). It can be None, '', '\n', '\r', and '\r\n'

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-02 Thread Marco Sulla
On Mon, 2 May 2022 at 18:31, Stefan Ram  wrote:
>
> |The Unicode standard defines a number of characters that
> |conforming applications should recognize as line terminators:[7]
> |
> |LF:Line Feed, U+000A
> |VT:Vertical Tab, U+000B
> |FF:Form Feed, U+000C
> |CR:Carriage Return, U+000D
> |CR+LF: CR (U+000D) followed by LF (U+000A)
> |NEL:   Next Line, U+0085
> |LS:Line Separator, U+2028
> |PS:Paragraph Separator, U+2029
> |
> Wikipedia "Newline".

Should I suppose that other encodings may have more line ending chars?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Marco Sulla
I suppose you should write to python-...@python.org , or in
https://discuss.python.org/ under the section Core development
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Marco Sulla
Something like this is OK?

import os

def tail(f):
chunk_size = 100
size = os.stat(f.fileno()).st_size

positions = iter(range(size, -1, -chunk_size))
next(positions)

chunk_line_pos = -1
pos = 0

for pos in positions:
f.seek(pos)
chars = f.read(chunk_size)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos != -1:
break

if chunk_line_pos == -1:
nbytes = pos
pos = 0
f.seek(pos)
chars = f.read(nbytes)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos == -1:
line_pos = pos
else:
line_pos = pos + chunk_line_pos + 1

f.seek(line_pos)

return f.readline()

This is simply for one line and for utf8.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-04-24 Thread Marco Sulla
On Sun, 24 Apr 2022 at 11:21, Roel Schroeven  wrote:

> dn schreef op 24/04/2022 om 0:04:
> > Disagreeing with @Chris in the sense that I use tail very frequently,
> > and usually in the context of server logs - but I'm talking about the
> > Linux implementation, not Python code!
> If I understand Marco correctly, what he want is to read the lines from
> bottom to top, i.e. tac instead of tail, despite his subject.
> I use tail very frequently too, but tac is something I almost never use.
>

Well, the inverse reader is only a secondary suggestion. I suppose a tail
is much more useful.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-04-24 Thread Marco Sulla
On Sun, 24 Apr 2022 at 00:19, Cameron Simpson  wrote:

> An approach I think you both may have missed: mmap the file and use
> mmap.rfind(b'\n') to locate line delimiters.
> https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind
>

Ah, I played very little with mmap, I didn't know about this. So I suppose
you can locate the newline and at that point read the line without using
chunks?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-04-24 Thread Marco Sulla
On Sat, 23 Apr 2022 at 23:18, Chris Angelico  wrote:

> Ah. Well, then, THAT is why it's inefficient: you're seeking back one
> single byte at a time, then reading forwards. That is NOT going to
> play nicely with file systems or buffers.
>
> Compare reading line by line over the file with readlines() and you'll
> see how abysmal this is.
>
> If you really only need one line (which isn't what your original post
> suggested), I would recommend starting with a chunk that is likely to
> include a full line, and expanding the chunk until you have that
> newline. Much more efficient than one byte at a time.
>

Well, I would like to have a sort of tail, so to generalise to more than 1
line. But I think that once you have a good algorithm for one line, you can
repeat it N times.

I understand that you can read a chunk instead of a single byte, so when
the newline is found you can return all the cached chunks concatenated. But
will this make the search of the start of the line faster? I suppose you
have always to read byte by byte (or more, if you're using urf16 etc) and
see if there's a newline.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-04-23 Thread Marco Sulla
On Sat, 23 Apr 2022 at 23:00, Chris Angelico  wrote:
> > > This is quite inefficient in general.
> >
> > Why inefficient? I think that readlines() will be much slower, not
> > only more time consuming.
>
> It depends on which is more costly: reading the whole file (cost
> depends on size of file) or reading chunks and splitting into lines
> (cost depends on how well you guess at chunk size). If the lines are
> all *precisely* the same number of bytes each, you can pick a chunk
> size and step backwards with near-perfect efficiency (it's still
> likely to be less efficient than reading a file forwards, on most file
> systems, but it'll be close); but if you have to guess, adjust, and
> keep going, then you lose efficiency there.

Emh, why chunks? My function simply reads byte per byte and compares it to
b"\n". When it find it, it stops and do a readline():

def tail(filepath):
"""
@author Marco Sulla
@date May 31, 2016
"""

try:
filepath.is_file
fp = str(filepath)
except AttributeError:
fp = filepath

with open(fp, "rb") as f:
size = os.stat(fp).st_size
start_pos = 0 if size - 1 < 0 else size - 1

if start_pos != 0:
f.seek(start_pos)
char = f.read(1)

if char == b"\n":
start_pos -= 1
f.seek(start_pos)

if start_pos == 0:
f.seek(start_pos)
else:
for pos in range(start_pos, -1, -1):
f.seek(pos)

char = f.read(1)

if char == b"\n":
break

return f.readline()

This is only for one line and in utf8, but it can be generalised.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-04-23 Thread Marco Sulla
On Sat, 23 Apr 2022 at 20:59, Chris Angelico  wrote:
>
> On Sun, 24 Apr 2022 at 04:37, Marco Sulla  
> wrote:
> >
> > What about introducing a method for text streams that reads the lines
> > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > Commons IO.
>
> It's fundamentally difficult to get precise. In general, there are
> three steps to reading the last N lines of a file:
>
> 1) Find out the size of the file (currently, if it's being grown)
> 2) Seek to the end of the file, minus some threshold that you hope
> will contain a number of lines
> 3) Read from there to the end of the file, split it into lines, and
> keep the last N
>
> Reading the preceding N lines is basically a matter of repeating the
> same exercise, but instead of "end of the file", use the byte position
> of the line you last read.
>
> The problem is, seeking around in a file is done by bytes, not
> characters. So if you know for sure that you can resynchronize
> (possible with UTF-8, not possible with some other encodings), then
> you can do this, but it's probably best to build it yourself (opening
> the file in binary mode).

Well, indeed I have an implementation that does more or less what you
described for utf8 only. The only difference is that I just started
from the end of file -1. I'm just wondering if this will be useful in
the stdlib. I think it's not too difficult to generalise for every
encoding.

> This is quite inefficient in general.

Why inefficient? I think that readlines() will be much slower, not
only more time consuming.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Receive a signal when waking or suspending?

2022-04-23 Thread Marco Sulla
I don't know in Python, but maybe you can create a script that writes
on a named pipe and read it from Python?
https://askubuntu.com/questions/226278/run-script-on-wakeup
-- 
https://mail.python.org/mailman/listinfo/python-list


tail

2022-04-23 Thread Marco Sulla
What about introducing a method for text streams that reads the lines
from the bottom? Java has also a ReversedLinesFileReader with Apache
Commons IO.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-04-18 Thread Marco Sulla
On Sat, 16 Apr 2022 at 17:14, Peter J. Holzer  wrote:
>
> On 2022-04-16 16:49:17 +0200, Marco Sulla wrote:
> > Furthermore, you didn't answer my simple question: why does the
> > security update package contain metadata about Debian patches, if the
> > Ubuntu security team did not benefit from Debian security patches but
> > only from internal work?
>
> It DOES NOT contain metadata about Debian patches. You are
> misinterpreting the name "debian". The directory has this name because
> the tools (dpkg, quilt, etc.) were originally written by the Debian team
> for the Debian distribution. Ubuntu uses the same tools. They didn't
> bother to rename the directory (why should they?), so the directory is
> still called "debian" on Ubuntu (and yes I know this because I've built
> numerous .deb packages on Ubuntu systems).

Ah ok, now I understand. Sorry for the confusion.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-04-16 Thread Marco Sulla
On Sat, 16 Apr 2022 at 10:15, Peter J. Holzer  wrote:
> It doesn't (or at least you can't conclude that from the evidence you
> posted).
>
> There is a subdirectory called "debian" in the build directory of every
> .deb package. This is true on Debian, Ubuntu and every other
> distribution which uses the .deb package format. This directory is
> required by the build tools and it contains all the data (e.g. build
> instructions, dependencies, patches, description, extra documentation)
> which was added by the packager. The name of the directory does not
> imply that any of the files there was created by Debian. I have built
> quite a few packages myself and I'm not a member of the Debian team.

Actually I don't care if the package was made by Debian. I'm sure that
it does not, since the Ubuntu packages have other terminology in
versions. For example, the git package is version 2.17.1-1ubuntu0.10

The important fact is that I suppose it's quite evident that the
Ubuntu team uses Debian patches to release their security updates,
since the release notes are public and worldwide, made by a
professional company, they are not made by an amateur. Furthermore I
checked all the security updates my system released when we started
this discussion, and all of them have release notes that contain
information about security patches made by Debian. Only the security
updates have these infos. Is it an amazing coincidence? I suppose no.

Furthermore, you didn't answer my simple question: why does the
security update package contain metadata about Debian patches, if the
Ubuntu security team did not benefit from Debian security patches but
only from internal work? I suppose I have to answer myself: because
the patch applied by Ubuntu _is_ actually a Debian patch.

The more interesting fact is that I checked all the security updates
and it seems they are only applications of Debian patches. So it seems
that the work of the Ubuntu security team is only to apply Debian
security patches. If so, probably Debian is really more secure than
Ubuntu, since I don't know if all the security patches made by Debian
are applied.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-14 Thread Marco Sulla
On Thu, 14 Apr 2022 at 19:16, MRAB  wrote:
>
> When you're working only with dates, timedelta not having a 'days'
> attribute would be annoying, especially when you consider that a day is
> usually 24 hours, but sometimes 23 or 25 hours (DST).

I agree. Furthermore, timedelta is, well, a time delta, not a date
with a timezone. How could a timedelta take into account DST, leap
seconds etc?

About the initial question, I think it's a good question.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-04-14 Thread Marco Sulla
On Wed, 13 Apr 2022 at 20:05, Peter J. Holzer  wrote:
>
> On 2022-04-12 21:03:00 +0200, Marco Sulla wrote:
> > On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer  wrote:
> > > They are are about a year apart, so they will usually contain different
> > > versions of most packages right from the start. So the Ubuntu and Debian
> > > security teams probably can't benefit much from each other.
> >
> > Well, this is what my updater on Lubuntu says to me today:
> >
> > Changes for tcpdump versions:
> > Installed version: 4.9.3-0ubuntu0.18.04.1
> > Available version: 4.9.3-0ubuntu0.18.04.2
> >
> > Version 4.9.3-0ubuntu0.18.04.2:
> >
> >   * SECURITY UPDATE: buffer overflow in read_infile
> > - debian/patches/CVE-2018-16301.patch: Add check of
> >   file size before allocating and reading content in
> >   tcpdump.c and netdissect-stdinc.h.
> > - CVE-2018-16301
> >   * SECURITY UPDATE: resource exhaustion with big packets
> > - debian/patches/CVE-2020-8037.patch: Add a limit to the
> >   amount of space that can be allocated when reading the
> >   packet.
> > - CVE-2020-8037
> >
> > I use an LTS version. So it seems that Ubuntu benefits from Debian
> > security patches.
>
> Why do you think so? Because the release notes mention debian/patches/*.patch?

Of course.

> This may be an artefact of the build process. The build tools for .deb
> packages expect all kinds of meta-data to live in a subdirectory called
> "debian", even on non-debian systems. This includes patches, at least if
> the maintainer is using quilt (which AFAIK is currently the recommended
> tool for that purpose).

And why does the security update package contain metadata about Debian
patches, if the Ubuntu security team did not benefit from Debian
security patches but only from internal work?

> OTOH tcpdump would be one of the those packages where Ubuntu could use a
> Debian patch directly [...]

It doesn't seem so. This is a fresh new security update:

Changes for git versions:
Installed version: 1:2.17.1-1ubuntu0.9
Available version: 1:2.17.1-1ubuntu0.10

Version 1:2.17.1-1ubuntu0.10:

  * SECURITY UPDATE: Run commands in diff users
- debian/patches/CVE-2022-24765-*.patch: fix GIT_CEILING_DIRECTORIES; add
  an owner check for the top-level-directory; add a function to
  determine whether a path is owned by the current user in patch.c,
  t/t0060-path-utils.sh, setup.c, compat/mingw.c, compat/mingw.h,
  git-compat-util.hi, config.c, config.h.
- CVE-2022-24765

I checked packages.debian.org and git 2.17 was never on Debian:

Package git

stretch (oldoldstable) (vcs): fast, scalable, distributed revision
control system
1:2.11.0-3+deb9u7: amd64 arm64 armel armhf i386 mips mips64el mipsel
ppc64el s390x
stretch-backports (vcs): fast, scalable, distributed revision control system
1:2.20.1-1~bpo9+1: amd64 arm64 armel armhf i386 mips mips64el mipsel
ppc64el s390x
buster (oldstable) (vcs): fast, scalable, distributed revision control system
1:2.20.1-2+deb10u3: amd64 arm64 armel armhf i386 mips mips64el mipsel
ppc64el s390x

etc.
https://packages.debian.org/search?keywords=git
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-04-12 Thread Marco Sulla
On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer  wrote:
> They are are about a year apart, so they will usually contain different
> versions of most packages right from the start. So the Ubuntu and Debian
> security teams probably can't benefit much from each other.

Well, this is what my updater on Lubuntu says to me today:

Changes for tcpdump versions:
Installed version: 4.9.3-0ubuntu0.18.04.1
Available version: 4.9.3-0ubuntu0.18.04.2

Version 4.9.3-0ubuntu0.18.04.2:

  * SECURITY UPDATE: buffer overflow in read_infile
- debian/patches/CVE-2018-16301.patch: Add check of
  file size before allocating and reading content in
  tcpdump.c and netdissect-stdinc.h.
- CVE-2018-16301
  * SECURITY UPDATE: resource exhaustion with big packets
- debian/patches/CVE-2020-8037.patch: Add a limit to the
  amount of space that can be allocated when reading the
  packet.
- CVE-2020-8037

I use an LTS version. So it seems that Ubuntu benefits from Debian
security patches. Not sure about the contrary.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: dict.get_deep()

2022-04-03 Thread Marco Sulla
On Sun, 3 Apr 2022 at 21:46, Peter J. Holzer  wrote:
>
> > > data.get_deep("users", 0, "address", "street", default="second star")
>
> Yep. Did that, too. Plus pass the final result through a function before
> returning it.

I didn't understand. Have you added a func parameter?

> I'm not sure whether I considered this when I wrote it, but a function
> has the advantage of working with every class which can be indexed. A
> method must be implemented on any class (so at least dict and list to be
> useful).

You're right, but where to put it? I don't know if an iterableutil package
exists. If included in the stdlib, I don't know where to put it. In
collections maybe?

PS: if you're interested, here is my implementation:

def get_deep(self, *args, default=_sentinel):
r"""
Get a nested element of the dictionary.

The method accepts multiple arguments or a single one. If a single
argument is passed, it must be an iterable. This represents the
keys or indexes of the nested element.

The method first tries to get the value v1 of the dict using the
first key. If it finds v1 and there's no other key, v1 is
returned. Otherwise, the method tries to retrieve the value from v1
associated with the second key/index, and so on.

If in any point, for any reason, the value can't be retrieved, the
`default` parameter is returned if specified. Otherwise, a
KeyError or an IndexError is raised.
"""

if len(args) == 1:
single = True

it_tpm = args[0]

try:
len(it_tpm)
it = it_tpm
except Exception:
# maybe it's a generator
try:
it = tuple(it_tpm)
except Exception:
err = (
f"`{self.get_deep.__name__}` called with a single " +
"argument supports only iterables"
)

raise TypeError(err) from None
else:
it = args
single = False

if not it:
if single:
raise ValueError(
f"`{self.get_deep.__name__}` argument is empty"
)
else:
raise TypeError(
f"`{self.get_deep.__name__}` expects at least one argument"
)

obj = self

for k in it:
try:
obj = obj[k]
except (KeyError, IndexError) as e:
if default is _sentinel:
raise e from None

return default

return obj
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: dict.get_deep()

2022-04-03 Thread Marco Sulla
On Sun, 3 Apr 2022 at 18:57, Dieter Maurer  wrote:
> You know you can easily implement this yourself -- in your own
> `dict` subclass.

Well, of course, but the question is if such a method is worth to be
builtin, in a world imbued with JSON. I suppose your answer is no.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: dict.get_deep()

2022-04-03 Thread Marco Sulla
On Sun, 3 Apr 2022 at 16:59, Kirill Ratkin via Python-list
 wrote:
>
> Hi Marco.
>
> Recently I met same issue. A service I intergated with was documented
> badly and sent ... unpredictable jsons.
>
> And pattern matching helped me in first solution. (later I switched to
> Pydantic models)
>
> For your example I'd make match rule for key path you need. For example:
>
>
> data = {"users": [{"address": {"street": "Baker"}}]}
>
> match data:
>  case {"users": [{"address": {"street": street}}]}:
>  print(f"street: {street}")
>
>  case _:
>  print("unsupported message structure")

Hi. I think your solution is very brilliant, but I'm a bit allergic to
pattern matching... :D Maybe it's me, but I found it really strange
and "magical".
-- 
https://mail.python.org/mailman/listinfo/python-list


dict.get_deep()

2022-04-02 Thread Marco Sulla
A proposal. Very often dict are used as a deeply nested carrier of
data, usually decoded from JSON. Sometimes I needed to get some of
this data, something like this:

data["users"][0]["address"]["street"]

What about something like this instead?

data.get_deep("users", 0, "address", "street")

and also, instead of this

try:
result = data["users"][0]["address"]["street"]
except KeyError, IndexError:
result = "second star"

write this:

data.get_deep("users", 0, "address", "street", default="second star")

?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-31 Thread Marco Sulla
On Thu, 31 Mar 2022 at 18:38, Cecil Westerhof via Python-list
 wrote:
> Most people think that
> Ubuntu is that also, because it is based on Debian. But Ubuntu wants
> also provide the newest versions of software and this will affect the
> stability and security negatively.

I think you're referring to the fact that Ubuntu releases a new stable
version every 6 months, while Debian every 2 years. This is true, but
Ubuntu also releases a LTS every 2 years. You can install a LTS and
change the options so you'll update the system only where a new LTS is
coming out. Furthermore you're not forced to upgrade, you can do it
when the LTS comes to the end.

On the other hand, you can live on the edge with Debian too. You can
install an unstable branch.

Furthermore, there's the company factor. According to Google, Debian
has about 1k devs, while Ubuntu only about 250. But these devs work
full time on Ubuntu and they are paid for. Not sure this is not an
important point. For what I know, historically the distros with the
reputation to be more stable are distros maintained by companies, Red
Hat and Gentoo for example.

About stability and security, I can't disagree. But I suppose the
people that use the unstable version of some Linux distro are useful
for testing and reporting bugs, also security one. So they contribute
to the stable versions, and I think we have to be grateful to these
"pioneers".
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Temporally disabling buffering

2022-03-31 Thread Marco Sulla
Dirty suggestion: stderr?

On Thu, 31 Mar 2022 at 18:38, Cecil Westerhof via Python-list
 wrote:
>
> In Python when the output of a script is going to a pipe stdout is
> buffered. When sending output to tee that is very inconvenient.
>
> We can set PYTHONUNBUFFERED, but then stdout is always unbuffered.
>
> On Linux we can do:
> PYTHONUNBUFFERED=T script.py | tee script.log
>
> Now the output is only unbuffered for the current run and buffered for
> other runs where the output goes to a pipe.
>
> --
> Cecil Westerhof
> Senior Software Engineer
> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-30 Thread Marco Sulla
On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer  wrote:
> They are are about a year apart, so they will usually contain different
> versions of most packages right from the start. So the Ubuntu and Debian
> security teams probably can't benefit much from each other.

Are you sure? Since LTS of Debian and Ubuntu lasts 5 years, I suppose
the versions of the packages should overlap at some point in the past.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best practice for caching hash

2022-03-16 Thread Marco Sulla
On Wed, 16 Mar 2022 at 09:11, Chris Angelico  wrote:
> Caching the hash of a
> string is very useful; caching the hash of a tuple, not so much; again
> quoting from the CPython source code:
>
> /* Tests have shown that it's not worth to cache the hash value, see
>https://bugs.python.org/issue9685 */

This is really interesting. Unluckily I can't use the pyperformance
benchmarks. I should use the code that uses frozendict, but I suppose
it's really hard...
Anyway this discourages me to continue to store unashable value, since
I should also store the error message. Storing only the hash when the
object is hashable is much cheaper, and maybe the extra field is not
so much a problem, since dict consumes more space than a tuple:

>>> sys.getsizeof({})
64
>>> sys.getsizeof(())
40
>>> sys.getsizeof({1:1})
232
>>> sys.getsizeof((1,))
48

> I don't know what use-cases frozendicts have, but I would
> suspect that if they are used at all, they'll often be used in cases
> where their keys are identical (for instance, the __dict__ of an
> immutable object type, where the keys are extremely stable across
> objects of the same type).

Well, I tried to implement them as dicts with shared keys, but I
abandoned it when Inada optimized dict(another_dict), where
another_dict is a compact dict. Since it's compact, you have "only" to
memcopy the entries (oversimplification).

I tried to do the same trick for the sparse dict structure, but
memcopy the keys and the values was not enough. I had to incref all
value *two* times and this slowed down the creation a lot. So I
decided to move to compact structure.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best practice for caching hash

2022-03-16 Thread Marco Sulla
On Wed, 16 Mar 2022 at 00:59, Chris Angelico  wrote:
>
> (Though it's a little confusing; a frozendict has to have nothing but
> immutable objects, yet it permits them to be unhashable?

It can have mutable objects. For example, a key k can have a list v as
value. You can modify v, but you can't assign to the key k another
value w. It's the same with the tuples, as you said. An index i can
contain a list l. Since it's a tuple, you can't set another object at
the index i, but you can modify the list l.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best practice for caching hash

2022-03-16 Thread Marco Sulla
On Wed, 16 Mar 2022 at 00:42, Cameron Simpson  wrote:
>
> Is it sensible to compute the hash only from the immutable parts?
> Bearing in mind that usually you need an equality function as well and
> it may have the same stability issues.

[...]

> In that case I would be inclined to never raise TypeError at all. I'd
> compute the hash entirely from the keys of the dict and compute equality
> in the normal fashion: identical keys and then equal corresponding
> values. That removes the requirement that values be immutable and/or
> hashable.

Well, I followed PEP 416, so I allowed immutable types for values, as
tuple does. Also tuple is hashable only if all its values are
hashable.

The equality function is the same as dict, with a little modification.
I do not check for hash in equality. I could add that, if both the
hash are calculated and different from -1 and they differ, False is
returned.

> >In this case I currently cache the value -1. The subsequent calls to
> >__hash__() will check if the value is -1. If so, a TypeError is
> >immediately raised.
>
> This will also make these values behave badly in dicts/sets, as they all
> hash to the same bucket.

Not sure to understand. If the hash is -1, it's not hashable, so it
can't be a member of a dict or set.

> You could, you know, cache the original exception.

I thought about it :) What prevented me is that is another PySsize_t
to store in memory
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best practice for caching hash

2022-03-15 Thread Marco Sulla
On Sat, 12 Mar 2022 at 22:37, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> Once hashing an object fails, why would an application try again?  I can
> see an application using a hashable value in a hashable situation again
> and again and again (i.e., taking advantage of the cache), but what's
> the use case for *repeatedly* trying to use an unhashable value again
> and again and again (i.e., taking advantage of a cached failure)?

Honestly? Don't know. Maybe because the object is passed to different
functions and all of them independently test the hashability? I'm
clutching at straws.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-14 Thread Marco Sulla
On Mon, 14 Mar 2022 at 18:33, Loris Bennett  wrote:
> I am not sure how different the two situations are.  Ubuntu is
> presumably relying on the Debian security team as well as other
> volunteers and at least one company, namely Canonical.

So do you think that Canonical contributes to the LTS security team of
Debian? It could be. In this perspective, there should be little
difference between Debian and Ubuntu. Debian 11 with XFCE is really
tempting...
-- 
https://mail.python.org/mailman/listinfo/python-list


Best practice for caching hash

2022-03-12 Thread Marco Sulla
I have a custom immutable object, and I added a cache for its hash
value. The problem is the object can be composed of mutable or
immutable objects, so the hash can raise TypeError.

In this case I currently cache the value -1. The subsequent calls to
__hash__() will check if the value is -1. If so, a TypeError is
immediately raised.

The problem is the first time I get an error with details, for example:

TypeError: unhashable type: 'list'

The subsequent times I simply raise a generic error:

TypeError

Ok, I can improve it by raising, for example, TypeError: not all
values are hashable. But do you think this is acceptable? Now I'm
thinking about it and it seems a little hacky to me.

Furthermore, in the C extension I have to define another property in
the struct, ma_hash_calculated, to track if the hash value is cached
or not, since there's no bogus value I can use in cache property,
ma_hash, to signal this. If I don't cache unhashable values, -1 can be
used to signal that ma_hash contains no cached value.

So if I do not cache if the object is unhashable, I save a little
memory per object (1 int) and I get a better error message every time.

On the other hand, if I leave the things as they are, testing the
unhashability of the object multiple times is faster. The code:

try:
hash(o)
except TypeError:
pass

execute in nanoseconds, if called more than 1 time, even if o is not
hashable. Not sure if this is a big advantage.

What do you think about? Here is the python code:
https://github.com/Marco-Sulla/python-frozendict/blob/35611f4cd869383678104dc94f82aa636c20eb24/frozendict/src/3_10/frozendictobject.c#L652-L697
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-11 Thread Marco Sulla
On Fri, 11 Mar 2022 at 19:10, Michael Torrie  wrote:
> Both Debian stable and Ubuntu LTS state they have a five year support
> life cycle.

Yes, but it seems that official security support in Debian ends after
three years:

"Debian LTS is not handled by the Debian security team, but by a
separate group of volunteers and companies interested in making it a
success"
https://wiki.debian.org/LTS

This is the only problem for me.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-11 Thread Marco Sulla
On Fri, 11 Mar 2022 at 06:38, Dan Stromberg  wrote:
> That's an attribute of your desktop environment, not the Linux distribution.
>
> EG: I'm using Debian with Cinnamon, which does support ctrl-alt-t.

Never used Cinnamon. It comes from Mint, right?

> Some folks say the desktop environment matters more than the distribution, 
> when choosing what OS to install.

Yes, it's important. I switched from Ubuntu to Xubuntu (then Lubuntu)
when Ubuntu started using Unity. I liked GNOME 2 and KDE prior to
Plasma. They were simple, lightweight and effective. I found these
qualities in XFCE and LXDE.

Anyway I think I'll not install Debian, because it's LTS releases are
not long enough for me. I don't know if there's a distro based on
Debian that has a long LTS support, Ubuntu apart.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-10 Thread Marco Sulla
On Thu, 10 Mar 2022 at 14:13, Jack Dangler  wrote:
> or why not get a cloud desktop running whatever distro you want and you
> don't have to do anything

Three reasons: privacy, speed, price. Not in this order.

On Thu, 10 Mar 2022 at 15:20, Chris Angelico  wrote:
> Very easy. I use Debian with Xfce, and it's an easy thing to add
> shortcuts - even dynamically

I used Xubuntu for a long time. I like Xfce.



On Thu, 10 Mar 2022 at 16:35, Loris Bennett  wrote:
> The shortcuts are properties of the desktop environment.  You could just
> install LXDE/LXQt on Debian if that's what you're used to from Lubuntu.

I tried LXQt on my desktop. Very disappointed. The OS Update interface
is just an "alert". LXDE unluckily is no longer developed.

> Of course, if you're too old and lazy to set up a shortcut, you might
> also be too old and lazy to install a different desktop environment ;-)

Okay, I'm lazy for boring things :D

PS: Is it just my impression or is there a plebiscite for Debian?
-- 
https://mail.python.org/mailman/listinfo/python-list


Suggestion for Linux Distro (from PSA: Linux vulnerability)

2022-03-10 Thread Marco Sulla
On Thu, 10 Mar 2022 at 04:50, Michael Torrie  wrote:
>
> On 3/9/22 13:05, Marco Sulla wrote:
> > So my laziness pays. I use only LTS distros, and I update only when
> > there are security updates.
> > PS: any suggestions for a new LTS distro? My Lubuntu is reaching its
> > end-of-life. I prefer lightweight debian-like distros.
>
> Maybe Debian itself?

I tried Debian on a VM, but I found it too much basical. A little
example: it does not have the shortcut ctrl+alt+t to open a terminal
that Ubuntu has. I'm quite sure it's simple to add, but I'm starting
to be old and lazy...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Could frozendict or frozenmap be of some use for PEP 683 (Immortal objects)?

2022-03-10 Thread Marco Sulla
On Wed, 9 Mar 2022 at 23:28, Martin Di Paola  wrote:
> Think in the immutable strings (str). What would happen with a program
> that does heavy parsing? I imagine that it will generate thousands of
> little strings. If those are immortal, the program will fill its memory
> very quickly as the GC will not reclaim their memory.

Well, as far as I know immortality was also suggested for interned
strings. If I understood well, the problem with "normal" strings is
that they are not really immutable in CPython. They have cache etc.
Also frozendict caches hash, but that cache can be easily removed.
-- 
https://mail.python.org/mailman/listinfo/python-list


Could frozendict or frozenmap be of some use for PEP 683 (Immortal objects)?

2022-03-09 Thread Marco Sulla
As title. dict can't be an immortal object, but hashable frozendict
and frozenmap can. I think this can increase their usefulness.

Another advantage: frozen dataclass will be really immutable if they
could use a frozen(dict|map) instead of a dict as __dict__
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PSA: Linux vulnerability

2022-03-09 Thread Marco Sulla
So my laziness pays. I use only LTS distros, and I update only when
there are security updates.
PS: any suggestions for a new LTS distro? My Lubuntu is reaching its
end-of-life. I prefer lightweight debian-like distros.

On Tue, 8 Mar 2022 at 19:56, Ethan Furman  wrote:
>
> https://arstechnica.com/information-technology/2022/03/linux-has-been-bitten-by-its-most-high-severity-vulnerability-in-years/
>
> --
> ~Ethan~
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cpython: when to incref before insertdict

2022-03-06 Thread Marco Sulla
On Sun, 6 Mar 2022 at 03:20, Inada Naoki  wrote:
> In general, when reference is borrowed from a caller, the reference is
> available during the API.
> But merge_dict borrows reference of key/value from other dict, not caller.
> [...]
> Again, insertdict takes the reference. So _PyDict_FromKeys() **does**
> INCREF before calling insertdict, when key/value is borrowed
> reference.
> https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Objects/dictobject.c#L2287-L2290
> https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Objects/dictobject.c#L2309-L2311
>
> On the other hand, slow path uses PyIter_Next() which returns strong
> reference. So no need to INCREF it.

Thank you Inada, these points make me things clear now.
(PS: dictobject will change a lot in 3.11... sigh :D)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: virtualenv and make DESTDIR=

2022-03-05 Thread Marco Sulla
On Sat, 5 Mar 2022 at 17:36, Barry Scott  wrote:
> Note: you usually cannot use pip when building an RPM with mock as the 
> network is disabled inside the build for
> security reasons.

Can't he previously download the packages and run pip on the local packages?
-- 
https://mail.python.org/mailman/listinfo/python-list


Cpython: when to incref before insertdict

2022-03-05 Thread Marco Sulla
I noticed that some functions inside dictobject.c that call insertdict
or PyDict_SetItem do an incref of key and value before the call, and a
decref after it. An example is dict_merge. Other functions, such as
_PyDict_FromKeys, don't do an incref before.

When an incref of key and value is needed before insertdict and when
is not? And why is an incref needed?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Error installing requirements

2022-02-19 Thread Marco Sulla
Maybe you compiled Python 2.7 by hand, David? It happened to me when I
tried to compile Python without zlib headers installed on my OS. Don't
know how it can be done on Windows.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to solve the given problem?

2022-02-10 Thread Marco Sulla
Narshad, I propose you post your questions to StackOverflow. I'm sure
they will be very happy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Global VS Local Subroutines

2022-02-10 Thread Marco Sulla
I agree with Chris. I don't know if it was already written: if you
want a local function for speed reasons, you can use the classic
approach of a main function.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do you log in your projects?

2022-02-10 Thread Marco Sulla
On Wed, 9 Feb 2022 at 20:40, Martin Di Paola  wrote:
>
> If the logs are meant to be read by my users I log high level messages,
> specially before parts that can take a while (like the classic
> "Loading...").

? Logs are not intended to be read by end users. Logs are primarily
used to understand what the code is doing in a production environment.
They could also be used to gather metrics data.

Why should you log to give a message instead of simply using a print?

> For exceptions I print the message but not the traceback.

Why? Traceback is vital to understand what and where the problem is. I
think you're confusing logs with messages. The stack trace can be
logged (I would say must), but the end user generally sees a vague
message with some hints, unless the program is used internally only.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do you log in your projects?

2022-02-08 Thread Marco Sulla
These are a lot of questions. I hope we're not off topic.
I don't know if mine are best practices. I can tell what I try to do.

On Tue, 8 Feb 2022 at 15:15, Lars Liedtke  wrote:
> - On a line per line basis? on a function/method basis?

I usually log the start and end of functions. I could also log inside
a branch or in other parts of the function/method.

> - Do you use decorators to mark beginnings and ends of methods/functions
> in log files?

No, since I put the function parameters in the first log. But I think
that such a decorator it's not bad.

> - Which kind of variable contents do you write into your logfiles? Of
> course you shouldn't leak secrets...

Well, all the data that is useful to understand what the code is
doing. It's better to repeat the essential data to identify a specific
call in all the logs of the function, so if it is called
simultaneously by more clients you can distinguish them

> - How do you decide, which kind of log message goes into which level?

It depends on the importance, the verbosity and the occurrences of the logs.

> - How do you prevent logging cluttering your actual code?

I have the opposite problem, I should log more. So I can't answer your question.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Waht do you think about my repeated_timer class

2022-02-02 Thread Marco Sulla
You could add a __del__ that calls stop :)

On Wed, 2 Feb 2022 at 21:23, Cecil Westerhof via Python-list
 wrote:
>
> I need (sometimes) to repeatedly execute a function. For this I wrote
> the below class. What do you think about it?
> from threading  import Timer
>
>
>
> class repeated_timer(object):
> def __init__(self, fn, interval, start = False):
> if not callable(fn):
> raise TypeError('{} is not a function'.format(fn))
> self._fn = fn
> self._check_interval(interval)
> self._interval   = interval
> self._timer  = None
> self._is_running = False
> if start:
> self.start()
>
> def _check_interval(self, interval):
> if not type(interval) in [int, float]:
> raise TypeError('{} is not numeric'.format(interval))
> if interval <= 0:
> raise ValueError('{} is not greater as 0'.format(interval))
>
> def _next(self):
> self._timer = Timer(self._interval, self._run)
> self._timer.start()
>
> def _run(self):
> self._next()
> self._fn()
>
> def set_interval(self, interval):
> self._check_interval(interval)
> self._interval = interval
>
> def start(self):
> if not self._is_running:
> self._next()
> self._is_running = True
>
> def stop(self):
> if self._is_running:
> self._timer.cancel()
> self._timer  = None
> self._is_running = False
>
> --
> Cecil Westerhof
> Senior Software Engineer
> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why dict.setdefault() has value as optional?

2022-02-02 Thread Marco Sulla
On Wed, 2 Feb 2022 at 14:34, Lars Liedtke  wrote:
>
> This is a quite philosophical queston if you look at it in general:
> "What value do you give a variable, that is not set?"

Maybe I expressed my question badly. My existential doubt is why
setdefault has an optional parameter for the value and not a required
parameter. I'm not asking why the default is None.
-- 
https://mail.python.org/mailman/listinfo/python-list


Why dict.setdefault() has value as optional?

2022-02-02 Thread Marco Sulla
Just out of curiosity: why dict.setdefault() has the default parameter
that well, has a default value (None)? I used setdefault in the past,
but I always specified a value. What's the use case of setting None by
default?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Segfault after deepcopy in a C extension

2022-01-31 Thread Marco Sulla
and I had to Py_INCREF(memo)! Thank you A LOT!

On Mon, 31 Jan 2022 at 23:01, Chris Angelico  wrote:

> On Tue, 1 Feb 2022 at 08:54, Marco Sulla 
> wrote:
> > PyObject* d = PyDict_New();
> > args = PyTuple_New(2);
> > PyTuple_SET_ITEM(args, 0, d);
> > PyTuple_SET_ITEM(args, 1, memo);
> > Py_DECREF(d);
> >
>
> https://docs.python.org/3/c-api/tuple.html#c.PyTuple_SET_ITEM
>
> SET_ITEM steals a reference, so you'll need to not also decref the
> dict yourself.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Segfault after deepcopy in a C extension

2022-01-31 Thread Marco Sulla
Well, this is more or less what I'm trying to do.

I have an immutable object. I would have copy.deepcopy() will return the
object itself if it's hashable. If not, it must return a deepcopy of it.

So I tried to implement a __deepcopy__ for the object. It segfaults if the
object is not hashable. And I don't understand why. gdb gives me an
incomprehensible backtrace. So I tried the old "print at each line", but
the segfault does not happen in the function. It happens when I quit REPL
or if I try to see the deepcopy.

What can I do to further debug it?

If someone is interested, this is the code:

PyObject* frozendict_deepcopy(PyObject* self, PyObject* memo) {
if (PyAnyFrozenDict_CheckExact(self)) {
frozendict_hash(self);

if (PyErr_Occurred()) {
PyErr_Clear();
}
else {
Py_INCREF(self);
return self;
}
}

if (! PyAnyFrozenDict_Check(self)) {
Py_RETURN_NOTIMPLEMENTED;
}

PyObject* d = PyDict_New();

if (d == NULL) {
return NULL;
}

PyObject* copy_module_name = NULL;
PyObject* copy_module = NULL;
PyObject* deepcopy_fun = NULL;
PyObject* args = NULL;
PyObject* res = NULL;

if (PyDict_Merge(d, self, 1)) {
goto end;
}
copy_module_name = PyUnicode_FromString("copy");

if (copy_module_name == NULL) {
goto end;
}

copy_module = PyImport_Import(copy_module_name);

if (copy_module == NULL) {
goto end;
}

deepcopy_fun = PyObject_GetAttrString(copy_module, "deepcopy");

if (deepcopy_fun == NULL) {
goto end;
}

args = PyTuple_New(2);

if (args == NULL) {
goto end;
}

PyTuple_SET_ITEM(args, 0, d);
PyTuple_SET_ITEM(args, 1, memo);

res = PyObject_CallObject(deepcopy_fun, args);

end:
Py_XDECREF(args);
Py_XDECREF(deepcopy_fun);
Py_XDECREF(copy_module);
Py_XDECREF(copy_module_name);
Py_DECREF(d);

return res;
}
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas or Numpy

2022-01-26 Thread Marco Sulla
On Mon, 24 Jan 2022 at 05:37, Dennis Lee Bieber  wrote:
> Note that the comparison warns that /indexing/ in pandas can be slow.
> If your manipulation is always "apply operationX to columnY" it should be
> okay -- but "apply operationX to the nth row of columnY", and repeat for
> other rows, is going to be slow.

In my small way, I can confirm. In one of my previous works, we used
numpy and Pandas. Writing the code in Pandas is quick, but they just
realised that was really slow, and they tried to transform as much
Panda code to numpy code as possible.

Furthermore, I saw that they were so accustomed with Pandas that they
used it for all, even for a simple csv creation, when the csv builtin
module is enough.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-16 Thread Marco Sulla
Thank you a lot for letting me understand :)

On Tue, 11 Jan 2022 at 22:09, Peter J. Holzer  wrote:

> On 2022-01-11 19:49:20 +0100, Marco Sulla wrote:
> > I think this is what you mean:
> >
> > >>> dis.dis("for _ in {1, 2}: pass")
> >   1   0 SETUP_LOOP  12 (to 14)
> >   2 LOAD_CONST   3 (frozenset({1, 2}))
> >   4 GET_ITER
> > >>6 FOR_ITER 4 (to 12)
> >   8 STORE_NAME   0 (_)
> >  10 JUMP_ABSOLUTE6
> > >>   12 POP_BLOCK
> > >>   14 LOAD_CONST   2 (None)
> >  16 RETURN_VALUE
> > >>> a = {1, 2}
> > >>> dis.dis("for _ in a: pass")
> >   1   0 SETUP_LOOP  12 (to 14)
> >   2 LOAD_NAME0 (a)
> >   4 GET_ITER
> > >>6 FOR_ITER 4 (to 12)
> >   8 STORE_NAME   1 (_)
> >  10 JUMP_ABSOLUTE6
> > >>   12 POP_BLOCK
> > >>   14 LOAD_CONST   0 (None)
> >  16 RETURN_VALUE
>
> I think you have omitted the part that Chris was hinting at.
>
> >>> dis.dis("a = {1, 2};\nfor _ in a: pass")
>   1   0 LOAD_CONST   0 (1)
>   2 LOAD_CONST   1 (2)
>   4 BUILD_SET2
>   6 STORE_NAME   0 (a)
>
>   2   8 LOAD_NAME0 (a)
>  10 GET_ITER
> >>   12 FOR_ITER 4 (to 18)
>  14 STORE_NAME   1 (_)
>  16 JUMP_ABSOLUTE   12
> >>   18 LOAD_CONST   2 (None)
>  20 RETURN_VALUE
>
> Now compare
>
>   2 LOAD_CONST   3 (frozenset({1, 2}))
>
> with
>
>   1   0 LOAD_CONST   0 (1)
>   2 LOAD_CONST   1 (2)
>   4 BUILD_SET2
>
> and you see the difference between using a frozenset as a constant and
> building a set at runtime.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Doc or example about conda custom build?

2022-01-16 Thread Marco Sulla
Sorry for being maybe a little OT. I tried to get help from other Conda
users, from chat and from the mailing list without success.

I would add a custom build on my conda package. Is there somewhere a doc or
an example about it?

(Specifically, I want to pass a custom parameter to the setup.py that lets
me package only the pure py version of the code.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pickle segfaults with custom type

2022-01-15 Thread Marco Sulla
Found. I simply forgot:


if (PyType_Ready(_Type) < 0) {
goto fail;
}

in the frozendict_exec function for the module.

On Fri, 7 Jan 2022 at 20:27, Marco Sulla 
wrote:

> I have a custom implementation of dict using a C extension. All works but
> the pickling of views and iter types. Python segfaults if I try to pickle
> them.
>
> For example, I have:
>
>
> static PyTypeObject PyFrozenDictIterKey_Type = {
> PyVarObject_HEAD_INIT(NULL, 0)
> "frozendict.keyiterator",   /* tp_name */
> sizeof(dictiterobject), /* tp_basicsize */
> 0,  /* tp_itemsize */
> /* methods */
> (destructor)dictiter_dealloc,   /* tp_dealloc */
> 0,  /* tp_vectorcall_offset */
> 0,  /* tp_getattr */
> 0,  /* tp_setattr */
> 0,  /* tp_as_async */
> 0,  /* tp_repr */
> 0,  /* tp_as_number */
> 0,  /* tp_as_sequence */
> 0,  /* tp_as_mapping */
> PyObject_HashNotImplemented,/* tp_hash */
> 0,  /* tp_call */
> 0,  /* tp_str */
> PyObject_GenericGetAttr,/* tp_getattro */
> 0,  /* tp_setattro */
> 0,  /* tp_as_buffer */
> Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
> 0,  /* tp_doc */
> (traverseproc)dictiter_traverse,/* tp_traverse */
> 0,  /* tp_clear */
> 0,  /* tp_richcompare */
> 0,  /* tp_weaklistoffset */
> PyObject_SelfIter,  /* tp_iter */
> (iternextfunc)frozendictiter_iternextkey,   /* tp_iternext */
> dictiter_methods,   /* tp_methods */
> 0,
> };
>
> This is the backtrace I get with gdb:
>
> #0  PyObject_Hash (v=0x7f043ce15540 ) at
> ../cpython_3_10/Objects/object.c:788
> #1  0x0048611c in PyDict_GetItemWithError (op=0x7f043e1f4900,
> key=key@entry=0x7f043ce15540 )
> at ../cpython_3_10/Objects/dictobject.c:1520
> #2  0x7f043ce227f6 in save (self=self@entry=0x7f043d8507d0,
> obj=obj@entry=0x7f043e1fb0b0, pers_save=pers_save@entry=0)
> at /home/marco/sources/cpython_3_10/Modules/_pickle.c:4381
> #3  0x7f043ce2534d in dump (self=self@entry=0x7f043d8507d0,
> obj=obj@entry=0x7f043e1fb0b0) at
> /home/marco/sources/cpython_3_10/Modules/_pickle.c:4515
> #4  0x7f043ce2567f in _pickle_dumps_impl (module=,
> buffer_callback=, fix_imports=,
> protocol=,
> obj=0x7f043e1fb0b0) at
> /home/marco/sources/cpython_3_10/Modules/_pickle.c:1203
> #5  _pickle_dumps (module=, args=,
> nargs=, kwnames=)
> at /home/marco/sources/cpython_3_10/Modules/clinic/_pickle.c.h:619
>
> and so on. The problematic part is in the second frame. Indeed the code of
> _pickle.c here is:
>
>
> reduce_func = PyDict_GetItemWithError(st->dispatch_table,
>   (PyObject *)type);
>
> The problem is that type is NULL. It tries to get the attribute tp_hash
> and it segfaults.
>
> I tried to change the header of the type to:
>
> PyVarObject_HEAD_INIT(_Type, 0)
>
> This way it works but, as known, it does not compile on Windows.
>
> The strange fact is that pickling the main type works, even if the type is
> NULL, as suggested for a custom type. This is the main type:
>
> PyTypeObject PyFrozenDict_Type = {
> PyVarObject_HEAD_INIT(NULL, 0)
> "frozendict." FROZENDICT_CLASS_NAME,/* tp_name */
> sizeof(PyFrozenDictObject), /* tp_basicsize */
> 0,  /* tp_itemsize */
> (destructor)dict_dealloc,   /* tp_dealloc */
> 0,  /* tp_vectorcall_offset */
> 0,  /* tp_getattr */
> 0,  /* tp_setattr */
> 0,  /* tp_as_async */
> (reprfunc)frozendict_repr,  /* tp_repr */
> _as_number,  /* tp_as_number */
> _as_sequence,  

Re: Why operations between dict views return a set and not a frozenset?

2022-01-11 Thread Marco Sulla
Ok... so I suppose, since you're inviting me to use dis and look at the
bytecode, that are you talking about constants in assembly, so const in C?
Sorry for the confusion, I'm not so skilled in C and I know nearly nothing
about assembly. Furthermore I never look at the bytecode of any language
before, so I simply didn't understand you.

I think this is what you mean:

>>> dis.dis("for _ in {1, 2}: pass")
  1   0 SETUP_LOOP  12 (to 14)
  2 LOAD_CONST   3 (frozenset({1, 2}))
  4 GET_ITER
>>6 FOR_ITER 4 (to 12)
  8 STORE_NAME   0 (_)
 10 JUMP_ABSOLUTE6
>>   12 POP_BLOCK
>>   14 LOAD_CONST   2 (None)
 16 RETURN_VALUE
>>> a = {1, 2}
>>> dis.dis("for _ in a: pass")
  1   0 SETUP_LOOP  12 (to 14)
  2 LOAD_NAME0 (a)
  4 GET_ITER
>>6 FOR_ITER 4 (to 12)
  8 STORE_NAME   1 (_)
 10 JUMP_ABSOLUTE6
>>   12 POP_BLOCK
>>   14 LOAD_CONST   0 (None)
 16 RETURN_VALUE


On Tue, 11 Jan 2022 at 01:05, Chris Angelico  wrote:

> On Tue, Jan 11, 2022 at 10:26 AM Marco Sulla
>  wrote:
> >
> > On Wed, 5 Jan 2022 at 23:02, Chris Angelico  wrote:
> > >
> > > On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla <
> marco.sulla.pyt...@gmail.com> wrote:
> > > >
> > > > On Wed, 5 Jan 2022 at 14:16, Chris Angelico 
> wrote:
> > > > > That's an entirely invisible optimization, but it's more than just
> > > > > "frozenset is faster than set". It's that a frozenset or tuple can
> be
> > > > > stored as a function's constants, which is a massive difference.
> > > >
> > > > Can you explain this?
> > >
> > > Play around with dis.dis and timeit.
> >
> > ? I don't understand. You're talking about function constants. What
> > are they? I can't dig deep into something if I can't know what it is.
> > Maybe are you talking about function default values for parameters?
>
> No, I'm talking about constants. Every function has them.
>
> > Of course. You can use a proxy and slow down almost everything much
> > more. Or you can simply create a version of the mutable object with
> > fewer methods, as more or less frozenset is. I checked the
> > implementation, no fast iteration is implemented. I do not understand
> > why in `for x in {1, 2, 3}` the set is substituted by a frozenset.
>
> Constants. Like I said, play around with dis.dis, and explore what's
> already happening. A set can't be a constant, a frozenset can be.
> Constants are way faster than building from scratch.
>
> Explore. Play around. I'm not going to try to explain everything in detail.
>
> If you're delving into the details of the C implementation of the
> dictionary, I would have expected you'd already be familiar with the
> way that functions behave.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-10 Thread Marco Sulla
On Wed, 5 Jan 2022 at 23:02, Chris Angelico  wrote:
>
> On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla  
> wrote:
> >
> > On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> > > That's an entirely invisible optimization, but it's more than just
> > > "frozenset is faster than set". It's that a frozenset or tuple can be
> > > stored as a function's constants, which is a massive difference.
> >
> > Can you explain this?
>
> Play around with dis.dis and timeit.

? I don't understand. You're talking about function constants. What
are they? I can't dig deep into something if I can't know what it is.
Maybe are you talking about function default values for parameters?

> > > Function positional arguments aren't interchangeable, so it makes
> > > sense to have them as a tuple.
> >
> > You are wrong, since kwarg is a dict. Indeed I proposed to use
> > frozendict for kwargs, and Guido said that it's a pity that this will
> > break a lot of existing Python code :D, since the fact that args is
> > _immutable_ and kwargs not always bothered him.
>
> Excuse me? I mentioned kwargs in the part that you removed from the
> quote, and the part you're quoting explicitly says "positional
> arguments".

Ok, I quote also the other part:

> (Function *keyword* arguments, on the other hand, are different; as
> long as the mapping from keys to values is maintained, you can remove
> some of them and pass the rest on, without fundamentally changing
> their meaning.)

First of all, I repeat, Guido said (more or less) that in a perfect
world, kwargs are immutable. Or maybe I did not understand what he
said, maybe he said that in a perfect world also args are mutable. But
I suppose it's more probable the first hypothesis :D

Secondly, you can also get the args from a function, transform it in a
list, change something and pass it unpacked to another function. You
will not change the meaning of the tuple, since, well, you copied it
in another mutable object. The original object is untouched.

I perfectly agree that, in the majority of cases, returning an
immutable vs a mutable are a matter of... sense? Meaning? Ok, I
perfectly agree. But IMHO there are many cases in which immutable
objects are used for a matter of speed, and I bet that args is one of
them.

> > Anyway, I'm starting to think that neither set nor frozenset are good
> > for dict items:
> >
> > (venv_3_10) marco@buzz:~$ python
> > Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
> > [GCC 10.1.1 20200718] on linux
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> a = {1: 2}
> > >>> b = {3: []}
> > >>> a | b
> > {1: 2, 3: []}
> > >>> a.items() | b.items()
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: unhashable type: 'list'
> > >>>
>
> Well yes. Only dict keys can be considered to be set-like.

This is not true. It's at least from Python 3.6, and I think also
before, that almost the full Set API was added to both keys and items
view.
Items indeed are a sort of set, in a mathematical sense, since any
pair (key, value) is unique, even if value is mutable.

> I don't
> know WHAT you think you're trying to do here, but if you ever thought
> of set operations on dict values, you may want to completely rethink
> what you're doing.

set ops on values? Never said that :) I said that currently you can
operate on item views with set operators. This is a fact.

I also said that, since py sets accept only hashable objects, maybe
another ad-hoc object should be used for the result of the items
operations.
But maybe the change isn't worth the additional trouble. Indeed I
didn't know about the new set methods and operations on dict views
until I explored dictobject.c

> Performance is not an automatic result of immutability. That simply
> isn't how it works.

Of course. You can use a proxy and slow down almost everything much
more. Or you can simply create a version of the mutable object with
fewer methods, as more or less frozenset is. I checked the
implementation, no fast iteration is implemented. I do not understand
why in `for x in {1, 2, 3}` the set is substituted by a frozenset.
-- 
https://mail.python.org/mailman/listinfo/python-list


Pickle segfaults with custom type

2022-01-07 Thread Marco Sulla
I have a custom implementation of dict using a C extension. All works but
the pickling of views and iter types. Python segfaults if I try to pickle
them.

For example, I have:


static PyTypeObject PyFrozenDictIterKey_Type = {
PyVarObject_HEAD_INIT(NULL, 0)
"frozendict.keyiterator",   /* tp_name */
sizeof(dictiterobject), /* tp_basicsize */
0,  /* tp_itemsize */
/* methods */
(destructor)dictiter_dealloc,   /* tp_dealloc */
0,  /* tp_vectorcall_offset */
0,  /* tp_getattr */
0,  /* tp_setattr */
0,  /* tp_as_async */
0,  /* tp_repr */
0,  /* tp_as_number */
0,  /* tp_as_sequence */
0,  /* tp_as_mapping */
PyObject_HashNotImplemented,/* tp_hash */
0,  /* tp_call */
0,  /* tp_str */
PyObject_GenericGetAttr,/* tp_getattro */
0,  /* tp_setattro */
0,  /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
0,  /* tp_doc */
(traverseproc)dictiter_traverse,/* tp_traverse */
0,  /* tp_clear */
0,  /* tp_richcompare */
0,  /* tp_weaklistoffset */
PyObject_SelfIter,  /* tp_iter */
(iternextfunc)frozendictiter_iternextkey,   /* tp_iternext */
dictiter_methods,   /* tp_methods */
0,
};

This is the backtrace I get with gdb:

#0  PyObject_Hash (v=0x7f043ce15540 ) at
../cpython_3_10/Objects/object.c:788
#1  0x0048611c in PyDict_GetItemWithError (op=0x7f043e1f4900,
key=key@entry=0x7f043ce15540 )
at ../cpython_3_10/Objects/dictobject.c:1520
#2  0x7f043ce227f6 in save (self=self@entry=0x7f043d8507d0,
obj=obj@entry=0x7f043e1fb0b0, pers_save=pers_save@entry=0)
at /home/marco/sources/cpython_3_10/Modules/_pickle.c:4381
#3  0x7f043ce2534d in dump (self=self@entry=0x7f043d8507d0,
obj=obj@entry=0x7f043e1fb0b0) at
/home/marco/sources/cpython_3_10/Modules/_pickle.c:4515
#4  0x7f043ce2567f in _pickle_dumps_impl (module=,
buffer_callback=, fix_imports=,
protocol=,
obj=0x7f043e1fb0b0) at
/home/marco/sources/cpython_3_10/Modules/_pickle.c:1203
#5  _pickle_dumps (module=, args=,
nargs=, kwnames=)
at /home/marco/sources/cpython_3_10/Modules/clinic/_pickle.c.h:619

and so on. The problematic part is in the second frame. Indeed the code of
_pickle.c here is:


reduce_func = PyDict_GetItemWithError(st->dispatch_table,
  (PyObject *)type);

The problem is that type is NULL. It tries to get the attribute tp_hash and
it segfaults.

I tried to change the header of the type to:

PyVarObject_HEAD_INIT(_Type, 0)

This way it works but, as known, it does not compile on Windows.

The strange fact is that pickling the main type works, even if the type is
NULL, as suggested for a custom type. This is the main type:

PyTypeObject PyFrozenDict_Type = {
PyVarObject_HEAD_INIT(NULL, 0)
"frozendict." FROZENDICT_CLASS_NAME,/* tp_name */
sizeof(PyFrozenDictObject), /* tp_basicsize */
0,  /* tp_itemsize */
(destructor)dict_dealloc,   /* tp_dealloc */
0,  /* tp_vectorcall_offset */
0,  /* tp_getattr */
0,  /* tp_setattr */
0,  /* tp_as_async */
(reprfunc)frozendict_repr,  /* tp_repr */
_as_number,  /* tp_as_number */
_as_sequence,  /* tp_as_sequence */
_as_mapping, /* tp_as_mapping */
(hashfunc)frozendict_hash,  /* tp_hash */
0,  /* tp_call */
0,  /* tp_str */
PyObject_GenericGetAttr,/* tp_getattro */
0,  /* tp_setattro */
0,  /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC
| Py_TPFLAGS_BASETYPE
| _Py_TPFLAGS_MATCH_SELF
| Py_TPFLAGS_MAPPING,   /* tp_flags */
frozendict_doc,  

Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Marco Sulla
On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> That's an entirely invisible optimization, but it's more than just
> "frozenset is faster than set". It's that a frozenset or tuple can be
> stored as a function's constants, which is a massive difference.

Can you explain this?

> In fact, the two data types are virtually identical in performance once 
> created [...]

This is really strange, since in theory frozenset should not have to
check if itself is mutated during the iteration, on each cycle. So the
speed should be noticeable faster. Maybe frozenset was not optimised,
because the use case is really little and will add potentially useless
C code? Furthermore, more the code, more the memory consumption and
less the speed. I have to check setobject.c.

> Function positional arguments aren't interchangeable, so it makes
> sense to have them as a tuple.

You are wrong, since kwarg is a dict. Indeed I proposed to use
frozendict for kwargs, and Guido said that it's a pity that this will
break a lot of existing Python code :D, since the fact that args is
_immutable_ and kwargs not always bothered him.

Anyway, I'm starting to think that neither set nor frozenset are good
for dict items:

(venv_3_10) marco@buzz:~$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1: 2}
>>> b = {3: []}
>>> a | b
{1: 2, 3: []}
>>> a.items() | b.items()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'list'
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Marco Sulla
On Wed, 5 Jan 2022 at 00:54, Chris Angelico  wrote:
> That's because a tuple is the correct data type when returning two
> distinct items. It's not a list that has two elements in it; it's a
> tuple of (key, value). Immutability is irrelevant.

Immutability is irrelevant, speed no. A tuple is faster than a list
and more compact. Also frozenset is faster than set. Indeed CPython
optimises internally a

for x in {1, 2, 3}

transforming the set in a frozenset for a matter of speed. That's why
tuple is usually preferred. I expected the same for frozenset

> Got any examples of variable-length sequences?

function positional args are tuples, for example.

> Usually a tuple is a
> structure, not just a sequence.

eh? Are you talking about the underlying C code?

> If something is just returning a
> sequence, it'll most often return a dedicated sequence type (like
> range in Py3) or a list (like lots of things in Py2).

Python 2 is now obsolete, I don't think is relevant for the discussion.

About your sentence, yes, usually a dedicated view, sequence or
generator is returned, but tuples too are really much used. A list is
returned very sporadically, for what I remember.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Marco Sulla
On Tue, 4 Jan 2022 at 19:38, Chris Angelico  wrote:
> [...] should the keys view be considered
> frozen or not? Remember the set of keys can change (when the
> underlying dict changes).

Well, also the items can change, but they are returned as tuples with
2 elements.

It seems to me that the stdlib, when something should return a
sequence, prefers to return a tuple. So I expected the same preference
for frozenset over set.

> It's not difficult to construct a frozenset from a set.

This sentence has the commutative property :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Marco Sulla
$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1:2}
>>> c = {1:2, 3:4}
>>> c.keys() - a.keys()
{3}
>>>


Why not frozenset({3})?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ModuleNotFoundError: No module named 'DistUtilsExtra'

2022-01-02 Thread Marco Sulla
https://askubuntu.com/questions/584857/distutilsextra-problem

On Sun, 2 Jan 2022 at 18:52, hongy...@gmail.com  wrote:
>
> On Ubuntu 20.04.3 LTS, I try to install pdfarranger [1] as follows but failed:
>
> $ sudo apt-get install python3-pip python3-distutils-extra \
>   python3-wheel python3-gi 
> python3-gi-cairo \
>   gir1.2-gtk-3.0 gir1.2-poppler-0.18 
> python3-setuptools
> $ git clone https://github.com/pdfarranger/pdfarranger.git pdfarranger.git
> $ cd pdfarranger.git
> $ pyenv shell 3.8.3
> $ pyenv virtualenv --system-site-packages pdfarranger
> $ pyenv shell pdfarranger
> $ pip install -U pip
> $ ./setup.py build
> Traceback (most recent call last):
>   File "./setup.py", line 24, in 
> from DistUtilsExtra.command import (
> ModuleNotFoundError: No module named 'DistUtilsExtra'
>
>
> See the following for the package list installed in this virtualenv:
>
> $ pip list
> PackageVersion
> -- 
> pip21.3.1
> pyfiglet   0.8.post1
> setuptools 41.2.0
> vtk9.0.20200612
>
> Any hints for fixing this problem? Also see here [2-3] for relevant 
> discussions.
>
> [1] https://github.com/pdfarranger/pdfarranger
> [2] https://github.com/pdfarranger/pdfarranger/issues/604
> [3] 
> https://discuss.python.org/t/modulenotfounderror-no-module-named-distutilsextra/12834
>
> Regards,
> HZ
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Who wrote Py_UNREACHABLE?

2022-01-02 Thread Marco Sulla
#if defined(RANDALL_WAS_HERE)
#  define Py_UNREACHABLE() \
Py_FatalError( \
"If you're seeing this, the code is in what I thought was\n" \
"an unreachable state.\n\n" \
"I could give you advice for what to do, but honestly, why\n" \
"should you trust me?  I clearly screwed this up.  I'm writing\n" \
"a message that should never appear, yet I know it will\n" \
"probably appear someday.\n\n" \
"On a deep level, I know I'm not up to this task.\n" \
"I'm so sorry.\n" \
"https://xkcd.com/2200;)
#elif defined(Py_DEBUG)
#  define Py_UNREACHABLE() \
Py_FatalError( \
"We've reached an unreachable state. Anything is possible.\n" \
"The limits were in our heads all along. Follow your dreams.\n" \
"https://xkcd.com/2200;)

etc
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to implement freelists in dict 3.10 for previous versions?

2022-01-01 Thread Marco Sulla
Ooookay, I suppose I have to study a little the thing :D

On Thu, 30 Dec 2021 at 07:59, Inada Naoki  wrote:
>
> On Wed, Dec 29, 2021 at 7:25 PM Marco Sulla
>  wrote:
> >
> > I noticed that now freelists in dict use _Py_dict_state. I suppose
> > this is done for thread safety.
> >
>
> Some core-dev are working on per-interpreter GIL. But it is not done yet.
> So you don't need to follow it soon. Your extension module will work
> well in Python 3.11.
>
> > I would implement it also for a C extension that uses CPython < 3.10.
> > How can I achieve this?
>
> See PyModule_GetState() to have per-interpreter module state instead
> of static variables.
> https://docs.python.org/3/c-api/module.html#c.PyModule_GetState
>
>
> --
> Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


How to make a type of a C extension compatible with mypy

2022-01-01 Thread Marco Sulla
I created a type in a C extension, that is an immutable dict. If I do:

a: mydict[str, str]

it works. But it doesn't work with mypy, as signalled to me by an user:

https://github.com/Marco-Sulla/python-frozendict/issues/39

How can I make it work? I don't know what he means with annotating
methods, and furthermore I suppose I can't do this in C.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: recover pickled data: pickle data was truncated

2022-01-01 Thread Marco Sulla
I agree with Barry. You can create a folder or a file with
pseudo-random names. I recommend you to use str(uuid.uuid4())

On Sat, 1 Jan 2022 at 14:11, Barry  wrote:
>
>
>
> > On 31 Dec 2021, at 17:53, iMath  wrote:
> >
> > 在 2021年12月30日星期四 UTC+8 03:13:21, 写道:
> >>> On Wed, 29 Dec 2021 at 18:33, iMath  wrote:
> >>> But I found the size of the file of the shelve data didn't change much, 
> >>> so I guess the data are still in it , I just wonder any way to recover my 
> >>> data.
> >> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling
> >> it by hand is a harsh work and maybe unreliable.
> >>
> >> Is there any reason you can't simply add a semaphore to avoid writing
> >> at the same time and re-run the code and regenerate the data?
> >
> > Thanks for your replies! I didn't have a sense of adding a semaphore on 
> > writing to pickle data before, so  corrupted the data.
> > Since my data was colleted in the daily usage, so cannot re-run the code 
> > and regenerate the data.
> > In order to avoid corrupting my data again and the complicity of using  a 
> > semaphore, now I am using json text to store my data.
>
> That will not fix the problem. You will end up with corrupt json.
>
> If you have one writer and one read then may be you can use the fact that a 
> rename is atomic.
>
> Writer does this:
> 1. Creat new json file in the same folder but with a tmp name
> 2. Rename the file from its tmp name to the public name.
>
> The read will just read the public name.
>
> I am not sure what happens in your world if the writer runs a second time 
> before the data is read.
>
> In that case you need to create a queue of files to be read.
>
> But if the problem is two process racing against each other you MUST use 
> locking.
> It cannot be avoided for robust operations.
>
> Barry
>
>
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: builtins.TypeError: catching classes that do not inherit from BaseException is not allowed

2021-12-31 Thread Marco Sulla
It was already done: https://pypi.org/project/tail-recursive/

On Thu, 30 Dec 2021 at 16:00, hongy...@gmail.com  wrote:
>
> I try to compute the factorial of a large number with tail-recursion 
> optimization decorator in Python3. The following code snippet is converted 
> from the code snippet given here [1] by the following steps:
>
> $ pyenv shell datasci
> $ python --version
> Python 3.9.1
> $ pip install 2to3
> $ 2to3 -w this-script.py
>
> ```
> # This program shows off a python decorator(
> # which implements tail call optimization. It
> # does this by throwing an exception if it is
> # its own grandparent, and catching such
> # exceptions to recall the stack.
>
> import sys
>
> class TailRecurseException:
>   def __init__(self, args, kwargs):
> self.args = args
> self.kwargs = kwargs
>
> def tail_call_optimized(g):
>   """
>   This function decorates a function with tail call
>   optimization. It does this by throwing an exception
>   if it is its own grandparent, and catching such
>   exceptions to fake the tail call optimization.
>
>   This function fails if the decorated
>   function recurses in a non-tail context.
>   """
>   def func(*args, **kwargs):
> f = sys._getframe()
> if f.f_back and f.f_back.f_back \
> and f.f_back.f_back.f_code == f.f_code:
>   raise TailRecurseException(args, kwargs)
> else:
>   while 1:
> try:
>   return g(*args, **kwargs)
> except TailRecurseException as e:
>   args = e.args
>   kwargs = e.kwargs
>   func.__doc__ = g.__doc__
>   return func
>
> @tail_call_optimized
> def factorial(n, acc=1):
>   "calculate a factorial"
>   if n == 0:
> return acc
>   return factorial(n-1, n*acc)
>
> print(factorial(1))
> # prints a big, big number,
> # but doesn't hit the recursion limit.
>
> @tail_call_optimized
> def fib(i, current = 0, next = 1):
>   if i == 0:
> return current
>   else:
> return fib(i - 1, next, current + next)
>
> print(fib(1))
> # also prints a big number,
> # but doesn't hit the recursion limit.
> ```
> However, when I try to test the above script, the following error will be 
> triggered:
> ```
> $ python this-script.py
> Traceback (most recent call last):
>   File "/home/werner/this-script.py", line 32, in func
> return g(*args, **kwargs)
>   File "/home/werner/this-script.py", line 44, in factorial
> return factorial(n-1, n*acc)
>   File "/home/werner/this-script.py", line 28, in func
> raise TailRecurseException(args, kwargs)
> TypeError: exceptions must derive from BaseException
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "/home/werner/this-script.py", line 46, in 
> print(factorial(1))
>   File "/home/werner/this-script.py", line 33, in func
> except TailRecurseException as e:
> TypeError: catching classes that do not inherit from BaseException is not 
> allowed
> ```
>
> Any hints for fixing this problem will be highly appreciated.
>
> [1]  https://stackoverflow.com/q/27417874
>
> Regards,
> HZ
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: recover pickled data: pickle data was truncated

2021-12-29 Thread Marco Sulla
On Wed, 29 Dec 2021 at 18:33, iMath  wrote:
> But I found the size of the file of the shelve data didn't change much, so I 
> guess the data are still in it , I just wonder any way to recover my data.

I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling
it by hand is a harsh work and maybe unreliable.

Is there any reason you can't simply add a semaphore to avoid writing
at the same time and re-run the code and regenerate the data?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla
On Wed, 29 Dec 2021 at 12:11, Dieter Maurer  wrote:
>
> Marco Sulla wrote at 2021-12-29 11:59 +0100:
> >On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> >> `MutableMapping` is a so called abstract base class (--> `abc`).
> >>
> >> It uses the `__subclass_check__` (and `__instance_check__`) of
> >> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> >> Those can be customized by overriding `MutableMapping.__subclasshook__`
> >> to ensure that your `frozendict` class (and their subclasses)
> >> are not considered subclasses of `MutableMapping`.
> >
> >It does not work:
> > ...
> >>>> issubclass(fd, Mm)
> >True
>
> There is a cache involved. The `issubclass` above,
> brings your `fd` in the `Mn`'s subclass cache.

It works, thank you! I had to put it before

Mapping.register(frozendict)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla
On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> `MutableMapping` is a so called abstract base class (--> `abc`).
>
> It uses the `__subclass_check__` (and `__instance_check__`) of
> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> Those can be customized by overriding `MutableMapping.__subclasshook__`
> to ensure that your `frozendict` class (and their subclasses)
> are not considered subclasses of `MutableMapping`.

It does not work:

$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import frozendict
>>> frozendict.c_ext
False
>>> from frozendict import frozendict as fd
>>> from collections.abc import MutableMapping as Mm
>>> issubclass(fd, Mm)
True
>>> @classmethod
... def _my_subclasshook(klass, subclass):
... if subclass == fd:
... return False
... return NotImplemented
...
>>> @classmethod
... def _my_subclasshook(klass, subclass):
... print(subclass)
... if subclass == fd:
... return False
... return NotImplemented
...
>>> Mm.__subclasshook__ = _my_subclasshook
>>> issubclass(fd, Mm)
True
>>> issubclass(tuple, Mm)






False
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list


How to implement freelists in dict 3.10 for previous versions?

2021-12-29 Thread Marco Sulla
I noticed that now freelists in dict use _Py_dict_state. I suppose
this is done for thread safety.

I would implement it also for a C extension that uses CPython < 3.10.
How can I achieve this?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla
On Wed, 29 Dec 2021 at 10:06, Dieter Maurer  wrote:
>
> Are you sure you need to implement your type in C at all?

It's already implemented, and, in some cases, is faster than dict:

https://github.com/Marco-Sulla/python-frozendict#benchmarks

PS: I'm doing a refactoring that speeds up creation even further,
making it almost as fast as dict.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?

2021-12-29 Thread Marco Sulla
On second thought, I think I'll do this for the pure py version. But I
will definitely not do this for the C extension, since it's anyway
strange that an immutable mapping inherits from a mutable one! I've
done it in the pure py version only for a matter of speed.

On Wed, 29 Dec 2021 at 09:24, Marco Sulla  wrote:
>
> On Wed, 29 Dec 2021 at 09:12, Dieter Maurer  wrote:
> >
> > Marco Sulla wrote at 2021-12-29 08:08 +0100:
> > >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer  wrote:
> > >> Why do you not derive from `dict` and override its mutating methods
> > >> (to raise a type error after initialization is complete)?
> > >
> > >I've done this for the pure py version, for speed. But in this way,
> > >frozendict results to be a subclass of MutableMapping.
> >
> > `MutableMapping` is a so called abstract base class (--> `abc`).
> >
> > It uses the `__subclass_check__` (and `__instance_check__`) of
> > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`.
> > Those can be customized by overriding `MutableMapping.__subclasshook__`
> > to ensure that your `frozendict` class (and their subclasses)
> > are not considered subclasses of `MutableMapping`.
>
> Emh. Too hacky for me too, sorry :D
-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   >