Am I banned from Discuss forum?
I was banned from the mailing list and Discuss forum for a very long time. Too much IMHO, but I paid my dues. Now this is my state in the forum: - I never posted something unrespectful in the last months - I have a limitation of three posts per threads, but only on some threads - Some random posts of mine are obscured and must be restored manually by moderators - I opened a thread about the proposal of a new section called Brainstorming. It was closed without a reason. - I can't post links - Two discussions I posted in section Idea were moved to Help, without a single line of explanation. If I'm not appreciated, I want to be publicly banned with a good reason, or at least a reason. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to generate a .pyi file for a C Extension using stubgen
On Fri, 29 Jul 2022 at 23:23, Barry wrote: > > > > > On 29 Jul 2022, at 19:33, Marco Sulla wrote: > > > > I tried to follow the instructions here: > > > > https://mypy.readthedocs.io/en/stable/stubgen.html > > > > but the instructions about creating a stub for a C Extension are a little > > mysterious. I tried to use it on the .so file without luck. > > It says that stubgen works on .py files not .so files. > You will need to write the .pyi for your .so manually. > > The docs could do with splitting the need for .pyi for .so > away from the stubgen description. But it says: "Mypy includes the stubgen tool that can automatically generate stub files (.pyi files) for Python modules and C extension modules." I tried stubgen -m modulename, but it generates very little code. -- https://mail.python.org/mailman/listinfo/python-list
How to generate a .pyi file for a C Extension using stubgen
I tried to follow the instructions here: https://mypy.readthedocs.io/en/stable/stubgen.html but the instructions about creating a stub for a C Extension are a little mysterious. I tried to use it on the .so file without luck. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
On Fri, 22 Jul 2022 at 09:00, Barry wrote: > With code as complex as python’s there will be memory allocations that occur that will not be directly related to the python code you test. > > To put it another way there is noise in your memory allocation signal. > > Usually the signal of a memory leak is very clear, as you noticed. > > For rare leaks I would use a tool like valgrind. Thank you all, but I needed a simple decorator to automatize the memory leak (and segfault) tests. I think that this version is good enough, I hope that can be useful to someone: def trace(iterations=100): def decorator(func): def wrapper(): print( f"Loops: {iterations} - Evaluating: {func.__name__}", flush=True ) tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(iterations): func() gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: if stat.count_diff * 100 > iterations: raise ValueError(f"stat: {stat}") return wrapper return decorator If the decorated function fails, you can try to raise the iterations parameter. I found that in my cases sometimes I needed a value of 200 or 300 -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
I've done this other simple test: #!/usr/bin/env python3 import tracemalloc import gc import pickle tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(1000): pickle.dumps(iter([])) gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: print(stat) The result is: /home/marco/sources/test.py:14: size=3339 B (+3339 B), count=63 (+63), average=53 B /home/marco/sources/test.py:9: size=464 B (+464 B), count=1 (+1), average=464 B /home/marco/sources/test.py:10: size=456 B (+456 B), count=1 (+1), average=456 B /home/marco/sources/test.py:13: size=28 B (+28 B), count=1 (+1), average=28 B It seems that, after 10 million loops, only 63 have a leak, with only ~3 KB. It seems to me that we can't call it a leak, no? Probably pickle needs a lot more cycles to be sure there's actually a real leakage. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
This naif code shows no leak: import resource import pickle c = 0 while True: pickle.dumps(iter([])) if (c % 1) == 0: max_rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss print(f"iteration: {c}, max rss: {max_rss} kb") c += 1 -- https://mail.python.org/mailman/listinfo/python-list
Re: Why I fail so bad to check for memory leak with this code?
On Thu, 21 Jul 2022 at 22:28, MRAB wrote: > > It's something to do with pickling iterators because it still occurs > when I reduce func_76 to: > > @trace > def func_76(): > pickle.dumps(iter([])) It's too strange. I found a bunch of true memory leaks with this decorator. It seems to be reliable. It's correct with pickle and with iter, but not when pickling iters. -- https://mail.python.org/mailman/listinfo/python-list
Why I fail so bad to check for memory leak with this code?
I tried to check for memory leaks in a bunch of functions of mine using a simple decorator. It works, but it fails with this code, returning a random count_diff at every run. Why? import tracemalloc import gc import functools from uuid import uuid4 import pickle def getUuid(): return str(uuid4()) def trace(func): @functools.wraps(func) def inner(): tracemalloc.start() snapshot1 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) for i in range(100): func() gc.collect() snapshot2 = tracemalloc.take_snapshot().filter_traces( (tracemalloc.Filter(True, __file__), ) ) top_stats = snapshot2.compare_to(snapshot1, 'lineno') tracemalloc.stop() for stat in top_stats: if stat.count_diff > 3: raise ValueError(f"count_diff: {stat.count_diff}") return inner dict_1 = {getUuid(): i for i in range(1000)} @trace def func_76(): pickle.dumps(iter(dict_1)) func_76() -- https://mail.python.org/mailman/listinfo/python-list
Re: Subtract n months from datetime
The package arrow has a simple shift method for months, weeks etc https://arrow.readthedocs.io/en/latest/#replace-shift -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Wed, 18 May 2022 at 23:32, Cameron Simpson wrote: > > On 17May2022 22:45, Marco Sulla wrote: > >Well, I've done a benchmark. > >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, > >>>> number=10) > >1.5963431186974049 > >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, > >>>> number=10) > >2.5240604374557734 > >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", > >>>> globals={"tail":tail}, number=10) > >1.8944984432309866 > > This suggests that the file size does not dominate uour runtime. Yes, this is what I wanted to test and it seems good. > Ah. > _Or_ that there are similar numbers of newlines vs text in the files so > reading similar amounts of data from the end. If the "line desnity" of > the files were similar you would hope that the runtimes would be > similar. No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum, so really long lines. Indeed I get better results tuning chunk_size. Anyway, also with the default value the performance is not bad at all. > >But the time of Linux tail surprise me: > > > >marco@buzz:~$ time tail lorem.txt > >[text] > > > >real0m0.004s > >user0m0.003s > >sys0m0.001s > > > >It's strange that it's so slow. I thought it was because it decodes > >and print the result, but I timed > > You're measuring different things. timeit() tries hard to measure just > the code snippet you provide. It doesn't measure the startup cost of the > whole python interpreter. Try: > > time python3 your-tail-prog.py /home/marco/lorem.txt Well, I'll try it, but it's not a bit unfair to compare Python startup with C? > BTW, does your `tail()` print output? If not, again not measuring the > same thing. > [...] > Also: does tail(1) do character set / encoding stuff? Does your Python > code do that? Might be apples and oranges. Well, as I wrote I also timed timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))", globals={"tail":tail}, number=10) and I got ~36 seconds. > If you have the source of tail(1) to hand, consider getting to the core > and measuring `time()` immediately before and immediately after the > central tail operation and printing the result. IMHO this is a very good idea, but I have to find the time(). Ahah. Emh. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Well, I've done a benchmark. >>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, >>> number=10) 1.5963431186974049 >>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, >>> number=10) 2.5240604374557734 >>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", >>> globals={"tail":tail}, number=10) 1.8944984432309866 small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2 GB. It seems the performance is good, thanks to the chunk suggestion. But the time of Linux tail surprise me: marco@buzz:~$ time tail lorem.txt [text] real0m0.004s user0m0.003s sys0m0.001s It's strange that it's so slow. I thought it was because it decodes and print the result, but I timed timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))", globals={"tail":tail}, number=10) and I got ~36 seconds. It seems quite strange to me. Maybe I got the benchmarks wrong at some point? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Fri, 13 May 2022 at 12:49, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2022-05-13 at 12:16:57 +0200, > Marco Sulla wrote: > > > On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > > [...] > > > > This is nearly the worst "specification" I have ever seen. > > > You're lucky. I've seen much worse (or no one). > > At least with *no* documentation, the source code stands for itself. So I did it well to not put one in the first time. I think that after 100 posts about tail, chunks etc it was clear what that stuff was about and how to use it. Speaking about more serious things, so far I've done a test with: * a file that does not end with \n * a file that ends with \n (after Stefan test) * a file with more than 10 lines * a file with less than 10 lines It seemed to work. I've only to benchmark it. I suppose I have to test with at least 1 GB file, a big lorem ipsum, and do an unequal comparison with Linux tail. I'll do it when I have time, so Chris will be no more angry with me. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > On 12May2022 19:48, Marco Sulla wrote: > >On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > >> There's no spec/doc, so one can't even test it. > > > >Excuse me, you're very right. > > > >""" > >A function that "tails" the file. If you don't know what that means, > >google "man tail" > > > >filepath: the file path of the file to be "tailed" > >n: the numbers of lines "tailed" > >chunk_size: oh don't care, use it as is > > This is nearly the worst "specification" I have ever seen. > You're lucky. I've seen much worse (or no one). -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Thank you very much. This helped me to improve the function: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueError(_err_n) if (n % 1 != 0): raise ValueError(_err_n) if (chunk_size <= 0): raise ValueError(_err_chunk_size) if (chunk_size % 1 != 0): raise ValueError(_err_chunk_size) n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 newlines_to_find = n first_step = True with open(filepath, "rb") as f: text = bytearray() while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) text[0:0] = chars search_pos = n_chunk_size while search_pos != -1: chunk_line_pos = chars.rfind(_lf, 0, search_pos) if first_step and chunk_line_pos == search_pos - 1: newlines_to_find += 1 first_step = False if chunk_line_pos != -1: newlines_to_find -= 1 if newlines_to_find == 0: break search_pos = chunk_line_pos if newlines_to_find == 0: break return bytes(text[chunk_line_pos+1:]) On Thu, 12 May 2022 at 20:29, Stefan Ram wrote: > I am not aware of a definition of "line" above, > but the PLR says: > > |A physical line is a sequence of characters terminated > |by an end-of-line sequence. > > . So 10 lines should have 10 end-of-line sequences. > Maybe. Maybe not. What if the file ends with no newline? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > > Marco Sulla writes: > >def tail(filepath, n=10, chunk_size=100): > >if (n <= 0): > >raise ValueError(_err_n) > ... > > There's no spec/doc, so one can't even test it. Excuse me, you're very right. """ A function that "tails" the file. If you don't know what that means, google "man tail" filepath: the file path of the file to be "tailed" n: the numbers of lines "tailed" chunk_size: oh don't care, use it as is """ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Wed, 11 May 2022 at 22:09, Chris Angelico wrote: > > Have you actually checked those three, or do you merely suppose them to be > true? I only suppose, as I said. I should do some benchmark and some other tests, and, frankly, I don't want to. I don't want to because I'm quite sure the implementation is fast, since it reads by chunks and cache them. I'm not sure it's 100% free of bugs, but the concept is very simple, since it simply mimics the *nix tail, so it should be reliable. > > > I'd very much like to see a CPython implementation of that function. It > > could be a method of a file object opened in binary mode, and *only* in > > binary mode. > > > > What do you think about it? > > Still not necessary. You can simply have it in your own toolkit. Why > should it be part of the core language? Why not? > How much benefit would it be > to anyone else? I suppose that every programmer, at least one time in its life, did a tail. > All the same assumptions are still there, so it still > isn't general It's general. It mimics the *nix tail. I can't think of a more general way to implement a tail. > I don't understand why this wants to be in the standard library. Well, the answer is really simple: I needed it and if I found it in the stdlib, I used it instead of writing the first horrible function. Furthermore, tail is such a useful tool that I suppose many others are interested, based on this quick Google search: https://www.google.com/search?q=python+tail A question on Stackoverflow really much voted, many other Stackoverflow questions, a package that seems to exactly do the same thing, that is mimic *nix tail, and a blog post about how to tail in Python. Furthermore, if you search python tail pypi, you can find a bunch of other packages: https://www.google.com/search?q=python+tail+pypi It seems the subject is quite popular, and I can't imagine otherwise. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber wrote: > > On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla > declaimed the following: > > >Nevertheless, tail is a fundamental tool in *nix. It's fast and > >reliable. Also the tail command can't handle different encodings? > > Based upon > https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY > thing tail looks at is single byte "\n". It does not handle other line > endings, and appears to performs BINARY I/O, not text I/O. It does nothing > for bytes that are not "\n". Split multi-byte encodings are irrelevant > since, if it does not find enough "\n" bytes in the buffer (chunk) it reads > another binary chunk and seeks for additional "\n" bytes. Once it finds the > desired amount, it is synchronized on the byte following the "\n" (which, > for multi-byte encodings might be a NUL, but in any event, should be a safe > location for subsequent I/O). > > Interpretation of encoding appears to fall to the console driver > configuration when displaying the bytes output by tail. Ok, I understand. This should be a Python implementation of *nix tail: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueError(_err_n) if (n % 1 != 0): raise ValueError(_err_n) if (chunk_size <= 0): raise ValueError(_err_chunk_size) if (chunk_size % 1 != 0): raise ValueError(_err_chunk_size) n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath, "rb") as f: text = bytearray() while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) text[0:0] = chars search_pos = n_chunk_size while search_pos != -1: chunk_line_pos = chars.rfind(_lf, 0, search_pos) if chunk_line_pos != -1: lines_not_found -= 1 if lines_not_found == 0: break search_pos = chunk_line_pos if lines_not_found == 0: break return bytes(text[chunk_line_pos+1:]) The function opens the file in binary mode and searches only for b"\n". It returns the last n lines of the file as bytes. I suppose this function is fast. It reads the bytes from the file in chunks and stores them in a bytearray, prepending them to it. The final result is read from the bytearray and converted to bytes (to be consistent with the read method). I suppose the function is reliable. File is opened in binary mode and only b"\n" is searched as line end, as *nix tail (and python readline in binary mode) do. And bytes are returned. The caller can use them as is or convert them to a string using the encoding it wants, or do whatever its imagination can think :) Finally, it seems to me the function is quite simple. If all my affirmations are true, the three obstacles written by Chris should be passed. I'd very much like to see a CPython implementation of that function. It could be a method of a file object opened in binary mode, and *only* in binary mode. What do you think about it? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > On Tue, 10 May 2022 at 03:47, Marco Sulla > wrote: > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > The point here is that text is a very different thing. Because you > > > cannot seek to an absolute number of characters in an encoding with > > > variable sized characters. _If_ you did a seek to an arbitrary number > > > you can end up in the middle of some character. And there are encodings > > > where you cannot inspect the data to find a character boundary in the > > > byte stream. > > > > Ooook, now I understand what you and Barry mean. I suppose there's no > > reliable way to tail a big file opened in text mode with a decent > > performance. > > > > Anyway, the previous-previous function I posted worked only for files > > opened in binary mode, and I suppose it's reliable, since it searches > > only for b"\n", as readline() in binary mode do. > > It's still fundamentally impossible to solve this in a general way, so > the best way to do things will always be to code for *your* specific > use-case. That means that this doesn't belong in the stdlib or core > language, but in your own toolkit. Nevertheless, tail is a fundamental tool in *nix. It's fast and reliable. Also the tail command can't handle different encodings? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > The point here is that text is a very different thing. Because you > cannot seek to an absolute number of characters in an encoding with > variable sized characters. _If_ you did a seek to an arbitrary number > you can end up in the middle of some character. And there are encodings > where you cannot inspect the data to find a character boundary in the > byte stream. Ooook, now I understand what you and Barry mean. I suppose there's no reliable way to tail a big file opened in text mode with a decent performance. Anyway, the previous-previous function I posted worked only for files opened in binary mode, and I suppose it's reliable, since it searches only for b"\n", as readline() in binary mode do. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 22:34, Barry wrote: > > > On 8 May 2022, at 20:48, Marco Sulla wrote: > > > > On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > >> > >>>> On 8 May 2022, at 17:05, Marco Sulla > >>>> wrote: > >>> > >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): > >>> n_chunk_size = n * chunk_size > >> > >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its > >> typically the smaller size the file system will allocate. > >> I tend to read on multiple of MiB as its near instant. > > > > Well, I tested on a little file, a list of my preferred pizzas, so > > Try it on a very big file. I'm not saying it's a good idea, it's only the value that I needed for my tests. Anyway, it's not a problem with big files. The problem is with files with long lines. > >> In text mode you can only seek to a value return from f.tell() otherwise > >> the behaviour is undefined. > > > > Why? I don't see any recommendation about it in the docs: > > https://docs.python.org/3/library/io.html#io.IOBase.seek > > What does adding 1 to a pos mean? > If it’s binary it mean 1 byte further down the file but in text mode it may > need to > move the point 1, 2 or 3 bytes down the file. Emh. I re-quote seek(offset, whence=SEEK_SET) Change the stream position to the given byte offset. And so on. No mention of differences between text and binary mode. > >> You have on limit on the amount of data read. > > > > I explained that previously. Anyway, chunk_size is small, so it's not > > a great problem. > > Typo I meant you have no limit. > > You read all the data till the end of the file that might be mega bytes of > data. Yes, I already explained why and how it could be optimized. I quote myself: Shortly, the file is always opened in text mode. File is read at the end in bigger and bigger chunks, until the file is finished or all the lines are found. Why? Because in encodings that have more than 1 byte per character, reading a chunk of n bytes, then reading the previous chunk, can eventually split the character between the chunks in two distinct bytes. I think one can read chunk by chunk and test the chunk junction problem. I suppose the code will be faster this way. Anyway, it seems that this trick is quite fast anyway and it's a lot simpler. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 22:02, Chris Angelico wrote: > > Absolutely not. As has been stated multiple times in this thread, a > fully general approach is extremely complicated, horrifically > unreliable, and hopelessly inefficient. Well, my implementation is quite general now. It's not complicated and inefficient. About reliability, I can't say anything without a test case. > The ONLY way to make this sort > of thing any good whatsoever is to know your own use-case and code to > exactly that. Given the size of files you're working with, for > instance, a simple approach of just reading the whole file would make > far more sense than the complex seeking you're doing. For reading a > multi-gigabyte file, the choices will be different. Apart from the fact that it's very, very simple to optimize for small files: this is, IMHO, a premature optimization. The code is quite fast even if the file is small. Can it be faster? Of course, but it depends on the use case. Every optimization in CPython must pass the benchmark suite test. If there's little or no gain, the optimization is usually rejected. > No, this does NOT belong in the core language. I respect your opinion, but IMHO you think that the task is more complicated than the reality. It seems to me that the method can be quite simple and fast. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > > > On 8 May 2022, at 17:05, Marco Sulla wrote: > > > > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): > >n_chunk_size = n * chunk_size > > Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically > the smaller size the file system will allocate. > I tend to read on multiple of MiB as its near instant. Well, I tested on a little file, a list of my preferred pizzas, so > >pos = os.stat(filepath).st_size > > You cannot mix POSIX API with text mode. > pos is in bytes from the start of the file. > Textmode will be in code points. bytes != code points. > > >chunk_line_pos = -1 > >lines_not_found = n > > > >with open(filepath, newline=newline, encoding=encoding) as f: > >text = "" > > > >hard_mode = False > > > >if newline == None: > >newline = _lf > >elif newline == "": > >hard_mode = True > > > >if hard_mode: > >while pos != 0: > >pos -= n_chunk_size > > > >if pos < 0: > >pos = 0 > > > >f.seek(pos) > > In text mode you can only seek to a value return from f.tell() otherwise the > behaviour is undefined. Why? I don't see any recommendation about it in the docs: https://docs.python.org/3/library/io.html#io.IOBase.seek > >text = f.read() > > You have on limit on the amount of data read. I explained that previously. Anyway, chunk_size is small, so it's not a great problem. > >lf_after = False > > > >for i, char in enumerate(reversed(text)): > > Simple use text.rindex('\n') or text.rfind('\n') for speed. I can't use them when I have to find both \n or \r. So I preferred to simplify the code and use the for cycle every time. Take into mind anyway that this is a prototype for a Python C Api implementation (builtin I hope, or a C extension if not) > > Shortly, the file is always opened in text mode. File is read at the end in > > bigger and bigger chunks, until the file is finished or all the lines are > > found. > > It will fail if the contents is not ASCII. Why? > > Why? Because in encodings that have more than 1 byte per character, reading > > a chunk of n bytes, then reading the previous chunk, can eventually split > > the character between the chunks in two distinct bytes. > > No it cannot. text mode only knows how to return code points. Now if you are > in > binary it could be split, but you are not in binary mode so it cannot. >From the docs: seek(offset, whence=SEEK_SET) Change the stream position to the given byte offset. > > Do you think there are chances to get this function as a method of the file > > object in CPython? The method for a file object opened in bytes mode is > > simpler, since there's no encoding and newline is only \n in that case. > > State your requirements. Then see if your implementation meets them. The method should return the last n lines from a file object. If the file object is in text mode, the newline parameter must be honored. If the file object is in binary mode, a newline is always b"\n", to be consistent with readline. I suppose the current implementation of tail satisfies the requirements for text mode. The previous one satisfied binary mode. Anyway, apart from my implementation, I'm curious if you think a tail method is worth it to be a method of the builtin file objects in CPython. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
I think I've _almost_ found a simpler, general way: import os _lf = "\n" _cr = "\r" def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath, newline=newline, encoding=encoding) as f: text = "" hard_mode = False if newline == None: newline = _lf elif newline == "": hard_mode = True if hard_mode: while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) text = f.read() lf_after = False for i, char in enumerate(reversed(text)): if char == _lf: lf_after == True elif char == _cr: lines_not_found -= 1 newline_size = 2 if lf_after else 1 lf_after = False elif lf_after: lines_not_found -= 1 newline_size = 1 lf_after = False if lines_not_found == 0: chunk_line_pos = len(text) - 1 - i + newline_size break if lines_not_found == 0: break else: while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) text = f.read() for i, char in enumerate(reversed(text)): if char == newline: lines_not_found -= 1 if lines_not_found == 0: chunk_line_pos = len(text) - 1 - i + len(newline) break if lines_not_found == 0: break if chunk_line_pos == -1: chunk_line_pos = 0 return text[chunk_line_pos:] Shortly, the file is always opened in text mode. File is read at the end in bigger and bigger chunks, until the file is finished or all the lines are found. Why? Because in encodings that have more than 1 byte per character, reading a chunk of n bytes, then reading the previous chunk, can eventually split the character between the chunks in two distinct bytes. I think one can read chunk by chunk and test the chunk junction problem. I suppose the code will be faster this way. Anyway, it seems that this trick is quite fast anyway and it's a lot simpler. The final result is read from the chunk, and not from the file, so there's no problems of misalignment of bytes and text. Furthermore, the builtin encoding parameter is used, so this should work with all the encodings (untested). Furthermore, a newline parameter can be specified, as in open(). If it's equal to the empty string, the things are a little more complicated, anyway I suppose the code is clear. It's untested too. I only tested with an utf8 linux file. Do you think there are chances to get this function as a method of the file object in CPython? The method for a file object opened in bytes mode is simpler, since there's no encoding and newline is only \n in that case. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 19:02, MRAB wrote: > > On 2022-05-07 17:28, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry wrote: > >> You need to handle the file in bin mode and do the handling of line > >> endings and encodings yourself. It’s not that hard for the cases you > >> wanted. > > > >>>> "\n".encode("utf-16") > > b'\xff\xfe\n\x00' > >>>> "".encode("utf-16") > > b'\xff\xfe' > >>>> "a\nb".encode("utf-16") > > b'\xff\xfea\x00\n\x00b\x00' > >>>> "\n".encode("utf-16").lstrip("".encode("utf-16")) > > b'\n\x00' > > > > Can I use the last trick to get the encoding of a LF or a CR in any > > encoding? > > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes > could be little-endian or big-endian. > > As you didn't specify which you wanted, it defaulted to little-endian > and added a BOM (U+FEFF). > > If you specify which endianness you want with "utf-16le" or "utf-16be", > it won't add the BOM: > > >>> # Little-endian. > >>> "\n".encode("utf-16le") > b'\n\x00' > >>> # Big-endian. > >>> "\n".encode("utf-16be") > b'\x00\n' Well, ok, but I need a generic method to get LF and CR for any encoding an user can input. Do you think that "\n".encode(encoding).lstrip("".encode(encoding)) is good for any encoding? Furthermore, is there a way to get the encoding of an opened file object? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 16:08, Barry wrote: > You need to handle the file in bin mode and do the handling of line endings > and encodings yourself. It’s not that hard for the cases you wanted. >>> "\n".encode("utf-16") b'\xff\xfe\n\x00' >>> "".encode("utf-16") b'\xff\xfe' >>> "a\nb".encode("utf-16") b'\xff\xfea\x00\n\x00b\x00' >>> "\n".encode("utf-16").lstrip("".encode("utf-16")) b'\n\x00' Can I use the last trick to get the encoding of a LF or a CR in any encoding? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber wrote: > > Windows also uses for the EOL marker, but Python's I/O system > condenses that to just internally (for TEXT mode) -- so using the > length of a string so read to compute a file position may be off-by-one for > each EOL in the string. So there's no way to reliably read lines in reverse in text mode using seek and read, but the only option is readlines? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
I have a little problem. I tried to extend the tail function, so it can read lines from the bottom of a file object opened in text mode. The problem is it does not work. It gets a starting position that is lower than the expected by 3 characters. So the first line is read only for 2 chars, and the last line is missing. import os _lf = "\n" _cr = "\r" _lf_ord = ord(_lf) def tail(f, n=10, chunk_size=100): n_chunk_size = n * chunk_size pos = os.stat(f.fileno()).st_size chunk_line_pos = -1 lines_not_found = n binary_mode = "b" in f.mode lf = _lf_ord if binary_mode else _lf while pos != 0: pos -= n_chunk_size if pos < 0: pos = 0 f.seek(pos) chars = f.read(n_chunk_size) for i, char in enumerate(reversed(chars)): if char == lf: lines_not_found -= 1 if lines_not_found == 0: chunk_line_pos = len(chars) - i - 1 print(chunk_line_pos, i) break if lines_not_found == 0: break line_pos = pos + chunk_line_pos + 1 f.seek(line_pos) res = b"" if binary_mode else "" for i in range(n): res += f.readline() return res Maybe the problem is 1 char != 1 byte? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 2 May 2022 at 00:20, Cameron Simpson wrote: > > On 01May2022 18:55, Marco Sulla wrote: > >Something like this is OK? > [...] > >def tail(f): > >chunk_size = 100 > >size = os.stat(f.fileno()).st_size > > I think you want os.fstat(). It's the same from py 3.3 > >chunk_line_pos = -1 > >pos = 0 > > > >for pos in positions: > >f.seek(pos) > >chars = f.read(chunk_size) > >chunk_line_pos = chars.rfind(b"\n") > > > >if chunk_line_pos != -1: > >break > > Normal text file _end_ in a newline. I'd expect this to stop immediately > at the end of the file. I think it's correct. The last line in this case is an empty bytes. > >if chunk_line_pos == -1: > >nbytes = pos > >pos = 0 > >f.seek(pos) > >chars = f.read(nbytes) > >chunk_line_pos = chars.rfind(b"\n") > > I presume this is because unless you're very lucky, 0 will not be a > position in the range(). I'd be inclined to avoid duplicating this code > and special case and instead maybe make the range unbounded and do > something like this: > > if pos < 0: > pos = 0 > ... seek/read/etc ... > if pos == 0: > break > > around the for-loop body. Yes, I was not very happy to duplicate the code... I have to think about it. > Seems sane. I haven't tried to run it. Thank you ^^ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Ok, I suppose \n and \r are enough: readline(size=- 1, /) Read and return one line from the stream. If size is specified, at most size bytes will be read. The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized. open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None) [...] newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n' -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 2 May 2022 at 18:31, Stefan Ram wrote: > > |The Unicode standard defines a number of characters that > |conforming applications should recognize as line terminators:[7] > | > |LF:Line Feed, U+000A > |VT:Vertical Tab, U+000B > |FF:Form Feed, U+000C > |CR:Carriage Return, U+000D > |CR+LF: CR (U+000D) followed by LF (U+000A) > |NEL: Next Line, U+0085 > |LS:Line Separator, U+2028 > |PS:Paragraph Separator, U+2029 > | > Wikipedia "Newline". Should I suppose that other encodings may have more line ending chars? -- https://mail.python.org/mailman/listinfo/python-list
Re: new sorting algorithm
I suppose you should write to python-...@python.org , or in https://discuss.python.org/ under the section Core development -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Something like this is OK? import os def tail(f): chunk_size = 100 size = os.stat(f.fileno()).st_size positions = iter(range(size, -1, -chunk_size)) next(positions) chunk_line_pos = -1 pos = 0 for pos in positions: f.seek(pos) chars = f.read(chunk_size) chunk_line_pos = chars.rfind(b"\n") if chunk_line_pos != -1: break if chunk_line_pos == -1: nbytes = pos pos = 0 f.seek(pos) chars = f.read(nbytes) chunk_line_pos = chars.rfind(b"\n") if chunk_line_pos == -1: line_pos = pos else: line_pos = pos + chunk_line_pos + 1 f.seek(line_pos) return f.readline() This is simply for one line and for utf8. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 24 Apr 2022 at 11:21, Roel Schroeven wrote: > dn schreef op 24/04/2022 om 0:04: > > Disagreeing with @Chris in the sense that I use tail very frequently, > > and usually in the context of server logs - but I'm talking about the > > Linux implementation, not Python code! > If I understand Marco correctly, what he want is to read the lines from > bottom to top, i.e. tac instead of tail, despite his subject. > I use tail very frequently too, but tac is something I almost never use. > Well, the inverse reader is only a secondary suggestion. I suppose a tail is much more useful. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 24 Apr 2022 at 00:19, Cameron Simpson wrote: > An approach I think you both may have missed: mmap the file and use > mmap.rfind(b'\n') to locate line delimiters. > https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind > Ah, I played very little with mmap, I didn't know about this. So I suppose you can locate the newline and at that point read the line without using chunks? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 23:18, Chris Angelico wrote: > Ah. Well, then, THAT is why it's inefficient: you're seeking back one > single byte at a time, then reading forwards. That is NOT going to > play nicely with file systems or buffers. > > Compare reading line by line over the file with readlines() and you'll > see how abysmal this is. > > If you really only need one line (which isn't what your original post > suggested), I would recommend starting with a chunk that is likely to > include a full line, and expanding the chunk until you have that > newline. Much more efficient than one byte at a time. > Well, I would like to have a sort of tail, so to generalise to more than 1 line. But I think that once you have a good algorithm for one line, you can repeat it N times. I understand that you can read a chunk instead of a single byte, so when the newline is found you can return all the cached chunks concatenated. But will this make the search of the start of the line faster? I suppose you have always to read byte by byte (or more, if you're using urf16 etc) and see if there's a newline. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > > > This is quite inefficient in general. > > > > Why inefficient? I think that readlines() will be much slower, not > > only more time consuming. > > It depends on which is more costly: reading the whole file (cost > depends on size of file) or reading chunks and splitting into lines > (cost depends on how well you guess at chunk size). If the lines are > all *precisely* the same number of bytes each, you can pick a chunk > size and step backwards with near-perfect efficiency (it's still > likely to be less efficient than reading a file forwards, on most file > systems, but it'll be close); but if you have to guess, adjust, and > keep going, then you lose efficiency there. Emh, why chunks? My function simply reads byte per byte and compares it to b"\n". When it find it, it stops and do a readline(): def tail(filepath): """ @author Marco Sulla @date May 31, 2016 """ try: filepath.is_file fp = str(filepath) except AttributeError: fp = filepath with open(fp, "rb") as f: size = os.stat(fp).st_size start_pos = 0 if size - 1 < 0 else size - 1 if start_pos != 0: f.seek(start_pos) char = f.read(1) if char == b"\n": start_pos -= 1 f.seek(start_pos) if start_pos == 0: f.seek(start_pos) else: for pos in range(start_pos, -1, -1): f.seek(pos) char = f.read(1) if char == b"\n": break return f.readline() This is only for one line and in utf8, but it can be generalised. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sat, 23 Apr 2022 at 20:59, Chris Angelico wrote: > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > wrote: > > > > What about introducing a method for text streams that reads the lines > > from the bottom? Java has also a ReversedLinesFileReader with Apache > > Commons IO. > > It's fundamentally difficult to get precise. In general, there are > three steps to reading the last N lines of a file: > > 1) Find out the size of the file (currently, if it's being grown) > 2) Seek to the end of the file, minus some threshold that you hope > will contain a number of lines > 3) Read from there to the end of the file, split it into lines, and > keep the last N > > Reading the preceding N lines is basically a matter of repeating the > same exercise, but instead of "end of the file", use the byte position > of the line you last read. > > The problem is, seeking around in a file is done by bytes, not > characters. So if you know for sure that you can resynchronize > (possible with UTF-8, not possible with some other encodings), then > you can do this, but it's probably best to build it yourself (opening > the file in binary mode). Well, indeed I have an implementation that does more or less what you described for utf8 only. The only difference is that I just started from the end of file -1. I'm just wondering if this will be useful in the stdlib. I think it's not too difficult to generalise for every encoding. > This is quite inefficient in general. Why inefficient? I think that readlines() will be much slower, not only more time consuming. -- https://mail.python.org/mailman/listinfo/python-list
Re: Receive a signal when waking or suspending?
I don't know in Python, but maybe you can create a script that writes on a named pipe and read it from Python? https://askubuntu.com/questions/226278/run-script-on-wakeup -- https://mail.python.org/mailman/listinfo/python-list
tail
What about introducing a method for text streams that reads the lines from the bottom? Java has also a ReversedLinesFileReader with Apache Commons IO. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Sat, 16 Apr 2022 at 17:14, Peter J. Holzer wrote: > > On 2022-04-16 16:49:17 +0200, Marco Sulla wrote: > > Furthermore, you didn't answer my simple question: why does the > > security update package contain metadata about Debian patches, if the > > Ubuntu security team did not benefit from Debian security patches but > > only from internal work? > > It DOES NOT contain metadata about Debian patches. You are > misinterpreting the name "debian". The directory has this name because > the tools (dpkg, quilt, etc.) were originally written by the Debian team > for the Debian distribution. Ubuntu uses the same tools. They didn't > bother to rename the directory (why should they?), so the directory is > still called "debian" on Ubuntu (and yes I know this because I've built > numerous .deb packages on Ubuntu systems). Ah ok, now I understand. Sorry for the confusion. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Sat, 16 Apr 2022 at 10:15, Peter J. Holzer wrote: > It doesn't (or at least you can't conclude that from the evidence you > posted). > > There is a subdirectory called "debian" in the build directory of every > .deb package. This is true on Debian, Ubuntu and every other > distribution which uses the .deb package format. This directory is > required by the build tools and it contains all the data (e.g. build > instructions, dependencies, patches, description, extra documentation) > which was added by the packager. The name of the directory does not > imply that any of the files there was created by Debian. I have built > quite a few packages myself and I'm not a member of the Debian team. Actually I don't care if the package was made by Debian. I'm sure that it does not, since the Ubuntu packages have other terminology in versions. For example, the git package is version 2.17.1-1ubuntu0.10 The important fact is that I suppose it's quite evident that the Ubuntu team uses Debian patches to release their security updates, since the release notes are public and worldwide, made by a professional company, they are not made by an amateur. Furthermore I checked all the security updates my system released when we started this discussion, and all of them have release notes that contain information about security patches made by Debian. Only the security updates have these infos. Is it an amazing coincidence? I suppose no. Furthermore, you didn't answer my simple question: why does the security update package contain metadata about Debian patches, if the Ubuntu security team did not benefit from Debian security patches but only from internal work? I suppose I have to answer myself: because the patch applied by Ubuntu _is_ actually a Debian patch. The more interesting fact is that I checked all the security updates and it seems they are only applications of Debian patches. So it seems that the work of the Ubuntu security team is only to apply Debian security patches. If so, probably Debian is really more secure than Ubuntu, since I don't know if all the security patches made by Debian are applied. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?
On Thu, 14 Apr 2022 at 19:16, MRAB wrote: > > When you're working only with dates, timedelta not having a 'days' > attribute would be annoying, especially when you consider that a day is > usually 24 hours, but sometimes 23 or 25 hours (DST). I agree. Furthermore, timedelta is, well, a time delta, not a date with a timezone. How could a timedelta take into account DST, leap seconds etc? About the initial question, I think it's a good question. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Wed, 13 Apr 2022 at 20:05, Peter J. Holzer wrote: > > On 2022-04-12 21:03:00 +0200, Marco Sulla wrote: > > On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer wrote: > > > They are are about a year apart, so they will usually contain different > > > versions of most packages right from the start. So the Ubuntu and Debian > > > security teams probably can't benefit much from each other. > > > > Well, this is what my updater on Lubuntu says to me today: > > > > Changes for tcpdump versions: > > Installed version: 4.9.3-0ubuntu0.18.04.1 > > Available version: 4.9.3-0ubuntu0.18.04.2 > > > > Version 4.9.3-0ubuntu0.18.04.2: > > > > * SECURITY UPDATE: buffer overflow in read_infile > > - debian/patches/CVE-2018-16301.patch: Add check of > > file size before allocating and reading content in > > tcpdump.c and netdissect-stdinc.h. > > - CVE-2018-16301 > > * SECURITY UPDATE: resource exhaustion with big packets > > - debian/patches/CVE-2020-8037.patch: Add a limit to the > > amount of space that can be allocated when reading the > > packet. > > - CVE-2020-8037 > > > > I use an LTS version. So it seems that Ubuntu benefits from Debian > > security patches. > > Why do you think so? Because the release notes mention debian/patches/*.patch? Of course. > This may be an artefact of the build process. The build tools for .deb > packages expect all kinds of meta-data to live in a subdirectory called > "debian", even on non-debian systems. This includes patches, at least if > the maintainer is using quilt (which AFAIK is currently the recommended > tool for that purpose). And why does the security update package contain metadata about Debian patches, if the Ubuntu security team did not benefit from Debian security patches but only from internal work? > OTOH tcpdump would be one of the those packages where Ubuntu could use a > Debian patch directly [...] It doesn't seem so. This is a fresh new security update: Changes for git versions: Installed version: 1:2.17.1-1ubuntu0.9 Available version: 1:2.17.1-1ubuntu0.10 Version 1:2.17.1-1ubuntu0.10: * SECURITY UPDATE: Run commands in diff users - debian/patches/CVE-2022-24765-*.patch: fix GIT_CEILING_DIRECTORIES; add an owner check for the top-level-directory; add a function to determine whether a path is owned by the current user in patch.c, t/t0060-path-utils.sh, setup.c, compat/mingw.c, compat/mingw.h, git-compat-util.hi, config.c, config.h. - CVE-2022-24765 I checked packages.debian.org and git 2.17 was never on Debian: Package git stretch (oldoldstable) (vcs): fast, scalable, distributed revision control system 1:2.11.0-3+deb9u7: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x stretch-backports (vcs): fast, scalable, distributed revision control system 1:2.20.1-1~bpo9+1: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x buster (oldstable) (vcs): fast, scalable, distributed revision control system 1:2.20.1-2+deb10u3: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x etc. https://packages.debian.org/search?keywords=git -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer wrote: > They are are about a year apart, so they will usually contain different > versions of most packages right from the start. So the Ubuntu and Debian > security teams probably can't benefit much from each other. Well, this is what my updater on Lubuntu says to me today: Changes for tcpdump versions: Installed version: 4.9.3-0ubuntu0.18.04.1 Available version: 4.9.3-0ubuntu0.18.04.2 Version 4.9.3-0ubuntu0.18.04.2: * SECURITY UPDATE: buffer overflow in read_infile - debian/patches/CVE-2018-16301.patch: Add check of file size before allocating and reading content in tcpdump.c and netdissect-stdinc.h. - CVE-2018-16301 * SECURITY UPDATE: resource exhaustion with big packets - debian/patches/CVE-2020-8037.patch: Add a limit to the amount of space that can be allocated when reading the packet. - CVE-2020-8037 I use an LTS version. So it seems that Ubuntu benefits from Debian security patches. Not sure about the contrary. -- https://mail.python.org/mailman/listinfo/python-list
Re: dict.get_deep()
On Sun, 3 Apr 2022 at 21:46, Peter J. Holzer wrote: > > > > data.get_deep("users", 0, "address", "street", default="second star") > > Yep. Did that, too. Plus pass the final result through a function before > returning it. I didn't understand. Have you added a func parameter? > I'm not sure whether I considered this when I wrote it, but a function > has the advantage of working with every class which can be indexed. A > method must be implemented on any class (so at least dict and list to be > useful). You're right, but where to put it? I don't know if an iterableutil package exists. If included in the stdlib, I don't know where to put it. In collections maybe? PS: if you're interested, here is my implementation: def get_deep(self, *args, default=_sentinel): r""" Get a nested element of the dictionary. The method accepts multiple arguments or a single one. If a single argument is passed, it must be an iterable. This represents the keys or indexes of the nested element. The method first tries to get the value v1 of the dict using the first key. If it finds v1 and there's no other key, v1 is returned. Otherwise, the method tries to retrieve the value from v1 associated with the second key/index, and so on. If in any point, for any reason, the value can't be retrieved, the `default` parameter is returned if specified. Otherwise, a KeyError or an IndexError is raised. """ if len(args) == 1: single = True it_tpm = args[0] try: len(it_tpm) it = it_tpm except Exception: # maybe it's a generator try: it = tuple(it_tpm) except Exception: err = ( f"`{self.get_deep.__name__}` called with a single " + "argument supports only iterables" ) raise TypeError(err) from None else: it = args single = False if not it: if single: raise ValueError( f"`{self.get_deep.__name__}` argument is empty" ) else: raise TypeError( f"`{self.get_deep.__name__}` expects at least one argument" ) obj = self for k in it: try: obj = obj[k] except (KeyError, IndexError) as e: if default is _sentinel: raise e from None return default return obj -- https://mail.python.org/mailman/listinfo/python-list
Re: dict.get_deep()
On Sun, 3 Apr 2022 at 18:57, Dieter Maurer wrote: > You know you can easily implement this yourself -- in your own > `dict` subclass. Well, of course, but the question is if such a method is worth to be builtin, in a world imbued with JSON. I suppose your answer is no. -- https://mail.python.org/mailman/listinfo/python-list
Re: dict.get_deep()
On Sun, 3 Apr 2022 at 16:59, Kirill Ratkin via Python-list wrote: > > Hi Marco. > > Recently I met same issue. A service I intergated with was documented > badly and sent ... unpredictable jsons. > > And pattern matching helped me in first solution. (later I switched to > Pydantic models) > > For your example I'd make match rule for key path you need. For example: > > > data = {"users": [{"address": {"street": "Baker"}}]} > > match data: > case {"users": [{"address": {"street": street}}]}: > print(f"street: {street}") > > case _: > print("unsupported message structure") Hi. I think your solution is very brilliant, but I'm a bit allergic to pattern matching... :D Maybe it's me, but I found it really strange and "magical". -- https://mail.python.org/mailman/listinfo/python-list
dict.get_deep()
A proposal. Very often dict are used as a deeply nested carrier of data, usually decoded from JSON. Sometimes I needed to get some of this data, something like this: data["users"][0]["address"]["street"] What about something like this instead? data.get_deep("users", 0, "address", "street") and also, instead of this try: result = data["users"][0]["address"]["street"] except KeyError, IndexError: result = "second star" write this: data.get_deep("users", 0, "address", "street", default="second star") ? -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Thu, 31 Mar 2022 at 18:38, Cecil Westerhof via Python-list wrote: > Most people think that > Ubuntu is that also, because it is based on Debian. But Ubuntu wants > also provide the newest versions of software and this will affect the > stability and security negatively. I think you're referring to the fact that Ubuntu releases a new stable version every 6 months, while Debian every 2 years. This is true, but Ubuntu also releases a LTS every 2 years. You can install a LTS and change the options so you'll update the system only where a new LTS is coming out. Furthermore you're not forced to upgrade, you can do it when the LTS comes to the end. On the other hand, you can live on the edge with Debian too. You can install an unstable branch. Furthermore, there's the company factor. According to Google, Debian has about 1k devs, while Ubuntu only about 250. But these devs work full time on Ubuntu and they are paid for. Not sure this is not an important point. For what I know, historically the distros with the reputation to be more stable are distros maintained by companies, Red Hat and Gentoo for example. About stability and security, I can't disagree. But I suppose the people that use the unstable version of some Linux distro are useful for testing and reporting bugs, also security one. So they contribute to the stable versions, and I think we have to be grateful to these "pioneers". -- https://mail.python.org/mailman/listinfo/python-list
Re: Temporally disabling buffering
Dirty suggestion: stderr? On Thu, 31 Mar 2022 at 18:38, Cecil Westerhof via Python-list wrote: > > In Python when the output of a script is going to a pipe stdout is > buffered. When sending output to tee that is very inconvenient. > > We can set PYTHONUNBUFFERED, but then stdout is always unbuffered. > > On Linux we can do: > PYTHONUNBUFFERED=T script.py | tee script.log > > Now the output is only unbuffered for the current run and buffered for > other runs where the output goes to a pipe. > > -- > Cecil Westerhof > Senior Software Engineer > LinkedIn: http://www.linkedin.com/in/cecilwesterhof > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Tue, 29 Mar 2022 at 00:10, Peter J. Holzer wrote: > They are are about a year apart, so they will usually contain different > versions of most packages right from the start. So the Ubuntu and Debian > security teams probably can't benefit much from each other. Are you sure? Since LTS of Debian and Ubuntu lasts 5 years, I suppose the versions of the packages should overlap at some point in the past. -- https://mail.python.org/mailman/listinfo/python-list
Re: Best practice for caching hash
On Wed, 16 Mar 2022 at 09:11, Chris Angelico wrote: > Caching the hash of a > string is very useful; caching the hash of a tuple, not so much; again > quoting from the CPython source code: > > /* Tests have shown that it's not worth to cache the hash value, see >https://bugs.python.org/issue9685 */ This is really interesting. Unluckily I can't use the pyperformance benchmarks. I should use the code that uses frozendict, but I suppose it's really hard... Anyway this discourages me to continue to store unashable value, since I should also store the error message. Storing only the hash when the object is hashable is much cheaper, and maybe the extra field is not so much a problem, since dict consumes more space than a tuple: >>> sys.getsizeof({}) 64 >>> sys.getsizeof(()) 40 >>> sys.getsizeof({1:1}) 232 >>> sys.getsizeof((1,)) 48 > I don't know what use-cases frozendicts have, but I would > suspect that if they are used at all, they'll often be used in cases > where their keys are identical (for instance, the __dict__ of an > immutable object type, where the keys are extremely stable across > objects of the same type). Well, I tried to implement them as dicts with shared keys, but I abandoned it when Inada optimized dict(another_dict), where another_dict is a compact dict. Since it's compact, you have "only" to memcopy the entries (oversimplification). I tried to do the same trick for the sparse dict structure, but memcopy the keys and the values was not enough. I had to incref all value *two* times and this slowed down the creation a lot. So I decided to move to compact structure. -- https://mail.python.org/mailman/listinfo/python-list
Re: Best practice for caching hash
On Wed, 16 Mar 2022 at 00:59, Chris Angelico wrote: > > (Though it's a little confusing; a frozendict has to have nothing but > immutable objects, yet it permits them to be unhashable? It can have mutable objects. For example, a key k can have a list v as value. You can modify v, but you can't assign to the key k another value w. It's the same with the tuples, as you said. An index i can contain a list l. Since it's a tuple, you can't set another object at the index i, but you can modify the list l. -- https://mail.python.org/mailman/listinfo/python-list
Re: Best practice for caching hash
On Wed, 16 Mar 2022 at 00:42, Cameron Simpson wrote: > > Is it sensible to compute the hash only from the immutable parts? > Bearing in mind that usually you need an equality function as well and > it may have the same stability issues. [...] > In that case I would be inclined to never raise TypeError at all. I'd > compute the hash entirely from the keys of the dict and compute equality > in the normal fashion: identical keys and then equal corresponding > values. That removes the requirement that values be immutable and/or > hashable. Well, I followed PEP 416, so I allowed immutable types for values, as tuple does. Also tuple is hashable only if all its values are hashable. The equality function is the same as dict, with a little modification. I do not check for hash in equality. I could add that, if both the hash are calculated and different from -1 and they differ, False is returned. > >In this case I currently cache the value -1. The subsequent calls to > >__hash__() will check if the value is -1. If so, a TypeError is > >immediately raised. > > This will also make these values behave badly in dicts/sets, as they all > hash to the same bucket. Not sure to understand. If the hash is -1, it's not hashable, so it can't be a member of a dict or set. > You could, you know, cache the original exception. I thought about it :) What prevented me is that is another PySsize_t to store in memory -- https://mail.python.org/mailman/listinfo/python-list
Re: Best practice for caching hash
On Sat, 12 Mar 2022 at 22:37, <2qdxy4rzwzuui...@potatochowder.com> wrote: > Once hashing an object fails, why would an application try again? I can > see an application using a hashable value in a hashable situation again > and again and again (i.e., taking advantage of the cache), but what's > the use case for *repeatedly* trying to use an unhashable value again > and again and again (i.e., taking advantage of a cached failure)? Honestly? Don't know. Maybe because the object is passed to different functions and all of them independently test the hashability? I'm clutching at straws. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Mon, 14 Mar 2022 at 18:33, Loris Bennett wrote: > I am not sure how different the two situations are. Ubuntu is > presumably relying on the Debian security team as well as other > volunteers and at least one company, namely Canonical. So do you think that Canonical contributes to the LTS security team of Debian? It could be. In this perspective, there should be little difference between Debian and Ubuntu. Debian 11 with XFCE is really tempting... -- https://mail.python.org/mailman/listinfo/python-list
Best practice for caching hash
I have a custom immutable object, and I added a cache for its hash value. The problem is the object can be composed of mutable or immutable objects, so the hash can raise TypeError. In this case I currently cache the value -1. The subsequent calls to __hash__() will check if the value is -1. If so, a TypeError is immediately raised. The problem is the first time I get an error with details, for example: TypeError: unhashable type: 'list' The subsequent times I simply raise a generic error: TypeError Ok, I can improve it by raising, for example, TypeError: not all values are hashable. But do you think this is acceptable? Now I'm thinking about it and it seems a little hacky to me. Furthermore, in the C extension I have to define another property in the struct, ma_hash_calculated, to track if the hash value is cached or not, since there's no bogus value I can use in cache property, ma_hash, to signal this. If I don't cache unhashable values, -1 can be used to signal that ma_hash contains no cached value. So if I do not cache if the object is unhashable, I save a little memory per object (1 int) and I get a better error message every time. On the other hand, if I leave the things as they are, testing the unhashability of the object multiple times is faster. The code: try: hash(o) except TypeError: pass execute in nanoseconds, if called more than 1 time, even if o is not hashable. Not sure if this is a big advantage. What do you think about? Here is the python code: https://github.com/Marco-Sulla/python-frozendict/blob/35611f4cd869383678104dc94f82aa636c20eb24/frozendict/src/3_10/frozendictobject.c#L652-L697 -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Fri, 11 Mar 2022 at 19:10, Michael Torrie wrote: > Both Debian stable and Ubuntu LTS state they have a five year support > life cycle. Yes, but it seems that official security support in Debian ends after three years: "Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success" https://wiki.debian.org/LTS This is the only problem for me. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Fri, 11 Mar 2022 at 06:38, Dan Stromberg wrote: > That's an attribute of your desktop environment, not the Linux distribution. > > EG: I'm using Debian with Cinnamon, which does support ctrl-alt-t. Never used Cinnamon. It comes from Mint, right? > Some folks say the desktop environment matters more than the distribution, > when choosing what OS to install. Yes, it's important. I switched from Ubuntu to Xubuntu (then Lubuntu) when Ubuntu started using Unity. I liked GNOME 2 and KDE prior to Plasma. They were simple, lightweight and effective. I found these qualities in XFCE and LXDE. Anyway I think I'll not install Debian, because it's LTS releases are not long enough for me. I don't know if there's a distro based on Debian that has a long LTS support, Ubuntu apart. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Thu, 10 Mar 2022 at 14:13, Jack Dangler wrote: > or why not get a cloud desktop running whatever distro you want and you > don't have to do anything Three reasons: privacy, speed, price. Not in this order. On Thu, 10 Mar 2022 at 15:20, Chris Angelico wrote: > Very easy. I use Debian with Xfce, and it's an easy thing to add > shortcuts - even dynamically I used Xubuntu for a long time. I like Xfce. On Thu, 10 Mar 2022 at 16:35, Loris Bennett wrote: > The shortcuts are properties of the desktop environment. You could just > install LXDE/LXQt on Debian if that's what you're used to from Lubuntu. I tried LXQt on my desktop. Very disappointed. The OS Update interface is just an "alert". LXDE unluckily is no longer developed. > Of course, if you're too old and lazy to set up a shortcut, you might > also be too old and lazy to install a different desktop environment ;-) Okay, I'm lazy for boring things :D PS: Is it just my impression or is there a plebiscite for Debian? -- https://mail.python.org/mailman/listinfo/python-list
Suggestion for Linux Distro (from PSA: Linux vulnerability)
On Thu, 10 Mar 2022 at 04:50, Michael Torrie wrote: > > On 3/9/22 13:05, Marco Sulla wrote: > > So my laziness pays. I use only LTS distros, and I update only when > > there are security updates. > > PS: any suggestions for a new LTS distro? My Lubuntu is reaching its > > end-of-life. I prefer lightweight debian-like distros. > > Maybe Debian itself? I tried Debian on a VM, but I found it too much basical. A little example: it does not have the shortcut ctrl+alt+t to open a terminal that Ubuntu has. I'm quite sure it's simple to add, but I'm starting to be old and lazy... -- https://mail.python.org/mailman/listinfo/python-list
Re: Could frozendict or frozenmap be of some use for PEP 683 (Immortal objects)?
On Wed, 9 Mar 2022 at 23:28, Martin Di Paola wrote: > Think in the immutable strings (str). What would happen with a program > that does heavy parsing? I imagine that it will generate thousands of > little strings. If those are immortal, the program will fill its memory > very quickly as the GC will not reclaim their memory. Well, as far as I know immortality was also suggested for interned strings. If I understood well, the problem with "normal" strings is that they are not really immutable in CPython. They have cache etc. Also frozendict caches hash, but that cache can be easily removed. -- https://mail.python.org/mailman/listinfo/python-list
Could frozendict or frozenmap be of some use for PEP 683 (Immortal objects)?
As title. dict can't be an immortal object, but hashable frozendict and frozenmap can. I think this can increase their usefulness. Another advantage: frozen dataclass will be really immutable if they could use a frozen(dict|map) instead of a dict as __dict__ -- https://mail.python.org/mailman/listinfo/python-list
Re: PSA: Linux vulnerability
So my laziness pays. I use only LTS distros, and I update only when there are security updates. PS: any suggestions for a new LTS distro? My Lubuntu is reaching its end-of-life. I prefer lightweight debian-like distros. On Tue, 8 Mar 2022 at 19:56, Ethan Furman wrote: > > https://arstechnica.com/information-technology/2022/03/linux-has-been-bitten-by-its-most-high-severity-vulnerability-in-years/ > > -- > ~Ethan~ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Cpython: when to incref before insertdict
On Sun, 6 Mar 2022 at 03:20, Inada Naoki wrote: > In general, when reference is borrowed from a caller, the reference is > available during the API. > But merge_dict borrows reference of key/value from other dict, not caller. > [...] > Again, insertdict takes the reference. So _PyDict_FromKeys() **does** > INCREF before calling insertdict, when key/value is borrowed > reference. > https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Objects/dictobject.c#L2287-L2290 > https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Objects/dictobject.c#L2309-L2311 > > On the other hand, slow path uses PyIter_Next() which returns strong > reference. So no need to INCREF it. Thank you Inada, these points make me things clear now. (PS: dictobject will change a lot in 3.11... sigh :D) -- https://mail.python.org/mailman/listinfo/python-list
Re: virtualenv and make DESTDIR=
On Sat, 5 Mar 2022 at 17:36, Barry Scott wrote: > Note: you usually cannot use pip when building an RPM with mock as the > network is disabled inside the build for > security reasons. Can't he previously download the packages and run pip on the local packages? -- https://mail.python.org/mailman/listinfo/python-list
Cpython: when to incref before insertdict
I noticed that some functions inside dictobject.c that call insertdict or PyDict_SetItem do an incref of key and value before the call, and a decref after it. An example is dict_merge. Other functions, such as _PyDict_FromKeys, don't do an incref before. When an incref of key and value is needed before insertdict and when is not? And why is an incref needed? -- https://mail.python.org/mailman/listinfo/python-list
Re: Error installing requirements
Maybe you compiled Python 2.7 by hand, David? It happened to me when I tried to compile Python without zlib headers installed on my OS. Don't know how it can be done on Windows. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to solve the given problem?
Narshad, I propose you post your questions to StackOverflow. I'm sure they will be very happy. -- https://mail.python.org/mailman/listinfo/python-list
Re: Global VS Local Subroutines
I agree with Chris. I don't know if it was already written: if you want a local function for speed reasons, you can use the classic approach of a main function. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do you log in your projects?
On Wed, 9 Feb 2022 at 20:40, Martin Di Paola wrote: > > If the logs are meant to be read by my users I log high level messages, > specially before parts that can take a while (like the classic > "Loading..."). ? Logs are not intended to be read by end users. Logs are primarily used to understand what the code is doing in a production environment. They could also be used to gather metrics data. Why should you log to give a message instead of simply using a print? > For exceptions I print the message but not the traceback. Why? Traceback is vital to understand what and where the problem is. I think you're confusing logs with messages. The stack trace can be logged (I would say must), but the end user generally sees a vague message with some hints, unless the program is used internally only. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do you log in your projects?
These are a lot of questions. I hope we're not off topic. I don't know if mine are best practices. I can tell what I try to do. On Tue, 8 Feb 2022 at 15:15, Lars Liedtke wrote: > - On a line per line basis? on a function/method basis? I usually log the start and end of functions. I could also log inside a branch or in other parts of the function/method. > - Do you use decorators to mark beginnings and ends of methods/functions > in log files? No, since I put the function parameters in the first log. But I think that such a decorator it's not bad. > - Which kind of variable contents do you write into your logfiles? Of > course you shouldn't leak secrets... Well, all the data that is useful to understand what the code is doing. It's better to repeat the essential data to identify a specific call in all the logs of the function, so if it is called simultaneously by more clients you can distinguish them > - How do you decide, which kind of log message goes into which level? It depends on the importance, the verbosity and the occurrences of the logs. > - How do you prevent logging cluttering your actual code? I have the opposite problem, I should log more. So I can't answer your question. -- https://mail.python.org/mailman/listinfo/python-list
Re: Waht do you think about my repeated_timer class
You could add a __del__ that calls stop :) On Wed, 2 Feb 2022 at 21:23, Cecil Westerhof via Python-list wrote: > > I need (sometimes) to repeatedly execute a function. For this I wrote > the below class. What do you think about it? > from threading import Timer > > > > class repeated_timer(object): > def __init__(self, fn, interval, start = False): > if not callable(fn): > raise TypeError('{} is not a function'.format(fn)) > self._fn = fn > self._check_interval(interval) > self._interval = interval > self._timer = None > self._is_running = False > if start: > self.start() > > def _check_interval(self, interval): > if not type(interval) in [int, float]: > raise TypeError('{} is not numeric'.format(interval)) > if interval <= 0: > raise ValueError('{} is not greater as 0'.format(interval)) > > def _next(self): > self._timer = Timer(self._interval, self._run) > self._timer.start() > > def _run(self): > self._next() > self._fn() > > def set_interval(self, interval): > self._check_interval(interval) > self._interval = interval > > def start(self): > if not self._is_running: > self._next() > self._is_running = True > > def stop(self): > if self._is_running: > self._timer.cancel() > self._timer = None > self._is_running = False > > -- > Cecil Westerhof > Senior Software Engineer > LinkedIn: http://www.linkedin.com/in/cecilwesterhof > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Why dict.setdefault() has value as optional?
On Wed, 2 Feb 2022 at 14:34, Lars Liedtke wrote: > > This is a quite philosophical queston if you look at it in general: > "What value do you give a variable, that is not set?" Maybe I expressed my question badly. My existential doubt is why setdefault has an optional parameter for the value and not a required parameter. I'm not asking why the default is None. -- https://mail.python.org/mailman/listinfo/python-list
Why dict.setdefault() has value as optional?
Just out of curiosity: why dict.setdefault() has the default parameter that well, has a default value (None)? I used setdefault in the past, but I always specified a value. What's the use case of setting None by default? -- https://mail.python.org/mailman/listinfo/python-list
Re: Segfault after deepcopy in a C extension
and I had to Py_INCREF(memo)! Thank you A LOT! On Mon, 31 Jan 2022 at 23:01, Chris Angelico wrote: > On Tue, 1 Feb 2022 at 08:54, Marco Sulla > wrote: > > PyObject* d = PyDict_New(); > > args = PyTuple_New(2); > > PyTuple_SET_ITEM(args, 0, d); > > PyTuple_SET_ITEM(args, 1, memo); > > Py_DECREF(d); > > > > https://docs.python.org/3/c-api/tuple.html#c.PyTuple_SET_ITEM > > SET_ITEM steals a reference, so you'll need to not also decref the > dict yourself. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Segfault after deepcopy in a C extension
Well, this is more or less what I'm trying to do. I have an immutable object. I would have copy.deepcopy() will return the object itself if it's hashable. If not, it must return a deepcopy of it. So I tried to implement a __deepcopy__ for the object. It segfaults if the object is not hashable. And I don't understand why. gdb gives me an incomprehensible backtrace. So I tried the old "print at each line", but the segfault does not happen in the function. It happens when I quit REPL or if I try to see the deepcopy. What can I do to further debug it? If someone is interested, this is the code: PyObject* frozendict_deepcopy(PyObject* self, PyObject* memo) { if (PyAnyFrozenDict_CheckExact(self)) { frozendict_hash(self); if (PyErr_Occurred()) { PyErr_Clear(); } else { Py_INCREF(self); return self; } } if (! PyAnyFrozenDict_Check(self)) { Py_RETURN_NOTIMPLEMENTED; } PyObject* d = PyDict_New(); if (d == NULL) { return NULL; } PyObject* copy_module_name = NULL; PyObject* copy_module = NULL; PyObject* deepcopy_fun = NULL; PyObject* args = NULL; PyObject* res = NULL; if (PyDict_Merge(d, self, 1)) { goto end; } copy_module_name = PyUnicode_FromString("copy"); if (copy_module_name == NULL) { goto end; } copy_module = PyImport_Import(copy_module_name); if (copy_module == NULL) { goto end; } deepcopy_fun = PyObject_GetAttrString(copy_module, "deepcopy"); if (deepcopy_fun == NULL) { goto end; } args = PyTuple_New(2); if (args == NULL) { goto end; } PyTuple_SET_ITEM(args, 0, d); PyTuple_SET_ITEM(args, 1, memo); res = PyObject_CallObject(deepcopy_fun, args); end: Py_XDECREF(args); Py_XDECREF(deepcopy_fun); Py_XDECREF(copy_module); Py_XDECREF(copy_module_name); Py_DECREF(d); return res; } -- https://mail.python.org/mailman/listinfo/python-list
Re: Pandas or Numpy
On Mon, 24 Jan 2022 at 05:37, Dennis Lee Bieber wrote: > Note that the comparison warns that /indexing/ in pandas can be slow. > If your manipulation is always "apply operationX to columnY" it should be > okay -- but "apply operationX to the nth row of columnY", and repeat for > other rows, is going to be slow. In my small way, I can confirm. In one of my previous works, we used numpy and Pandas. Writing the code in Pandas is quick, but they just realised that was really slow, and they tried to transform as much Panda code to numpy code as possible. Furthermore, I saw that they were so accustomed with Pandas that they used it for all, even for a simple csv creation, when the csv builtin module is enough. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why operations between dict views return a set and not a frozenset?
Thank you a lot for letting me understand :) On Tue, 11 Jan 2022 at 22:09, Peter J. Holzer wrote: > On 2022-01-11 19:49:20 +0100, Marco Sulla wrote: > > I think this is what you mean: > > > > >>> dis.dis("for _ in {1, 2}: pass") > > 1 0 SETUP_LOOP 12 (to 14) > > 2 LOAD_CONST 3 (frozenset({1, 2})) > > 4 GET_ITER > > >>6 FOR_ITER 4 (to 12) > > 8 STORE_NAME 0 (_) > > 10 JUMP_ABSOLUTE6 > > >> 12 POP_BLOCK > > >> 14 LOAD_CONST 2 (None) > > 16 RETURN_VALUE > > >>> a = {1, 2} > > >>> dis.dis("for _ in a: pass") > > 1 0 SETUP_LOOP 12 (to 14) > > 2 LOAD_NAME0 (a) > > 4 GET_ITER > > >>6 FOR_ITER 4 (to 12) > > 8 STORE_NAME 1 (_) > > 10 JUMP_ABSOLUTE6 > > >> 12 POP_BLOCK > > >> 14 LOAD_CONST 0 (None) > > 16 RETURN_VALUE > > I think you have omitted the part that Chris was hinting at. > > >>> dis.dis("a = {1, 2};\nfor _ in a: pass") > 1 0 LOAD_CONST 0 (1) > 2 LOAD_CONST 1 (2) > 4 BUILD_SET2 > 6 STORE_NAME 0 (a) > > 2 8 LOAD_NAME0 (a) > 10 GET_ITER > >> 12 FOR_ITER 4 (to 18) > 14 STORE_NAME 1 (_) > 16 JUMP_ABSOLUTE 12 > >> 18 LOAD_CONST 2 (None) > 20 RETURN_VALUE > > Now compare > > 2 LOAD_CONST 3 (frozenset({1, 2})) > > with > > 1 0 LOAD_CONST 0 (1) > 2 LOAD_CONST 1 (2) > 4 BUILD_SET2 > > and you see the difference between using a frozenset as a constant and > building a set at runtime. > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Doc or example about conda custom build?
Sorry for being maybe a little OT. I tried to get help from other Conda users, from chat and from the mailing list without success. I would add a custom build on my conda package. Is there somewhere a doc or an example about it? (Specifically, I want to pass a custom parameter to the setup.py that lets me package only the pure py version of the code.) -- https://mail.python.org/mailman/listinfo/python-list
Re: Pickle segfaults with custom type
Found. I simply forgot: if (PyType_Ready(_Type) < 0) { goto fail; } in the frozendict_exec function for the module. On Fri, 7 Jan 2022 at 20:27, Marco Sulla wrote: > I have a custom implementation of dict using a C extension. All works but > the pickling of views and iter types. Python segfaults if I try to pickle > them. > > For example, I have: > > > static PyTypeObject PyFrozenDictIterKey_Type = { > PyVarObject_HEAD_INIT(NULL, 0) > "frozendict.keyiterator", /* tp_name */ > sizeof(dictiterobject), /* tp_basicsize */ > 0, /* tp_itemsize */ > /* methods */ > (destructor)dictiter_dealloc, /* tp_dealloc */ > 0, /* tp_vectorcall_offset */ > 0, /* tp_getattr */ > 0, /* tp_setattr */ > 0, /* tp_as_async */ > 0, /* tp_repr */ > 0, /* tp_as_number */ > 0, /* tp_as_sequence */ > 0, /* tp_as_mapping */ > PyObject_HashNotImplemented,/* tp_hash */ > 0, /* tp_call */ > 0, /* tp_str */ > PyObject_GenericGetAttr,/* tp_getattro */ > 0, /* tp_setattro */ > 0, /* tp_as_buffer */ > Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */ > 0, /* tp_doc */ > (traverseproc)dictiter_traverse,/* tp_traverse */ > 0, /* tp_clear */ > 0, /* tp_richcompare */ > 0, /* tp_weaklistoffset */ > PyObject_SelfIter, /* tp_iter */ > (iternextfunc)frozendictiter_iternextkey, /* tp_iternext */ > dictiter_methods, /* tp_methods */ > 0, > }; > > This is the backtrace I get with gdb: > > #0 PyObject_Hash (v=0x7f043ce15540 ) at > ../cpython_3_10/Objects/object.c:788 > #1 0x0048611c in PyDict_GetItemWithError (op=0x7f043e1f4900, > key=key@entry=0x7f043ce15540 ) > at ../cpython_3_10/Objects/dictobject.c:1520 > #2 0x7f043ce227f6 in save (self=self@entry=0x7f043d8507d0, > obj=obj@entry=0x7f043e1fb0b0, pers_save=pers_save@entry=0) > at /home/marco/sources/cpython_3_10/Modules/_pickle.c:4381 > #3 0x7f043ce2534d in dump (self=self@entry=0x7f043d8507d0, > obj=obj@entry=0x7f043e1fb0b0) at > /home/marco/sources/cpython_3_10/Modules/_pickle.c:4515 > #4 0x7f043ce2567f in _pickle_dumps_impl (module=, > buffer_callback=, fix_imports=, > protocol=, > obj=0x7f043e1fb0b0) at > /home/marco/sources/cpython_3_10/Modules/_pickle.c:1203 > #5 _pickle_dumps (module=, args=, > nargs=, kwnames=) > at /home/marco/sources/cpython_3_10/Modules/clinic/_pickle.c.h:619 > > and so on. The problematic part is in the second frame. Indeed the code of > _pickle.c here is: > > > reduce_func = PyDict_GetItemWithError(st->dispatch_table, > (PyObject *)type); > > The problem is that type is NULL. It tries to get the attribute tp_hash > and it segfaults. > > I tried to change the header of the type to: > > PyVarObject_HEAD_INIT(_Type, 0) > > This way it works but, as known, it does not compile on Windows. > > The strange fact is that pickling the main type works, even if the type is > NULL, as suggested for a custom type. This is the main type: > > PyTypeObject PyFrozenDict_Type = { > PyVarObject_HEAD_INIT(NULL, 0) > "frozendict." FROZENDICT_CLASS_NAME,/* tp_name */ > sizeof(PyFrozenDictObject), /* tp_basicsize */ > 0, /* tp_itemsize */ > (destructor)dict_dealloc, /* tp_dealloc */ > 0, /* tp_vectorcall_offset */ > 0, /* tp_getattr */ > 0, /* tp_setattr */ > 0, /* tp_as_async */ > (reprfunc)frozendict_repr, /* tp_repr */ > _as_number, /* tp_as_number */ > _as_sequence,
Re: Why operations between dict views return a set and not a frozenset?
Ok... so I suppose, since you're inviting me to use dis and look at the bytecode, that are you talking about constants in assembly, so const in C? Sorry for the confusion, I'm not so skilled in C and I know nearly nothing about assembly. Furthermore I never look at the bytecode of any language before, so I simply didn't understand you. I think this is what you mean: >>> dis.dis("for _ in {1, 2}: pass") 1 0 SETUP_LOOP 12 (to 14) 2 LOAD_CONST 3 (frozenset({1, 2})) 4 GET_ITER >>6 FOR_ITER 4 (to 12) 8 STORE_NAME 0 (_) 10 JUMP_ABSOLUTE6 >> 12 POP_BLOCK >> 14 LOAD_CONST 2 (None) 16 RETURN_VALUE >>> a = {1, 2} >>> dis.dis("for _ in a: pass") 1 0 SETUP_LOOP 12 (to 14) 2 LOAD_NAME0 (a) 4 GET_ITER >>6 FOR_ITER 4 (to 12) 8 STORE_NAME 1 (_) 10 JUMP_ABSOLUTE6 >> 12 POP_BLOCK >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE On Tue, 11 Jan 2022 at 01:05, Chris Angelico wrote: > On Tue, Jan 11, 2022 at 10:26 AM Marco Sulla > wrote: > > > > On Wed, 5 Jan 2022 at 23:02, Chris Angelico wrote: > > > > > > On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla < > marco.sulla.pyt...@gmail.com> wrote: > > > > > > > > On Wed, 5 Jan 2022 at 14:16, Chris Angelico > wrote: > > > > > That's an entirely invisible optimization, but it's more than just > > > > > "frozenset is faster than set". It's that a frozenset or tuple can > be > > > > > stored as a function's constants, which is a massive difference. > > > > > > > > Can you explain this? > > > > > > Play around with dis.dis and timeit. > > > > ? I don't understand. You're talking about function constants. What > > are they? I can't dig deep into something if I can't know what it is. > > Maybe are you talking about function default values for parameters? > > No, I'm talking about constants. Every function has them. > > > Of course. You can use a proxy and slow down almost everything much > > more. Or you can simply create a version of the mutable object with > > fewer methods, as more or less frozenset is. I checked the > > implementation, no fast iteration is implemented. I do not understand > > why in `for x in {1, 2, 3}` the set is substituted by a frozenset. > > Constants. Like I said, play around with dis.dis, and explore what's > already happening. A set can't be a constant, a frozenset can be. > Constants are way faster than building from scratch. > > Explore. Play around. I'm not going to try to explain everything in detail. > > If you're delving into the details of the C implementation of the > dictionary, I would have expected you'd already be familiar with the > way that functions behave. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Why operations between dict views return a set and not a frozenset?
On Wed, 5 Jan 2022 at 23:02, Chris Angelico wrote: > > On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla > wrote: > > > > On Wed, 5 Jan 2022 at 14:16, Chris Angelico wrote: > > > That's an entirely invisible optimization, but it's more than just > > > "frozenset is faster than set". It's that a frozenset or tuple can be > > > stored as a function's constants, which is a massive difference. > > > > Can you explain this? > > Play around with dis.dis and timeit. ? I don't understand. You're talking about function constants. What are they? I can't dig deep into something if I can't know what it is. Maybe are you talking about function default values for parameters? > > > Function positional arguments aren't interchangeable, so it makes > > > sense to have them as a tuple. > > > > You are wrong, since kwarg is a dict. Indeed I proposed to use > > frozendict for kwargs, and Guido said that it's a pity that this will > > break a lot of existing Python code :D, since the fact that args is > > _immutable_ and kwargs not always bothered him. > > Excuse me? I mentioned kwargs in the part that you removed from the > quote, and the part you're quoting explicitly says "positional > arguments". Ok, I quote also the other part: > (Function *keyword* arguments, on the other hand, are different; as > long as the mapping from keys to values is maintained, you can remove > some of them and pass the rest on, without fundamentally changing > their meaning.) First of all, I repeat, Guido said (more or less) that in a perfect world, kwargs are immutable. Or maybe I did not understand what he said, maybe he said that in a perfect world also args are mutable. But I suppose it's more probable the first hypothesis :D Secondly, you can also get the args from a function, transform it in a list, change something and pass it unpacked to another function. You will not change the meaning of the tuple, since, well, you copied it in another mutable object. The original object is untouched. I perfectly agree that, in the majority of cases, returning an immutable vs a mutable are a matter of... sense? Meaning? Ok, I perfectly agree. But IMHO there are many cases in which immutable objects are used for a matter of speed, and I bet that args is one of them. > > Anyway, I'm starting to think that neither set nor frozenset are good > > for dict items: > > > > (venv_3_10) marco@buzz:~$ python > > Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) > > [GCC 10.1.1 20200718] on linux > > Type "help", "copyright", "credits" or "license" for more information. > > >>> a = {1: 2} > > >>> b = {3: []} > > >>> a | b > > {1: 2, 3: []} > > >>> a.items() | b.items() > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: unhashable type: 'list' > > >>> > > Well yes. Only dict keys can be considered to be set-like. This is not true. It's at least from Python 3.6, and I think also before, that almost the full Set API was added to both keys and items view. Items indeed are a sort of set, in a mathematical sense, since any pair (key, value) is unique, even if value is mutable. > I don't > know WHAT you think you're trying to do here, but if you ever thought > of set operations on dict values, you may want to completely rethink > what you're doing. set ops on values? Never said that :) I said that currently you can operate on item views with set operators. This is a fact. I also said that, since py sets accept only hashable objects, maybe another ad-hoc object should be used for the result of the items operations. But maybe the change isn't worth the additional trouble. Indeed I didn't know about the new set methods and operations on dict views until I explored dictobject.c > Performance is not an automatic result of immutability. That simply > isn't how it works. Of course. You can use a proxy and slow down almost everything much more. Or you can simply create a version of the mutable object with fewer methods, as more or less frozenset is. I checked the implementation, no fast iteration is implemented. I do not understand why in `for x in {1, 2, 3}` the set is substituted by a frozenset. -- https://mail.python.org/mailman/listinfo/python-list
Pickle segfaults with custom type
I have a custom implementation of dict using a C extension. All works but the pickling of views and iter types. Python segfaults if I try to pickle them. For example, I have: static PyTypeObject PyFrozenDictIterKey_Type = { PyVarObject_HEAD_INIT(NULL, 0) "frozendict.keyiterator", /* tp_name */ sizeof(dictiterobject), /* tp_basicsize */ 0, /* tp_itemsize */ /* methods */ (destructor)dictiter_dealloc, /* tp_dealloc */ 0, /* tp_vectorcall_offset */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_as_async */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ PyObject_HashNotImplemented,/* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ PyObject_GenericGetAttr,/* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */ 0, /* tp_doc */ (traverseproc)dictiter_traverse,/* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ PyObject_SelfIter, /* tp_iter */ (iternextfunc)frozendictiter_iternextkey, /* tp_iternext */ dictiter_methods, /* tp_methods */ 0, }; This is the backtrace I get with gdb: #0 PyObject_Hash (v=0x7f043ce15540 ) at ../cpython_3_10/Objects/object.c:788 #1 0x0048611c in PyDict_GetItemWithError (op=0x7f043e1f4900, key=key@entry=0x7f043ce15540 ) at ../cpython_3_10/Objects/dictobject.c:1520 #2 0x7f043ce227f6 in save (self=self@entry=0x7f043d8507d0, obj=obj@entry=0x7f043e1fb0b0, pers_save=pers_save@entry=0) at /home/marco/sources/cpython_3_10/Modules/_pickle.c:4381 #3 0x7f043ce2534d in dump (self=self@entry=0x7f043d8507d0, obj=obj@entry=0x7f043e1fb0b0) at /home/marco/sources/cpython_3_10/Modules/_pickle.c:4515 #4 0x7f043ce2567f in _pickle_dumps_impl (module=, buffer_callback=, fix_imports=, protocol=, obj=0x7f043e1fb0b0) at /home/marco/sources/cpython_3_10/Modules/_pickle.c:1203 #5 _pickle_dumps (module=, args=, nargs=, kwnames=) at /home/marco/sources/cpython_3_10/Modules/clinic/_pickle.c.h:619 and so on. The problematic part is in the second frame. Indeed the code of _pickle.c here is: reduce_func = PyDict_GetItemWithError(st->dispatch_table, (PyObject *)type); The problem is that type is NULL. It tries to get the attribute tp_hash and it segfaults. I tried to change the header of the type to: PyVarObject_HEAD_INIT(_Type, 0) This way it works but, as known, it does not compile on Windows. The strange fact is that pickling the main type works, even if the type is NULL, as suggested for a custom type. This is the main type: PyTypeObject PyFrozenDict_Type = { PyVarObject_HEAD_INIT(NULL, 0) "frozendict." FROZENDICT_CLASS_NAME,/* tp_name */ sizeof(PyFrozenDictObject), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)dict_dealloc, /* tp_dealloc */ 0, /* tp_vectorcall_offset */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_as_async */ (reprfunc)frozendict_repr, /* tp_repr */ _as_number, /* tp_as_number */ _as_sequence, /* tp_as_sequence */ _as_mapping, /* tp_as_mapping */ (hashfunc)frozendict_hash, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ PyObject_GenericGetAttr,/* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC | Py_TPFLAGS_BASETYPE | _Py_TPFLAGS_MATCH_SELF | Py_TPFLAGS_MAPPING, /* tp_flags */ frozendict_doc,
Re: Why operations between dict views return a set and not a frozenset?
On Wed, 5 Jan 2022 at 14:16, Chris Angelico wrote: > That's an entirely invisible optimization, but it's more than just > "frozenset is faster than set". It's that a frozenset or tuple can be > stored as a function's constants, which is a massive difference. Can you explain this? > In fact, the two data types are virtually identical in performance once > created [...] This is really strange, since in theory frozenset should not have to check if itself is mutated during the iteration, on each cycle. So the speed should be noticeable faster. Maybe frozenset was not optimised, because the use case is really little and will add potentially useless C code? Furthermore, more the code, more the memory consumption and less the speed. I have to check setobject.c. > Function positional arguments aren't interchangeable, so it makes > sense to have them as a tuple. You are wrong, since kwarg is a dict. Indeed I proposed to use frozendict for kwargs, and Guido said that it's a pity that this will break a lot of existing Python code :D, since the fact that args is _immutable_ and kwargs not always bothered him. Anyway, I'm starting to think that neither set nor frozenset are good for dict items: (venv_3_10) marco@buzz:~$ python Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) [GCC 10.1.1 20200718] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = {1: 2} >>> b = {3: []} >>> a | b {1: 2, 3: []} >>> a.items() | b.items() Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'list' >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: Why operations between dict views return a set and not a frozenset?
On Wed, 5 Jan 2022 at 00:54, Chris Angelico wrote: > That's because a tuple is the correct data type when returning two > distinct items. It's not a list that has two elements in it; it's a > tuple of (key, value). Immutability is irrelevant. Immutability is irrelevant, speed no. A tuple is faster than a list and more compact. Also frozenset is faster than set. Indeed CPython optimises internally a for x in {1, 2, 3} transforming the set in a frozenset for a matter of speed. That's why tuple is usually preferred. I expected the same for frozenset > Got any examples of variable-length sequences? function positional args are tuples, for example. > Usually a tuple is a > structure, not just a sequence. eh? Are you talking about the underlying C code? > If something is just returning a > sequence, it'll most often return a dedicated sequence type (like > range in Py3) or a list (like lots of things in Py2). Python 2 is now obsolete, I don't think is relevant for the discussion. About your sentence, yes, usually a dedicated view, sequence or generator is returned, but tuples too are really much used. A list is returned very sporadically, for what I remember. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why operations between dict views return a set and not a frozenset?
On Tue, 4 Jan 2022 at 19:38, Chris Angelico wrote: > [...] should the keys view be considered > frozen or not? Remember the set of keys can change (when the > underlying dict changes). Well, also the items can change, but they are returned as tuples with 2 elements. It seems to me that the stdlib, when something should return a sequence, prefers to return a tuple. So I expected the same preference for frozenset over set. > It's not difficult to construct a frozenset from a set. This sentence has the commutative property :) -- https://mail.python.org/mailman/listinfo/python-list
Why operations between dict views return a set and not a frozenset?
$ python Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) [GCC 10.1.1 20200718] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = {1:2} >>> c = {1:2, 3:4} >>> c.keys() - a.keys() {3} >>> Why not frozenset({3})? -- https://mail.python.org/mailman/listinfo/python-list
Re: ModuleNotFoundError: No module named 'DistUtilsExtra'
https://askubuntu.com/questions/584857/distutilsextra-problem On Sun, 2 Jan 2022 at 18:52, hongy...@gmail.com wrote: > > On Ubuntu 20.04.3 LTS, I try to install pdfarranger [1] as follows but failed: > > $ sudo apt-get install python3-pip python3-distutils-extra \ > python3-wheel python3-gi > python3-gi-cairo \ > gir1.2-gtk-3.0 gir1.2-poppler-0.18 > python3-setuptools > $ git clone https://github.com/pdfarranger/pdfarranger.git pdfarranger.git > $ cd pdfarranger.git > $ pyenv shell 3.8.3 > $ pyenv virtualenv --system-site-packages pdfarranger > $ pyenv shell pdfarranger > $ pip install -U pip > $ ./setup.py build > Traceback (most recent call last): > File "./setup.py", line 24, in > from DistUtilsExtra.command import ( > ModuleNotFoundError: No module named 'DistUtilsExtra' > > > See the following for the package list installed in this virtualenv: > > $ pip list > PackageVersion > -- > pip21.3.1 > pyfiglet 0.8.post1 > setuptools 41.2.0 > vtk9.0.20200612 > > Any hints for fixing this problem? Also see here [2-3] for relevant > discussions. > > [1] https://github.com/pdfarranger/pdfarranger > [2] https://github.com/pdfarranger/pdfarranger/issues/604 > [3] > https://discuss.python.org/t/modulenotfounderror-no-module-named-distutilsextra/12834 > > Regards, > HZ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Who wrote Py_UNREACHABLE?
#if defined(RANDALL_WAS_HERE) # define Py_UNREACHABLE() \ Py_FatalError( \ "If you're seeing this, the code is in what I thought was\n" \ "an unreachable state.\n\n" \ "I could give you advice for what to do, but honestly, why\n" \ "should you trust me? I clearly screwed this up. I'm writing\n" \ "a message that should never appear, yet I know it will\n" \ "probably appear someday.\n\n" \ "On a deep level, I know I'm not up to this task.\n" \ "I'm so sorry.\n" \ "https://xkcd.com/2200;) #elif defined(Py_DEBUG) # define Py_UNREACHABLE() \ Py_FatalError( \ "We've reached an unreachable state. Anything is possible.\n" \ "The limits were in our heads all along. Follow your dreams.\n" \ "https://xkcd.com/2200;) etc -- https://mail.python.org/mailman/listinfo/python-list
Re: How to implement freelists in dict 3.10 for previous versions?
Ooookay, I suppose I have to study a little the thing :D On Thu, 30 Dec 2021 at 07:59, Inada Naoki wrote: > > On Wed, Dec 29, 2021 at 7:25 PM Marco Sulla > wrote: > > > > I noticed that now freelists in dict use _Py_dict_state. I suppose > > this is done for thread safety. > > > > Some core-dev are working on per-interpreter GIL. But it is not done yet. > So you don't need to follow it soon. Your extension module will work > well in Python 3.11. > > > I would implement it also for a C extension that uses CPython < 3.10. > > How can I achieve this? > > See PyModule_GetState() to have per-interpreter module state instead > of static variables. > https://docs.python.org/3/c-api/module.html#c.PyModule_GetState > > > -- > Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
How to make a type of a C extension compatible with mypy
I created a type in a C extension, that is an immutable dict. If I do: a: mydict[str, str] it works. But it doesn't work with mypy, as signalled to me by an user: https://github.com/Marco-Sulla/python-frozendict/issues/39 How can I make it work? I don't know what he means with annotating methods, and furthermore I suppose I can't do this in C. -- https://mail.python.org/mailman/listinfo/python-list
Re: recover pickled data: pickle data was truncated
I agree with Barry. You can create a folder or a file with pseudo-random names. I recommend you to use str(uuid.uuid4()) On Sat, 1 Jan 2022 at 14:11, Barry wrote: > > > > > On 31 Dec 2021, at 17:53, iMath wrote: > > > > 在 2021年12月30日星期四 UTC+8 03:13:21, 写道: > >>> On Wed, 29 Dec 2021 at 18:33, iMath wrote: > >>> But I found the size of the file of the shelve data didn't change much, > >>> so I guess the data are still in it , I just wonder any way to recover my > >>> data. > >> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling > >> it by hand is a harsh work and maybe unreliable. > >> > >> Is there any reason you can't simply add a semaphore to avoid writing > >> at the same time and re-run the code and regenerate the data? > > > > Thanks for your replies! I didn't have a sense of adding a semaphore on > > writing to pickle data before, so corrupted the data. > > Since my data was colleted in the daily usage, so cannot re-run the code > > and regenerate the data. > > In order to avoid corrupting my data again and the complicity of using a > > semaphore, now I am using json text to store my data. > > That will not fix the problem. You will end up with corrupt json. > > If you have one writer and one read then may be you can use the fact that a > rename is atomic. > > Writer does this: > 1. Creat new json file in the same folder but with a tmp name > 2. Rename the file from its tmp name to the public name. > > The read will just read the public name. > > I am not sure what happens in your world if the writer runs a second time > before the data is read. > > In that case you need to create a queue of files to be read. > > But if the problem is two process racing against each other you MUST use > locking. > It cannot be avoided for robust operations. > > Barry > > > > -- > > https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: builtins.TypeError: catching classes that do not inherit from BaseException is not allowed
It was already done: https://pypi.org/project/tail-recursive/ On Thu, 30 Dec 2021 at 16:00, hongy...@gmail.com wrote: > > I try to compute the factorial of a large number with tail-recursion > optimization decorator in Python3. The following code snippet is converted > from the code snippet given here [1] by the following steps: > > $ pyenv shell datasci > $ python --version > Python 3.9.1 > $ pip install 2to3 > $ 2to3 -w this-script.py > > ``` > # This program shows off a python decorator( > # which implements tail call optimization. It > # does this by throwing an exception if it is > # its own grandparent, and catching such > # exceptions to recall the stack. > > import sys > > class TailRecurseException: > def __init__(self, args, kwargs): > self.args = args > self.kwargs = kwargs > > def tail_call_optimized(g): > """ > This function decorates a function with tail call > optimization. It does this by throwing an exception > if it is its own grandparent, and catching such > exceptions to fake the tail call optimization. > > This function fails if the decorated > function recurses in a non-tail context. > """ > def func(*args, **kwargs): > f = sys._getframe() > if f.f_back and f.f_back.f_back \ > and f.f_back.f_back.f_code == f.f_code: > raise TailRecurseException(args, kwargs) > else: > while 1: > try: > return g(*args, **kwargs) > except TailRecurseException as e: > args = e.args > kwargs = e.kwargs > func.__doc__ = g.__doc__ > return func > > @tail_call_optimized > def factorial(n, acc=1): > "calculate a factorial" > if n == 0: > return acc > return factorial(n-1, n*acc) > > print(factorial(1)) > # prints a big, big number, > # but doesn't hit the recursion limit. > > @tail_call_optimized > def fib(i, current = 0, next = 1): > if i == 0: > return current > else: > return fib(i - 1, next, current + next) > > print(fib(1)) > # also prints a big number, > # but doesn't hit the recursion limit. > ``` > However, when I try to test the above script, the following error will be > triggered: > ``` > $ python this-script.py > Traceback (most recent call last): > File "/home/werner/this-script.py", line 32, in func > return g(*args, **kwargs) > File "/home/werner/this-script.py", line 44, in factorial > return factorial(n-1, n*acc) > File "/home/werner/this-script.py", line 28, in func > raise TailRecurseException(args, kwargs) > TypeError: exceptions must derive from BaseException > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/home/werner/this-script.py", line 46, in > print(factorial(1)) > File "/home/werner/this-script.py", line 33, in func > except TailRecurseException as e: > TypeError: catching classes that do not inherit from BaseException is not > allowed > ``` > > Any hints for fixing this problem will be highly appreciated. > > [1] https://stackoverflow.com/q/27417874 > > Regards, > HZ > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: recover pickled data: pickle data was truncated
On Wed, 29 Dec 2021 at 18:33, iMath wrote: > But I found the size of the file of the shelve data didn't change much, so I > guess the data are still in it , I just wonder any way to recover my data. I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling it by hand is a harsh work and maybe unreliable. Is there any reason you can't simply add a semaphore to avoid writing at the same time and re-run the code and regenerate the data? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 12:11, Dieter Maurer wrote: > > Marco Sulla wrote at 2021-12-29 11:59 +0100: > >On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > >> `MutableMapping` is a so called abstract base class (--> `abc`). > >> > >> It uses the `__subclass_check__` (and `__instance_check__`) of > >> `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > >> Those can be customized by overriding `MutableMapping.__subclasshook__` > >> to ensure that your `frozendict` class (and their subclasses) > >> are not considered subclasses of `MutableMapping`. > > > >It does not work: > > ... > >>>> issubclass(fd, Mm) > >True > > There is a cache involved. The `issubclass` above, > brings your `fd` in the `Mn`'s subclass cache. It works, thank you! I had to put it before Mapping.register(frozendict) -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > `MutableMapping` is a so called abstract base class (--> `abc`). > > It uses the `__subclass_check__` (and `__instance_check__`) of > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > Those can be customized by overriding `MutableMapping.__subclasshook__` > to ensure that your `frozendict` class (and their subclasses) > are not considered subclasses of `MutableMapping`. It does not work: $ python Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18) [GCC 10.1.1 20200718] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import frozendict >>> frozendict.c_ext False >>> from frozendict import frozendict as fd >>> from collections.abc import MutableMapping as Mm >>> issubclass(fd, Mm) True >>> @classmethod ... def _my_subclasshook(klass, subclass): ... if subclass == fd: ... return False ... return NotImplemented ... >>> @classmethod ... def _my_subclasshook(klass, subclass): ... print(subclass) ... if subclass == fd: ... return False ... return NotImplemented ... >>> Mm.__subclasshook__ = _my_subclasshook >>> issubclass(fd, Mm) True >>> issubclass(tuple, Mm) False >>> -- https://mail.python.org/mailman/listinfo/python-list
How to implement freelists in dict 3.10 for previous versions?
I noticed that now freelists in dict use _Py_dict_state. I suppose this is done for thread safety. I would implement it also for a C extension that uses CPython < 3.10. How can I achieve this? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On Wed, 29 Dec 2021 at 10:06, Dieter Maurer wrote: > > Are you sure you need to implement your type in C at all? It's already implemented, and, in some cases, is faster than dict: https://github.com/Marco-Sulla/python-frozendict#benchmarks PS: I'm doing a refactoring that speeds up creation even further, making it almost as fast as dict. -- https://mail.python.org/mailman/listinfo/python-list
Re: What's the public API alternative to _PyObject_GC_IS_TRACKED()?
On second thought, I think I'll do this for the pure py version. But I will definitely not do this for the C extension, since it's anyway strange that an immutable mapping inherits from a mutable one! I've done it in the pure py version only for a matter of speed. On Wed, 29 Dec 2021 at 09:24, Marco Sulla wrote: > > On Wed, 29 Dec 2021 at 09:12, Dieter Maurer wrote: > > > > Marco Sulla wrote at 2021-12-29 08:08 +0100: > > >On Wed, 29 Dec 2021 at 00:03, Dieter Maurer wrote: > > >> Why do you not derive from `dict` and override its mutating methods > > >> (to raise a type error after initialization is complete)? > > > > > >I've done this for the pure py version, for speed. But in this way, > > >frozendict results to be a subclass of MutableMapping. > > > > `MutableMapping` is a so called abstract base class (--> `abc`). > > > > It uses the `__subclass_check__` (and `__instance_check__`) of > > `abc.ABCMeta` to ensure `issubclass(dict, MutableMapping)`. > > Those can be customized by overriding `MutableMapping.__subclasshook__` > > to ensure that your `frozendict` class (and their subclasses) > > are not considered subclasses of `MutableMapping`. > > Emh. Too hacky for me too, sorry :D -- https://mail.python.org/mailman/listinfo/python-list