RE: Comparing caching strategies

2023-02-16 Thread avi.e.gross
I am less interested in the choice of names than the pro and con of when these 
Roaring bitmaps are worth using and when they are not.

It is a bit like discussing whether various compression techniques are worth 
using as the storage or memory costs can be weighed against the CPU or 
transient memory costs of compressing and uncompressing.  The value tends to 
depend on many factors and there may even be times you want to store the data 
in multiple data structures with each optimized to store that kind or amount of 
data.

Roaring bitmaps claim to be an improvement not only over uncompressed 
structures but some other compressed versions but my reading shows it may be 
limited to some uses. Bitsets in general seem to be useful only for a largely 
contiguous set of integers where each sequential bit represents whether the nth 
integer above the lowest is in the set or not. Of course, properly set up, this 
makes Unions and Intersections and some other operations fairly efficient. But 
sets are not the same as dictionaries and often you are storing other data 
types than smaller integers.

Many normal compression techniques can require lots of time to uncompress to 
find anything. My impression is that Roaring Bitmaps is a tad like the pkzip 
software that tries various compression techniques on each file and chooses 
whatever seems to work better on each one. That takes extra time when zipping, 
but when unzipping a file, it goes directly to the method used to compress it 
as the type is in a header and just expands it one way.

My impression is that Roaring bitmaps breaks up the range of integers into 
smaller chunks and depending on what is being stored in that chunk, may leave 
it as an uncompressed bitmap, or a list of the sparse contents, or other 
storage methods and can search each version fairly quickly. 

So, I have no doubt it works great for some applications such as treating 
social security numbers as integers. It likely would be overkill to store 
something like the components of an IP address between 0 and 255 inclusive.

But having said that, there may well be non-integer data that can be mapped 
into and out of integers. As an example, airports or radio stations have names 
like LAX or WPIX. If you limit yourself to ASCII letters then every one of them 
can be stored as a 32-bit integer, perhaps with some padding. Of course for 
such fairly simple data, some might choose to place the data in a balanced tree 
structure and get reasonable search speed.

I am curious about the size of some of these structures but obviously it 
depends. Are they stored on disk in this form too?


-Original Message-
From: Python-list  On 
Behalf Of MRAB
Sent: Thursday, February 16, 2023 11:24 PM
To: python-list@python.org
Subject: Re: Comparing caching strategies

On 2023-02-14 22:20, Rob Cliffe via Python-list wrote:
> On 11/02/2023 00:39, Dino wrote:
>> First off, a big shout out to Peter J. Holzer, who mentioned roaring 
>> bitmaps a few days ago and led me to quite a discovery.
>>
> I was intrigued to hear about roaring bitmaps and discover they really 
> were a thing (not a typo as I suspected at first).
> What next, I wonder?
>   argumentative arrays
>   chattering classes (on second thoughts, we have those already)
>   dancing dictionaries
>   garrulous generators
>   laughing lists
>   piping pipelines
>   singing strings
>   speaking sets
>   stuttering sorts
>   talking tuples
>   whistling walruses?

babbling bytestrings?

> The future awaits [pun not intended] ...

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-16 Thread Chris Angelico
On Fri, 17 Feb 2023 at 15:28, MRAB  wrote:
>
> On 2023-02-14 22:20, Rob Cliffe via Python-list wrote:
> > On 11/02/2023 00:39, Dino wrote:
> >> First off, a big shout out to Peter J. Holzer, who mentioned roaring
> >> bitmaps a few days ago and led me to quite a discovery.
> >>
> > I was intrigued to hear about roaring bitmaps and discover they really
> > were a thing (not a typo as I suspected at first).
> > What next, I wonder?
> >   argumentative arrays
> >   chattering classes (on second thoughts, we have those already)
> >   dancing dictionaries
> >   garrulous generators
> >   laughing lists
> >   piping pipelines
> >   singing strings
> >   speaking sets
> >   stuttering sorts
> >   talking tuples
> >   whistling walruses?
>
> babbling bytestrings?
>

Excited exceptions. I'm sure there's a particle physics crossover
waiting for this one.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-16 Thread MRAB

On 2023-02-14 22:20, Rob Cliffe via Python-list wrote:

On 11/02/2023 00:39, Dino wrote:
First off, a big shout out to Peter J. Holzer, who mentioned roaring 
bitmaps a few days ago and led me to quite a discovery.



I was intrigued to hear about roaring bitmaps and discover they really
were a thing (not a typo as I suspected at first).
What next, I wonder?
      argumentative arrays
      chattering classes (on second thoughts, we have those already)
      dancing dictionaries
      garrulous generators
      laughing lists
      piping pipelines
      singing strings
      speaking sets
      stuttering sorts
      talking tuples
      whistling walruses?


babbling bytestrings?


The future awaits [pun not intended] ...


--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-16 Thread Rob Cliffe via Python-list

On 11/02/2023 00:39, Dino wrote:
First off, a big shout out to Peter J. Holzer, who mentioned roaring 
bitmaps a few days ago and led me to quite a discovery.


I was intrigued to hear about roaring bitmaps and discover they really 
were a thing (not a typo as I suspected at first).

What next, I wonder?
    argumentative arrays
    chattering classes (on second thoughts, we have those already)
    dancing dictionaries
    garrulous generators
    laughing lists
    piping pipelines
    singing strings
    speaking sets
    stuttering sorts
    talking tuples
    whistling walruses?
The future awaits [pun not intended] ...
--
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-16 Thread Weatherby,Gerard
I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple


class LruCache:
"""Dictionary like storage of most recently inserted values"""

def __init__(self, size: int = 1000):
""":param size number of cached entries"""
assert isinstance(size, int)
self.size = size
self.insert_counter = 0
   self.oldest = 0
self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age 
index
self._lru: Dict[int, Hashable] = {} # age counter dictionary

def insert(self, key: Hashable, value: Any) -> None:
"""Insert into dictionary"""
existing = self._data.get(key, None)
self._data[key] = (value, self.insert_counter)
self._lru[self.insert_counter] = key
if existing is not None:
self._lru.pop(existing[1], None)  # remove old counter value, if it 
exists
self.insert_counter += 1
if (sz := len(self._data)) > self.size:  # is cache full?
assert sz == self.size + 1
while (
key := self._lru.get(self.oldest, None)) is None:  # index may not 
be present, if value was reinserted
self.oldest += 1
del self._data[key]  # remove oldest key / value from dictionary
del self._lru[self.oldest]
self.oldest += 1  # next oldest index
assert len(self._lru) == len(self._data)

def get(self, key: Hashable) -> Optional[Any]:
"""Get value or return None if not in cache"""
if (tpl := self._data.get(key, None)) is not None:
return tpl[0]
return None


if __name__ == "__main__":
CACHE_SIZE = 1000
TEST_SIZE = 1_000_000
cache = LruCache(size=CACHE_SIZE)

all = []
for i in range(TEST_SIZE):
all.append(random.randint(-5000, 5000))

summary = collections.defaultdict(int)
for value in all:
cache.insert(value, value * value)
summary[value] += 1
smallest = TEST_SIZE
largest = -TEST_SIZE
total = 0
for value, count in summary.items():
smallest = min(smallest, count)
largest = max(largest, count)
total += count
avg = total / len(summary)
print(f"{len(summary)} values occurrences range from {smallest} to 
{largest}, average {avg:.1f}")

recent = set()  # recent most recent entries
for i in range(len(all) - 1, -1, -1):  # loop backwards to get the most 
recent entries
value = all[i]
if len(recent) < CACHE_SIZE:
recent.add(value)
if value in recent:
if (r := cache.get(value)) != value * value:
raise ValueError(f"Cache missing recent {value} {r}")
else:
if cache.get(value) != None:
raise ValueError(f"Cache includes old {value}")

From: Python-list  on 
behalf of Dino 
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org 
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:
> On 2/14/23 15:07, Dino wrote:
>>

--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-16 Thread Peter Pearson
On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:
> On Tue, 14 Feb 2023 at 07:12, Stephen Tucker  wrote:
[snip]
>> I have just produced the following log in IDLE (admittedly, in Python
>> 2.7.10 and, yes I know that it has been superseded).
>>
>> It appears to show a precision tail-off as the supplied float gets bigger.
[snip]
>>
>> For your information, the first 20 significant figures of the cube root in
>> question are:
>>49793385921817447440
>>
>> Stephen Tucker.
>> --
>> >>> 123.456789 ** (1.0 / 3.0)
>> 4.979338592181744
>> >>> 1234567890. ** (1.0 / 3.0)
>> 49793385921817.36
>
> You need to be aware that 1.0/3.0 is a float that is not exactly equal
> to 1/3 ...
[snip]
> SymPy again:
>
> In [37]: a, x = symbols('a, x')
>
> In [38]: print(series(a**x, x, Rational(1, 3), 2))
> a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))
>
> You can see that the leading relative error term from x being not
> quite equal to 1/3 is proportional to the log of the base. You should
> expect this difference to grow approximately linearly as you keep
> adding more zeros in the base.

Marvelous.  Thank you.


-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python: How to use the 'trace' module programmatically?

2023-02-16 Thread Peter Slížik
Gerard - I did not use the filtering options. Thank you for bringing them
to my attention.

Barry - thank you for the insight.

Now the tracing works as expected. I'm not sure why it didn't work
before... Maybe the program redirected stdout?

Thank you guys,
Peter

On Thu, Feb 16, 2023 at 9:56 AM Barry  wrote:

>
>
> On 15 Feb 2023, at 17:23, Peter Slížik  wrote:
>
> Hello,
>
> I'm trying to analyze complex Python code. For some specific reasons, I
> decided to use tracing instead of a debugger.
>
> The first thing I tried was:
>
> python -m trace -t /path/to/file.py
>
> The output of this command turned out to be completely useless. The reason
> is that there was a thread running in the background, doing some work
> every *0.1
> s* and this generated thousands of lines of tracing information. The useful
> information (a reaction to my interaction with app GUI) scrolled away in a
> blink.
>
> For this reason, I decided to limit the scope of tracing. I did the
> following.
>
> The original code:
>
> def caller():
>print("I'm the caller.")
>callee()
> def callee():
>print("Callee here.")
>
> Code modified for tracing:
>
> import trace
>
> tracer = trace.Tracer(
>count=0,
>trace=1,
> )
> def outer():
>print("I'm the caller.")
>tracer.runfunc(inner)
>
>
>
> The docs show that you need to do either add the outfile to trace
> or generate and write the report after runfunc returns.
>
> I have not tested this, just read the docs out of curiosity
> Here https://docs.python.org/3/library/trace.html
>
> Barry
>
> def inner():
>print("Callee here.")
>
> Now I launched the program and the tracer did not generate any output. I
> was hoping that this would provide complete tracing information, but only
> for the limited scope of inner().
>
> No success with tracer.run() either.
>
> What I was able to do, when I set count=1, I was able to catch the coverage
> data with tracer.results() and write them to a file. But the tracing
> information was not generated even in this case.
>
> Am I doing anything wrong?
>
> Peter
> --
> https://mail.python.org/mailman/listinfo/python-list
>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python: How to use the 'trace' module programmatically?

2023-02-16 Thread Barry


> On 15 Feb 2023, at 17:23, Peter Slížik  wrote:
> 
> Hello,
> 
> I'm trying to analyze complex Python code. For some specific reasons, I
> decided to use tracing instead of a debugger.
> 
> The first thing I tried was:
> 
> python -m trace -t /path/to/file.py
> 
> The output of this command turned out to be completely useless. The reason
> is that there was a thread running in the background, doing some work
> every *0.1
> s* and this generated thousands of lines of tracing information. The useful
> information (a reaction to my interaction with app GUI) scrolled away in a
> blink.
> 
> For this reason, I decided to limit the scope of tracing. I did the
> following.
> 
> The original code:
> 
> def caller():
>print("I'm the caller.")
>callee()
> def callee():
>print("Callee here.")
> 
> Code modified for tracing:
> 
> import trace
> 
> tracer = trace.Tracer(
>count=0,
>trace=1,
> )
> def outer():
>print("I'm the caller.")
>tracer.runfunc(inner)


The docs show that you need to do either add the outfile to trace
or generate and write the report after runfunc returns.

I have not tested this, just read the docs out of curiosity
Here https://docs.python.org/3/library/trace.html

Barry

> def inner():
>print("Callee here.")
> 
> Now I launched the program and the tracer did not generate any output. I
> was hoping that this would provide complete tracing information, but only
> for the limited scope of inner().
> 
> No success with tracer.run() either.
> 
> What I was able to do, when I set count=1, I was able to catch the coverage
> data with tracer.results() and write them to a file. But the tracing
> information was not generated even in this case.
> 
> Am I doing anything wrong?
> 
> Peter
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 
-- 
https://mail.python.org/mailman/listinfo/python-list