Re: Can you help me with this memoization simple example?

2024-03-31 Thread marc nicole via Python-list
Thanks for the first comment which I incorporated

but when you say "You can't use a list as a key, but you can use a tuple as
a key,
provided that the elements of the tuple are also immutable."

does it mean  the result of sum of the array is not convenient to use as
key as I do?
Which tuple I should use to refer to the underlying list value as you
suggest?

Anything else is good in my code ?

Thanks

Le dim. 31 mars 2024 à 01:44, MRAB via Python-list 
a écrit :

> On 2024-03-31 00:09, marc nicole via Python-list wrote:
> > I am creating a memoization example with a function that adds up /
> averages
> > the elements of an array and compares it with the cached ones to retrieve
> > them in case they are already stored.
> >
> > In addition, I want to store only if the result of the function differs
> > considerably (passes a threshold e.g. 50 below).
> >
> > I created an example using a decorator to do so, the results using the
> > decorator is slightly faster than without the memoization which is OK,
> but
> > is the logic of the decorator correct ? anybody can tell me ?
> >
> > My code is attached below:
> >
> >
> >
> > import time
> >
> >
> > def memoize(f):
> >  cache = {}
> >
> >  def g(*args):
> >  if args[1] == "avg":
> >  sum_key_arr = sum(list(args[0])) / len(list(args[0]))
>
> 'list' will iterate over args[0] to make a list, and 'sum' will iterate
> over that list.
>
> It would be simpler to just let 'sum' iterate over args[0].
>
> >  elif args[1] == "sum":
> >  sum_key_arr = sum(list(args[0]))
> >  if sum_key_arr not in cache:
> >  for (
> >  key,
> >  value,
> >  ) in (
> >  cache.items()
> >  ):  # key in dict cannot be an array so I use the sum of the
> > array as the key
>
> You can't use a list as a key, but you can use a tuple as a key,
> provided that the elements of the tuple are also immutable.
>
> >  if (
> >  abs(sum_key_arr - key) <= 50
> >  ):  # threshold is great here so that all values are
> > approximated!
> >  # print('approximated')
> >  return cache[key]
> >  else:
> >  # print('not approximated')
> >  cache[sum_key_arr] = f(args[0], args[1])
> >  return cache[sum_key_arr]
> >
> >  return g
> >
> >
> > @memoize
> > def aggregate(dict_list_arr, operation):
> >  if operation == "avg":
> >  return sum(list(dict_list_arr)) / len(list(dict_list_arr))
> >  if operation == "sum":
> >  return sum(list(dict_list_arr))
> >  return None
> >
> >
> > t = time.time()
> > for i in range(200, 15000):
> >  res = aggregate(list(range(i)), "avg")
> >
> > elapsed = time.time() - t
> > print(res)
> > print(elapsed)
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Can you help me with this memoization simple example?

2024-03-30 Thread marc nicole via Python-list
I am creating a memoization example with a function that adds up / averages
the elements of an array and compares it with the cached ones to retrieve
them in case they are already stored.

In addition, I want to store only if the result of the function differs
considerably (passes a threshold e.g. 50 below).

I created an example using a decorator to do so, the results using the
decorator is slightly faster than without the memoization which is OK, but
is the logic of the decorator correct ? anybody can tell me ?

My code is attached below:



import time


def memoize(f):
cache = {}

def g(*args):
if args[1] == "avg":
sum_key_arr = sum(list(args[0])) / len(list(args[0]))
elif args[1] == "sum":
sum_key_arr = sum(list(args[0]))
if sum_key_arr not in cache:
for (
key,
value,
) in (
cache.items()
):  # key in dict cannot be an array so I use the sum of the
array as the key
if (
abs(sum_key_arr - key) <= 50
):  # threshold is great here so that all values are
approximated!
# print('approximated')
return cache[key]
else:
# print('not approximated')
cache[sum_key_arr] = f(args[0], args[1])
return cache[sum_key_arr]

return g


@memoize
def aggregate(dict_list_arr, operation):
if operation == "avg":
return sum(list(dict_list_arr)) / len(list(dict_list_arr))
if operation == "sum":
return sum(list(dict_list_arr))
return None


t = time.time()
for i in range(200, 15000):
res = aggregate(list(range(i)), "avg")

elapsed = time.time() - t
print(res)
print(elapsed)
-- 
https://mail.python.org/mailman/listinfo/python-list


How to create a binary tree hierarchy given a list of elements as its leaves

2024-01-28 Thread marc nicole via Python-list
So I am trying to build a binary tree hierarchy given numerical elements
serving for its leaves (last level of the tree to build). From the leaves I
want to randomly create a name for the higher level of the hierarchy and
assign it to the children elements. For example: if the elements inputted
are `0,1,2,3` then I would like to create firstly 4 elements (say by random
giving them a label composed of a letter and a number) then for the second
level (iteration) I assign each of 0,1 to a random name label (e.g. `b1`)
and `2,3` to another label (`b2`) then for the third level I assign a
parent label to each of `b1` and `b2` as `c1`.

An illustation of the example is the following tree:


[image: tree_exp.PNG]

For this I use numpy's `array_split()` to get the chunks of arrays based on
the iteration needs.
for example to get the first iteration arrays I use `np.array_split(input,
(input.size // k))` where `k` is an even number. In order to assign a
parent node to the children the array range should enclose the children's.
For example to assign the parent node with label `a1` to children `b1` and
`b2` with range respectively [0,1] and [2,3], the parent should have the
range [0,3].

All is fine until a certain iteration (k=4) returns parent with range [0,8]
which is overlapping to children ranges and therefore cannot be their
parent.

My question is how to evenly partition such arrays in a binary way and
create such binary tree so that to obtain for k=4 the first range to be
[0,7] instead of [0,8]?

My code is the following:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import string
import random
import numpy as np


def generate_numbers_list_until_number(stop_number):
if str(stop_number).isnumeric():
return np.arange(stop_number)
else:
raise TypeError('Input should be a number!')


def generate_node_label():
return random.choice(string.ascii_lowercase) \
+ str(random.randint(0, 10))


def main():
data = generate_numbers_list_until_number(100)
k = 1
hierarchies = []
cells_arrays = np.array_split(data, data.size // k)
print cells_arrays
used_node_hierarchy_name = []
node_hierarchy_name = [generate_node_label() for _ in range(0,
   len(cells_arrays))]
used_node_hierarchy_name.extend(node_hierarchy_name)
while len(node_hierarchy_name) > 1:
k = k * 2

# bug here in the following line

cells_arrays = list(map(lambda x: [x[0], x[-1]],
np.array_split(data, data.size // k)))
print cells_arrays
node_hierarchy_name = []

# node hierarchy names should not be redundant in another level

for _ in range(0, len(cells_arrays)):
node_name = generate_node_label()
while node_name in used_node_hierarchy_name:
node_name = generate_node_label()
node_hierarchy_name.append(node_name)
used_node_hierarchy_name.extend(node_hierarchy_name)
print used_node_hierarchy_name
hierarchies.append(list(zip(node_hierarchy_name, cells_arrays)))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
>  target_original_cells_coordinates[idx_cell][0],
> >  ] = dataframe_original.iloc[
> >  dataset_index_values[idx_cell][1],
> > dataset_index_values[idx_cell][0]
> >  ]
> >  all_datasets.append(dataframe_cpy)
> >  all_datasets_final.append(all_datasets)
> >  return all_datasets_final
> >
> >
> > def main():
> >  zipf_dataset = create_zipf_distribution()
> >
> >  target_cells = select_target_values(zipf_dataset, 5)
> >  print(target_cells)
> >  contour_cells = select_contours(target_cells)
> >  print(contour_cells)
> >  target_cells_with_contour = apply_contours(target_cells,
> contour_cells)
> >  datasets = create_possible_datasets(zipf_dataset,
> > target_cells_with_contour)
> >  print(datasets)
> >
> >
> > main()
> >
> > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list
> > mailto:python-list@python.org>> a écrit :
> >
> > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> >  > Hello,
> >  >
> >  > I have an initial dataframe with a random list of target cells
> > (each cell
> >  > being identified with a couple (x,y)).
> >  > I want to yield four different dataframes each containing the
> > value of one
> >  > of the contour (surrounding) cells of each specified target cell.
> >  >
> >  > the surrounding cells to consider for a specific target cell are
> > : (x-1,y),
> >  > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4
> > cells from
> >  > these and consider for replacement to the target cell.
> >  >
> >  > I want to do that through a pandas-specific approach without
> > having to
> >  > define the contour cells separately and then apply the changes on
> the
> >  > dataframe
> >
> > 1. Why do you want a Pandas-specific approach?  Many people would
> > rather
> > keep code independent of special libraries if possible;
> >
> > 2. How big can these collections of target cells be, roughly
> speaking?
> > The size could make a big difference in picking a design;
> >
> > 3. You really should work on formatting code for this list.  Your
> code
> > below is very complex and would take a lot of work to reformat to the
> > point where it is readable, especially with the nearly impenetrable
> > arguments in some places.  Probably all that is needed is to replace
> > all
> > tabs by (say) three spaces, and to make sure you intentionally break
> > lines well before they might get word-wrapped.  Here is one example I
> > have reformatted (I hope I got this right):
> >
> > list_tuples_idx_cells_all_datasets = list(filter(
> >  lambda x: utils_tuple_list_not_contain_nan(x),
> >  [list(tuples) for tuples in list(
> >itertools.product(*target_cells_with_contour))
> >  ]))
> >
> > 4. As an aside, it doesn't look like you need to convert all those
> > sequences and iterators to lists all over the place;
> >
> >
> >  > (but rather using an all in one approach):
> >  > for now I have written this example which I think is not Pandas
> > specific:
> > [snip]
> >
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> > <https://mail.python.org/mailman/listinfo/python-list>
> >
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
Thanks for the reply,

I think using a Pandas (or a Numpy) approach would optimize the execution
of the program.

Target cells could be up to 10% the size of the dataset, a good example to
start with would have from 10 to 100 values.

Let me know your thoughts, here's a reproducible example which I formatted:



from numpy import random
import pandas as pd
import numpy as np
import operator
import math
from collections import deque
from queue import *
from queue import Queue
from itertools import product


def select_target_values(dataframe, number_of_target_values):
target_cells = []
for _ in range(number_of_target_values):
row_x = random.randint(0, len(dataframe.columns) - 1)
col_y = random.randint(0, len(dataframe) - 1)
target_cells.append((row_x, col_y))
return target_cells


def select_contours(target_cells):
contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
contour_cells = []
for target_cell in target_cells:
# random contour count for each cell
contour_cells_count = random.randint(1, 4)
try:
contour_cells.append(
[
tuple(
map(
lambda i, j: i + j,
(target_cell[0], target_cell[1]),
contour_coordinates[iteration_],
)
)
for iteration_ in range(contour_cells_count)
]
)
except IndexError:
continue
return contour_cells


def create_zipf_distribution():
zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))

zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)

return zipf_distribution_dataset


def apply_contours(target_cells, contour_cells):
target_cells_with_contour = []
# create one single list of cells
for idx, target_cell in enumerate(target_cells):
target_cell_with_contour = [target_cell]
target_cell_with_contour.extend(contour_cells[idx])
target_cells_with_contour.append(target_cell_with_contour)
return target_cells_with_contour


def create_possible_datasets(dataframe, target_cells_with_contour):
all_datasets_final = []
dataframe_original = dataframe.copy()

list_tuples_idx_cells_all_datasets = list(
filter(
lambda x: x,
[list(tuples) for tuples in
list(product(*target_cells_with_contour))],
)
)
target_original_cells_coordinates = list(
map(
lambda x: x[0],
[
target_and_contour_cell
for target_and_contour_cell in target_cells_with_contour
],
)
)
for dataset_index_values in list_tuples_idx_cells_all_datasets:
all_datasets = []
for idx_cell in range(len(dataset_index_values)):
dataframe_cpy = dataframe.copy()
dataframe_cpy.iat[
target_original_cells_coordinates[idx_cell][1],
target_original_cells_coordinates[idx_cell][0],
] = dataframe_original.iloc[
dataset_index_values[idx_cell][1],
dataset_index_values[idx_cell][0]
]
all_datasets.append(dataframe_cpy)
all_datasets_final.append(all_datasets)
return all_datasets_final


def main():
zipf_dataset = create_zipf_distribution()

target_cells = select_target_values(zipf_dataset, 5)
print(target_cells)
contour_cells = select_contours(target_cells)
print(contour_cells)
target_cells_with_contour = apply_contours(target_cells, contour_cells)
datasets = create_possible_datasets(zipf_dataset,
target_cells_with_contour)
print(datasets)


main()

Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list <
python-list@python.org> a écrit :

> On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> > Hello,
> >
> > I have an initial dataframe with a random list of target cells (each cell
> > being identified with a couple (x,y)).
> > I want to yield four different dataframes each containing the value of
> one
> > of the contour (surrounding) cells of each specified target cell.
> >
> > the surrounding cells to consider for a specific target cell are :
> (x-1,y),
> > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
> > these and consider for replacement to the target cell.
> >
> > I want to do that through a pandas-specific approach without having to
> > define the contour cells separately and then apply the changes on the
> > dataframe
>
> 1. Why do you want a Pandas-specific approach?  Many people would rather
> keep code independent of special libraries if possible;
>
> 2. How big can these collections of target cells be, roughly speaking?
> The size could make a big difference in picking a desi

How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
Hello,

I have an initial dataframe with a random list of target cells (each cell
being identified with a couple (x,y)).
I want to yield four different dataframes each containing the value of one
of the contour (surrounding) cells of each specified target cell.

the surrounding cells to consider for a specific target cell are : (x-1,y),
(x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
these and consider for replacement to the target cell.

I want to do that through a pandas-specific approach without having to
define the contour cells separately and then apply the changes on the
dataframe (but rather using an all in one approach):
for now I have written this example which I think is not Pandas specific:























































*def select_target_values(dataframe, number_of_target_values):
target_cells = []for _ in range(number_of_target_values):row_x
= random.randint(0, len(dataframe.columns) - 1)col_y =
random.randint(0, len(dataframe) - 1)target_cells.append((row_x,
col_y))return target_cellsdef select_contours(target_cells):
contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]contour_cells =
[]for target_cell in target_cells:# random contour count for
each cellcontour_cells_count = random.randint(1, 4)try:
contour_cells.append([tuple(map(lambda i, j: i + j,
(target_cell[0], target_cell[1]), contour_coordinates[iteration_]))
 for iteration_ in range(contour_cells_count)])except
IndexError:continuereturn contour_cellsdef
apply_contours(target_cells, contour_cells):target_cells_with_contour =
[]# create one single list of cellsfor idx, target_cell in
enumerate(target_cells):target_cell_with_contour = [target_cell]
target_cell_with_contour.extend(contour_cells[idx])
target_cells_with_contour.append(target_cell_with_contour)return
target_cells_with_contourdef create_possible_datasets(dataframe,
target_cells_with_contour):all_datasets_final = []
dataframe_original = dataframe.copy()#check for nans
list_tuples_idx_cells_all_datasets = list(filter(lambda x:
utils_tuple_list_not_contain_nan(x),
[list(tuples) for tuples in list(itertools.product(

*target_cells_with_contour))]))target_original_cells_coordinates =
list(map(lambda x: x[0],
[target_and_contour_cell for target_and_contour_cell in
 target_cells_with_contour]))for
dataset_index_values in list_tuples_idx_cells_all_datasets:
all_datasets = []for idx_cell in range(len(dataset_index_values)):
  dataframe_cpy = dataframe.copy()dataframe_cpy.iat[
target_original_cells_coordinates[idx_cell][1],
target_original_cells_coordinates[idx_cell][0]] =
dataframe_original.iloc[dataset_index_values[idx_cell][1],
dataset_index_values[idx_cell][0]]
all_datasets.append(dataframe_cpy)
all_datasets_final.append(all_datasets)return all_datasets_final*


If you have a better Pandas approach (unifying all these methods into one
that make use of dataframe methods only) please let me know.

thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list


best tool to extract domain hierarchy from a dimension in an OLAP dataset (csv)

2024-01-13 Thread marc nicole via Python-list
Hi all,

I have a csv OLAP dataset that I want to extract the domain hierarchies
from each of its dimensions.

Anybody could recommend a Python tool that could manage this properly?

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


cubes library docs are not accurate, first example failing unexpectedly

2023-06-08 Thread marc nicole via Python-list
Hello to All,

I want to create a cube from csv data file and to perform and aggregation
on it, the code is below:

from sqlalchemy import create_enginefrom cubes.tutorial.sql import
create_table_from_csvfrom cubes import Workspace, Cell, browser
import dataif __name__ == '__main__':
engine = create_engine('sqlite:///data.sqlite')
create_table_from_csv(engine,
 "../data/data.csv",
 table_name="irbd_balance",
 fields=[
("category", "string"),
("category_label", "string"),
 ("subcategory", "string"),
  ("subcategory_label", "string"),
  ("line_item", "string"),
 ("year", "integer"),
 ("amount", "integer")],
create_id=True
  )
print("done. file data.sqlite created")
workspace = Workspace()
workspace.register_default_store("sql", url="sqlite:///data.sqlite")
workspace.import_model("../model.json")


cube = workspace.cube("irbd_balance")

browser = workspace.browser("irbd_balance")

cell = Cell(cube)
result = browser.aggregate(cell, drilldown=["year"])
for record in result.drilldown:
print(record)





The tutorial and the library are available here:

https://pythonhosted.org/cubes/tutorial.html
The error stack is :


result = browser.aggregate(cell, drilldown=["year"])
  File "C:\Users\path\venv\lib\site-packages\cubes\browser.py", line
145, in aggregate
result = self.provide_aggregate(cell,
  File "C:\path\venv\lib\site-packages\cubes\sql\browser.py", line
400, in provide_aggregate
(statement, labels) = self.aggregation_statement(cell,
  File "C:\path\venv\lib\site-packages\cubes\sql\browser.py", line
532, in aggregation_statement
raise ArgumentError("List of aggregates should not be empty")
cubes.errors.ArgumentError: List of aggregates should not be empty

It seems the tutorial contains some typos.

Any idea how to fix this? Else is there any other better olap cubes
library for Python that has great docs?
-- 
https://mail.python.org/mailman/listinfo/python-list