Re: Can you help me with this memoization simple example?
Thanks for the first comment which I incorporated but when you say "You can't use a list as a key, but you can use a tuple as a key, provided that the elements of the tuple are also immutable." does it mean the result of sum of the array is not convenient to use as key as I do? Which tuple I should use to refer to the underlying list value as you suggest? Anything else is good in my code ? Thanks Le dim. 31 mars 2024 à 01:44, MRAB via Python-list a écrit : > On 2024-03-31 00:09, marc nicole via Python-list wrote: > > I am creating a memoization example with a function that adds up / > averages > > the elements of an array and compares it with the cached ones to retrieve > > them in case they are already stored. > > > > In addition, I want to store only if the result of the function differs > > considerably (passes a threshold e.g. 50 below). > > > > I created an example using a decorator to do so, the results using the > > decorator is slightly faster than without the memoization which is OK, > but > > is the logic of the decorator correct ? anybody can tell me ? > > > > My code is attached below: > > > > > > > > import time > > > > > > def memoize(f): > > cache = {} > > > > def g(*args): > > if args[1] == "avg": > > sum_key_arr = sum(list(args[0])) / len(list(args[0])) > > 'list' will iterate over args[0] to make a list, and 'sum' will iterate > over that list. > > It would be simpler to just let 'sum' iterate over args[0]. > > > elif args[1] == "sum": > > sum_key_arr = sum(list(args[0])) > > if sum_key_arr not in cache: > > for ( > > key, > > value, > > ) in ( > > cache.items() > > ): # key in dict cannot be an array so I use the sum of the > > array as the key > > You can't use a list as a key, but you can use a tuple as a key, > provided that the elements of the tuple are also immutable. > > > if ( > > abs(sum_key_arr - key) <= 50 > > ): # threshold is great here so that all values are > > approximated! > > # print('approximated') > > return cache[key] > > else: > > # print('not approximated') > > cache[sum_key_arr] = f(args[0], args[1]) > > return cache[sum_key_arr] > > > > return g > > > > > > @memoize > > def aggregate(dict_list_arr, operation): > > if operation == "avg": > > return sum(list(dict_list_arr)) / len(list(dict_list_arr)) > > if operation == "sum": > > return sum(list(dict_list_arr)) > > return None > > > > > > t = time.time() > > for i in range(200, 15000): > > res = aggregate(list(range(i)), "avg") > > > > elapsed = time.time() - t > > print(res) > > print(elapsed) > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Can you help me with this memoization simple example?
I am creating a memoization example with a function that adds up / averages the elements of an array and compares it with the cached ones to retrieve them in case they are already stored. In addition, I want to store only if the result of the function differs considerably (passes a threshold e.g. 50 below). I created an example using a decorator to do so, the results using the decorator is slightly faster than without the memoization which is OK, but is the logic of the decorator correct ? anybody can tell me ? My code is attached below: import time def memoize(f): cache = {} def g(*args): if args[1] == "avg": sum_key_arr = sum(list(args[0])) / len(list(args[0])) elif args[1] == "sum": sum_key_arr = sum(list(args[0])) if sum_key_arr not in cache: for ( key, value, ) in ( cache.items() ): # key in dict cannot be an array so I use the sum of the array as the key if ( abs(sum_key_arr - key) <= 50 ): # threshold is great here so that all values are approximated! # print('approximated') return cache[key] else: # print('not approximated') cache[sum_key_arr] = f(args[0], args[1]) return cache[sum_key_arr] return g @memoize def aggregate(dict_list_arr, operation): if operation == "avg": return sum(list(dict_list_arr)) / len(list(dict_list_arr)) if operation == "sum": return sum(list(dict_list_arr)) return None t = time.time() for i in range(200, 15000): res = aggregate(list(range(i)), "avg") elapsed = time.time() - t print(res) print(elapsed) -- https://mail.python.org/mailman/listinfo/python-list
How to create a binary tree hierarchy given a list of elements as its leaves
So I am trying to build a binary tree hierarchy given numerical elements serving for its leaves (last level of the tree to build). From the leaves I want to randomly create a name for the higher level of the hierarchy and assign it to the children elements. For example: if the elements inputted are `0,1,2,3` then I would like to create firstly 4 elements (say by random giving them a label composed of a letter and a number) then for the second level (iteration) I assign each of 0,1 to a random name label (e.g. `b1`) and `2,3` to another label (`b2`) then for the third level I assign a parent label to each of `b1` and `b2` as `c1`. An illustation of the example is the following tree: [image: tree_exp.PNG] For this I use numpy's `array_split()` to get the chunks of arrays based on the iteration needs. for example to get the first iteration arrays I use `np.array_split(input, (input.size // k))` where `k` is an even number. In order to assign a parent node to the children the array range should enclose the children's. For example to assign the parent node with label `a1` to children `b1` and `b2` with range respectively [0,1] and [2,3], the parent should have the range [0,3]. All is fine until a certain iteration (k=4) returns parent with range [0,8] which is overlapping to children ranges and therefore cannot be their parent. My question is how to evenly partition such arrays in a binary way and create such binary tree so that to obtain for k=4 the first range to be [0,7] instead of [0,8]? My code is the following: #!/usr/bin/python # -*- coding: utf-8 -*- import string import random import numpy as np def generate_numbers_list_until_number(stop_number): if str(stop_number).isnumeric(): return np.arange(stop_number) else: raise TypeError('Input should be a number!') def generate_node_label(): return random.choice(string.ascii_lowercase) \ + str(random.randint(0, 10)) def main(): data = generate_numbers_list_until_number(100) k = 1 hierarchies = [] cells_arrays = np.array_split(data, data.size // k) print cells_arrays used_node_hierarchy_name = [] node_hierarchy_name = [generate_node_label() for _ in range(0, len(cells_arrays))] used_node_hierarchy_name.extend(node_hierarchy_name) while len(node_hierarchy_name) > 1: k = k * 2 # bug here in the following line cells_arrays = list(map(lambda x: [x[0], x[-1]], np.array_split(data, data.size // k))) print cells_arrays node_hierarchy_name = [] # node hierarchy names should not be redundant in another level for _ in range(0, len(cells_arrays)): node_name = generate_node_label() while node_name in used_node_hierarchy_name: node_name = generate_node_label() node_hierarchy_name.append(node_name) used_node_hierarchy_name.extend(node_hierarchy_name) print used_node_hierarchy_name hierarchies.append(list(zip(node_hierarchy_name, cells_arrays))) -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?
> target_original_cells_coordinates[idx_cell][0], > > ] = dataframe_original.iloc[ > > dataset_index_values[idx_cell][1], > > dataset_index_values[idx_cell][0] > > ] > > all_datasets.append(dataframe_cpy) > > all_datasets_final.append(all_datasets) > > return all_datasets_final > > > > > > def main(): > > zipf_dataset = create_zipf_distribution() > > > > target_cells = select_target_values(zipf_dataset, 5) > > print(target_cells) > > contour_cells = select_contours(target_cells) > > print(contour_cells) > > target_cells_with_contour = apply_contours(target_cells, > contour_cells) > > datasets = create_possible_datasets(zipf_dataset, > > target_cells_with_contour) > > print(datasets) > > > > > > main() > > > > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list > > mailto:python-list@python.org>> a écrit : > > > > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote: > > > Hello, > > > > > > I have an initial dataframe with a random list of target cells > > (each cell > > > being identified with a couple (x,y)). > > > I want to yield four different dataframes each containing the > > value of one > > > of the contour (surrounding) cells of each specified target cell. > > > > > > the surrounding cells to consider for a specific target cell are > > : (x-1,y), > > > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 > > cells from > > > these and consider for replacement to the target cell. > > > > > > I want to do that through a pandas-specific approach without > > having to > > > define the contour cells separately and then apply the changes on > the > > > dataframe > > > > 1. Why do you want a Pandas-specific approach? Many people would > > rather > > keep code independent of special libraries if possible; > > > > 2. How big can these collections of target cells be, roughly > speaking? > > The size could make a big difference in picking a design; > > > > 3. You really should work on formatting code for this list. Your > code > > below is very complex and would take a lot of work to reformat to the > > point where it is readable, especially with the nearly impenetrable > > arguments in some places. Probably all that is needed is to replace > > all > > tabs by (say) three spaces, and to make sure you intentionally break > > lines well before they might get word-wrapped. Here is one example I > > have reformatted (I hope I got this right): > > > > list_tuples_idx_cells_all_datasets = list(filter( > > lambda x: utils_tuple_list_not_contain_nan(x), > > [list(tuples) for tuples in list( > >itertools.product(*target_cells_with_contour)) > > ])) > > > > 4. As an aside, it doesn't look like you need to convert all those > > sequences and iterators to lists all over the place; > > > > > > > (but rather using an all in one approach): > > > for now I have written this example which I think is not Pandas > > specific: > > [snip] > > > > -- > > https://mail.python.org/mailman/listinfo/python-list > > <https://mail.python.org/mailman/listinfo/python-list> > > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?
Thanks for the reply, I think using a Pandas (or a Numpy) approach would optimize the execution of the program. Target cells could be up to 10% the size of the dataset, a good example to start with would have from 10 to 100 values. Let me know your thoughts, here's a reproducible example which I formatted: from numpy import random import pandas as pd import numpy as np import operator import math from collections import deque from queue import * from queue import Queue from itertools import product def select_target_values(dataframe, number_of_target_values): target_cells = [] for _ in range(number_of_target_values): row_x = random.randint(0, len(dataframe.columns) - 1) col_y = random.randint(0, len(dataframe) - 1) target_cells.append((row_x, col_y)) return target_cells def select_contours(target_cells): contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)] contour_cells = [] for target_cell in target_cells: # random contour count for each cell contour_cells_count = random.randint(1, 4) try: contour_cells.append( [ tuple( map( lambda i, j: i + j, (target_cell[0], target_cell[1]), contour_coordinates[iteration_], ) ) for iteration_ in range(contour_cells_count) ] ) except IndexError: continue return contour_cells def create_zipf_distribution(): zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5)) zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3) return zipf_distribution_dataset def apply_contours(target_cells, contour_cells): target_cells_with_contour = [] # create one single list of cells for idx, target_cell in enumerate(target_cells): target_cell_with_contour = [target_cell] target_cell_with_contour.extend(contour_cells[idx]) target_cells_with_contour.append(target_cell_with_contour) return target_cells_with_contour def create_possible_datasets(dataframe, target_cells_with_contour): all_datasets_final = [] dataframe_original = dataframe.copy() list_tuples_idx_cells_all_datasets = list( filter( lambda x: x, [list(tuples) for tuples in list(product(*target_cells_with_contour))], ) ) target_original_cells_coordinates = list( map( lambda x: x[0], [ target_and_contour_cell for target_and_contour_cell in target_cells_with_contour ], ) ) for dataset_index_values in list_tuples_idx_cells_all_datasets: all_datasets = [] for idx_cell in range(len(dataset_index_values)): dataframe_cpy = dataframe.copy() dataframe_cpy.iat[ target_original_cells_coordinates[idx_cell][1], target_original_cells_coordinates[idx_cell][0], ] = dataframe_original.iloc[ dataset_index_values[idx_cell][1], dataset_index_values[idx_cell][0] ] all_datasets.append(dataframe_cpy) all_datasets_final.append(all_datasets) return all_datasets_final def main(): zipf_dataset = create_zipf_distribution() target_cells = select_target_values(zipf_dataset, 5) print(target_cells) contour_cells = select_contours(target_cells) print(contour_cells) target_cells_with_contour = apply_contours(target_cells, contour_cells) datasets = create_possible_datasets(zipf_dataset, target_cells_with_contour) print(datasets) main() Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list < python-list@python.org> a écrit : > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote: > > Hello, > > > > I have an initial dataframe with a random list of target cells (each cell > > being identified with a couple (x,y)). > > I want to yield four different dataframes each containing the value of > one > > of the contour (surrounding) cells of each specified target cell. > > > > the surrounding cells to consider for a specific target cell are : > (x-1,y), > > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from > > these and consider for replacement to the target cell. > > > > I want to do that through a pandas-specific approach without having to > > define the contour cells separately and then apply the changes on the > > dataframe > > 1. Why do you want a Pandas-specific approach? Many people would rather > keep code independent of special libraries if possible; > > 2. How big can these collections of target cells be, roughly speaking? > The size could make a big difference in picking a desi
How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?
Hello, I have an initial dataframe with a random list of target cells (each cell being identified with a couple (x,y)). I want to yield four different dataframes each containing the value of one of the contour (surrounding) cells of each specified target cell. the surrounding cells to consider for a specific target cell are : (x-1,y), (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from these and consider for replacement to the target cell. I want to do that through a pandas-specific approach without having to define the contour cells separately and then apply the changes on the dataframe (but rather using an all in one approach): for now I have written this example which I think is not Pandas specific: *def select_target_values(dataframe, number_of_target_values): target_cells = []for _ in range(number_of_target_values):row_x = random.randint(0, len(dataframe.columns) - 1)col_y = random.randint(0, len(dataframe) - 1)target_cells.append((row_x, col_y))return target_cellsdef select_contours(target_cells): contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]contour_cells = []for target_cell in target_cells:# random contour count for each cellcontour_cells_count = random.randint(1, 4)try: contour_cells.append([tuple(map(lambda i, j: i + j, (target_cell[0], target_cell[1]), contour_coordinates[iteration_])) for iteration_ in range(contour_cells_count)])except IndexError:continuereturn contour_cellsdef apply_contours(target_cells, contour_cells):target_cells_with_contour = []# create one single list of cellsfor idx, target_cell in enumerate(target_cells):target_cell_with_contour = [target_cell] target_cell_with_contour.extend(contour_cells[idx]) target_cells_with_contour.append(target_cell_with_contour)return target_cells_with_contourdef create_possible_datasets(dataframe, target_cells_with_contour):all_datasets_final = [] dataframe_original = dataframe.copy()#check for nans list_tuples_idx_cells_all_datasets = list(filter(lambda x: utils_tuple_list_not_contain_nan(x), [list(tuples) for tuples in list(itertools.product( *target_cells_with_contour))]))target_original_cells_coordinates = list(map(lambda x: x[0], [target_and_contour_cell for target_and_contour_cell in target_cells_with_contour]))for dataset_index_values in list_tuples_idx_cells_all_datasets: all_datasets = []for idx_cell in range(len(dataset_index_values)): dataframe_cpy = dataframe.copy()dataframe_cpy.iat[ target_original_cells_coordinates[idx_cell][1], target_original_cells_coordinates[idx_cell][0]] = dataframe_original.iloc[dataset_index_values[idx_cell][1], dataset_index_values[idx_cell][0]] all_datasets.append(dataframe_cpy) all_datasets_final.append(all_datasets)return all_datasets_final* If you have a better Pandas approach (unifying all these methods into one that make use of dataframe methods only) please let me know. thanks! -- https://mail.python.org/mailman/listinfo/python-list
best tool to extract domain hierarchy from a dimension in an OLAP dataset (csv)
Hi all, I have a csv OLAP dataset that I want to extract the domain hierarchies from each of its dimensions. Anybody could recommend a Python tool that could manage this properly? Thanks -- https://mail.python.org/mailman/listinfo/python-list
cubes library docs are not accurate, first example failing unexpectedly
Hello to All, I want to create a cube from csv data file and to perform and aggregation on it, the code is below: from sqlalchemy import create_enginefrom cubes.tutorial.sql import create_table_from_csvfrom cubes import Workspace, Cell, browser import dataif __name__ == '__main__': engine = create_engine('sqlite:///data.sqlite') create_table_from_csv(engine, "../data/data.csv", table_name="irbd_balance", fields=[ ("category", "string"), ("category_label", "string"), ("subcategory", "string"), ("subcategory_label", "string"), ("line_item", "string"), ("year", "integer"), ("amount", "integer")], create_id=True ) print("done. file data.sqlite created") workspace = Workspace() workspace.register_default_store("sql", url="sqlite:///data.sqlite") workspace.import_model("../model.json") cube = workspace.cube("irbd_balance") browser = workspace.browser("irbd_balance") cell = Cell(cube) result = browser.aggregate(cell, drilldown=["year"]) for record in result.drilldown: print(record) The tutorial and the library are available here: https://pythonhosted.org/cubes/tutorial.html The error stack is : result = browser.aggregate(cell, drilldown=["year"]) File "C:\Users\path\venv\lib\site-packages\cubes\browser.py", line 145, in aggregate result = self.provide_aggregate(cell, File "C:\path\venv\lib\site-packages\cubes\sql\browser.py", line 400, in provide_aggregate (statement, labels) = self.aggregation_statement(cell, File "C:\path\venv\lib\site-packages\cubes\sql\browser.py", line 532, in aggregation_statement raise ArgumentError("List of aggregates should not be empty") cubes.errors.ArgumentError: List of aggregates should not be empty It seems the tutorial contains some typos. Any idea how to fix this? Else is there any other better olap cubes library for Python that has great docs? -- https://mail.python.org/mailman/listinfo/python-list