I'm working on an NLP and I got bitten by an unreasonably slow behaviour in Python while operating with small amounts of numbers.
I have the following code: ```python import random, time from functools import reduce def trainPerceptron(perceptron, data): learningRate = 0.002 weights = perceptron['weights'] error = 0 for chunk in data: input = chunk['input'] output = chunk['output'] # 12x slower than equivalent JS sum_ = 0 for key in input: v = weights[key] sum_ += v # 20x slower than equivalent JS #sum_ = reduce(lambda acc, key: acc + weights[key], input) actualOutput = sum_ if sum_ > 0 else 0 expectedOutput = 1 if output == perceptron['id'] else 0 currentError = expectedOutput - actualOutput if currentError: error += currentError ** 2 change = currentError * learningRate for key in input: weights[key] += change return error # Build mock data structure data = [{ 'input': random.sample(range(0, 5146), 10), 'output': 0 } for _ in range(11514)] perceptrons = [{ 'id': i, 'weights': [0.0] * 5147, } for i in range(60)] # simulate 60 perceptrons # Simulate NLU for i in range(151): # 150 iterations hrstart = time.perf_counter() for perceptron in perceptrons: trainPerceptron(perceptron, data) hrend = time.perf_counter() print(f'Epoch {i} - Time for training: {int((hrend - hrstart) * 1000)}ms') ``` Running it on my M1 MBP I get the following numbers. ``` Epoch 0 - Time for training: 199ms Epoch 1 - Time for training: 198ms Epoch 2 - Time for training: 199ms Epoch 3 - Time for training: 197ms Epoch 4 - Time for training: 199ms ... Epoch 146 - Time for training: 198ms Epoch 147 - Time for training: 200ms Epoch 148 - Time for training: 198ms Epoch 149 - Time for training: 198ms Epoch 150 - Time for training: 198ms ``` Each epoch is taking around 200ms, which is unreasonably slow given the small amount of numbers that are being processed. I profiled the code with `cProfile` in order to find out what is going on: ``` 655306 function calls (655274 primitive calls) in 59.972 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 3/1 0.000 0.000 59.972 59.972 {built-in method builtins.exec} 1 0.005 0.005 59.972 59.972 poc-small.py:1(<module>) 9060 59.850 0.007 59.850 0.007 poc-small.py:4(trainPerceptron) 1 0.006 0.006 0.112 0.112 poc-small.py:34(<listcomp>) 11514 0.039 0.000 0.106 0.000 random.py:382(sample) 115232 0.034 0.000 0.047 0.000 random.py:235(_randbelow_with_getrandbits) 11548 0.002 0.000 0.012 0.000 {built-in method builtins.isinstance} 11514 0.002 0.000 0.010 0.000 <frozen abc>:117(__instancecheck__) 183616 0.010 0.000 0.010 0.000 {method 'getrandbits' of '_random.Random' objects} 11514 0.002 0.000 0.008 0.000 {built-in method _abc._abc_instancecheck} 11514 0.002 0.000 0.006 0.000 <frozen abc>:121(__subclasscheck__) 115140 0.005 0.000 0.005 0.000 {method 'add' of 'set' objects} 11514 0.003 0.000 0.004 0.000 {built-in method _abc._abc_subclasscheck} 115232 0.004 0.000 0.004 0.000 {method 'bit_length' of 'int' objects} 151 0.003 0.000 0.003 0.000 {built-in method builtins.print} ``` This wasn't too helpful, so I tried with [line_profiler][1]: ``` Timer unit: 1e-06 s Total time: 55.2079 s File: poc-small.py Function: trainPerceptron at line 4 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4 @profile 5 def trainPerceptron(perceptron, data): 6 1214 301.0 0.2 0.0 learningRate = 0.002 7 1214 255.0 0.2 0.0 weights = perceptron['weights'] 8 1214 114.0 0.1 0.0 error = 0 9 13973840 1742427.0 0.1 3.2 for chunk in data: 10 13973840 1655043.0 0.1 3.0 input = chunk['input'] 11 13973840 1487543.0 0.1 2.7 output = chunk['output'] 12 13 # 12x slower than equivalent JS 14 13973840 1210755.0 0.1 2.2 sum_ = 0 15 139738397 13821056.0 0.1 25.0 for key in input: 16 139738397 13794656.0 0.1 25.0 v = weights[key] 17 139738396 14942692.0 0.1 27.1 sum_ += v 18 19 # 20x slower than equivalent JS 20 #sum_ = reduce(lambda acc, key: acc + weights[key], input) 21 22 13973839 1618273.0 0.1 2.9 actualOutput = sum_ if sum_ > 0 else 0 23 24 13973839 1689194.0 0.1 3.1 expectedOutput = 1 if output == perceptron['id'] else 0 25 13973839 1365346.0 0.1 2.5 currentError = expectedOutput - actualOutput 26 13732045 1211916.0 0.1 2.2 if currentError: 27 241794 38375.0 0.2 0.1 error += currentError ** 2 28 241794 25377.0 0.1 0.0 change = currentError * learningRate 29 2417940 271237.0 0.1 0.5 for key in input: 30 2417940 332890.0 0.1 0.6 weights[key] += change 31 32 1213 405.0 0.3 0.0 return error ``` This shows that these 3 lines (that are adding the numbers) are taking the entire runtime budget: ```python for key in input: v = weights[key] sum_ += v ``` I thought throwing numpy at the problem in order to speed it up, but `input` has a very small size, which means that the overhead of calling numpy will hurt the performance more than the gains obtained by making the math operations with it. Anyhow, adding numbers shouldn't be that slow, which makes me believe something weird is going on with Python. In order to confirm my theory, I ported the code to Javascript. This is the result: ```js function trainPerceptron(perceptron, data) { const learningRate = 0.002; const weights = perceptron['weights']; let error = 0; for (const chunk of data) { const input = chunk['input']; const output = chunk['output']; const sum = input.reduce((acc, key) => acc + weights[key], 0); const actualOutput = sum > 0 ? sum : 0; const expectedOutput = output === perceptron['id'] ? 1 : 0; const currentError = expectedOutput - actualOutput; if (currentError) { error += currentError ** 2; const change = currentError * learningRate; for (const key in input) { weights[key] += change; } } } return error; } // Build mock data structure const data = new Array(11514); for (let i = 0; i < data.length; i++) { const inputSet = new Set(); while (inputSet.size < 10) { inputSet.add(Math.floor(Math.random() * 5146)); } const input = Array.from(inputSet); data[i] = { input: input, output: 0 }; } const perceptrons = Array.from({ length: 60 }, (_, i) => ({ id: i, weights: Array.from({ length: 5147 }, () => 0.0), })); // simulate 60 perceptrons // Simulate NLU for (let i = 0; i < 151; i++) { // 150 iterations const hrstart = performance.now(); for (const perceptron of perceptrons) { trainPerceptron(perceptron, data); } const hrend = performance.now(); console.log(`Epoch ${i} - Time for training: ${Math.floor(hrend - hrstart)}ms`); } ``` When I run the JS code I get the following numbers: ``` Epoch 0 - Time for training: 30ms Epoch 1 - Time for training: 18ms Epoch 2 - Time for training: 17ms Epoch 3 - Time for training: 17ms Epoch 4 - Time for training: 17ms ... Epoch 147 - Time for training: 17ms Epoch 148 - Time for training: 17ms Epoch 149 - Time for training: 17ms Epoch 150 - Time for training: 17ms ``` These numbers confirm my theory. Python is being unreasonably slow. Any idea why or what exactly is making it perform so poorly? Runtime details: MacOS Ventura 13.2.1 (22D68) Macbook Pro M1 Pro 32GB Python 3.11.0 (native Apple Silicon) Regards! [1]: https://github.com/pyutils/line_profiler -- https://mail.python.org/mailman/listinfo/python-list