Re: C is it always faster than nump?

Avi Gross via Python-list Fri, 25 Feb 2022 20:40:31 -0800

Yes, Chris, C is real as a somewhat abstract concept. There are a whole slew of 
different variations each time it is released anew with changes and then some 
people at various times built actual compilers that implement a varying subset 
of what is possible, and not necessarily in quite the same way.

As you gathered, I am saying that comparing languages is not so effective as 
comparing implementations and even better specific programs on specific data. 
And yet, you can still get odd results if you cherry pick what to test. 
Consider a sorting algorithm that rapidly checks if the data is already sorted, 
and if so, does not bother sorting it. It will quite possibly be the fastest 
one in a comparison if the data is chosen to be already in order! But on many 
other sets of data it will have wasted some time checking if it is in order 
while other algorithms have started sorting it!

Bad example, maybe, but there are better ones. Consider an algorithm that does 
no checking for one of many errors that can happen. It does not see if the 
arguments it gets are within expected ranges of types or values. It does not 
intercept attempts to divide by zero and much more. Another algorithm is quite 
bulletproof and thus has lots more code and maybe runs much slower. Is it 
shocking if it tests slower . But the other code may end up failing faster in 
the field and need a rewrite.

A really fair comparison is often really hard. Languages are abstract and 
sometimes a new implementation makes a huge change.

Take interpreted languages including Python and R that specify all kinds of 
functions that may be written within the language at first. Someone may 
implement a function like sum() (just an example) that looks like the sum of a 
long list of items is the first item added to a slightly longer sum of the 
remaining items. It stops when the final recursive sum is about to be called 
with no remaining arguments. Clearly this implementation may be a tad slow. But 
does Python require this version of sum() or will it allow any version that can 
be called the same way and returns the same results every time? Does it even 
matter if the function is written in C or C++ or FORTRAN or even assembler of 
some kind, as long as it is placed in an accessible library and there is some 
interface that allows you to make the call in python notation and it is fed to 
the function in the way it requires, and similarly deals with returned values? 
A wrapper, sort of.

The use of such a shortcut is not against the spirit of the language. You can 
still specify you want the sum() function from some module, or write your own. 
This is true most places. I remember way back when how early UNIX shells did 
silly things like call /bin/echo to do trivial things, or call an external 
program to do something as trivial as i=i+1 and then they started building in 
such functionality and your shell scripts suddenly really speeded up. A 
non-programmer I once worked for wrote some truly humongous shell scripts that 
brought machines it was run on remotely in places like Japan during their 
day-time to their knees. Collecting billing data from all over by running a 
pipeline with 9 processes per line/row was a bit much. 

At first I sped it up quite a bit by using newer built-in features like I 
described, or doing more with fewer elements in pipelines. But I saw how much 
was caused by using the wrong tools for the job and there were programs 
designed to analyze data in various ways.

I replaced almost all of it with an AWK script that speeded things up many 
orders of magnitude. And, yes, AWK was not as fast as C but more trivial to 
program in for this need as it had so  many needed aspects built-in or 
happening automagically.

Would we do the entire project differently today? Definitely. All the billing 
records would not be sitting in an assortment of flat files all over the place 
but rather be fed into some database that made retrieval of all kinds of 
reports straightforward without needing to write much code at all.

How many modules or "packages" were once written largely using the language and 
then gradually "improved" by replacing parts, especially slower parts, with 
external content as we have been discussing? In a sense, some Python 
applications run on older versions of Python may be running faster as newer 
versions have improved some of the "same" code while to the user, they see it 
running on the same language, Python?

-----Original Message-----
From: Chris Angelico <ros...@gmail.com>
To: python-list@python.org <python-list@python.org>
Sent: Fri, Feb 25, 2022 2:58 pm
Subject: Re: C is it always faster than nump?

On Sat, 26 Feb 2022 at 06:44, Avi Gross via Python-list
<python-list@python.org> wrote:
>
> I agree with Richard.
>
> Some people may be confused and think c is the speed of light and 
> relativistically speaking, nothing can be faster. (OK, just joking. The uses 
> of the same letter of the alphabet are not at all related. One is named for 
> the language that came after the one named B, while the other may be short 
> for celeritas meaning speed.)
>
> There is no such thing as C. C does nothing. It is a combination of a 
> language specification and some pieces of software called compilers that 
> implement it well or less well.
>

Uhh, that's taking it a little bit TOO far.... I agree with your
point, but saying that there's no such thing as C is slightly unfair
:)

> There is such a thing as a PROGRAM. A program completely written in C is a 
> thing. It can run fast or slow based on a combination of how it was written 
> and on what data it operates on, which hardware and OS and so on. AND some of 
> it may likely be running code from libraries written in other languages like 
> FORTRAN that get linked into it in some way at compile time or runtime, and 
> hooks into the local OS and so on.
>
> So your program written supposedly in pure C, may run faster or slower. If 
> you program a "sort" algorithm in C, it may matter if it is an implementation 
> of a merge sort or at bubble sort or ...
>

More specifically: You're benchmarking a particular *implementation*
of a particular *algorithm*. Depending on what you're trying to
demonstrate, either could be significant.

Performance testing between two things written in C is a huge job.
Performance testing across languages has a strong tendency to be
meaningless (like benchmarking Python's integers against JavaScript's
numbers).

> As noted, numpy is largely written in C. It may well be optimized in some 
> places but there are constraints that may well make it hard to optimize 
> compared to some other implementation without those constraints. In 
> particular, it interfaces with standard Python data structures at times such 
> as when initializing from a Python List, or List of Lists, or needing to hold 
> on to various attributes so it can be converted back, or things I am not even 
> aware of.
>

(Fortran)

In theory, summing a Numpy array should be incredibly fast, but in
practice, there's a lot of variation, and it can be quite surprising.
For instance, integers are faster than floats, everyone knows that.
And it's definitely faster to sum smaller integers than larger ones.

rosuav@sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.float64)' 'numpy.sum(x)'
1000 loops, best of 5: 325 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.int64)' 'numpy.sum(x)'
500 loops, best of 5: 551 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.int32)' 'numpy.sum(x)'
500 loops, best of 5: 680 usec per loop

... Or not.

Summing arrays isn't necessarily the best test of numpy anyway, but as
you can see, testing is an incredibly difficult thing to get right.
The easiest thing to prove is that you have no idea how to prove
anything usefully, and most of us achieve that every time :)

ChrisA

> So, I suspect it may well be possible to make a pure C library similar to 
> numpy in many ways but that can only be used within a C program that only 
> uses native C data structures. It also is possible to write such a program 
> that is horribly slow. And it is possible to write a less complex version of 
> numpy that does not support some current numpy functionality and overall runs 
> much faster on what it does support.
>
> I do wonder at the reason numpy and pandas and lots of other modules have to 
> exist. Other languages like R made design choices that built in ideas of 
> vectorization from the start. Python has lots of object-oriented 
> extensibility that can allow you to create interpreted code that may easily 
> extend it in areas to have some similar features. You can create an 
> array-like data structure that holds only one object type and is extended so 
> adding two together (or multiplying) ends up doing it componentwise. But 
> attempts to do some such things often run into problems as they tend to be 
> slow. So numpy was not written in python, mostly, albeit it could have been 
> even more impressive if it took advantage of more pythonic abilities, at a 
> cost.
>
> But now that numpy is in C, pretty much, it is somewhat locked in when and if 
> other things in Python change.
>
> The reality is that many paradigms carried too far end up falling short.
>
>
> -----Original Message-----
> From: Richard Damon <rich...@damon-family.org>
> To: python-list@python.org
> Sent: Fri, Feb 25, 2022 1:48 pm
> Subject: Re: C is it always faster than nump?
>
>
> On 2/25/22 4:12 AM, BELAHCENE Abdelkader wrote:
> > Hi,
> > a lot of people think that C (or C++) is faster than python, yes I agree,
> > but I think that's not the case with numpy, I believe numpy is faster than
> > C, at least in some cases.
> >
> My understanding is that numpy is written in C, so for it to be faster
> than C, you are saying that C is faster that C.
>
> The key point is that numpy was written by skilled programmers who
> carefully optimized their code to be as fast as possible for the major
> cases. Thus it is quite possible for the numpy code to be faster in C
> than code written by a person without that level of care and effort.
>
> There are similar package available for many languages, including C/C++
> to let mere mortals get efficient numerical processing.
>
> --
> Richard Damon
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C is it always faster than nump?

Reply via email to