[Trisquel-users] Re : Lightweight Browser

2018-02-03 Thread lcerf

Thank you for the discussion.  :-)


[Trisquel-users] Re : Lightweight Browser

2018-02-03 Thread lcerf
Here is a Shell script, "fact.sh", that reads integers on the standard input  
and factorizes them:

#!/bin/sh
while read nb
do
factor $nb
done

Calling it with twice the same integer and measuring the overall run time:
$ printf  
'531691198313966348700354693990401\n531691198313966348700354693990401\n'  
| /usr/bin/time -f %U ./fact.sh
531691198313966348700354693990401: 2305843009213693951  
2305843009213693951
531691198313966348700354693990401: 2305843009213693951  
2305843009213693951

185.00
The result of the first call of 'factor' was not cached.  As a consequence,  
the exact same computation is done twice.


The kernel cannot know that, given a same number, 'factor' will return the  
same factorization.  It cannot know that, for some reason, 'factor' will  
likely be called on an integer that it recently factorized.  The programmer  
knows.  She implements a cache of the last 1000 calls to 'factor' (the worst  
cache ever, I know):

#!/bin/sh
TMP=`mktemp -t fact.sh.XX`
trap "rm $TMP* 2>/dev/null" 0
while read nb
do
if ! grep -m 1 ^$nb: $TMP
then
factor $nb
fi | tee $TMP.1
head -999 $TMP >> $TMP.1
mv $TMP.1 $TMP
done
And our execution on twice the same integer is about twice faster:
$ printf  
'531691198313966348700354693990401\n531691198313966348700354693990401\n'  
| /usr/bin/time -f %U ./fact.sh
531691198313966348700354693990401: 2305843009213693951  
2305843009213693951
531691198313966348700354693990401: 2305843009213693951  
2305843009213693951

92.71

It does not look like there is what you call "a higher (strategic) reason"  
here.  Just a costly function that is frequently called with the same  
arguments and whose result only depends on these arguments.  A quite common  
situation.


[Trisquel-users] Re : Lightweight Browser

2018-02-03 Thread lcerf
You see, we are back to the subtleties between grand design and tactical  
design choices. It really depends on for which purpose you allocate RAM.


I agree.  There is no reason to re-implement what the kernel does (probably  
better).


If it is for *direct* data caching, then it is both "selfish" (as I've  
explained in previous post) and inefficient too, as the kernel makes more  
efficient use of free RAM. But, on the other hand, if it is related to  
*indirect* caching, that employs a rather complicated algorithm beyond that  
of kernel's idea of data caching, then it may be worthwhile.


The algorithm does not have to be complicated.  It is just that the kernel  
cannot take initiatives that require application-level knowledge (such as the  
fact that a function will often be called with the same arguments).  The  
programmer has to do the work in that case.


[Trisquel-users] Re : Lightweight Browser

2018-02-03 Thread lcerf
Etymologies are not definitions.  The source you cite tells it on its front  
page:


Etymologies are not definitions; they're explanations of what our words meant
https://www.etymonline.com

In the case of "scarce", the page you show says "c. 1300".  So, that is what  
what "scarce" meant circa 1300.  Today, it has a different meaning.


[Trisquel-users] Re : Lightweight Browser

2018-02-03 Thread lcerf
And also because DRAM is accessed page-wise. Changing page is much more  
expensive than accessing data on the same (already selected) page.


Yes, there is that too.  And accessing recent pages is fast thanks to yet  
another cache, the translation lookaside buffer:  
https://en.wikipedia.org/wiki/Translation_lookaside_buffer


This, along with onpon4's similar views, is overlooking a basic fact: That  
the kernel is already using free memory for data caching. A user space  
program attemting to do its own data caching is a grave error (a bug, in  
essence) because it tries to overtake kernels job on itself, rather  
selfishly.


The kernel cannot know a costly function will be frequently called with the  
same arguments and will always return the same value given the same arguments  
(i.e., does not depend on anything but its arguments).  A cache at the  
application-level is not reimplementing the caches at system-level.


And how would you really check that from within a user space program and take  
necessary steps?


I am not suggesting that the program should do that.  I am only saying that  
there is no benefit in choosing "lightweight applications" and always having  
much free RAM.  That it is a waste of RAM.  If you always have much free RAM,  
you had better choose applications that require more memory to be faster.


The obvious thing to do is that, you must allocate no more RAM than you  
really need, and leave the rest (deciding what to do with free RAM) to the  
kernel.


An implementation strategy that minimizes the space requirements ("no more  
RAM than you really need") will usually be slower that alternatives that  
require more space.  As with the one-million-line examples I gave to heyjoe.   
Or with the examples on  
https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff


[Trisquel-users] Re : Lightweight Browser

2018-02-02 Thread lcerf

Scarce means restricted in quantity.

No, it does not.  It means "insufficient to satisfy the need or demand":  
http://www.dictionary.com/browse/scarce


When your system does not swap, the amount of RAM is sufficient to satisfy  
the need or demand.  There is no scarcity of RAM.


One more time (as with the word "freedom"), you are rewriting the dictionary  
to not admit you are wrong.  You are ready to play dumb too:


Reading/writing 1GB of RAM is slower than reading/writing 1KB of RAM.

Thank you Captain Obvious.

You are also grossly distorting what we write (so that your wrong statements  
about the sequentiality of RAM or its fragmentation eating up CPU cycles or  
... do not look too stupid in comparison?):


Which implies that one should fill up the whole available RAM just to  
print(7) and that won't affect performance + will add a benefit, which is  
nonsense.


And then you accuse us of derailing the discussion:

The space-time trade-off has absolutely nothing to do with where all this  
started. (...) Then the whole discussion went into some unsolicited mini  
lecturing


By the way, sorry for using arguments, giving examples, etc.

It is easy to verify who derailed the "whole discussion" because he does not  
want to admit he is wrong: just go up the hierarchy of posts.  It starts with  
you writing:


It is possible to optimize performance through about:config settings (turn  
off disk cache, tune mem cache size and others).

https://trisquel.info/forum/lightweight-browser#comment-127383

Me replying:

Caches are improving performances when you revisit a site.
https://trisquel.info/forum/lightweight-browser#comment-127396

And onpon4 adding the amount of RAM, which is not scarce nowadays, as the  
only limitation to my affirmation, which can be generalized to many other  
programming techniques that reduce time requirements by using more space:


Exactly. I don't think a lot of people understand that increased RAM and hard  
disk consumption is often done intentionally to improve performance. The only  
way reducing RAM consumption will ever help performance is if you're using so  
much RAM that it's going into swap, and very few people have so little RAM  
that that's going to happen.

https://trisquel.info/forum/lightweight-browser#comment-127400

Then you start saying we are wrong.  Onpon4 and I are still talking about why  
programs eating most of your RAM (but not more) are usually faster than the  
so-called "lightweight" alternatives and how, in particular, caching improves  
performance.  In contrast, and although you stayed on-topic at the beginning  
(e.g., claiming that "caching in RAM is not a performance benefit per se"),  
you now pretend that "the space-time trade-off has absolutely nothing to do  
with where all this started"  and that "Reading/writing 1GB of RAM is slower  
than reading/writing 1KB of RAM" is a relevant argument to close the "whole  
discussion".


Also, earlier, you were trying to question onpon4's skills, starting a  
sentence with "I don't know what your programming experience is but".  Kind  
of funny from somebody who believe the Web could be broadcast to every user.   
FYI, both onpon4 and I are programmers.


[Trisquel-users] Re : Lightweight Browser

2018-02-02 Thread lcerf

Resources are always scarce (limited) and should be used responsibly.

They are always limited.  They are not always scarce.  In the case of memory,  
as long as you do not reach the (limited) amount of RAM you have, there is is  
no penalty.  And by using more memory, a program can be made faster.


I pointed you to https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff  
for real-world examples.  But I can explain it on a basic example too.


Let us say you have a file with one million lines and a program repetitively  
needs to read specific lines, identified by their line numbers.  The program  
can access the disk every time it needs a line.  That strategy uses as little  
memory as possible.  It is constant, i.e., it does not depend on the size of  
the file.  But the program is slow.  Storing the whole file in RAM turns the  
program several orders of magnitude faster... unless there is not enough free  
space in RAM to store the file.  Let us take that second case and imagine  
that it often happens that a same line must be reread, whereas most of the  
lines are never read.  The program can implement a cache.  It will keep in  
RAM the last lines that were accessed so that rereading a line that was  
recently read is fast (no disk access).  The cache uses some memory to fasten  
the program.  As long as the size of the cache does not exceed the amount of  
available RAM, the larger the cache, the faster the program.


Going on with our one-million-line file to give another example of a  
trade-off between time and space: let us imagine 95% of the lines are  
actually the same, a default "line".  If there is enough free memory, the  
fastest implementation strategy remains to have an array of size one million  
so that the program can access any line in constant-time.  If there is not  
enough free memory, the program can store the sole pairs (line number, line)  
where "line" is not the default one, i.e., 5% of the lines.  After ordering  
those pairs by line number, a binary search allows to return any line in a  
time that grows logarithmically with the number of non-default lines (if the  
line number was not stored, the default line, stored once, is returned).   
That strategy is slower (logarithmic-time vs. constant-time) than the one  
using an array of size one million, if there is enough free space to store  
such an array.


In the end, the fastest implementation is the one that uses the more space  
while remaining below the amount of available RAM.  It is that strategy that  
you want.  Not the one that uses as little memory as possible.


You need free RAM for handling new processes and peak loads.

That is correct.  And it is an important point for server systems.  On  
desktop systems, processes do not pop up from nowhere and you usually do not  
want to do many things at the same time.


Python is an interpreted language and you don't know how the interpreter  
handles the data internally. A valid test would be near the hardware level,  
perhaps in assembler.


You certainly run far more Python than assembler.  So, for a real-life  
comparison, Python makes more sense than assembler.  The same holds for  
programs that takes into consideration the specificities of your hardware:  
you run far more generic code (unless we are talking about supercomputing).


I was talking about the algorithmic memory fragmentation which results in  
extra CPU cycles.


No, it does not, because there it is *random-access* memory.  As I was  
writing in my other post, RAM fragmentation is only a problem when you run  
short of RAM: there are free blocks but, because they are not contiguous, the  
kernel cannot allocate them to store a large "object" (for example a large  
array).  It therefore has to swap.


Running a browser like Firefox on 512MB resulted in swapping. If you assume  
that software should be bloated and incompatible with older hardware, you  
will never create a lightweight program.


There can be space-efficient (and probably time-inefficient, given the  
trade-offs as in my examples above) programs for systems with little RAM,  
where the faster (but more space-consuming) program would lead to swapping.   
For systems with enough RAM, the space-consuming programs are faster and that  
is what users want: a fast program.  It makes sense that programmers consider  
what most users have, several GB of RAM nowadays, when it comes to designing  
their programs to be as fast as possible.


Your tests show that 'dd' is faster with a cache.  It is what onpon4 and I  
keep on telling you: by storing more data you can get a faster program.


[Trisquel-users] Re : Lightweight Browser

2018-02-02 Thread lcerf
I don't know what your programming experience is but your expectations of  
efficiency are contrary to the basic programming principle: that a program  
should use only as much memory as it actually needs for completing the task  
and that memory usage should be optimized.


You can often get a faster program by using more memory.  See  
https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff for examples.  As  
long as the system does not swap, it is the way to go.


Occupying as much RAM as possible just because there is free RAM is  
meaningless.


Storing in memory data that will never be needed again is, of course, stupid.  
 We are not talking about that.


RAM access is sequential.

You know that RAM means "Random-access memory", don't you?  The access is not  
sequential.  Manipulating data that is sequentially stored in RAM is faster  
because of CPU cache and sequential prefetching:  
https://en.wikipedia.org/wiki/Cache_prefetching


The idea of CPU cache is, well, that of caching: keeping a copy of recent  
data/software closer to the CPU because it may have to be accessed again  
soon.  The same idea, at another level, that a program can implement to be  
faster: keeping data in main memory instead of recomputing it.


Are you also arguing that having free space in the CPU cache has benefits?

On a system with more memory (e.g. 16GB) you can keep more data cached in RAM  
but that doesn't mean that programs should simply occupy lots or all because  
there is plenty of it and/or because RAM is faster than HDD.


It kind of means that.  You want fast programs, don't you?  If that can be  
achieved by taking more memory, the program will indeed be faster, unless the  
memory requirements become so huge that the system swaps.  So, I ask you  
again: "How often does your system run out of RAM?".  If the answer is  
"rarely", then choosing programs with higher memory requirements may be a  
good idea: they can be faster than lightweight alternatives.


It is more time consuming to manage scattered memory blocks and thousand of  
pointers than reading a whole block at once.


It is not because of fragmentation (which only becomes a problem when the  
system swaps: free RAM cannot be allocated because too little is available in  
continuous blocks).  It is because of sequential prefetching in the CPU  
cache.  Not an argument against caching.  Quite the opposite.


[Trisquel-users] Re : Lightweight Browser

2018-02-01 Thread lcerf
As in, a browser with large memory footprint and heavy CPU usage will usually  
also have large package download size and more complex user interface. There  
is a strong correlation between them.


That may be, in practice, the "rule" for Web browsers (I am not sure).  That  
is not true in general.


While you might be able to dig up an exception, it would still be an  
exception to the rule, which I was talking about in the first place.


The program I work on (pattern mining, nothing to do with Web browsers) is a  
650 kB binary which can easily use GB of RAM (it depends on the data at  
input) and, with such a memory consumption, it can take 100% CPU during  
seconds or during hours (it depends on the parameters it is given).  One of  
the parameters actually controls a trade-off between space and time (a  
threshold to decide whether the data should be stored in a dense way or in a  
sparse way).


For instance, one may reduce memory consumption by 30% at the cost of 30%  
higher CPU usage (just for the sake of example, no pedantry please) but a bad  
design can boost both CPU and RAM usage 200%.


I am not sure what you call design.  Design includes choosing a solution with  
a good trade-off between CPU usage and memory usage.


The gain/loss is usually not fixed.  It depends on the size of the data at  
input.  The choice is often between two algorithms with different time and  
space complexities (in big O notation), i.e., the percentage is  
asymptotically infinite.  You may say theory does not matter... but it does.   
There are popular computing techniques (e.g., dynamic programming) that  
precisely aim to get a smaller time complexity against a higher space  
complexity.


https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff gives many other  
examples.  Including one that deals with Web browsers, about rendering SVG  
every time the page changes or creating a raster version of the SVG.  It is  
far from 30% here: SVG is orders of magnitude smaller in size but it takes  
orders of magnitude more time to render it.


Likewise a full featured software can use 400% more resources than a  
barebones one.


A feature that you do not use should not take significantly more resources.


[Trisquel-users] Re : Lightweight Browser

2018-02-01 Thread lcerf
In essence #2 and #4 are redundant - they are by-products of #1 and #3 to  
large extent.


No, there are not.  #1 is about RAM consumption, #2 about disk consumption,  
#3 about (CPU) time consumption and #4 about human-computer interface.


So, in practice, there is really only one definition of "lightweight" which  
entails *both* CPU usage and memory footprint. They usually both go up and  
down (also directly affecting #2 and #4) depending on design perfection and  
functionality span.


It is not true.  There is often a choice to be made between storing data or  
repetitively computing them, i.e., a trade-off between (CPU) time and  
(memory) space.


[Trisquel-users] Re : Lightweight Browser

2018-02-01 Thread lcerf
If you constantly allocate and deallocate huge amounts of memory this is an  
overhead. So caching in RAM is not a performance benefit per se.


Yes, it is.  It is about *not* deallocating recent data that may have to be  
computed/accessed again, unless there is a shortage of free memory.


Starting a new program requires free memory. If all (or most) memory is  
already full, this will cause swapping. You need to have enough free memory.


Onpon4 did not say otherwise.  She also rightfully said that "there is zero  
benefit to having RAM free that you're not using".  How often does your  
system run out of RAM?


[Trisquel-users] Re : Lightweight Browser

2018-02-01 Thread lcerf

Caches are improving performances when you revisit a site.