Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Andrew Stubbs

On 11/11/12 11:31, Eitan Adler wrote:

Mike,

http://www.technollama.co.uk/wordpress/wp-content/uploads/2011/05/obvious-troll.jpg


Insulting respected members of the Linux community will get you nowhere. 
I realise that some might call you the same, and BSD also, so you should 
know better.



Andrew,

I don't think we ever solved the problem you saw? What error did you
get with what shell?


It does seem that just using bash is not sufficient to solve this 
problem. Joel has declared that it must work on Solaris /bin/sh, so work 
there it must. Not that I have any means to test that.


Anyway, here's the transcript showing the problem:

/tmp/ccache$ ./test.sh
starting testsuite base
starting testsuite link
gcc: error trying to exec 'cc1': execvp: No such file or directory
gcc: error trying to exec 'cc1': execvp: No such file or directory
SUITE: link, TEST: CCACHE_CPP2 - Files differ: reference_test1.o != 
test1.o

cache directory /tmp/ccache/testdir.18802/.ccache
primary config  /tmp/ccache/testdir.18802/ccache.conf
secondary config  (readonly)
cache hit (direct) 0
cache hit (preprocessed)   4
cache miss 3
called for link2
called for preprocessing   1
multiple source files  1
compiler produced stdout   1
couldn't find the compiler 1
bad compiler arguments 1
unsupported source language1
unsupported compiler option1
output to a non-regular file   1
no input file  1
files in cache 3
cache size  12.3 kB
max cache size   5.0 GB
TEST FAILED
Test data and log file have been left in testdir.18802

I've deliberately moved the sources to /tmp to side-step any possible 
problems caused by calling test.sh from another directory, but it made 
no visible difference.


And now again with bash:

/tmp/ccache$ bash ./test.sh
starting testsuite base
starting testsuite link
starting testsuite hardlink
starting testsuite cpp2
starting testsuite nlevels4
starting testsuite nlevels1
starting testsuite basedir
starting testsuite direct
starting testsuite compression
starting testsuite readonly
starting testsuite extrafiles
starting testsuite cleanup
starting testsuite pch
starting testsuite upgrade
starting testsuite prefix
test done - OK

checkbashisms does indeed return nothing.

Running sh -x test.sh shows that the gcc command producing the error:

+ CCACHE_DISABLE=1 gcc -c test1.c -o reference_test1.o -O -O
gcc: error trying to exec 'cc1': execvp: No such file or directory

I don't understand what's wrong with that command. gcc isn't supposed to 
rely on the PATH to find cc1, but presumably it's something environmental.


I'm pretty sure I did not see this problem in the default shell provided 
by Ubuntu 12.04, so it's either a dash bug, or some kind of subtle 
incompatibility.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] BSDiff for cache objects

2012-11-12 Thread Bogdan Harjoc
I just did a quick search, and couldn't find discussions on the idea of
caching compiled objects as binary diffs from other existing objects.

Basically, before writing a new object file, ccache could find a similar
object in the cache (based on object-code or source-code hashes for
example) and store the delta (using bsdiff, xdelta, ...) instead of the
complete file. For most minor source code changes, the savings should be
worth the extra effort.

Alternatively, a compact operation could be run periodically, that
compresses the cache using the same approach.

My question is whether ccache's real-world use would benefit from a feature
like this. I can put together a test that looks through people's .ccache
and reports how many good bsdiff candidates there are, and what the
savings would be.

Any opinions ?
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] BSDiff for cache objects

2012-11-12 Thread Jürgen Buchmüller
Am Montag, den 12.11.2012, 13:49 +0200 schrieb Bogdan Harjoc:
 Basically, before writing a new object file, ccache could find a similar
 object in the cache (based on object-code or source-code hashes for
 example)

The main goal of most hashes is to give very distinct results even for
even small changes in the input data, which is why there is not really
an algorithm to compare two files' similarity based on hashes.

Similarity of two files would have to be calculated based on something
that currently isn't available - AFAICT. The savings in size are
probably less important than the expectable performance loss for
building deltas of source and/or object files.

Juergen


___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] BSDiff for cache objects

2012-11-12 Thread Bogdan Harjoc
On Mon, Nov 12, 2012 at 2:30 PM, Jürgen Buchmüller pullm...@t-online.dewrote:

 Am Montag, den 12.11.2012, 13:49 +0200 schrieb Bogdan Harjoc:
  Basically, before writing a new object file, ccache could find a similar
  object in the cache (based on object-code or source-code hashes for
  example)

 The main goal of most hashes is to give very distinct results even for
 even small changes in the input data, which is why there is not really
 an algorithm to compare two files' similarity based on hashes.


I should have been more specific. I meant block-hashes, like rsync and
bsdiff do:
http://www.samba.org/~tridge/phd_thesis.pdf

The savings in size are
 probably less important than the expectable performance loss for
 building deltas of source and/or object files.


My concern as well. But an offline ccache-compact that runs every 24h or
so, possibly only creating the 100 hashes once for every new file, should
be pretty fast. And applying a bspatch requires a bunzip2 and going through
a list of INSERT/ADD instructions. It can probably be approximated to just
bunzip2. There is also xdelta which is faster.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] BSDiff for cache objects

2012-11-12 Thread Bogdan Harjoc
On Mon, Nov 12, 2012 at 3:39 PM, Andrew Stubbs a...@codesourcery.com wrote:

 On 12/11/12 11:49, Bogdan Harjoc wrote:

 Alternatively, a compact operation could be run periodically, that
 compresses the cache using the same approach.


 Is cache size/capacity a very big issue for you?


No but there is room for improvement. This could be optional, like a
CCACHE_COMPRESS that saves 99% instead of 40% when I routinely recompile 20
kernel branches, for example (v2.6.x, 3.0.x, 3.4.x, -git, -next, -ubuntu,
etc).

Or how about a larger cache limit? If you don't have much disk space then
 combining that with CCACHE_HARDLINK might provide a useful saving?
 (Although compression must be disabled and your cache must be on the same
 file-system as your build.)


My opinion is that ccache is a space-speed tradeoff that moves the savings
 balance toward speed-savings and away from space-savings. For the most
 part, users are fine with that, indeed want that, and modifications that
 move that balance back toward space-saving aren't interesting.


Adjusting cache limits to available space works, of course. This is the
kind of reply I was asking for -- what size/speed constraints do ccache
users typically face.

This idea is nice, but seems like it will reduce performance on both
 cache-hit and cache-miss, like regular compression, but even more so,
 especially on cache-miss.


The offline approach won't affect cache-miss at all. In cache-hit cases, it
will add the cost of applying the patch.

That said, if we can have out cake and eat it then bring it on. (It's a
 shame the hard-link feature isn't completely safe, or it would do just
 that.)


I'll run some tests on my .ccache dir and post results once I have them.

Cheers,
Bogdan
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] BSDiff for cache objects

2012-11-12 Thread Andrew Stubbs

On 12/11/12 14:08, Bogdan Harjoc wrote:

No but there is room for improvement. This could be optional, like a
CCACHE_COMPRESS that saves 99% instead of 40% when I routinely recompile 20
kernel branches, for example (v2.6.x, 3.0.x, 3.4.x, -git, -next, -ubuntu,
etc).


I realise that the more diverged the branches are, the fewer exact cache 
hits you will get, but there still should be a lot of overlap here. 
However, if you're building the branches in different directories and 
using absolute paths then you might be missing out on potential cache 
sharing (without any binary differences).


Are you familiar with CCACHE_BASEDIR?

Andrew

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Eitan Adler
On 12 November 2012 06:03, Andrew Stubbs a...@codesourcery.com wrote:
 Running sh -x test.sh shows that the gcc command producing the error:

 + CCACHE_DISABLE=1 gcc -c test1.c -o reference_test1.o -O -O
 gcc: error trying to exec 'cc1': execvp: No such file or directory

 I don't understand what's wrong with that command. gcc isn't supposed to
 rely on the PATH to find cc1, but presumably it's something environmental.

Can you get a ktrace (or strace) of what gcc is doing with and without CCACHE ?




-- 
Eitan Adler
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Eitan Adler
On 12 November 2012 13:00, Eitan Adler li...@eitanadler.com wrote:
 On 12 November 2012 06:03, Andrew Stubbs a...@codesourcery.com wrote:
 Running sh -x test.sh shows that the gcc command producing the error:

 + CCACHE_DISABLE=1 gcc -c test1.c -o reference_test1.o -O -O
 gcc: error trying to exec 'cc1': execvp: No such file or directory

 I don't understand what's wrong with that command. gcc isn't supposed to
 rely on the PATH to find cc1, but presumably it's something environmental.

 Can you get a ktrace (or strace) of what gcc is doing with and without CCACHE 
 ?

Also, does gcc exhibit problems on other test programs outside of test.sh ?


-- 
Eitan Adler
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Mike Frysinger
On Sunday 11 November 2012 06:31:14 Eitan Adler wrote:
 On 11 November 2012 00:46, Mike Frysinger vap...@gentoo.org wrote:
  On Saturday 10 November 2012 00:41:52 Eitan Adler wrote:
  On 10 November 2012 00:41, Mike Frysinger wrote:
   if the script is written in bash and is intended to be, then
   /bin/bash is the correct answer.
  
  Absolutely false. /usr/local/bin or /opt/bin might be the correct
  location.
  
  if you have a crap system where bash isn't installed with /bin/bash, then
  you already have a ton of problems with existing software.  forcing
  stupid behavior on everyone to cater to broken systems is wrong.
 
 http://www.technollama.co.uk/wordpress/wp-content/uploads/2011/05/obvious-t
 roll.jpg

yes, when people tell you forcing asinine behavior is wrong, you label them 
trolls.  i guess that's how you win arguments.
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Mike Frysinger
On Saturday 10 November 2012 05:08:40 Joel Rosdahl wrote:
 On 10 November 2012 06:45, Mike Frysinger vap...@gentoo.org wrote:
  i see old style portable code in there that could easily be modernized to
  recent POSIX
 
 Please don't strive to do that. Solaris's /bin/sh isn't POSIX.

autoconf searches well known paths to locate an up-to-date shell.  my limited 
understanding is that Solaris stores modern tools somewhere in /usr/.  would 
you be amendable to having the script re-exec itself via those so we can 
update things ?
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Mike Frysinger
On Monday 12 November 2012 06:03:37 Andrew Stubbs wrote:
 Running sh -x test.sh shows that the gcc command producing the error:
 
 + CCACHE_DISABLE=1 gcc -c test1.c -o reference_test1.o -O -O
 gcc: error trying to exec 'cc1': execvp: No such file or directory
 
 I don't understand what's wrong with that command. gcc isn't supposed to
 rely on the PATH to find cc1, but presumably it's something environmental.

it relies on argv[0] to locate its internal tools.  if you change that command 
to `env CCACHE_DISABLE=1 ...`, does it work better ?
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] BSDiff for cache objects

2012-11-12 Thread Bogdan Harjoc
Initial results from a small .ccache (3.0) dir:

- 6476 objects
- 300MB
- probably about 500-1000 compiles/recompiles of around 100 small to large
projects

The test was:
1. Find the candidates for compression, based on: objdump -t | grep  g 
(defined symbols). If two objects had at least 4 symbols defined, and 85%
of them were identical, the files were selected for compression.
2. Run bsdiff on the selected pairs of files, and collect the total raw
size, and the resulting compressed size.

The results are:
4459 out of 6476 files compressed (6099674 - 629795 bytes)

So roughly 90% compression rate, for a random .ccache folder.

I attached sources for the test (first run ./get-symbols.sh, then
./find-similar).

Will post results for a more favorable scenario (multiple builds of
different versions of the same project) if time permits.
#include stdio.h
#include sys/types.h
#include sys/stat.h
#include dirent.h
#include string.h
#include stdlib.h
#include limits.h
#include ctype.h

const char *BSDIFF = ~/build/bsdiff-4.3/bsdiff;

struct Func {
	Func *next;

	char *name;
};

struct Obj {
	Obj *next;

	char *name;
	char *path;
	int nfuncs;
	Func *funcs;
	
	Obj *match;
	int common;
};

const int HASH_SIZE = 19031;
Obj *objs[HASH_SIZE] = {};
#define NAME_HASH(name) ((*(int *)(name)) % HASH_SIZE)

Obj *read_obj(const char *base, int a, int b, const char *namesuff)
{
	Obj *o = new Obj;
	o-next = 0;
	o-name = new char[64];
	o-path = new char[128];
	o-nfuncs = 0;
	o-funcs = 0;
	o-common = 0;
	o-match = 0;
	
	int nc = snprintf(o-name, 64, %x%x%s, a, b, namesuff);
	if (nc = 0 || nc = 64) abort();

	snprintf(o-path, 128, %s/%s, base, namesuff);
	FILE *f = fopen(o-path, rb);
	for (;;) {
		static char line[PATH_MAX];
		if (!fgets(line, PATH_MAX, f))
			break;

		for (int l=strlen(line); l  isspace(line[l-1]); line[--l]=0);


		Func **h = o-funcs;
		
		Func *fn = new Func;
		fn-name = strdup(line);
		
		fn-next = *h;
		*h = fn;

		o-nfuncs++;
	}

	fclose(f);

	return o;
}

void compare(Obj *o1, Obj *o2)
{
	/*
	if (strcmp(o1-name, 978911edd002afb6ecb9611659327e3e-3475537) ||
		strcmp(o2-name, 4794bbcacf3bc42cdf1a44cf89523949-3429991))
		return;
	*/

	if (o1-nfuncs  4 || o2-nfuncs  4) return;
	int r = 100*o1-nfuncs/o2-nfuncs;
	if (r  80 || r  125) return;
	if (!strcmp(o1-name, o2-name)) return;

	Func *f1 = o2-funcs;
	Func *f2 = o1-funcs;
	
	int m=0;
	while (f1) {
		int r = 0;
		while (f2  (r = strcmp(f1-name, f2-name))  0) // reversed since we pushed fns at the head
			f2 = f2-next;

		if (f2  r==0) // match
			m++;

		f1 = f1-next;
	}
	
	if (m  o1-common) {
		o1-common = m;
		o1-match = o2;
	}
}

int main()
{
	int nobjs=0;

	for (int a=0; a0x10; a++)
		for (int b=0; b0x10; b++) {
			char base[] = strings/.ccache/1/2;
			sprintf(base, strings/.ccache/%x/%x, a, b);
			DIR *dh = opendir(base);
			if (!dh) continue;

			for (;;) {
struct dirent *de = readdir(dh);
if (!de)
	break;
if (strchr(de-d_name, '.'))
	continue;

Obj *o = read_obj(base, a, b, de-d_name);

Obj **h = objs[NAME_HASH(o-name)];
o-next = *h;
*h = o;

nobjs++;
			}

			closedir(dh);
		}
	
	int total_raw=0;
	int total_bsd=0;
	int nbsd=0;

	for (int h1=0; h1HASH_SIZE; h1++) {
		printf(\r%d%%, h1*100/HASH_SIZE);

		for (Obj *o1=objs[h1]; o1; o1=o1-next) {
			// Look for a match in all the objects that are after it in our hash.
			// If we get a match, it means we can store o1 as a delta from o2.

			for (int h2=h1; h2HASH_SIZE; h2++) {
Obj *o2 = (h2==h1) ? o1 : objs[h2];

for (; o2; o2=o2-next)
	compare(o1, o2);
			}

			if (o1-nfuncs  100*o1-common/o1-nfuncs = 85) {
char cmd[PATH_MAX];
snprintf(cmd, PATH_MAX, %s %s %s tmp.bsd, BSDIFF, o1-path, o1-match-path);
if (system(cmd))
	{ printf(%s\n, cmd); abort(); }

struct stat st;
if (stat(o1-path, st))
	{ printf(%s\n, o1-path); abort(); }
total_raw += st.st_size;

if (stat(tmp.bsd, st))
	{ printf(%s (tmp.bsd)\n, o1-path); abort(); }
total_bsd += st.st_size;

nbsd++;
			}
		}
	}

	remove(tmp.bsd);

	printf(\n%d files compressed (%d - %d bytes)\n, nbsd, total_raw, total_bsd);
	
	return 0;
}


get-symbols.sh
Description: Bourne shell script
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Eitan Adler
On 12 November 2012 13:11, Eitan Adler li...@eitanadler.com wrote:
 On 12 November 2012 13:04, Mike Frysinger vap...@gentoo.org wrote:
 yes, when people tell you forcing asinine behavior is wrong, you label them
 trolls.  i guess that's how you win arguments.

 Claiming that systems without /bin/bash are crap shows a level of
 naivete that only someone new to the open source world has. It was a
 choice between actual
 incompetence (unlikely) or pretend incompetence (a troll).

 On to the substance instead of my mistaken ad hominem :

 1) Even on systems with a binary called /bin/bash using #!/bash/bash is 
 wrong
 2) Many systems don't ship with bash at all for licensing, technical,
 or preference reasons
 3) Most operating systems that ship bash don't ship it in /bin

 The only correct behaviors are using

 #!/usr/bin/env bash to find the bash binary
 or
 #!/bin/sh  which is mandated to exist by POSIX

actually one more valid behavior:

#!/bin/sh
[ -z $BASH ]  bash $0

though this is far less common and I'm not sure if it has any
positives or negatives versus the 'env' approach.

-- 
Eitan Adler
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Use bash for test.sh.

2012-11-12 Thread Paul Smith
On Mon, 2012-11-12 at 13:11 -0500, Eitan Adler wrote:
 #!/bin/sh  which is mandated to exist by POSIX

Actually, unless there's been a change, POSIX doesn't mandate that the
POSIX shell appear as /bin/sh.

Unfortunately, this means that systems are free to provide definitively
non-POSIX /bin/sh and still be allowed to paint themselves with the
veneer of compliance (yes I'm looking at you Solaris!!), since there is
a POSIX shell somewhere (else) on the system.

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache