Re: [Chicken-users] c-string return question

2011-10-14 Thread Jörg F . Wittenberger

Sorry John&Felix,

I must have been overworked.

Some sleep already made me aware that the memory in question is indeed
clobbered.  The c-pointer section in the FFI manual is just not clear
about that.  (And somehow I must have convinced myself that C_mpointer
would already copy out the memory, which is obviously not the case.)

Given the facts the c-pointer type is much less interesting now.
I'll avoid it from now.

/Jörg





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-14 Thread Felix
> My point is, that this code is valid wrt. the manual and apparently
> valid given my understanding of what the C code tries to do:
> make it possible to return on stack strings.

Can you point out the relevant manual section?

> For me it ends up like this:
> 
> --- #define return(x) C_cblock C_r = 
> (C_mpointer(&C_a,(void*)(x))); goto
> --- #C_ret; C_cblockend static C_word C_fcall stub26(C_word 
> C_buf,C_word
> --- #C_a0) C_regparm; C_regparm static C_word C_fcall 
> stub26(C_word
> --- #C_buf,C_word C_a0){ C_word 
> C_r=C_SCHEME_UNDEFINED,*C_a=(C_word*)C_buf;
> --- #unsigned int ch=(unsigned int )C_num_to_unsigned_int(C_a0); 
> static
> --- #unsigned char off[6]={0xFC,0xF8,0xF0,0xE0,0xC0,0x00};
>  int size=5; C_char buf[7];
>  buf[6]='\0';
>  if (ch < 0x80) {
>buf[5]=ch;
>  } else {
>buf[size--]=(ch&0x3F)|0x80; ch=ch>>6;
>while (ch) { buf[size--]=(ch&0x3F)|0x80; ch=ch>>6; }
>/* Write the size information into the first byte */
>++size;
>buf[size]=off[size]|buf[size];
>  }
>  return(buf+size);
> 
> C_ret:
> #undef return
> 
> return C_r;}
> ---
> 
> to be called like this:
> 
> -
> t3=C_a_i_bytevector(&a,1,C_fix(3));
> t4=C_i_foreign_unsigned_integer_argumentp(t2);
> t5=stub26(t3,t4);
> C_trace("##sys#peek-c-string");
> t6=*((C_word*)lf[5]+1);
> ((C_proc4)(void*)(*((C_word*)t6+1)))(4,t6,t1,t5,C_fix(0));}
> -
> 
> But somehow the "C_proc4" receives clobbered memory.

"stub26" is called as a normal C function and the returned buffer
will be clobbered if it points to (nonstatic) heap data.


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, Alan Post wrote:


On Thu, Oct 13, 2011 at 02:46:30PM -0400, John Cowan wrote:

Alan Post scripsit:

> It does make the routine non-reentrant.  Does that matter here?

I don't see how.  This routine is called from Chicken, and the string
gets copied into a Chicken string right away.

I suppose you might want to shut off interrupts.



Right!  I was laboring under the illusion of posix threads.


Hm, I'm working in the presence of posix threads; just until now
there is only one chicken thread for me.

Which might change, as I said.

But shutting off interrupts is totally irrelevant here.
We are talking about the generated code as seen within the C function.
interrupts are checked at their begin.





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, John Cowan wrote:


Alan Post scripsit:


It does make the routine non-reentrant.  Does that matter here?


I don't see how.  This routine is called from Chicken, and the string
gets copied into a Chicken string right away.

I suppose you might want to shut off interrupts.


Common.  When I consider such low level things, then I'm
not caught in the cage of the application at hand.

It might very well be that I want one day to run two chicken
threads in one process.  So far there is no promise that this might
work.  But the declarations in chicken core look already as if one
could try to do that.

I don't want to accidentally create a stupid test case for the fact
that there is no provision (I can't even imagine any) for code
inside foreign-lambda* to be always thread local...

I'd rather keep a test case for the temporary dysfunctional but
good API for returning on stack strings.

If it was not for QA wrt. chicken, my simplest solution would be
to just use the equivalent definition as it went into chicken.
But, that's *not* the point, you see.





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, John Cowan wrote:


Jörg F. Wittenberger scripsit:


So I'll stick with the test case and remove the "static" keyword from
the buffer definition once I have an updated gcc in my production
environment.


"Program testing can be used to show the presence of bugs, but never to
show their absence!" --Edsger Dijkstra

And this is especially true for Heisenbugs like this.  Keep the 'static'
permanently: it's safe and it costs essentially nothing.


Except for the reentrance/thread safety issue that is!






___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

John,

it's not my intention to argue about the merits of the way the
foreign-lambda* I posted has been written.
(If I had to do so, I would argue that using a dynamic "buf"
would be better style.  Less sensible to being [re]used in a
multi threaded or reentrant environment.)

My point is, that this code is valid wrt. the manual and apparently
valid given my understanding of what the C code tries to do:
make it possible to return on stack strings.

The compiler output could be much simpler, if there where a restriction
that c-string would always point to heap or static memory.
(Maybe a c-string-static type would be an idea to distinguish
the complex case and the simple one?  Not sure.)

The situation is, that more recent gcc versions will do fine with that
one (and break elsewhere) while those in some common linux distributions
at fail to work.

On Oct 13 2011, John Cowan wrote:


I looked at all instances of 'define return' and at most they seem to
copy pointers:


That's what they do.  They arrange things for ##sys#peek-c-string
to find the C string.

For me it ends up like this:

--- #define return(x) C_cblock C_r = 
(C_mpointer(&C_a,(void*)(x))); goto C_ret; C_cblockend static C_word 
C_fcall stub26(C_word C_buf,C_word C_a0) C_regparm; C_regparm static C_word 
C_fcall stub26(C_word C_buf,C_word C_a0){ C_word 
C_r=C_SCHEME_UNDEFINED,*C_a=(C_word*)C_buf; unsigned int ch=(unsigned int 
)C_num_to_unsigned_int(C_a0); static unsigned char 
off[6]={0xFC,0xF8,0xF0,0xE0,0xC0,0x00};

 int size=5; C_char buf[7];
 buf[6]='\0';
 if (ch < 0x80) {
   buf[5]=ch;
 } else {
   buf[size--]=(ch&0x3F)|0x80; ch=ch>>6;
   while (ch) { buf[size--]=(ch&0x3F)|0x80; ch=ch>>6; }
   /* Write the size information into the first byte */
   ++size;
   buf[size]=off[size]|buf[size];
 }
 return(buf+size);

C_ret:
#undef return

return C_r;}
---

to be called like this:

-
t3=C_a_i_bytevector(&a,1,C_fix(3));
t4=C_i_foreign_unsigned_integer_argumentp(t2);
t5=stub26(t3,t4);
C_trace("##sys#peek-c-string");
t6=*((C_word*)lf[5]+1);
((C_proc4)(void*)(*((C_word*)t6+1)))(4,t6,t1,t5,C_fix(0));}
-

But somehow the "C_proc4" receives clobbered memory.

I don't see why.

/Jörg





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Alan Post
On Thu, Oct 13, 2011 at 02:46:30PM -0400, John Cowan wrote:
> Alan Post scripsit:
> 
> > It does make the routine non-reentrant.  Does that matter here?
> 
> I don't see how.  This routine is called from Chicken, and the string
> gets copied into a Chicken string right away.
> 
> I suppose you might want to shut off interrupts.
> 

Right!  I was laboring under the illusion of posix threads.

-Alan
-- 
.i ma'a lo bradi cu penmi gi'e du

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread John Cowan
Alan Post scripsit:

> It does make the routine non-reentrant.  Does that matter here?

I don't see how.  This routine is called from Chicken, and the string
gets copied into a Chicken string right away.

I suppose you might want to shut off interrupts.

-- 
John Cowanhttp://ccil.org/~cowanco...@ccil.org
SAXParserFactory [is] a hideous, evil monstrosity of a class that should
be hung, shot, beheaded, drawn and quartered, burned at the stake,
buried in unconsecrated ground, dug up, cremated, and the ashes tossed
in the Tiber while the complete cast of Wicked sings "Ding dong, the
witch is dead."  --Elliotte Rusty Harold on xml-dev

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Alan Post
On Thu, Oct 13, 2011 at 02:07:04PM -0400, John Cowan wrote:
> Jörg F. Wittenberger scripsit:
> 
> > So I'll stick with the test case and remove the "static" keyword from
> > the buffer definition once I have an updated gcc in my production
> > environment.
> 
> "Program testing can be used to show the presence of bugs, but never to
> show their absence!" --Edsger Dijkstra
> 
> And this is especially true for Heisenbugs like this.  Keep the 'static'
> permanently: it's safe and it costs essentially nothing.
> 

It does make the routine non-reentrant.  Does that matter here?

-Alan
-- 
.i ma'a lo bradi cu penmi gi'e du

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread John Cowan
Jörg F. Wittenberger scripsit:

> So I'll stick with the test case and remove the "static" keyword from
> the buffer definition once I have an updated gcc in my production
> environment.

"Program testing can be used to show the presence of bugs, but never to
show their absence!" --Edsger Dijkstra

And this is especially true for Heisenbugs like this.  Keep the 'static'
permanently: it's safe and it costs essentially nothing.

-- 
My confusion is rapidly waxing  John Cowan
For XML Schema's too taxing:co...@ccil.org
I'd use DTDshttp://www.ccil.org/~cowan
If they had local trees --
I think I best switch to RELAX NG.

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread John Cowan
Jörg F. Wittenberger scripsit:

> (Watch out for C_cblock and C_cblockend #defines in chicken.h , which
> depend on the C compiler in use.)

Normally, they are ({ and }) respectively, the
GNU C extension for statement expressions (see
http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html ). In C++ mode,
they compile as "do{" and "}while(0)" instead.  In neither case do they
do anything to the stack.

> It does a local #define return(x) to insert a block wherein it saves
> the to-be-returned string *before* the actual return statement is seen
> by the C compiler.

I looked at all instances of 'define return' and at most they seem to
copy pointers: they don't copy the chars that are pointed to.  That is
what matters here: one way or another, this code returns a pointer to
garbage outside the current stack.

> the trick as deployed in the Chicken source does not work under
> certain C compilers.

Since it's still not valid C despite the trick, that's no surprise.

An alternative approach to using a static string, overkill in this case,
is to malloc() the result string and declare the result type to be
c-string* rather than c-string.

-- 
Only do what only you can do.   John Cowan 
  --Edsger W. Dijkstra's advice
to a student in search of a thesis

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, Jim Ursetto wrote:


On Oct 13, 2011, at 11:02 AM, Jörg F. Wittenberger wrote:


ages ago I wrote these simple lines:


Out of curiosity, would this suit your purposes instead:

(##sys#char->utf8-string (integer->char x))


Looks good.

I did not notice that this made it into the chicken core since
I wrote my code.

NB:  The code I posted is actually a good test case for the
c-string return value in the chicken FFI.
This code was actually converted to C from an equivalent Scheme
implementation (good enough by all counts for the actual purpose
at hand) to learn about Chickens c-string return handling.

So I'll stick with the test case and remove the "static" keyword
from the buffer definition once I have an updated gcc in my
production environment.

Have Fun

/Jörg





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Alan Post
On Thu, Oct 13, 2011 at 07:24:12PM +0200, Jörg F. Wittenberger wrote:
> IMHO the moral of the story: Never trust you C compiler too much.
> 

I've had to get more familiar with gcc's -f flag, as the years have
gone by.  '-fno-strict-aliasing' is one that I've personally needed
(and chicken requires too, I believe) for some time now, and
variously I've had to turn those on and off based on writing C that
was a little too comfortable with the underlying machine
architecture.

A favorite trick of mine, for instance:

  struct string {
size_t string_size;
char   string_buffer[1]; /* note the single character string */
}

Where I then malloc 'sizeof(struct string)+strlen(str)' all as one
block of memory and write the string past the end of the struct.[1]

You might find a wonderful playground of debugging potential if you
try this code fiddling with your -f options: start with the ones
that get defined with -O3, particularly those that aren't defined
in -O2.

-Alan

1: this stores both the size of the string and an extra character
   for the null pointer, which I do on purpose.
-- 
.i ma'a lo bradi cu penmi gi'e du

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, Jörg F. Wittenberger wrote:

Recently this code begin to return garbage under gcc 4.4.5
on amd64 and ARM, though more reliable on ARM.


I forgot some marginal thing you might want to know just in case:

With gcc 4.4.5 (as in current debian stable) you really, really
don't want to compile C code as produced by Chicken with
gcc -O3 !!

This works for me for small test programs so far.

But with a 50k LoC program it runs into all sorts of errors.
Just deleting the .o files and recompile the same C code gives
me a working executable.

This trigger memories to my recent observation
http://lists.nongnu.org/archive/html/chicken-users/2011-10/msg00067.html
This one came up under gcc 4.5.2 (as in current Ubuntu).

IMHO the moral of the story: Never trust you C compiler too much.

Since the latter would be a case of the newer compiler producing
code from perfect C source about valgrind will complain.
(Which does not exclude the chance that valgrind would be wrong.
Just I don't believe in that.)

I'm afraid these facts are almost off-topic here.
Unrelated except for the collateral damage, that Chicken compiles
to C, which is not exactly an executable format on most machines.  ;-)


/Jörg




___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jim Ursetto
On Oct 13, 2011, at 11:02 AM, Jörg F. Wittenberger wrote:

> ages ago I wrote these simple lines:

Out of curiosity, would this suit your purposes instead:

(##sys#char->utf8-string (integer->char x))
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

On Oct 13 2011, John Cowan wrote:


Jörg F. Wittenberger scripsit:


(define integer->utf8string
 (foreign-lambda*
  c-string ((unsigned-integer ch))
  "static C_uchar off[6]={0xFC,0xF8,0xF0,0xE0,0xC0,0x00};
 int size=5; C_uchar buf[7];
 buf[6]='\\0';
 if (ch < 0x80) {
   buf[5]=ch;
 } else {
   buf[size--]=(ch&0x3F)|0x80; ch=ch>>6;
   while (ch) { buf[size--]=(ch&0x3F)|0x80; ch=ch>>6; }
   /* Write the size information into the first byte */
   ++size;
   buf[size]=off[size]|buf[size];
 }
 return(buf+size);
"))


This code is not good C, because it returns a pointer into a stack
frame which has already been exited.  It may just so happen that
there is still a correct value there, but there are no guarantees.
I'd guess that the corruption happens when there is a minor GC.
See http://c-faq.com/~scs/cclass/int/sx5.html .


Wait!

The chicken manual does not mention this restriction.  For a reason.

When you read the expanded C code as Chicken produces,
you will find, that it does through some magic to make sure
this restriction shall not apply.

(Watch out for C_cblock and C_cblockend #defines in chicken.h
, which depend on the C compiler in use.)

It does a local #define return(x) to insert a block wherein it
saves the to-be-returned string *before* the actual return statement
is seen by the C compiler.


1.) static C_uchar buf[7];
   ^^
   does the trick.


That's absolutely the Right Thing.  You are now returning a pointer to
the static data region, which will always be available.


Not exactly.  While your explanation of my reasoning how to circumvent
the none-working situation is correct, this means that
the trick as deployed in the Chicken source does not work under certain
C compilers.


Best Regards

/Jörg





___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] c-string return question

2011-10-13 Thread John Cowan
Jörg F. Wittenberger scripsit:

> (define integer->utf8string
>  (foreign-lambda*
>   c-string ((unsigned-integer ch))
>   "static C_uchar off[6]={0xFC,0xF8,0xF0,0xE0,0xC0,0x00};
>  int size=5; C_uchar buf[7];
>  buf[6]='\\0';
>  if (ch < 0x80) {
>buf[5]=ch;
>  } else {
>buf[size--]=(ch&0x3F)|0x80; ch=ch>>6;
>while (ch) { buf[size--]=(ch&0x3F)|0x80; ch=ch>>6; }
>/* Write the size information into the first byte */
>++size;
>buf[size]=off[size]|buf[size];
>  }
>  return(buf+size);
> "))

This code is not good C, because it returns a pointer into a stack
frame which has already been exited.  It may just so happen that
there is still a correct value there, but there are no guarantees.
I'd guess that the corruption happens when there is a minor GC.
See http://c-faq.com/~scs/cclass/int/sx5.html .

> 1.) static C_uchar buf[7];
>^^
>does the trick.

That's absolutely the Right Thing.  You are now returning a pointer to
the static data region, which will always be available.

-- 
John Cowan  co...@ccil.org   http://ccil.org/~cowan
Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
There is also one Dummett.  By their works shall ye know them.  However, just as
no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
known by his works.  Indeed, Bummett does not exist.  It is part of the function
of this and other e-mail messages, therefore, to do what they can to create him.

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] c-string return question

2011-10-13 Thread Jörg F . Wittenberger

Hi,

ages ago I wrote these simple lines:

(define integer->utf8string
 (foreign-lambda*
  c-string ((unsigned-integer ch))
  "static C_uchar off[6]={0xFC,0xF8,0xF0,0xE0,0xC0,0x00};
 int size=5; C_uchar buf[7];
 buf[6]='\\0';
 if (ch < 0x80) {
   buf[5]=ch;
 } else {
   buf[size--]=(ch&0x3F)|0x80; ch=ch>>6;
   while (ch) { buf[size--]=(ch&0x3F)|0x80; ch=ch>>6; }
   /* Write the size information into the first byte */
   ++size;
   buf[size]=off[size]|buf[size];
 }
 return(buf+size);
"))

this happend to work at least on i336 amd64 and ARM for years
every day.

Recently this code begin to return garbage under gcc 4.4.5
on amd64 and ARM, though more reliable on ARM.

However: no clear test case available:

When I write the above definition plus some test code
(define xx (integer->utf8string 160))
(display (char->integer (string-ref xx 0)))
into it's own file, I have so far been unable to make it return
garbage.

It does return garbage I compile this code
as the only one foreign function together with the ssax parser
in it's own module (and link it into a larger program).

Maybe it's helpful to know how I escaped:
1.) static C_uchar buf[7];
   ^^
   does the trick.

2.) AND so does adding a for-loop right before the return, which
   prints a hex output of "buf" to stderr!  (Instead of the static
   declaration.  So it's rather obvious a gcc issue.)

As far as I understand the C code into which Chicken expands this
function, I'd say: that one is correct.

Plus: so far I have gcc 4.5.2 on my dev machine.  There I never
have been able to reproduce this case.

MAybe it's helpful to know what can go wrong.

BEst Regards

/Jörg




___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users