[E-devel] Attn: SIMD tweakers. :) Need some testing help please.

2006-01-07 Thread Tres Melton
 be included where it is needed
especially where the posix_memalign() might not be available (it was
written from scratch without looking at any other code so it is not
copyright encumbered in any way).  The disadvantage is that the posix
code is much faster, better tested, and can be freed with free().  I
have not attempted to optimize mine at all and won't unless the code is
wanted here or in X.


Timing:
The timing on these routines is not based on time at all.  It is based
on CPU cycles consumed by the routines.  Since the routines can be
preempted we are really looking for the best case scenario.  If you
want really accurate cycle counts you either need to reboot to single
user mode or run the code as root and clear the interrupts prior to the
shading routines. (Don't forget to issue an sti when you return. :)  It
is this possible preemption that explains negative reductions and most
of the other weird output.  The timing crap is in time-it.[ch] and
utilizes a small amount of inline assembly.  If anyone as an alternate
timing infrastructure I'd be interested in a macro that flips between 2
or 3 different ones.


Needed:
Michael also wanted to know the Resident Stack Size.  How should I go
about retrieving it and when is the best time to do it?  Help figuring
our why the aligned memory moves fail on x86 in my 32 bit chroot jail.
It might have something to do with the memory chunk that is made
available in a 32 bit chroot jail on a 64 bit processor is not 16 byte
aligned.  (I hope anyway.  I might make aligned memory access enabled by
a macro in the code that I finally submit to MEJ like it is now.  Adjust
it in cmod.h)


CC People (and why):
MEJ:It is the Eterm shading routines that I'm testing with.
Raster: You wanted to see some statistics on performance.  Specifically
how fast this is over the internal x86 fixup microcode.
Tiago:  As the person that found the alignment issues in imlib2
originally.
John:   As the author of the imlib2 code that has an alignment issue.  I
hope to dive into that code shortly and hope you have the time
to work with me on it.  (Not yet, but soon)
vapier: Because weird code turns him on for some reason and he is
Gentoo's maintainer for Eterm/Enlightenment.  
hparker:
dang:   My Gentoo AMD64 mentors.  :p

Licensing:
I'm releasing this code as copyright by me and others with all my
rights reserved for the next month or so just so that other versions
don't appear and confuse things (I will gladly accept patches though).
If Michael or Raster want to adopt it into a testing branch of their
projects then they may have it to license however they please; if not
then I will release the portions that I wrote next month under their
preferred license: the BSD.  (I prefer the GPL)  If you do play with the
code anyway then be nice and set the major version number to 0
(TEST_VERSION_MAJOR in tst.c) for me please.

Thanks all, for helping out in Operation: Aggravate MEJ!  Muu ha ha ha
ha.  He hates duplicated code.  :-)   (For those newer to the list than
I, MEJ is really a good sport about accepting outside code -- look at
Escreen.  But how many of us look forward to the prospect of having 16
different functions written in one of C, assembly, or inline assembly to
accomplish the single job of shading a background image.  :/  )  And
please ignore all the crap that is still in the code.  I'll skim it down
once I know more about the test results.  MEJ: I know we have different
coding styles; I will make a concerted effort to adjust my code to your
style for everything that I submit to you so don't freak on me yet.
This should have most everything to test aligned/unaligned memory in
SSE2 on x86  x86-64 as well as to profile the older MMX code that
Willem wrote.  I'm very tired ATM so I'm sure I missed something
critical.  Sorry.  I'll get to it after some sleep.

The code is being hosted here (thanks Gentoo  Mr. Parker):
http://dev.gentoo.org/~hparker/simd-tester.tar.gz

Cheers,
-- 
Tres Melton
IRC  Gentoo: RiverRat


signature.asc
Description: This is a digitally signed message part


Re: [E-devel] Eterm stuff. (I'm back :)

2006-01-04 Thread Tres Melton
On Wed, 2006-01-04 at 16:20 -0500, Michael Jennings wrote:
 On Monday, 02 January 2006, at 04:28:52 (-0700),
 Tres Melton wrote:
  Patch 1)
 Applied.
  Patch 2)
 I think you're correct.  Applied.
  Patch 3)
 Applied.
  Patch 4)
 Applied, and copyright dates have been changed.  :-)

Thanks.  :)

  I have a patch that works here that checks for alignment or not and
  then calls the existing unaligned routine or a different aligned
  routine but that might be disruptive so I want to submit that as
  part of a much larger set.  I've included things like cache
  prefetching, x86 SSE2 and I have a couple of instructions left in
  translating sse2 to sse so that x86 can use sse which should double
  their shading speeds.
 
 Sounds good.  You're not referring to the 15/16 patch you sent, right?

The 15/16 patch that I just submitted is the last bit that can be
applied without restructuring the checks for, and macros defining, the
SIMD routines.  It 'should' work (it does here) but the optimal solution
is to guarantee that the data submitted is already aligned (so we don't
have to check like this patch does).  That can't be done in Eterm as
Eterm doesn't allocate the data storage; X allocates the storage and
that is where any more improvements need to be made.  You know how I dip
my toe before jumping in, well I just got on their mailing lists and
joined their IRC channel; it could be a while before I get it into X.

  That is going to create a problem with the HAVE_SSE macros as it
  will now be valid on both x86 and x86-64.  Anyway, I'm going to be
  releasing a testing program that tests the different things and, as
  Raster requested, some profiling code so we know what we are
  actually gaining.
 
 If needed, both configure-time and run-time checks for MMX/SSE/SSE2
 can be added.

I have thought about making generic function pointers like
shade_ximage_15() and then point them to the appropriate C/MMX/SSE
routines at runtime but that isn't a great solution as some of the code
will not compile/assemble without hardware support.  Compile time checks
are going to be necessary and I don't really see any reason to have them
switchable at runtime other than to compare the two (my test code will
enable that when I finish it) so I think the best solution is going to
be some compile time checks and just use macros like I did with the
aligned/unaligned functions to maneuver the code path to the correct
code.  Ultimately I'd like to have simple calls in the pixmap code
without all the HAVE_SSE crap and then let the macros sort it out.  To
that end, I would like to add a cmod.h file (or whatever you think it
should be named) that has all the function prototypes in it as well as
all the macro logic to the tree so that it becomes completely
transparent everywhere except that file.  I have included a rough draft
of the macros structure for your comments.

 It would probably be better to commit and see how it goes. :)

Everything that I have submitted prior to this email I consider ready to
be committed.  The first set I think can go into stable pretty quick and
the last one (15/16 aligned/unaligned) seems right to me but I don't
want to be responsible for hosing anyone's Eterm.  :)   This is all I
can submit until the profiling testing code is out and I get some
feedback.  I would like your thoughts on this submission but it is
nowhere near ready for a commit of any kind.

 Michael

Best Regards,
-- 
Tres Melton
IRC  Gentoo: RiverRat


#define ETERM_ARCH_UNKNOWN	1
#define ETERM_ARCH_x86		201
#define ETERM_ARCH_x86_64	301

#define ETERM_SIMD_UNKNOWN	1
#define ETERM_SIMD_MMX		201
#define ETERM_SIMD_MMX_PLUS	202
#define ETERM_SIMD_SSE		301
#define ETERM_SIMD_SSE2 	302
#define ETERM_SIMD_SSE3		303


#define ETERM_ARCH  ETERM_ARCH_x86
#define ETERM_SIMD  ETERM_SIMD_SSE
#define ETERM_ALIGNMENT 16
 
 
 
int test1( void )
{
#if ( defined ETERM_ARCH )  ( ETERM_ARCH == ETERM_ARCH_x86 )
#  if   ( defined ETERM_SIMD )  ( ETERM_SIMD == ETERM_SIMD_MMX )
  printf( x86 MMX Routines.\n );
#  elif ( defined ETERM_SIMD )  ( ETERM_SIMD == ETERM_SIMD_SSE )
  printf( x86 SSE Routines.\n );
#  elif ( defined ETERM_SIMD )  ( ETERM_SIMD == ETERM_SIMD_SSE2 )
  printf( x86SSE2 Routines.\n );
#  else
  printf(   C Routines.\n );
//  C routines
#  endif
#elif   ( defined ETERM_ARCH )  ( ETERM_ARCH == ETERM_ARCH_x86_64 )
#  if   ( defined ETERM_SIMD )  ( ETERM_SIMD == ETERM_SIMD_SSE2 )
  printf( x86-64 SSE2 Routines.\n );
#  else /*  Other, lesser, combinations make no sense  */
  printf(   C Routines.\n );
//  C routines
#  endif
#else
  printf(   C Routines.\n );
//  C routines
#endif
  return 1;
}


int main( void )
{
  return( test1());
}



signature.asc
Description: This is a digitally signed message part


[E-devel] Eterm with safe aligned movs.

2006-01-03 Thread Tres Melton
Michael,

Remember me, I'm back!  :)  I recall you stating the last time
you saw me: Speaking of someone that likes to fuck with shit...  Well
I aim to please.  I have a staggered set of patches that are coming that
rework the entire SIMD engine.  If you're still reading and haven't
opened an email reply for a point by point shoot-down-in-flames then I
might have a chance. :)

I'm getting ready to explore that alignment issue that came up a
few months ago within imlib2.  (Do we have a mail archive that stores   
stuff older than a few months? Sourceforge is disappointing in that
regard.)  On that note I've been digging into the various code bases as
much as the processor's internals to figure it out

The 15/16 bpp and the 32 bpp shading routines work fundamentally
different.  The 15/16 bpp routines read in a pixel and make two more
copies for the three colors: red, green, and blue.  They then mask out
the other colors in the copy shade tint each color with its modifier and
then blend the colors back into a single pixel.  The 32 bpp routines
read in a pixel of three 8 bit colors and expand them to 16 bit values
in place, do the math and then truncate them back.  That is why the
32bpp routines only read in 64bits at a time in SSE mode.  Those two 32
bit pixels, with three 8 bit colors and maybe an 8 bit alpha channel
each expand to a 64 bit pixel with 16 bit values internally.  After it
is tinted or shaded it is truncated back to four 8 bit values per pixel
and the two 32 bit pixels are then written back as a 64 bit chuck.
Therefore there is no need for using aligned/unaligned 128 bit memory
moves in 32bpp mode.  This is what also creates rounding error
differences between the C routines.  C does the temporary work in 32 bit
integers instead of 16 bits inside the SIMD engine.

Anyway, this presumes that you've attached the first set of patches
from the other day.  This should be added to the needs lots of testing
branch of the code.  I've tested it here and all is good.  The only
thing that would be better than this patch is one that gauranteed the
data was aligned.  That involves digging into X, which I'm doing, but
with great trepidation as that monster is a bit intimidating.  Anyway,
patches for guaranteed alignment will go to X for XImage data.  The
patches for x86/SSE are going to first get released as a test program to
see if I can get some widespread testing as I don't have the hardware.
I'm sure you'll see that on the list prior to me incorporating them into
Eterm and submitting them to you.  Feel free to beep me on IRC if there
is anything else.

Cheers,
-- 
Tres Melton
IRC  Gentoo: RiverRat
Index: eterm/Eterm/src/pixmap.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/pixmap.c,v
retrieving revision 1.115
diff -u -b -B -u -r1.115 pixmap.c
--- eterm/Eterm/src/pixmap.c	22 Dec 2005 23:31:33 -	1.115
+++ eterm/Eterm/src/pixmap.c	3 Jan 2006 22:34:31 -
@@ -66,9 +66,25 @@
 extern void shade_ximage_32_mmx(void *data, int bpl, int w, int h, int rm, int gm, int bm);
 
 /* Assembler routines for 64 bit cpu with sse2 */
-extern void shade_ximage_15_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
-extern void shade_ximage_16_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
-extern void shade_ximage_32_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
+extern void shade_ximage_15_sse2_A(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_15_sse2_U(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_16_sse2_A(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_16_sse2_U(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+
+#define shade_ximage_15_sse2( data, bpl, w, h, rm, gm, bm )   \
+{ \
+  (((long) ( data ))  ((long) ( bpl ))  ((long) ( ETERM_ALIGNMENT - 1 ))) ? \
+shade_ximage_15_sse2_U((data), (bpl), (w), (h), (rm), (gm), (bm))   : \
+shade_ximage_15_sse2_A((data), (bpl), (w), (h), (rm), (gm), (bm));\
+}
+
+#define shade_ximage_16_sse2( data, bpl, w, h, rm, gm, bm )   \
+{ \
+  (((long) ( data ))  ((long) ( bpl ))  ((long) ( ETERM_ALIGNMENT - 1 ))) ? \
+shade_ximage_16_sse2_U((data), (bpl), (w), (h), (rm), (gm), (bm))   : \
+shade_ximage_16_sse2_A((data), (bpl), (w), (h), (rm), (gm), (bm));\
+}
+
 
 #ifdef PIXMAP_SUPPORT
 static Imlib_Border bord_none = { 0, 0, 0, 0 };
Index: eterm/Eterm/src/sse2_cmod.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/sse2_cmod.c,v
retrieving revision 1.1
diff -u -b -B -u -r1.1 sse2_cmod.c
--- eterm/Eterm/src/sse2_cmod.c

Re: [E-devel] Eterm with safe aligned movs. (Supersceedes previos)

2006-01-03 Thread Tres Melton
On Tue, 2006-01-03 at 15:49 -0700, Tres Melton wrote:
 Michael,
 
Crap,  Wrong damn tree.  Try this instead of the last one please.


-- 
Tres Melton
IRC  Gentoo: RiverRat
Index: eterm/Eterm/src/pixmap.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/pixmap.c,v
retrieving revision 1.115
diff -u -b -B -u -r1.115 pixmap.c
--- eterm/Eterm/src/pixmap.c	22 Dec 2005 23:31:33 -	1.115
+++ eterm/Eterm/src/pixmap.c	3 Jan 2006 22:59:40 -
@@ -66,10 +66,30 @@
 extern void shade_ximage_32_mmx(void *data, int bpl, int w, int h, int rm, int gm, int bm);
 
 /* Assembler routines for 64 bit cpu with sse2 */
-extern void shade_ximage_15_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
-extern void shade_ximage_16_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
+#ifdef HAVE_SSE2
+extern void shade_ximage_15_sse2_A(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_15_sse2_U(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_16_sse2_A(void *data, int bpl, int w, int h, int rm, int gm, int bm );
+extern void shade_ximage_16_sse2_U(void *data, int bpl, int w, int h, int rm, int gm, int bm );
 extern void shade_ximage_32_sse2(void *data, int bpl, int w, int h, int rm, int gm, int bm);
 
+#define ETERM_ALIGNMENT 16
+
+#define shade_ximage_15_sse2( data, bpl, w, h, rm, gm, bm )   \
+{ \
+  (((long) ( data ))  ((long) ( bpl ))  ((long) ( ETERM_ALIGNMENT - 1 ))) ? \
+shade_ximage_15_sse2_U((data), (bpl), (w), (h), (rm), (gm), (bm))   : \
+shade_ximage_15_sse2_A((data), (bpl), (w), (h), (rm), (gm), (bm));\
+}
+
+#define shade_ximage_16_sse2( data, bpl, w, h, rm, gm, bm )   \
+{ \
+  (((long) ( data ))  ((long) ( bpl ))  ((long) ( ETERM_ALIGNMENT - 1 ))) ? \
+shade_ximage_16_sse2_U((data), (bpl), (w), (h), (rm), (gm), (bm))   : \
+shade_ximage_16_sse2_A((data), (bpl), (w), (h), (rm), (gm), (bm));\
+}
+#endif
+
 #ifdef PIXMAP_SUPPORT
 static Imlib_Border bord_none = { 0, 0, 0, 0 };
 #endif
Index: eterm/Eterm/src/sse2_cmod.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/sse2_cmod.c,v
retrieving revision 1.1
diff -u -b -B -u -r1.1 sse2_cmod.c
--- eterm/Eterm/src/sse2_cmod.c	14 Jun 2005 19:39:01 -	1.1
+++ eterm/Eterm/src/sse2_cmod.c	3 Jan 2006 22:59:41 -
@@ -94,7 +88,7 @@
 
 #ifdef HAVE_SSE2
 
-void shade_ximage_15_sse2( volatile void *data, volatile int bpl, volatile int w, volatile int h, volatile int rm, volatile int gm, volatile int bm )
+void shade_ximage_15_sse2_U( volatile void *data, volatile int bpl, volatile int w, volatile int h, volatile int rm, volatile int gm, volatile int bm )
 {
   __asm__ __volatile__ (
 	.align 16  \n\t   /* SIMD instructions should be aligned on 16 byte (128 bit) boundraries for performance reasons.*/
@@ -269,7 +263,7 @@
 }
 
 
-void shade_ximage_16_sse2( volatile void *data, volatile int bpl, volatile int w, volatile int h, volatile int rm, volatile int gm, volatile int bm )
+void shade_ximage_16_sse2_U( volatile void *data, volatile int bpl, volatile int w, volatile int h, volatile int rm, volatile int gm, volatile int bm )
 {
   __asm__ __volatile__ (
 	.align 16  \n\t   /* SIMD instructions should be aligned on 16 byte (128 bit) boundraries for performance reasons.*/
@@ -447,6 +441,359 @@
   );	/*  End of Assembly  */
 }
 
+void shade_ximage_15_sse2_A( volatile void *data, volatile int bpl, volatile int w, volatile int h, volatile int rm, volatile int gm, volatile int bm )
+{
+  __asm__ __volatile__ (
+	.align 16  \n\t   /* SIMD instructions should be aligned on 16 byte (128 bit) boundraries for performance reasons.*/
+	leaq -14(%%rsi, %%rbx, 2), %%rsi\n\t	/* Load the stack index register with a pointer to data + ( width * bytes/pixel ) -6		*/
+	negq %%rbx			\n\t	/* Negate the width to that we can increment the counter	*/
+	jz 10f\n\t	/* Jump to end if the line count is zero			*/
+	movd %[red_mod], %%xmm5	\n\t	/* Load the color modifiers into mmx registers			*/
+	movd %[green_mod], %%xmm6	\n\t	/*  */
+	movd %[blue_mod], %%xmm7	\n\t	/*  */
+	punpcklwd %%xmm5, %%xmm5	\n\t	/* Unpack and Interleave low words.  From A64_128bit_Media_Programming (p. 380)			*/
+	punpcklwd %%xmm6, %%xmm6	\n\t	/* Duplicate the bottom 16 bits into the next 16 bits (both operands are the same)		*/
+	punpcklwd %%xmm7, %%xmm7	\n\t
+	punpckldq %%xmm5, %%xmm5	\n\t	/* Unpack and Interleave low double words.  From A64_128bit_Media_Programming (p. 376)		*/
+	punpckldq %%xmm6, %%xmm6	\n\t	/* Duplicate the bottom 32 bits into the next 32 bits (both operands

[E-devel] Eterm stuff. (I'm back :)

2006-01-02 Thread Tres Melton
Michael,

I've been getting into this alignment issue a lot deeper and there are
some fundamental issues in X that need exploring and I'm using the Eterm
shading routines to play with them.  I have a number of patches that
I've been working on so I might as well get started.

Patch 1)
#ifdef HAVE_SSE2  XImage * __attribute__ ((aligned (16))) ximg;
First problem with this patch is that glibc can't guarantee anything
more than 8 bytes of alignment.  For more alignment you need to use
posix_memalign().  Second issue is that this will align the pointer to
the data but not the data; and in reality it isn't ximg that needs
aligned but ximage - data and that allocation is done in X and I'm
looking into getting that to align correctly for x86-64 right now.

Patch 2)
I'm not positive about this one but it seems right to me.  Once you
shade it there should be no reason to shade it again.

Patch 3)
There was talk about imlib2 routines failing on aligned memory moves and
the solution was temporarily to change the aligned moves to unaligned.
That worried me a bit when I wrote these so I did everything with
unaligned moves.  Since two registers will always be aligned I changed
the moves from movdqu to movdqa.  

Patch 4)
This is just copyright update and comment cleanup.  (And don't forget to
update the copyright dates in pixmap.c as it was a year behind
yesterday; now it's two years out of date.  :)

I have a patch that works here that checks for alignment or not and
then calls the existing unaligned routine or a different aligned routine
but that might be disruptive so I want to submit that as part of a much
larger set.  I've included things like cache prefetching, x86 SSE2 and I
have a couple of instructions left in translating sse2 to sse so that
x86 can use sse which should double their shading speeds.

That is going to create a problem with the HAVE_SSE macros as it will
now be valid on both x86 and x86-64.  Anyway, I'm going to be releasing
a testing program that tests the different things and, as Raster
requested, some profiling code so we know what we are actually gaining.

Anyway, the three patches above should be safe to just apply (and
double check the second one please).  And just a heads up on the coming
stuff.  :)

Happy New Year,
-- 
Tres Melton
IRC  Gentoo: RiverRat
? eterm/Eterm/configure.in-message.patch
Index: eterm/Eterm/src/pixmap.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/pixmap.c,v
retrieving revision 1.115
diff -u -b -B -u -r1.115 pixmap.c
--- eterm/Eterm/src/pixmap.c	22 Dec 2005 23:31:33 -	1.115
+++ eterm/Eterm/src/pixmap.c	2 Jan 2006 10:15:25 -
@@ -1748,14 +1748,7 @@
 void
 colormod_trans(Pixmap p, imlib_t *iml, GC gc, unsigned short w, unsigned short h)
 {
-
-#ifdef HAVE_SSE2
-XImage * __attribute__ ((aligned (16))) ximg;
-#elif defined HAVE_MMX
-XImage * __attribute__ ((aligned (8))) ximg;
-#else
 XImage *ximg;
-#endif
 register unsigned long i;
 
 #if 0
? eterm/Eterm/configure.in-message.patch
Index: eterm/Eterm/src/pixmap.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/pixmap.c,v
retrieving revision 1.115
diff -u -b -B -u -r1.115 pixmap.c
--- eterm/Eterm/src/pixmap.c	22 Dec 2005 23:31:33 -	1.115
+++ eterm/Eterm/src/pixmap.c	2 Jan 2006 10:23:01 -
@@ -1887,6 +1880,7 @@
 if (ximg-bits_per_pixel != 32) {
 D_PIXMAP((Rendering 24 bit\n));
 shade_ximage_24(ximg-data, ximg-bytes_per_line, w, h, rm, gm, bm);
+break;
 }
 /* drop */
 case 32:
Index: eterm/Eterm/src/sse2_cmod.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/sse2_cmod.c,v
retrieving revision 1.1
diff -u -b -B -u -r1.1 sse2_cmod.c
--- eterm/Eterm/src/sse2_cmod.c	14 Jun 2005 19:39:01 -	1.1
+++ eterm/Eterm/src/sse2_cmod.c	2 Jan 2006 10:47:22 -
@@ -126,8 +126,8 @@
 	jns 3f\n\t
 	2:\n\t	/* Start of the inner loop (pixels 8 at a time -- 8 * 16 = 128bits/xmm register )		*/
 	movdqu (%%rsi, %%rcx, 2), %%xmm0\n\t	/* Load the 16 bits of the pixel (5 bits for red, 6 bits for green, 5 bits for blue)		*/
-	movdqu %%xmm0, %%xmm1		\n\t	/* Create a copy of the pixel for the green color		*/
-	movdqu %%xmm0, %%xmm2		\n\t	/* Create a copy of the pixel for the blue color		*/
+	movdqa %%xmm0, %%xmm1		\n\t	/* Create a copy of the pixel for the green color		*/
+	movdqa %%xmm0, %%xmm2		\n\t	/* Create a copy of the pixel for the blue color		*/
 	psrlw $5, %%xmm1		\n\t	/* Packed Shift Right Logical Words*/
 		/* From A64_128bit_Media_Programming (p. 347)			*/
 		/* Shifts the blue off of the green color			*/
@@ -191,8 +191,8 @@
 	jns 8f\n\t
 	7:\n\t
 	movdqu (%%rsi, %%rcx, 2), %%xmm0\n\t
-	movdqu

[E-devel] 64 bit cleanliness Imlib2 funckyness (patch)

2005-09-28 Thread Tres Melton
vapier sent me on a mission to find out why imlib2's, newly enabled for
the amd64, code seg-faults on his machine.  I can confirm that it
crashes on mine too.  I found a chunk of code that assumes 32 bit
pointers and this patch corrects that behavior.

Anyway the program feh crashes when you try to open a menu with the
mouse and it has been traced to the --enable-amd64 configure option.  My
current system (CVS is current) configuration is:


imlib2 1.2.1.006

Configuration Options Summary:
Image Loaders:
  JPEG: yes
  PNG.: yes
  TIFF: yes
  GIF.: yes
  ZLIB: yes
  BZIP2...: no
  ID3.: yes
Use MMX for extra speed...: no
Use AMD64 for extra speed.: yes
Installation Path.: /usr
Compilation...: make
Installation..: make install


Sorry Mike, the bug seems to be deeper than just this.

Cheers,

-- 
Tres Melton
IRC  Gentoo: RiverRat
Index: libs/imlib2/src/lib/rgba.c
===
RCS file: /cvsroot/enlightenment/e17/libs/imlib2/src/lib/rgba.c,v
retrieving revision 1.1
diff -u -b -B -r1.1 rgba.c
--- libs/imlib2/src/lib/rgba.c	1 Nov 2004 09:45:31 -	1.1
+++ libs/imlib2/src/lib/rgba.c	28 Sep 2005 08:45:44 -
@@ -2854,7 +2854,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -2924,7 +2924,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -2996,7 +2996,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3066,7 +3066,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3138,7 +3138,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3208,7 +3208,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3280,7 +3280,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3350,7 +3350,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_2(width))
   {
@@ -3421,7 +3421,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_4(width))
   {
@@ -3465,7 +3465,7 @@
  {
 for (y = 0; y  h; y++)
   {
- for (x = 0; ((x  w)  (!(IS_ALIGNED_32((int)dest; x++)
+ for (x = 0; ((x  w)  (!(IS_ALIGNED_32((long)dest; x++)
{
   WRITE1_RGBA_RGB332(src, dest);
}
@@ -3518,7 +3518,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_4(width))
   {
@@ -3563,7 +3563,7 @@
 for (y = dy; y  h; y++)
   {
  w = width + dx;
- for (x = dx; ((x  w)  (!(IS_ALIGNED_32((int)dest; x++)
+ for (x = dx; ((x  w)  (!(IS_ALIGNED_32((long)dest; x++)
{
   WRITE1_RGBA_RGB332_DITHER(src, dest);
}
@@ -3597,7 +3597,7 @@
w = width;
h = height;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_4(width))
   {
@@ -3641,7 +3641,7 @@
  {
 for (y = 0; y  h; y++)
   {
- for (x = 0; ((x  w)  (!(IS_ALIGNED_32((int)dest; x++)
+ for (x = 0; ((x  w)  (!(IS_ALIGNED_32((long)dest; x++)
{
   WRITE1_RGBA_RGB666(src, dest);
}
@@ -3694,7 +3694,7 @@
w = width + dx;
h = height + dy;
 
-   if (IS_ALIGNED_32((int)dest))
+   if (IS_ALIGNED_32((long)dest))
  {
 if (IS_MULTIPLE_4(width))
   {
@@ -3739,7 +3739,7 @@
 for (y = dy; y  h; y++)
   {
  w = width + dx;
- for (x = dx; ((x  w)  (!(IS_ALIGNED_32((int)dest; x++)
+ for (x = dx; ((x  w)  (!(IS_ALIGNED_32((long)dest; x

Re: [E-devel] patch - imlib2 blend in AMD64

2005-08-23 Thread Tres Melton
Somehow the CC of the following never made it to the list.  Here it is again.


On Tue, 2005-08-23 at 11:30 +0900, Carsten Haitzler wrote:
 On Mon, 22 Aug 2005 19:43:58 + Tiago Victor Gehring
 [EMAIL PROTECTED] babbled:

Lots of people said. and then raster said:

 actually do tests - you may find the unaligned copies  not that much slower as
 traditionally x86 hw has always done the fixups for unaligned read/writes in
 hardware and thus the overhead is fairly small.

Tests are needed and so is some discussion.  In relation to the above
topics:


Mornin' all,


Okay, here's the deal.  I'm going to talk about some stupid shit that
everyone already knows and then you guys can to call me an idiot, jerky,
or whatever.  Raster et. al, please correct me if I am wrong and my
apologies for the review to everyone.

To review, okay the problem is that the hardware needs to be accounted
for.  It is impossible to load a byte from RAM into cache just like you
can't read a byte from the hard disk.  We all know that you can
read( ?, ?, 1 ) but we also know that when the kernel gets the call it
reads a block, returns the character asked for and holds the rest in a
buffer.  That is what happens in hardware.  The chips have all the wires
coming into them and their controller chip watches them and upon the
proper signal, ALE, it will respond.  The Address Latch Enable (ALE)
means that the CPU is asking for the memory at this address and is
actually a wire on the cpu/mobo and the voltage hits 5V (in my day, now
~2.5) and that current hits the memory controller chip and causes it to
start its cycle (ANDed with the timer wire so it starts at the next
clock tick).  That memory controller interconnects RAM and L[123]_Cache
and they all have to deal with ram in chunks/pages/lines/etc and the
number of bytes in a chunk is always a function based on the number of
wires internally/externally (ie. 386sx = 32bit chip on 16bit bus).  Add
all this shit up, and basically what you have is memory that is byte
addressable on the software level but much more complex on the hardware
level.  

When the CPU asks for a byte of memory the controller will return a
line that contains the byte in question.  When the CPU asks for a value
that is multiple bytes it will be returned in one or more cache lines of
memory.  If the CPU can be assured that the data in question is aligned
with the same alignment as the storage location is then it can
manipulate the wires once to move the data.  If it is not so aligned
then it must lode the data in 2+ chunks.  The plus is mooted by the
instruction set (it doesn't handle data types larger than the register
size to/from the SIMD core).  This data move can be done in a single
tick (to/from registers and L1 cache) so it is a waste of a cycle to
check for alignment because you could have moved the unaligned data by
then.  The two cycles are contingent upon the fix-ups that raster
mentioned above being efficient and I'm not sure exactly how the
hardware does it.  It could roll the address of the source wire to
achieve alignment or simply take two cycles and move the data in pieces.

It is therefore appropriate for us to check once at the beginning of
the image.  The preferable solution is to guarantee alignment upon entry
otherwise we are going to have to use unaligned memory moves or have two
pieces of code.  In order to achieve an alignment guarantee we need to
control how *image is created and ensure that it is a pointer that fits
if ( image % alignment ) then do unaligned_stuff.  This is a function
of the compiler and other things and can be accomplished with the
__align__ operator in C and the .align directive in asm with the GNU
tools.  I haven't investigated all of the possibilities here so I know
that the functions exist but am not entirely positive of the calling
convention nor implementation.  Anyway, all of the image_load type of
functions (or the one image_create ,or whatever it's named, that is
called by all others) need to be rewritten to ensure alignment and we
would still need to check for the possibility of a user created image
that gives an unaligned pointer to the pixels.  In order to avoid the
sigv we would have to check for alignment and maybe call a function with
unaligned moves, re-align the data, or error out in that case.  

I'm not real familiar with the imlib2 code, and more importantly, how
it is used, so that is why I'm mentioning things like this.  For those
of you that know the internals, what do you propose?  the works for
all solution is to just use unaligned memory accesses.  The faster
than all others is going to need fully aligned memory, pre-fetched
caches (already in there), and most of all predictability.

Comments, please...

Cheers,
The River Rat



---
SF.Net email is Sponsored by the Better Software Conference  EXPO
September 19-22, 2005 * San Francisco, CA * 

Re: [E-devel] patch - imlib2 blend in AMD64

2005-08-23 Thread Tres Melton
On Tue, 2005-08-23 at 11:30 +0900, Carsten Haitzler wrote:
 On Mon, 22 Aug 2005 19:43:58 + Tiago Victor Gehring
 [EMAIL PROTECTED] babbled:

Lots of people said. and then raster said:

 actually do tests - you may find the unaligned copies  not that much slower as
 traditionally x86 hw has always done the fixups for unaligned read/writes in
 hardware and thus the overhead is fairly small.

Tests are needed and so is some discussion.  In relation to the above
topics:


Mornin' all,


Okay, here's the deal.  I'm going to talk about some stupid shit that
everyone already knows and then you guys can to call me an idiot, jerky,
or whatever.  Raster et. al, please correct me if I am wrong and my
apologies for the review to everyone.

To review, okay the problem is that the hardware needs to be accounted
for.  It is impossible to load a byte from RAM into cache just like you
can't read a byte from the hard disk.  We all know that you can
read( ?, ?, 1 ) but we also know that when the kernel gets the call it
reads a block, returns the character asked for and holds the rest in a
buffer.  That is what happens in hardware.  The chips have all the wires
coming into them and their controller chip watches them and upon the
proper signal, ALE, it will respond.  The Address Latch Enable (ALE)
means that the CPU is asking for the memory at this address and is
actually a wire on the cpu/mobo and the voltage hits 5V (in my day, now
~2.5) and that current hits the memory controller chip and causes it to
start its cycle (ANDed with the timer wire so it starts at the next
clock tick).  That memory controller interconnects RAM and L[123]_Cache
and they all have to deal with ram in chunks/pages/lines/etc and the
number of bytes in a chunk is always a function based on the number of
wires internally/externally (ie. 386sx = 32bit chip on 16bit bus).  Add
all this shit up, and basically what you have is memory that is byte
addressable on the software level but much more complex on the hardware
level.  

When the CPU asks for a byte of memory the controller will return a
line that contains the byte in question.  When the CPU asks for a value
that is multiple bytes it will be returned in one or more cache lines of
memory.  If the CPU can be assured that the data in question is aligned
with the same alignment as the storage location is then it can
manipulate the wires once to move the data.  If it is not so aligned
then it must lode the data in 2+ chunks.  The plus is mooted by the
instruction set (it doesn't handle data types larger than the register
size to/from the SIMD core).  This data move can be done in a single
tick (to/from registers and L1 cache) so it is a waste of a cycle to
check for alignment because you could have moved the unaligned data by
then.  The two cycles are contingent upon the fix-ups that raster
mentioned above being efficient and I'm not sure exactly how the
hardware does it.  It could roll the address of the source wire to
achieve alignment or simply take two cycles and move the data in pieces.

It is therefore appropriate for us to check once at the beginning of
the image.  The preferable solution is to guarantee alignment upon entry
otherwise we are going to have to use unaligned memory moves or have two
pieces of code.  In order to achieve an alignment guarantee we need to
control how *image is created and ensure that it is a pointer that fits
if ( image % alignment ) then do unaligned_stuff.  This is a function
of the compiler and other things and can be accomplished with the
__align__ operator in C and the .align directive in asm with the GNU
tools.  I haven't investigated all of the possibilities here so I know
that the functions exist but am not entirely positive of the calling
convention nor implementation.  Anyway, all of the image_load type of
functions (or the one image_create ,or whatever it's named, that is
called by all others) need to be rewritten to ensure alignment and we
would still need to check for the possibility of a user created image
that gives an unaligned pointer to the pixels.  In order to avoid the
sigv we would have to check for alignment and maybe call a function with
unaligned moves, re-align the data, or error out in that case.  

I'm not real familiar with the imlib2 code, and more importantly, how
it is used, so that is why I'm mentioning things like this.  For those
of you that know the internals, what do you propose?  the works for
all solution is to just use unaligned memory accesses.  The faster
than all others is going to need fully aligned memory, pre-fetched
caches (already in there), and most of all predictability.

Comments, please...

Cheers,
The River Rat



---
SF.Net email is Sponsored by the Better Software Conference  EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile  Plan-Driven Development * Managing Projects  

Re: [E-devel] patch - imlib2 blend in AMD64

2005-08-22 Thread Tres Melton
On Mon, 2005-08-22 at 07:29 +, Tiago Victor Gehring wrote:
 Hi,
 regarding the problem I mentionted about the new amd64 optimized
 functions in imlib2, I think I found the problem, has something to do
 with the fact that memory was not aligned in some (SSE2 128 bit) MOV
 operations - ie, I just changed a couple of MOVDQA to MOVDQU in file
 amd64_blend.S, treating memory as unaligned; 
 Now if this has some other side effects (speed?) I don't know, but for
 me it worked now...
 
 Cheers,
 Tiago Gehring
 
This is a poor solution in terms of speed.  The correct solution is to
ensure that the memory is properly aligned.  For the time being it
should be left that way (I noticed that raster committed the move
unaligned data change).  I have spoken with vapier (briefly) about it
and am hoping to force the memory to be aligned on 128 bit boundaries.
This will impact the stack size of the code and a few other things that
I want to look into before offering a patch.  A couple of hints:

SSE instructions should be aligned on 16 byte (128 bit) boundaries.
MMX instructions should be aligned on  8 byte ( 64 bit) boundaries.

ASM:
  .align 16

C:
  Image * __attribute__ ((aligned (16))) image;

Regards,
RiverRat
-- 
Tres



---
SF.Net email is Sponsored by the Better Software Conference  EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile  Plan-Driven Development * Managing Projects  Teams * Testing  QA
Security * Process Improvement  Measurement * http://www.sqe.com/bsce5sf
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] Growing Shrinking Eterms in E-16.7.2 E-16.CVS

2005-08-12 Thread Tres Melton
MEJ,

I submitted a bug report to kwo yesterday about growing and shrinking
Eterms in E-16.8 (also E-16.7.2) but he peeked at it and thinks that it
is an Eterm issue.  I do to.  Anyway, I'm sorry to report that I'm
suffering from 'growing Eterm windows' again.  At the end of this email
is the command that I use to start the Eterm.  I can send its config if
you want.  I have Ctrl-Left and Ctrl-Right setup to move to the virtual
desktop to the left and right respectively.  When I have the Eterm open
on one desktop and the mouse in it and I switch desktops (via hot-keys)
to another one the Eterm seems to grow/shrink.  This is a borderless
Eterm with pop-up scroll bars and my desktop is 1152*864*24.  As always,
I'll test whatever you want but you might have to walk me through it.

The problem seems to only be present with borderless Eterms that have
pop-up scroll bars.  Changing one of those two values seems so fix
things.


Cheers,
RiverRat

---

CUT_CHARS=;:\ \'[]{}()

ETERM_OPTIONS= --buttonbar off --scrollbar-type motif --trans --itrans
--cmod 130 32 --border-width 4 --save-lines 4096 --scrollbar-popup
--borderless 

Eterm --geometry 188x65+1152+864 $ETERM_OPTIONS --cut-chars $CUT_CHARS




---
SF.Net email is Sponsored by the Better Software Conference  EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile  Plan-Driven Development * Managing Projects  Teams * Testing  QA
Security * Process Improvement  Measurement * http://www.sqe.com/bsce5sf
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] CPUfreq and conservative governor

2005-07-26 Thread Tres Melton
On Tue, 2005-07-26 at 12:16 +0200, FORT Yannick wrote:
 While shifting from kernel 2.6.11 to 2.6.12, i notice that a new 
 governor for cpuscaling appears : conservative, that is less 
 configurable than ondemand but totally optimized for laptops.

I apologize for commenting without knowing much about this issue from
the enlightenment perspective, but does this have anything to do with
the CONFIG_HZ that appeared in Linux-2.6.13-rc3, I think?  If so then
the issue is going to be much more difficult to resolve than you think.
The kernel absolutely refuses to export its real Hz to user space.  For
more information please see the two bug reports below.

http://bugs.gentoo.org/show_bug.cgi?id=90090
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151763

The problem is that Linus doesn't see a need to report this value to
user space.  To be fair I think it was at one time, long ago, but that
certain user space apps were broken.  The ones that used a hard coded
value of 100.  I have been bouncing email back and forth with four
kernel developers in preparation for taking the issue to the LKML and
would appreciate anymore information that you have on the issue.  The
last email I have is from Robert Love and he explicitly asked me to
Explain why gprof needs to know the _actual_ timing tick.  As you can
see this isn't the only place this is effected.  But it is only one of
three.  (procps and gprof being the other two).

Any further examples of things that need the value would be appreciated
as would anymore detail on this issue that anyone can provide.

 The problem is the cpufreq doesn't known this governor, and nothing is 
 shown in the menu letting user choose its governor, and i've no 
 programing skills to create a patch for it, but i think this is easy for 
   nearly anyone reading this mailing list
 
 Wouldn't be great to display at least the governor name in this menu 
 when the governor is unknown, if someday kernel developpers add 2 
 governors in one version, this module will be hard to use :/
 
 
Best Regards,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] List configuration question.

2005-07-22 Thread Tres Melton
Is there anyway to have the FROM: field changed to point to the list?
If you forget to change the TO: field when replying it ends up going to
the individual and not to the list.  I have been bitten by this a number
of times and have seen others cus and scream as well.  The latest
Example is Edward Presutti's [E-devel] Monitor Module Patch 2 and
2-r1 emails.  If not I'll continue to try to implant new usage into
stubborn brain.  :)

-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] List configuration question.

2005-07-22 Thread Tres Melton
On Fri, 2005-07-22 at 14:48 +0100, Simon Poole wrote:
 The current behaviour is correct.  If you want to send messages to the 
 list, you should be using Reply to all.
 
 On Fri, 2005-07-22 at 03:27 -0700, David Sharp wrote: 
  
  just use your mailer's reply to all feature (wow, i had to remember
  to do it myself as i was typing this..). this will CC list as well as
  send to the original sender.
  

So I get two copies of everything?  Just seems like a waste. Okay
though.

Regards,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: [e-users] Re: [e-users] Exposé for E17

2005-07-22 Thread Tres Melton
On Fri, 2005-07-22 at 19:32 +0200, Dènis Riedijk wrote:
 Well, I guess we could take a look at how e16.8 does it with the pagers...
 As far as I can remember, the pagers in e16 were constantly updated.
 
An issue with the pagers in E-16.8 came up not to long ago while playing
Doom3.  It seems that if the pager is set to make miniature snaps and
continuously scan the screen (and isn't shaded) then Doom became
unresponsive.  I discussed this issue with kwo and he adjusted the
pagers but I believe that he said that it was hard to determine what a
GL app is doing to the display.  Anyway, in the course of the discussion
he informed me that he had re-written the pager code between E-16.7.2
and E-16.8.CVS so you might have to look into both source trees.

Regards,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77alloc_id492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] List configuration question.

2005-07-22 Thread Tres Melton
On Fri, 2005-07-22 at 13:42 -0700, David Sharp wrote:
 I've never gotten two copies of anything. I always figured the list
 mailer is smart enough not to send to people already listed in the To:
 or CC: fields.

Well, it's not.  I have two copies of your email here so it must be your
mail reader.  Currently, I'm using Evolution so I'll dig through the
configs after I finish mucking around with my hardware tonight.

Cheers,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] (no subject)

2005-07-20 Thread Tres Melton





   Project: Eterm: CVS
https://sourceforge.net/cvs/?group_id=212

On sourceforge.net


Under the section for Anonymous CVS Access it says web-based CVS
repository viewer.  I know it says it right after ..to see which
modules are available.. but your eye gets drawn to the highlighted
phrase and that link goes to the cvs repository for the www site:
http://cvs.sourceforge.net/viewcvs.py/eterm/


RiverRat on #Edevelop,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Please ignore the parent, WWW site issues for Eterm/SourceForge

2005-07-20 Thread Tres Melton
Please ignore the parent email, I was collecting information for an
email and accidently had the control down when I hit return.  Sorry.

Basically, go to http://sourceforge.net (login or not), search for
Eterm, click on the Eterm project, click on the CVS link at the top of
the page, and click on either the web based CVS viewer in Anonymous CVS
Access (link below) or the Browse CVS Repository on the right and you go
to a CVS viewer for the WWW site and not the Eterm project source (link
below).

The only way that I found the Eterm code was by going through the
Enlightenment project.
http://cvs.sourceforge.net/viewcvs.py/enlightenment/eterm/Eterm/src/

Shouldn't one be able to get to Eterm's CVS source tree on
SourceForge.net through the Eterm project page?  If I've overlooked
something then please tell me but I think this is wrong behavior.
(TM)  :-)

Regards,

On Wed, 2005-07-20 at 03:56 -0600, Tres Melton wrote:

Project: Eterm: CVS
 https://sourceforge.net/cvs/?group_id=212
 
 On sourceforge.net
 
 
 Under the section for Anonymous CVS Access it says web-based CVS
 repository viewer.  I know it says it right after ..to see which
 modules are available.. but your eye gets drawn to the highlighted
 phrase and that link goes to the cvs repository for the www site:
 http://cvs.sourceforge.net/viewcvs.py/eterm/

RiverRat on #Edevelop,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Eterm accentuation

2005-07-17 Thread Tres Melton
On Sun, 2005-07-17 at 15:44 -0400, Michael Jennings wrote:
 On Sunday, 17 July 2005, at 19:28:30 (+0100),
 losted wrote:
 
 Hi, i recently compiled Eterm 0.9.3 under a NetBSD 2.0.2 system, i=20
  use PT encoding (ISO-8859-1) and accentueted characteres are often used=20
  mainly when i'm using text-mode editors, irc text-mode clients, and etc.
 What is happening, is that Eterm does not accept anyway special char=20
  like '=E9' or whatever, the Font itself suports, i'm able to copy some=20
  text with accentueted characters from somewhere, and paste it under=20
  Eterm and it works, so what is really hapening? things work nice with=20
  other aplications and terminals like rxvt, xterm and etc.
 Can someone tell me what is hapening and if theres a way to correct i=
  t ?
 
 Known bug fixed in CVS.

Somebody in #gentoo was having problems with their CPU load shooting
over 90% in Eterm when using an alternate character set and posted the
bug to the E-dev mailing list.  He told me that someone told him that it
was a known problem and left it at that.  Is this the same bug?  If so,
that's great that it is now fixed.  

Have a good day,
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: Patches to Eterm

2005-07-17 Thread Tres Melton
On Tue, 2005-06-14 at 15:45 -0400, Michael Jennings wrote:

I don't know how I missed this one, sorry.  I just performed a fresh
checkout and everything seems fine.  I had tested it earlier with a cvs
up too though.  I did 'try' and make a patch for the ./configure message
that might be an improvement but I didn't really know what I was doing.
I just glued some stuff together and guessed.  I know there have been
issues with the autoFUCK stuff (as you so eloquently put it in the flame
fest) so just toss it if it's screwy.

 On Wednesday, 18 May 2005, at 01:10:13 (-0600),
 Tres Melton wrote:
 
  The second patch I actually recommend that you do not apply.  It
...
  if(!((rm^0x100)|(gm^0x100)|(bm^0x100)))

Is there already a check for all 3 modifiers == 256 (as above) in
another function or are you just going to let it shade to the same color
(lots of CPU cycles but does nothing)?

 Both patches have been applied.  I used a macro to maintain
 readability.

Nice and clean there.

  If you would like, I would be willing to maintain the shading
  routines.  I don't know cvs well enough to commit anything but you
  can bounce anything that gets submitted off of me and I'll double
  check them; I feel pretty comfortable in that code now.  I tried to
...
 As I do not have a P4 box or an x86_64 box, please test the latest CVS
 stuff to make sure nothing was merged incorrectly.  Also, if you'd
 like to maintain the shading stuff, please do. :)

Everything's golden here.  I read the devel list almost daily but if I
miss something you can send it to me directly (or tell someone else to)
and I'd be happy to deal with any of the shading code maintainence.

  If you tell me where Eterm reads the background from e and sets its
  initial geometry I'd also like to fix the problem of the pop-up
  scrollbar taking up room when it is not active at startup.  (I sent
  you pics of the problem awhile back.)
 
 The initial window creation and sizing is done in windows.c in
 Create_Windows()

Thank, you.  I'll look into the scrollbar issue when I get a chance.

 On Monday, 06 June 2005, at 11:28:28 (-0400),
 John Ellson wrote:
...
  Tres indicated that adding -mpreferred-stack-boundary=16 might
  still be beneficial on x86_64, but that it might consume extra space
  at run time.  I'm not in a position to make that call, but if you
  agree its a good idea I can probably make the configure.in change.
 
 Has anyone done any testing on this in terms of additional RSS being
 needed?

RSS?  Resident Stack Size?  No.  How would I do that?  From what I've
read though, it is almost always desirable to have the cache lines and
the stack base aligned.  Do you know what the line size is on a Pentium
II/III/IV L1 cache?  On x86-64 it is an invalid op to push anything
besides a 64 bit value (32 bit systems can push 8/16/32 bit values) so
the most that is lost there is a single element (aligned to 128 bits).
Where I'm still not clear, and not for a lack of research, is on the
possibility of forcing stack alignment on one file's functions and not
the entire programs.  The -mpreferred-stack-boundary=?? wasn't added to
CFLAGS so right now we should be the same place we were.  I'll look into
this when I finish the stuff in the next paragraph.

I've started porting the code to SSE  SSE2 for x86 and adding some
profiling code with some interesting results but I'm not in any hurry
here.  Pre-populating the cache helped a lot and I'm playing with
temporal writes now.  :)  If I do submit something it will be cleaned up
and hopefully a single file for each arch (x86 supporting the best
option of MMX, SSE,  SSE2) and will have profiling numbers to justify
the patches.  Possibly a single file for both archs.  This is the first
code I ever did for SIMD and it was kinda fun.  :-}

If you reply could you please start another thread as I forget to go
more than a week back in time all too often.

 Thanks,
 Michael

Thank you,
-- 
Tres
--- configure.in	2005-06-14 13:39:00.0 -0600
+++ configure.in.mine	2005-07-17 21:22:24.0 -0600
@@ -518,13 +518,19 @@
   i*86)
   grep mmx /proc/cpuinfo /dev/null 21  HAVE_MMX=yes
   ;;
+  x86_64)
+  HAVE_MMX_MESG=MMX unavailable on x86-64
+  ;;
+  * )
+  HAVE_MMX_MESG=MMX not detected
+  ;;
   esac
   ])
 if test x$HAVE_MMX = xyes; then
 AC_MSG_RESULT([yes (32-bit)])
 AC_DEFINE(HAVE_MMX, , [Define for 32-bit MMX support.])
 else
-AC_MSG_RESULT([no (no MMX detected)])
+AC_MSG_RESULT([no ($HAVE_MMX_MESG)])
 fi
 AM_CONDITIONAL(HAVE_MMX, test x$HAVE_MMX = xyes)
 


Re: [E-devel] e-16.8 cvs bug: slightly broken focus behavior

2005-07-05 Thread Tres Melton
On Tue, 2005-07-05 at 19:05 -0400, Mike Frysinger wrote:
 i prefer to use the 'focus follows mouse click' behavior but ive noticed a 
 quirk in using mouse bindings with it ...
 
 for example, say i have two windows open, Eterm and Gimp ... Eterm currently 
 has the focus (it gets key strokes, uses the window border has the 'active' 
 color, and is on top) ... if i alt+left click the Gimp window to move it 
 around, it is raised to the top most level, but it is not set as active 
 (Eterm still receives my key strokes and the window border has the 'active' 
 color) ... as soon as i just left click Gimp (no keyboard modifiers), it is 
 properly set as the active window
 -mike

I believe kwo is still out till the end of the week but he may be
checking his email.  What are you saying you would like the behavior to
be?  An Alt+Left-click:

1)  raises and focuses the window for movement.

2)  moves the window without bringing it to the top.

Personally I prefer focus follows mouse (sloppily) but I really like the
ability to have a window focused without bringing it to the top.  If you
were to choose option 1 then that would be impossible (except maybe w/
Alt-Tab selection??).  If you prefer option 2 then I'll look into xlib
and see what that entails.  Either way I'll probably wait for kwo to
weigh in on the way it is supposed to work.  We might need to add an
additional option to dis/enable that behavior.

What if you push Alt, press the left button, and release the Alt.
Should the window still be movable until you release the left button or
should the movement be aborted?  In light of the last issue you posted
and the way that kwo patched it I would have to guess that movement
stops with either Alt-key-release OR Left-button-up.  Would you agree?
This all relates to the grab-keyboard/mouse stuff in the main event
loop.

-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Patch for e-16.8 cvs alt+tab bug

2005-06-27 Thread Tres Melton
On Sun, 2005-06-26 at 04:01 -0400, Mike Frysinger wrote:
 finally found a way to reproduce this :)
 
 synced up earlier today, so my running version should be pretty up-to-date ...
 
 if you hold alt+tab and then right click a window, the alt+tab list gets 
 frozen in the middle of the screen ... only way to make it go away is to 
 restart e :/
 -mike

Patch attached.

-- 
Tres
Index: menus.c
===
RCS file: /cvsroot/enlightenment/e16/e/src/menus.c,v
retrieving revision 1.200
diff -u -r1.200 menus.c
--- menus.c	9 Jun 2005 18:28:13 -	1.200
+++ menus.c	27 Jun 2005 07:12:48 -
@@ -2197,6 +2198,7 @@
 	  }
 	else if (!strcmp(prm, named))
 	  {
+ WarpFocusHide();
 	 SoundPlay(SOUND_MENU_SHOW);
 	 MenusShowNamed(p);
 	  }
Index: warp.c
===
RCS file: /cvsroot/enlightenment/e16/e/src/warp.c,v
retrieving revision 1.65
diff -u -r1.65 warp.c
--- warp.c	25 May 2005 21:42:59 -	1.65
+++ warp.c	27 Jun 2005 07:12:49 -
@@ -210,7 +210,7 @@
EFlush();
 }
 
-static void
+void
 WarpFocusHide(void)
 {
int i;


Re: [E-devel] Re: Patches to Eterm

2005-06-21 Thread Tres Melton
Stephen Horner mailed me privately (and asked me to forward to list),

---8

I think I just did what you did, and accidentally emailed you instead of the
list. Damnit! Anyways I looked and just found out that my $MAILDIR/sent is
b0rked, so if you could bounce the email I sent concerning the altivec
optimizations (if you did get it) to the list, I'd be a happy hacker indeed :)

Stephen

ps - do you know any powerpc assembly? 

---8

On Mon, 2005-06-20 at 23:09 -0700, Stephen Horner wrote:
  Damn.  I sent this privately to Mike because I forgot to change the
  address.  I don't know why some people's email leaves out the list but
  sorry Mike.  For the list:

 This would be a good thing for me as well, being that I currently only have a
 PIII Coppermine machine. If you need a box to test the code on, and haven't 
 yet
 found someone to assist you with your code, feel free to give me a holler.
 
 On another note, I'm curious if there is anyone here on the list that has a
 powerpc32 or 64 bit machine and uses Eterm? I'm curious if anyone here uses 
 such
 glorious (^_^) hardware, and could use the altivec version of these
 optimizations. If not, I'll direct my powerpc studies on another project.
 
 Thanks
 Stephen 
 
Interesting that you mention the PPC as a new comment in my code is:

/*  The challenge is now on for the PowerPC gurus to adapt this code to use
 *  the PowerPC's Altivec SIMD engine to achieve the same performance 
 *  increases.  PA-RISC has the MAX extensions, the Alpha has MVI, the SGI
 *  has MDMX, and the UltraSPARC has VIS.  Good luck guys!  :-)
 *
 *  P.S.Sorry MEJ.  :-/
 */


I've been tinkering with this for a little while and ported the x86/MMX
engine to run w/ SSE2 (actually all it does is use all 128 bits of the
registers so it was pretty easy) but their are a couple of instructions
that I used that deal with the upper 64 bits that aren't available in
the SSE1 instruction set.  I'm still looking into that.

This isn't a high priority for me because Eterm's maintainer, Michael
Jennings (Mej), wasn't too excited about yet another code base to
maintain.  Nor has he chimed in on any of these discussions.  The main
reason that I did the original port was that we x86-64 users didn't have
access to any enhanced asm routines so it made a big difference for me.
But people with x86/SSE can still use the original MMX stuff written by
Willem Jan-Monsuwe and get a good deal of speed up.  On the other hand
this type of stuff is interesting so I'm going to do some stuff if for
no other reason than to learn.  However, before I submit a change set to
Mej I want to have a thorough patch that handles all the cases that are
going to be, or could be, handled:

ArchInst-setStatus
---
All C   Done by MejThe profiling base.
x86 MMX Done by Willem
x86-64  SSE2Done by me
x86 SSE2Done but not submitted
x86 SSE In progress
x86 3DNow   ???
PPC Altivec Offered by Stephen Horner???
Alpha   MVI ???
SGI MDMX???
U-Sparc VIS ???
PA-RISC MAX ???

And most importantly I want to justify the changes with some custom
built profiling code and time the SIMD stuff against the C code.  Using
the C stuff for a base should enable meaningful benchmarks across
different speed processors, after all I don't want to profile the speed
of the processors but the speed _increases_ within the processor
families by using their Single_Instruction_Multiple_Data ops.  Provided
I can get some good numbers and someone can help me come up with a good
way to modify the auto-tools nightmare into a coherent and easily
maintainable processor interrogation system then we can collectively
submit the change set to Mr. Jennings.  The auto-tools stuff is a
weakness for me as John Ellson had to be talked into doing it for the
x86-64 SSE2 stuff.  And hopefully we won't irritate Mr. Jennings to much
by redoing a newly committed patch.  ;-)

I have a fairly comprehensive test program that has some timing stuff
in it (needs improvement) that I intend to hand out to the people that
have volunteered to test so that I can ask them to benchmark the various
things.  It also verifies correctness of the code.  I also have a patch
to gdb that enables inspection of the xmm registers as hexadecimal
rather than floats.  And I have found that a few of the instructions in
the original code can be replaced with others that do the same thing but
with different calls.  Those need to be benchmarked to see if they
helped or hurt.  The biggest place that I have squeezed more performance
from is by pre-populating the cache, adding memory fences, and using
non-temporal memory reads and writes.  Again, these different code sets
are going to need benchmarking on different processor cores to see which
ones are 

Re: [E-devel] Intermittent focus issue

2005-06-17 Thread Tres Melton
On Fri, 2005-06-17 at 00:29 -0700, eric richardson wrote:

Another user whose email doesn't contain the list's address.

 * Tres Melton ([EMAIL PROTECTED]) wrote:
 
  Which version of e16?  e16.8 or e16.7.2?  I use lots of mozilla windows
  w/ lots of tabs and have not seen this issue (in either e16).
 
 [eWorld ([EMAIL PROTECTED])-([Fri June 17 12:17am])]
 ~: /usr/enlightenment/bin/enlightenment --version
 Enlightenment Version: 0.16.7.2
 Last updated on: $Date: 2004/11/13 11:21:53 $
 
 I'd notice it lots of times where firefox would open a new window and I
 couldn't get back to being able to type in the original one until the
 new window was closed.  Same thing in 0.17.  I really do tend to blame
 something in mozilla itself, though.

I use e-16.7.2 mostly and am testing e-16.8 and haven't ever noticed
this with mozilla (someone else blamed the common code base that mozilla
 firefox share) and I haven't noticed this.  A friend was over last
week and commented that I had twelve Mozilla windows open with over 100
different tabs in them so I would say that it is Firefox and not
Mozilla.  FYI, there are some very significant changes between e-16.7.2
 e-16.8 but e-16.8 has some really cool new features, namely composite.

 --
 eric richardson -- http://ericrichardson.com -- http://blogdowntown.com
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: Patches to Eterm

2005-06-14 Thread Tres Melton
On Tue, 2005-06-14 at 22:50 -0500, Edward Presutti wrote:
 Is it possible to enable Eterm's SSE2 support on P4 class processors? I've
 got a Northwood class P4 3.2 w/ SSE2, but the new makefile looks for
 X86_64 architecture to determine whether or not to enable SSE2. Is this
 patch specific to the X86_64 architecture or will it work on P4 class
 processors?
 
 Thanks,
 Ed

As the author of those patches I can tell you that the problem is in
two places.  The first is the MMX/SSE instructions and those will work
on whichever processor supports them.  The second set of issues are
things like loop control, pointers, and function parameters.

The original author of the MMX stuff, Willem Jan Monsuwe, wrote the
entire thing in assembly specifically for x86 which annoyed me, a proud
owner of an AMD64, enough to port the routines to x86-64.  In the
processes of porting I decided to use the extra 64bits in the xmm
registers that are present in SSE.  That was actually the easy part.
The harder part was porting the surrounding control to x86-64 so the
short answer to your question is an unfortunate no.  I think that some
of the newer P4s are actually 64 bit, or EM64T, and those processors
should work as I took care not to use functions or registers unique to
AMD64 but the control structures are definitely designed for a 64 bit
processor.  

If you are interested in getting the SSE stuff to work with a 32 bit
processor with SSE2 I'd actually suggest that you start with the MMX
routines, mmx_cmod.S, and add the few changes that were made to the SIMD
instructions.  They are relatively few and revolve around reading twice
the data in, writing twice the data out, and adjusting the counters to
notice that they now handling twice the pixels per pass as the original
MMX routines did.  There are also a few places that have to duplicate
the colormodifiers in the lower 64 bits into the upper 64 bits too.  The
sse_cmod.c file is actually inline assembly instead of pure assembly but
it should be easy enough to compare the two and extract the stuff
relevant to SSE and add them to the mmx_cmod.S file.  My reasons for
moving to inline assembly are detailed in the comments at the top of
sse_cmod.c  I'd also suggest that you ask Mr. Jennings, Eterm's
author/maintainer, about his feelings since I know he wasn't overly
enthusiastic about adding another code base to maintain to his
considerable workload.  When I started I tried to make the port using a
series of #ifdef's but that quickly became overwhelming as the resulting
code had more preprocessor code than C code.  It should be easy enough
to understand what is going on in my code as it is excessively
commented.  

Good Luck and I hope that helps,
The RiverRat

 ---
 SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
 from IBM. Find simple to follow Roadmaps, straightforward articles,
 informative Webcasts and more! Get everything you need to get up to
 speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
 ___
 enlightenment-devel mailing list
 enlightenment-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
-- 
Tres



---
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477alloc_id=16492op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: libast from Eterm CVS fails to build with gcc4 on x86_64

2005-06-10 Thread Tres Melton
On Wed, 2005-06-08 at 19:20 -0700, Stephen Horner wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 18:00, Mon 06 Jun 05, Tres Melton wrote:
  Well, the pages I was reading used the nop trick but it looks like a
  better solution has been presented.  I almost forgot that integers are
  still 32 bit on x86-64 so explaining the movl instead of movq.  For
  those interested the gcc -S is (the #APP/#NO_APP is gcc's way of
  marking inline asm):
  
  - 8 --
  #APP
  startit:
  #NO_APP
  movl$10, -20(%rbp)
  #APP
  stopit:
  #NO_APP
  - 8 --
 
 Forgive me if this seems obvious to most, but i'm unclear on exactly what you
 are saying here about #APP and #NO_APP in regards to labelling assembly code.
 I'm very new to assembly, and this sounded to interesting for me to just say 
 to
 myself meh i'll prolly learn it later . . . ^_^ Also I was curious why if 
 you
 simply intend to lable a code block before assembly, that you don't just
 asm( pants on ); if asm() allows for such a thing ( couldn't find 
 man
 2 asm lol ). At any rate, thanks in advance.
 

First, any blocks of asm([statements] [outputs] : [inputs] :
[clobbers] ); that gcc encounters will be placed between #APP and
#NO_APP.  Second Mike's startit: and stopit: are labels or jump
destinations and as such they must be one word (no spaces, dashes, etc.)
but consume no space (except within the intermediate files).

HTH,
-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61 plasma display: http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Eterm SSE2 patch for x86_64

2005-06-06 Thread Tres Melton
On Mon, 2005-06-06 at 11:28 -0400, John Ellson wrote:

 All credit for the sse2_cmod.c code goes to Tres.   I just did the easy 
 bits.

Thanks, but the real credit goes to Willem Monsuwe [EMAIL PROTECTED] for
writing the original MMX code.  All I did was expand it to use all 128
bits of the xmm registers via SSE2 and make it inline so that it can
handle whatever optimizations are thrown at gcc.  It should be twice as
fast as the original MMX since it processes twice as many pixels at
once.  

Thanks for the work John,
-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61 plasma display: http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] libast from Eterm CVS fails to build with gcc4 on x86_64

2005-06-02 Thread Tres Melton
Has anyone actually looked into why this is failing?

By the rules of Discreet Math/Boolean Algebra:

something xor something = 0
something_else xor 0 = something_else

so

valA ^ ( valA ^ valB ) = valB

If that is not working then you may have found a compiler bug (probably
in the register allocator).  The appropriate output code should resemble
this (ATT syntax: op src, dest):

mov var_a, %r1
mov var_b, %r2
mov %r1, %r3
xor %r2, %r3
xor %r3, %r1
xor %r3, %r2
mov %r1, var_a
mov %r2, var_b

or optimized:

mov var_a, %r1
xor var_b, %r1
xor %r1, var_a
xor %r1, var_b

or load the values, manipulate the values, store the values.  If
something happens in the middle of those steps, like the registers are
needed elsewhere (and not saved), then it will fail.  The above code has
two memory fetches and two memory writes.  Is it really faster than? :

mov var_a, %r1
mov var_b, %r2
mov %r2, var_a
mov %r1, var_b

without the type casting:

#define BINSWAP(a, b)  ((a) ^= (b) ^= (a) ^= (b))

seems to rely on many compiler optimizations that aren't clearly
documented and may be defined differently for different architectures,
not to mention how different compilers will choose to deal with it.
Hence the failure on x86-64.  For those curious compiling:

#define BINSWAP(a, b) \
   (((long) (a)) ^= ((long) (b)) ^= ((long) (a)) ^= ((long) (b)))

int main( void )
{
  long a = 3;
  long b = 8;

  asm( noop;noop;noop );
  BINSWAP(a,b);
  asm( noop;noop;noop );

}

yields:

noop;noop;noop
movq-16(%rbp), %rdx
leaq-8(%rbp), %rax 
xorq%rdx, (%rax)   
movq-8(%rbp), %rdx 
leaq-16(%rbp), %rax
xorq%rdx, (%rax)   
movq-16(%rbp), %rdx
leaq-8(%rbp), %rax 
xorq%rdx, (%rax)   
noop;noop;noop

If you enable -O[123] then you will need to use the values a  b before
and after the BINSWAP call or they will be optimized away.  And simply
using immediate values like I did will cause the compiler to simply set
the different registers that are used to access them in reverse order.
In other words the swap gets optimized out.  The above code is without
-O and is clearly more complicated (by more than double) than it needs
to be.

Just my $0.02,
-- 
Tres



---
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] libast from Eterm CVS fails to build with gcc4 on x86_64

2005-06-02 Thread Tres Melton
On Thu, 2005-06-02 at 21:04 -0400, John Ellson wrote:
 Its not the xor thats failing.   Its the cast of the LHS of the 
 assignments.

I understand now.  Sorry for the confusion.

-- 
Tres



---
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] 2 report bug in E16.8

2005-05-20 Thread Tres Melton
On Fri, 2005-05-20 at 21:57 +0200, Kim Woelders wrote:
 Tres Melton wrote:
  I used to have a problem with borderless Eterm's shrinking when
  changing back and forth in virtual/multiple desktops.  Those seem to
  have fixed themselves.  But, I still suffer from windows that grow
  while switching desktops.  I don't know enough about the way that
  Eterm and Enlightenment interact to have a clue where to start
  looking for the problem though.
  
 What? That should definitely not happen. If is does, it should be e16's
 fault exclusively. I use a *lot* of Eterm's and I don't ever see this
 happen. If you can describe a test scenario I'll take a look at it.

Before I get you too excited let me confirm that it happens with the
latest cvs's of Eterm/Enlightenment and then figure out exactly when it
happens and what to do to replicate it.  I, too, use many Eterms and am
very anal about them being in certain places and certain sizes and get
aggravated when they move or change.  8-)

  Let me know if I can test anything for you.
  
 Bug reports are most welcome, however, I consider e16  0.16.8 (CVS)
 history :)

I have a couple of issues to sort out on my machine to get it up to
e-16. (a live cvs build for Gentoo), mostly due to unrelated things.

 It looks like you have dug yourself quite deeply into Eterm, and I think
 there is a problem that might be related to (not caused by!) the stuff
 you are doing.

Actually, I don't know that much about Eterm.  I know graphics so I
ported the shading routines to SSE2 on AMD64.

 The problem happens with Eterm's with pure transparency, i.e. -O only.

It has been my casual observation that it happens with borderless Eterms
as I have never observed it with a bordered Eterm.  It also seems to be
related to changing desktops w/o a focused window to/from desktops with
a focused Eterm, which depends on where the mouse is.

 After an e16 restart, Eterm will often complain about the background
 pixmap being invalid. I think this is a bug in Eterm, and if you are
 interested I'll come up with the gory details of what I think happens,
 and what I think could be done about it.

All of my Eterms are transparent and I never restart enlightenment.  It
used to have a bad habit of moving all of my windows to the upper left
virtual desktop (fixed now) so I got used to avoiding restarts.  I am
interested, but, as I said above, I'd like to confirm that the problem
exists with the latest version of e 16 before you spend any of your
time.  I can say that I rarely change desktops by sliding my mouse off
the edge or clicking on the pager; I use hot keys instead.

 /Kim
-- 
Tres



---
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412alloc_id=16344op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] Patches to Eterm

2005-05-18 Thread Tres Melton
Michael,

15bpp saturation shading:  mostly readability changes like adding shift
of 0.  (The compiler optimizes them out, I checked.)  An attempt to
change the spacing/indentation to be more like your style. And fix an
error that I made by leaving one too many bits set in green. (oops! :-)

16bpp saturation shading:  same as above w/o an error to correct.  Move
the int declarations back into the inner loop after verifying that the
compiler moves them out on its own.

32bpp unsaturated shading:  remove the unneeded type casting (unsigned
char).

24bpp unsaturated shading:  remove the unneeded temp variables r, g, and
b.  In reality all of the r, g, and b variables that are still needed
could be replaced with a single temp since I patched everything to
process the colors one at a time. (For efficiency in memory access two
should be used for 15/16bpp to avoid indexing into the array three
times; b is currently used for that as it is the last color processed.)

The second patch I actually recommend that you do not apply.  It does
greatly speed up the instruction but costs readability.  It is only
executed once per window to shade so the speedup may not be worth it but
I'll leave the decision up to you.  Basically checking that each color
modifier is less than 256 is the same as checking that no bits of 8
(starting from 0) or higher are set.  Or them all up, shift off the
irrelevant bits and see what's left.  You could also check to see if all
three color modifiers are equal to 256 and blow out early if you wanted
to.  if(!((rm^0x100)|(gm^0x100)|(bm^0x100)))

When I say that I checked how the compiler does something I mean that I
wrote small little programs and compiled them and then looked at the
assembly output.  It would be too tedious to look at the assembly output
of a large file like pixmap.c.

If you would like, I would be willing to maintain the shading routines.
I don't know cvs well enough to commit anything but you can bounce
anything that gets submitted off of me and I'll double check them; I
feel pretty comfortable in that code now.  I tried to adjust the
auto-magic stuff for sse2 but I don't know it well enough yet.  I
manually linked Eterm to use the sse routines on my machine and they
work here.  Please feel free to remove all my notes from the top of the
sse2_cmod.c file and any other crap you want to before merging it.  Let
me know if you want me to test the merged code.

Please double check the 15bpp C saturation shading routine just in case
I had another synapse misfire again. 

If you tell me where Eterm reads the background from e and sets its
initial geometry I'd also like to fix the problem of the pop-up
scrollbar taking up room when it is not active at startup.  (I sent you
pics of the problem awhile back.)

When it comes to the terminal side, I've always considered that a black
art so there isn't much I could do there even if I wanted to.  Actually
I think it works pretty damn great and doesn't need anything else.  If
there is any other graphics stuff or even assembly that you want help
with let me know -- other than that I'm going to start spending my free
time trying to code on E17 by adding SSE2 optimizations to evas.

Thanks for all your help and I hope I wasn't too much of a pain in the
ass.  I think I finished everything that I set out to do.

Regards,
-- 
Tres
Index: eterm/Eterm/src/pixmap.c
===
RCS file: /cvsroot/enlightenment/eterm/Eterm/src/pixmap.c,v
retrieving revision 1.112
diff -u -r1.112 pixmap.c
--- eterm/Eterm/src/pixmap.c	10 May 2005 18:59:50 -	1.112
+++ eterm/Eterm/src/pixmap.c	18 May 2005 06:03:40 -
@@ -1579,12 +1579,12 @@
 int r, g, b;
 
 b = ((DATA16 *) ptr)[x];
-r = ( (b  10 )* rm )  8;
-r = ( r  0x001f ) ? 0xfc00 : ( r  10 );
-g = (((b   5 )  0x003f ) * gm )  8;
-g = ( g  0x001f ) ? 0x03e0 : ( g  5 );
-b = (( b  0x001f ) * bm )  8;
-b = ( b  0x001f ) ? 0x001f : b;
+r = (((b  10)  0x001f ) * rm)  8;
+r = (r  0x001f) ? 0x7c00 : (r  10);
+g = (((b   5)  0x001f ) * gm)  8;
+g = (g  0x001f) ? 0x03e0 : (g  5);
+b = (((b   0)  0x001f ) * bm)  8;
+b = (b  0x001f) ? 0x001f : (b  0);
 ((DATA16 *) ptr)[x] = (r|g|b);
 }
 ptr += bpl;
@@ -1618,15 +1618,16 @@
 }
 } else {
 for (y = h; --y = 0;) {
-int r, g, b;
 for (x = -w; x  0; x++) {
+int r, g, b;
+
 b = ((DATA16 *) ptr)[x];
-r = ( (b  11 )* rm )  8;
-		r = ( r  0x001f ) ? 0xf800 : ( r  11 );
-g = (((b   5 )  0x003f ) * gm )  8;
-		g = ( g  0x003f ) ? 0x07e0 : ( g  5 );
-b = (( b  0x001f ) * bm )  8;
-		b = ( b  0x001f ) ? 0x001f : 

Re: [E-devel] Eterm-0.9.3 + deadkeys not working right ?

2005-05-09 Thread Tres Melton
Mike / Tobias,

I don't use deadkeys myself, being American and only speaking
programming languages, but after a few minutes of Googling I would guess
that it started here:

Commit by mej  ::  eterm/Eterm/ (ChangeLog configure.in): 
Mon Apr 18 16:00:22 2005 Michael Jennings (mej)
Remove unused NO_XLOCALE crap and do it right.

and in configure.in:

# check if we need X_LOCALE definition
AC_CHECK_LIB(X11, _Xsetlocale, AC_DEFINE(X_LOCALE, , [X locale.]),
AC_DEFINE(NO_XLOCALE, , [No X locale.]))

# For multibyte selection handling
#if test $MULTICHAR_ENCODING != none; then
  AC_CHECK_LIB(Xmu, XmuInternAtom)
#fi

and from the XFree86-4.3.0 Changelog:
http://www.hupo.org.cn/docs/linuxdoc/XFree86-4.3.0/CHANGELOG

952. A more complete set of dead accent/space compose sequences, add
  Multi_key slash for letters with a stroke, and add some
  combos for exponent characters, katakana voiced sounds, etc to
  the en_US.UTF-8 compose file (#5646, David Monniaux).

The bottom line is that a X_LOCALE needs to be specified so that things
know what to do with deadkey sequences.  The same sequence can generate
different things in different locations.  I would ask what X_LOCALE is
set to and try changing it.  You (Mike) know more about this automagical
crap than I but I'd bet money (a little) that the problem is related to
MeJ's change on 18 April 2005.

Hope that helps.

BTW, do you know how to get gdb to display the contents of a xmm
register on AMD64 in hex?  info all-registers and variants only show
the values as floats.  (The registers are shared between the SSE and x87
stuff but I'm not using x87).  And I tried gdb ver. 6.0-r1 in Gentoo,
6.3 from gnu.org and the latest cvs.


-- 
Tres



---
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393alloc_id=16281op=click
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] GPL on eclair?

2005-05-07 Thread Tres Melton
On Sat, 2005-05-07 at 04:17 -0400, Mike Frysinger wrote:
 On Saturday 07 May 2005 03:53 am, Mike Frysinger wrote:
  i just see it as an annoying issue where you suddenly cant assume that you
  can copy  paste code from any old random e17 app into any other random e17
  app
 
 after a quick chat with Simon, why not have a general rule with e17:
 - libraries are BSD
 - apps can be GPL/whatever, but you should add an exception clause that says 
 that e17 devs can import code from your GPL/whatever app into their app 
 without having to worry about stupid licensing conflicts
 
  side note, doing a quick check on all the COPYING files, these are GPL:
  e17/libs/esmart
  e17/libs/epsilon
  e17/proto/exml
 
 since these are libraries and GPL-2 sucks for libraries, perhaps this was 
 just 
 a copy  paste error and these really should be BSD ?
 -mike
 
 
 ---
 This SF.Net email is sponsored by: NEC IT Guy Games.
 Get your fingers limbered up and give it your best shot. 4 great events, 4
 opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
 win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
 ___
 enlightenment-devel mailing list
 enlightenment-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

This is kinda important to me as well.  I am finishing up a SSE2 port of
Eterm's shading routines and wanted to use the GPL license for them.  I
didn't even realize that E/Eterm uses the BSD license.  I would prefer
the code stay open with the GPL but if MeJ insists then I suppose the
BSD license will have to do.  It's his baby after all.  I just like
things to go fast!  ;-)

-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] Fix for 15bpp C shading routine and a small issue with the 15bpp MMX saturated shading routine

2005-05-07 Thread Tres Melton
Eterm devs,

PATCHES:

1.  Like I suspected the 15bpp w/ saturation C shading routine flips on
too many bits in adjacent colors just like the 16bpp C routine did.
This patch corrects that behavior.

2.  I'm not sure how 15bpp is defined.  If the highest bit should always
be zero then there is a bug in the 15bpp MMX shading routines.  I
noticed it comparing the output of my new 15bpp shading routine to the
old one.  If the 15th pixel is ALWAYS, and WILL ALWAYS BE, ignored then
this is not an issue.  If not then the 15bpp MMX shading routine leaves
overflow from the red color modification in the left most bit and it
should be cleared.  I have attached a patch to do just that.  This is
only an issue in the saturation section as mathematics say that without
saturation red will never overflow.

NOTES:
The status of the x86_64 port of the MMX routines is kinda dead.  While
doing the port it occurred to me that all 64bit processors will have at
least SSE2 with 128bit Multi-Media registers and that there is no reason
that I shouldn't take advantage of that.  As a result the *NEW* SSE2
port of the shading routines is as follows:

1.  The 15bpp SSE2 shading routines are complete and verified to shade
identically to the 15bpp (patched) C shading routines.  They shade 8
pixels per pass until pixels_remaining_for_line / 8 = 0 and then shade
one at a time.  This is twice as many pixels per pass as the MMX
routines and we should see a corresponding speed improvement as well.

2.  The 16bpp SSE2 shading routines are complete and verified to shade
identically to the 16bpp (patched) C shading routines.  The same
performance boost as 15bpp mode has been included.

3.  The 32bpp routine is currently working with the 64 bit MMX registers
and processing one pixel at a time.  I hope to convert it to use the
full 128 bits and process two pixels at a time.  That is the max as room
for overflow is needed (see note below).  This will more than double the
complexity of this routine but also double its performance.

4.  The 24bpp routine is still under investigation.  There is not a 24bpp
MMX shading routine but that isn't the problem.  The problem is moving
24 bits of data into a processor's register and zero padding the
remainder of the pixel to a byte boundary of 2^n (where n is
non-negative and whole).  24 bits = 3 bytes and there is no 'n' that
works directly.  The only solution is to read a byte at a time.  That's
three reads and three writes for each pixel.  That is actually what the
C routine does by manipulating the three unsigned chars.  Once each
pixel is loaded the shading is identical to the 32bpp routines but the
overhead of unpacking the 24 bits into 32 and then repacking is not
looking to be worth it, especially if after all of that work we can only
process two pixels at a time.  I attempted a work around that reads the
data 32 bits at a time and simply writes the top most 8 bits back out
when storing the other 24 bits of the pixel.  If anybody has any
suggestions I've overlooked on this topic then _PLEASE_ speak up.

Things to note (maybe for the Eterm man page under --cmod):  All of the
colors of all of the pixels need some room for overflow during the
intermediate steps of the shading.  Although no hard errors will occur
strange behavior will happen when the color * modifier exceeds the
temporary storage.  For 15  16 bpp mode overflow bits are:

15bpp   5 bits red  5 bits green5 bits blue
3 bit overflow  3 bit overflow  3 bit overflow  
16bpp   5 bits red  6 bits green5 bits blue
3 bit overflow  2 bit overflow  3 bit overflow  

This is true for all the shading routines: C, MMX, and SSE2.  In 24  32
bpp modes the color consumes the entire byte and so a word is used for
the intermediate values.  Therefore each color of each pixel has a full
8 bits for overflow.  The colors are still condensed back to 8 bits upon
completion though.  It is impossible to use a couple of bits from the
alpha channel for overflow as the working size must be byte aligned and
the first size above 8 bits is 16.

While lurking on the #gento-dev channel I noticed some of the devs
bitching about the register allocator in gcc (v. 4 I think).  The MMX
routines expect the register allocator to behave a certain way and will
bitch loudly (or SEG_FAULT) if its behavior changes.  (The incoming
parameters will be in unpredictable locations).  To avoid any problems
with this issue I have opted to write the SSE2 routines using inline
assembly.  Even if I had written it in pure assembly combining it with
the mmx_cmod.S would have required more #ifdef 's than code.  Sorry
Mej!  :-/  I started to do it that way for you but if you saw the code
you'd flip.  Much more detailed info is in the comments at the top of
the new file and will be submitted soon.

Is there a way to look at Eterm-0.9.4/src/pixmap.c without getting the
entire CVS tree?  A link to a web page with the latest pixmap.c source
in CVS 

Re: [E-devel] GPL on eclair?

2005-05-07 Thread Tres Melton
On Sun, 2005-05-08 at 02:08 +0900, Carsten Haitzler wrote:
 On Sat, 07 May 2005 09:14:46 -0600 Tres Melton [EMAIL PROTECTED] babbled:
 
  On Sat, 2005-05-07 at 04:17 -0400, Mike Frysinger wrote:
   On Saturday 07 May 2005 03:53 am, Mike Frysinger wrote:
i just see it as an annoying issue where you suddenly cant assume that 
you
can copy  paste code from any old random e17 app into any other random 
e17
app
   
   after a quick chat with Simon, why not have a general rule with e17:
   - libraries are BSD
   - apps can be GPL/whatever, but you should add an exception clause that 
   says 
   that e17 devs can import code from your GPL/whatever app into their app 
   without having to worry about stupid licensing conflicts

But If the E developers import your code into a BSD project then won't
it become BSD?  Otherwise that seems like a good plan.

  This is kinda important to me as well.  I am finishing up a SSE2 port of
  Eterm's shading routines and wanted to use the GPL license for them.  I
  didn't even realize that E/Eterm uses the BSD license.  I would prefer
  the code stay open with the GPL but if MeJ insists then I suppose the
  BSD license will have to do.  It's his baby after all.  I just like
  things to go fast!  ;-)
 
 well you could make them GPL - but mej can just not accept the patches as 
 they would taint eterm's existing license making it gpl. bsd guarantees that 
 THAT code stays open - but if people can steal it - if i want to steal that 
 code and put it into some closed proprietary project - you would never know. 
 it can be reformatted, and at the end of the day its an algorithm. there are 
 only so many ways you can write a fast routine to do a fairly narrow scoped 
 task. if that was the case someone would have claimed copyright infringement 
 on for (i = 0; i  n; i++) a long time ago :)

I take your point about code theft.  It is a good one but there are
exceptions to that.  What's that guy's name in Germany that runs
gpl-violations.org, Harald Welte?  He also wrote most of the IP tables
code.  He has gotten a number of companies to comply with the GPL and
post their code.  Every once in awhile hell does freeze over.  :-/

If for(i = 0; i  n; i++) was patented would it be owned by Brian
Kerningham or Dennis Ritchie?

 anyway - i can understand what you mean - but even if it were gpl you 
 couldn't practically find instances of it in closed code :(. you will know 
 your code will be public and free in eterm's code and available and able to 
 be re-used with very few restrictions, but not more limitations than that.
 
 basically if someone submits patches to code - they are implicitly agreeing 
 to the existing copyright license unless they ask for a change or re-license 
 their patches and code. if they are licensed differently the chances of them 
 being used drop dramatically to somewhere about 0 :(

As I stated, Eterm is Mej's baby and he can have my modifications anyway
that he wants them.  That doesn't mean that I can't hope for a license
change to the GPL though.  My SSE2 modifications were originally
patterned after the MMX extensions by Willem Monsuwe [EMAIL PROTECTED]
anyway.  They were changed to run on x86_64, use all 128 bits of the
SSE2 %xmm registers, and converted to inline assembly to avoid problems
with a changing register allocator in gcc.  They are hardly recognizable
now and if I worked for M$ Mr. Bill would claim ownership but I'm coding
for the benefit of everyone and I believe in giving credit where credit
is due.  Cheers to Willem!  ;-)  I still wish there was someway to
guarantee that my work wouldn't end up in the hands of a morally
impaired company without at least getting a paycheck.  I might be old
fashioned but I like to get kissed before I get screwed.  Your points
are well taken but I still don't want to be the one to write M$' next
TCP/IP stack.

-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] Eterm's 64 bit MMX extensions

2005-05-03 Thread Tres Melton
Michael,
I've read an enormous amount in the last two weeks about assembly code,
mmx extensions, and GNU/Linux and the GCC tools that make it all work.
I now feel comfortable stating that I know what is going on.  The inline
assembly code was a necessary intermediate step for me to understand
exactly what was going on inside the processor during the shading
routines.  The code that I'm using now uses the exact same mmx
operations as the 32 bit stuff.  The difference is pointers and
registers are now 64bits but integers and data remained 32 bits.  That
complicated moving the data into and out of the processor's registers.
There are two ways to load a 32bit number into a 64bit register: with
zero padding or sign extension.  This is usually handled by the compiler
but we are in assembly code here and not C.  The function prototype:

shade_ximage_16(void *data, int bpl, int w, int h, int rm, int gm, int bm)

uses integers.  But the routines that handle the data treat it as if
they were unsigned (zero padded).  Obviously bpl, w, and h cannot ever
be negative but can the color modifiers ever be?  The way that I read
the documentation they cannot and that is how I've coded the routines.
If I'm wrong then please let me know before I finalize the patches.

The real question that I have for you is, in what form would you like
the patches?  1)  Inline assembly code.  2)  A separate asm file for the
MMX_64 stuff.  3) Combining the 32 bit and 64 bit assembly stuff in a
single file in A) a single #if #else #end placing two almost identical
copies of the code in each w/ needed modifications. B) With a bunch of
#if #else #endif sprinkled throughout the code or C) with a number of
#defines that get defined to handle either 32 or 64 bit code?

The 3 ways to put it all in the mmx_cmod.S file:
A)
#if HAVE_MMX
/*  All of the 32 bit code  */
#else if HAVE_MMX_64
/*  All of the 64 bit code  */
#endif

B)
#if HAVE_MMX
leal (%esi, %ebx, 4), %esi
#else if HAVE_MMX_64
leaq (%rsi, %rbx, 4), %rsi
#endif

C)
#if HAVE_MMX
#  define SI esi
#  define BX ebx
#  define LEA leal
#else if HAVE_MMX_64
#  define SI rsi
#  define BX rbx
#  define LEA leaq
#endif
LEA (%SI, %BX, 4), %SI

My preference would be a separate mmx_cmod_64.S file but it is your
baby so I'll do it however you want it.  I see that you committed a
change to the cvs tree for the issue of the HAVE_MMX_64 macro definition
so that's done but if we add a new mmx_cnod_64.S (or mmx_cmod_64.c for
inline asm) file then the makefile will have to be updated.  I get
Makefiles but not Makefile.am as much.  It will have to be conditional
so that MMX_SRCS = mmx_cmod.S | mmx_cmod_64.S depending upon whether we
HAVE_MMX or HAVE_MMX_64.  I'm hoping to con you or vapier into helping
me there.  Assuming that all of my previous assumptions are correct the
only question left is what format you want the additions in.  (Oh, and
do you want me to strip out the comments that I had to add to understand
the code; I don't need them anymore but it would have been nice to have
them there when I started!)

Best Regards,
-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] x86_64 MMX for 32 bpp Eterm shading

2005-04-30 Thread Tres Melton
Sorry about no subject on the original.
-- 
Tres



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


[E-devel] Eterm patches to fix and optimize it.

2005-04-26 Thread Tres Melton
I have attached some patches to Eterm.

1.  eterm-0.9.3-endian.patch
Fix the big/little endian problem that is really the cause of the
strange blue visuals.  The pixmap.c code expects WORDS_BIGENDIAN to be
undefined on little endian machines.  libast/sysdef.h.in causes that
macro to get set to 0.  This patch makes the proper segment get compiled
whether WORDS_BIGENDIAN is defined to 0 or undefined.

2.  eterm-0.9.3-revert-pixmap-colmod.patch
This patch will revert the original fix for the blue tint.  It
shouldn't have even done anything on my AMD64 because it is a little
endian machine.  It is my bet that this patch would actually break a
SPARC or other big endian machine.  Was it ever tested on such hardware?

3.  eterm-0.9.3-shade_24_mmx.patch
4.  eterm-0.9.3-shade_24_mmx-2.patch
The first one adds an MMX optimized assembly routine for shading an
Eterm w/ 24 bit color.  The second modifies pixmap.c to use the new
routine.  I don't have the hardware to test this one with so someone
with an x86 (not AMD64 or EM64T) and 24 bit color should test it.

5.  eterm-0.9.3-shade_32_optimize1.patch
This is a patch that MeJ wanted that optimizes the 32 bit non-MMX
shading routines.  The premise of the patch is to read a color,
manipulate that color, and store that color, and then goto the next
color.  This will allow the value for a color to stay in the CPU's
registers throughout its manipulation and because it is in a double loop
and executed for each pixel in the image it should increase performance.
As an added bonus it saves three integers from the stack.

6.  eterm-0.9.3-shade_32_optimize2.patch
The same type of thing as #5 but it is not quite as clean because the
value of the original color is used twice and I didn't want to perform
two memory fetches so I had to keep the temporary variables r, g, b.

7.  eterm-0.9.3-shade_24_optimize1.patch
This is pretty much the same as #5 except for 24 bit color.

8.  eterm-0.9.3-shade_24_optimize2.patch
This is pretty much the same as #6 except for 24 bit color.

Next I would like to start submitting some hooks for the
shade_ximage_??_mmx_64 routines that I am working on.  Some of the line
numbers in the patch files might be off as these hooks are already in my
code.  I think that they take 17 lines at the top of the file.  I did
edit the patch files to remove the stuff that wasn't relevant to each
particular patch.  Let me know if I've done this correctly and how to
fix it if I haven't.  I will confess that I need sleep (@ 5:45 am) but I
compiled after all of them and tested the parts that I have hardware for
so everything should work (famous last words).  I made all of the
patches from the src directory.

-- 
Tres
--- pixmap.orig.c	2005-04-26 02:34:25.0 -0600
+++ pixmap.c	2005-04-26 02:42:17.0 -0600
@@ -1620,6 +1640,16 @@
 }
 }
 
+/*  WORDS_BIGENDIAN gets defined to 0 in libast/sysdef.h but the following   */
+/*code considers undefined to be false and accepts any definition as */
+/*being true.  Changing it here, in this way, makes it the only place*/
+/*that any change needs to be made.  */
+# ifdef WORDS_BIGENDIAN
+#   if WORDS_BIGENDIAN == 0
+# undef WORDS_BIGENDIAN
+# warning WORDS_BIGENDIAN is defined but 0.  Undefining WORDS_BIGENDIAN.
+#   endif
+# endif
 /* RGB 32 */
 static void
 shade_ximage_32(void *data, int bpl, int w, int h, int rm, int gm, int bm)
Patch taken from upstream cvs to fix funky blue tinting.

--- src/pixmap.c
+++ src/pixmap.c
@@ -1649,12 +1649,12 @@
 int r, g, b;
 
 # ifdef WORDS_BIGENDIAN
-r = (ptr[x + 6] * rm)  8;
-g = (ptr[x + 5] * gm)  8;
-b = (ptr[x + 4] * bm)  8;
-ptr[x + 6] = r;
-ptr[x + 5] = g;
-ptr[x + 4] = b;
+r = (ptr[x + 1] * rm)  8;
+g = (ptr[x + 2] * gm)  8;
+b = (ptr[x + 3] * bm)  8;
+ptr[x + 1] = r;
+ptr[x + 2] = g;
+ptr[x + 3] = b;
 # else
 r = (ptr[x + 2] * rm)  8;
 g = (ptr[x + 1] * gm)  8;
@@ -1672,9 +1672,9 @@
 int r, g, b;
 
 # ifdef WORDS_BIGENDIAN
-r = (ptr[x + 6] * rm)  8;
-g = (ptr[x + 5] * gm)  8;
-b = (ptr[x + 4] * bm)  8;
+r = (ptr[x + 1] * rm)  8;
+g = (ptr[x + 2] * gm)  8;
+b = (ptr[x + 3] * bm)  8;
 # else
 r = (ptr[x + 2] * rm)  8;
 g = (ptr[x + 1] * gm)  8;
@@ -1684,9 +1684,9 @@
 g |= (!(g  8) - 1);
 b |= (!(b  8) - 1);
 # ifdef WORDS_BIGENDIAN
-ptr[x + 6] = r;
-ptr[x + 5] = g;
-ptr[x + 4] = b;
+ptr[x + 1] = r;
+ptr[x + 2] = g;
+

Re: [E-devel] Re: Eterm patches to fix and optimize it.

2005-04-26 Thread Tres Melton
I just checked out the new code and noticed that you basically did what
I did in 1. and 2.  I think your solution is a bit more readable.  

On Tue, 2005-04-26 at 13:59 -0400, Michael Jennings wrote:
 On Tuesday, 26 April 2005, at 05:46:39 (-0600),
 Tres Melton wrote:
 
  3.  eterm-0.9.3-shade_24_mmx.patch
  4.  eterm-0.9.3-shade_24_mmx-2.patch

 Doesn't work here.  I get the following assembler errors:
 
 mmx_cmod.S:524: Error: expecting scale factor of 1, 2, 4, or 8: got `3)'

Ok, 3. was an act of stupidity.  I forgot that lea? uses (base_ptr,
index, sizeof) where sizeof is required to be 2^n.  On 64 bit machines
the mm? registers are 128 bits so it could also be 16. None of that
stuff will even compile in 64 bit mode so I was guessing and pulled a
stupid.  I might try again sometime but I would have to do some math
prior to reading the 24 bit pixel into a 32 number and before writing it
back out.  I did ensure that writing 32 bits for a 24 bit pixel didn't
hose any data, though.  If I get some other hardware I'll try again so
that when I do submit something I can be sure that it works and that I'm
not wasting your time.  For now, I suggest dropping 3.  4. and that we
not speak of them again (sorry).

  5.  eterm-0.9.3-shade_32_optimize1.patch
  6.  eterm-0.9.3-shade_32_optimize2.patch
  7.  eterm-0.9.3-shade_24_optimize1.patch
  8.  eterm-0.9.3-shade_24_optimize2.patch
 
 All applied and committed.
 
  Next I would like to start submitting some hooks for the
  shade_ximage_??_mmx_64 routines that I am working on.  Some of the
  line numbers in the patch files might be off as these hooks are
  already in my code.  I think that they take 17 lines at the top of
  the file.  I did edit the patch files to remove the stuff that
  wasn't relevant to each particular patch.  Let me know if I've done
  this correctly and how to fix it if I haven't.  I will confess that
  I need sleep (@ 5:45 am) but I compiled after all of them and tested
  the parts that I have hardware for so everything should work (famous
  last words).  I made all of the patches from the src directory.
 
 Changes in line numbers are not a problem, but if you could make the
 patch from the top-level directory (cvs/eterm/Eterm/), that would be
 great. :)
 
 Thanks,
 Michael

I'll do my best but I'm still learning cvs, diff, patch, etc..  The
other coding that I've done was part of a small group where cvs wasn't
as important and we could always meet around the lunch table.

Thanks,

-- 
Tres



---
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: Eterm patches to fix and optimize it.

2005-04-26 Thread Tres Melton
I just checked out the new code and noticed that you basically did what
I did in 1. and 2.  I think your solution is a bit more readable.  

On Tue, 2005-04-26 at 13:59 -0400, Michael Jennings wrote:
 On Tuesday, 26 April 2005, at 05:46:39 (-0600),
 Tres Melton wrote:
 
  3.  eterm-0.9.3-shade_24_mmx.patch
  4.  eterm-0.9.3-shade_24_mmx-2.patch

 Doesn't work here.  I get the following assembler errors:
 
 mmx_cmod.S:524: Error: expecting scale factor of 1, 2, 4, or 8: got `3)'

Ok, 3. was an act of stupidity.  I forgot that lea? uses (base_ptr,
index, sizeof) where sizeof is required to be 2^n.  On 64 bit machines
the mm? registers are 128 bits so it could also be 16. None of that
stuff will even compile in 64 bit mode so I was guessing and pulled a
stupid.  I might try again sometime but I would have to do some math
prior to reading the 24 bit pixel into a 32 number and before writing it
back out.  I did ensure that writing 32 bits for a 24 bit pixel didn't
hose any data, though.  If I get some other hardware I'll try again so
that when I do submit something I can be sure that it works and that I'm
not wasting your time.  For now, I suggest dropping 3.  4. and that we
not speak of them again (sorry).

  5.  eterm-0.9.3-shade_32_optimize1.patch
  6.  eterm-0.9.3-shade_32_optimize2.patch
  7.  eterm-0.9.3-shade_24_optimize1.patch
  8.  eterm-0.9.3-shade_24_optimize2.patch
 
 All applied and committed.
 
  Next I would like to start submitting some hooks for the
  shade_ximage_??_mmx_64 routines that I am working on.  Some of the
  line numbers in the patch files might be off as these hooks are
  already in my code.  I think that they take 17 lines at the top of
  the file.  I did edit the patch files to remove the stuff that
  wasn't relevant to each particular patch.  Let me know if I've done
  this correctly and how to fix it if I haven't.  I will confess that
  I need sleep (@ 5:45 am) but I compiled after all of them and tested
  the parts that I have hardware for so everything should work (famous
  last words).  I made all of the patches from the src directory.
 
 Changes in line numbers are not a problem, but if you could make the
 patch from the top-level directory (cvs/eterm/Eterm/), that would be
 great. :)
 
 Thanks,
 Michael

I'll do my best but I'm still learning cvs, diff, patch, etc..  The
other coding that I've done was part of a small group where cvs wasn't
as important and we could always meet around the lunch table.

Thanks,

-- 
Tres



---
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Eterm amd64 / 64 bit fix

2005-04-26 Thread Tres Melton
This was actually my patch but vapier (Mike Frysinger) submitted it
through Gentoo for me.  It is not a 64/32 bit thing so much as it is a
32 bits/pixel thing.  I'm not as familiar with the code as I'd like to
be but this code snippet seems hard coded for 32bpp.  Has this been
tried using other color depths?  If icon_data is a pointer to a 16bpp
image and we divide it by CARD32 won't the nelements parameter be half
of what it should be?

Just asking.  :-)

On Tue, 2005-04-26 at 13:26 -0400, Michael Jennings wrote:
 On Thursday, 21 April 2005, at 18:17:26 (-0400),
 Mike Frysinger wrote:
 
  the patch i committed applied against src/command.h and that's it
  ... i was just checking other things to see if similar fixes could
  be used elsewhere and i noticed pixmap.c ... so i'm asking for
  someone else to verify that this code is 32bit/64bit clean since i
  know nothing of X functions
 
 Your patch did not change the definition of CARD32.  It is still 32
 bits.  Rather, it changed the structure for MWM hints to use CARD64
 and INT64 instead of CARD32 and INT32.
 
 Michael
 
-- 
Tres



---
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: Eterm's latest patch

2005-04-26 Thread Tres Melton
The reply to the top half of the email (deleted) is in another email to
the list.

On Tue, 2005-04-26 at 12:00 -0400, Michael Jennings wrote:
  Another issue that needs to get resolved in the ./configure script
  is that it enables MMX on my AMD64.  That's fine, I have MMX, but
  the code in mmx_cmod.S is not compatible with a 64 bit chip.  The
  proper thing to do at the moment is to disable MMX on AMD64.  Eterm
  won't compile without doing that via ./configure --disable-mmx.
  That should be automatic.  There is also a HAVE_MMX macro that
  enables/disables certain things.  I propose adding a HAVE_MMX_64
  macro for using the MMX extensions on AMD64 (and EM64T).
 
 I agree.  Do you have any suggestions for detecting x86_64 as opposed
 to i386?

The only way that I know of detecting x86_64 is the way I did for the
WMHints on AMD64: #ifdef LONG64.  That won't work while we are
configuring though.  As I said, I'm new to some of these tools and
configure is one of them.  Any ideas here people?  Please chime in.

  I'm working on other things for Eterm right now including an MMX
  port to AMD64.  I have been in touch with, Willem-Jan Monsuwe
  [EMAIL PROTECTED], the original author of the code and it sounds
  like he did it just for fun when MMX first came out and is no longer
  interested in it.  I sent a preliminary mmx_64_cmod.S file and hope
  he looks it over.  If I don't hear back from him or he doesn't want
  to get involved I'll try and finish the port myself.  It has been a
  while since I've coded in Asm so it might take some time for me to
  complete the port.
 
 No hurry, though I'm sure the folks on Opteron systems would love MMX
 support. :)
 
 I've received the patches you've sent so far.  I'm going to get
 through them as quickly as I can, but my Real Job(tm) is keeping me
 pretty busy too, so please be patient. :)  But keep the patches
 coming!

Great!  No problem.

 Michael
 
 -- 
 Michael Jennings (a.k.a. KainX)  http://www.kainx.org/  [EMAIL PROTECTED]
 n + 1, Inc., http://www.nplus1.net/   Author, Eterm (www.eterm.org)
 ---
  As your attorney, it is my duty to inform you that it is not
   important that you understand what I'm doing or why you're paying
   me so much money. What's important is that you continue to do so.
   -- Hunter S. Thompson's Attorney

-- 
Tres



---
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] Re: Eterm's latest patch

2005-04-26 Thread Tres Melton
vapier,
The C code is not the issue.  The problem is that we want
the ./configure script to automatically detect an x86_64 platform and
disable the MMX extensions.  There are a couple of places in pixmap.c
where your suggestion will work but it will miss the fact that we want
the Makefile to be built in a way so that it will never try to assemble
mmx_cmod.S on x86_64 for now and later assemble mmx_cmod.S on x86 and
mmx_64_cmod.S on x86_64.  The linking issues need to get resolved as
well; which, if any, MMX routines get linked into the executable?  These
things _must_ be dealt with prior to the invocation of the compiler;
these issues determine _how_ the compiler (and linker) should be
invoked.  You trick of using #ifdef __x86_64__ will get used in the C
code to determine which function to call to actually to do the shading
but that is only after we have determined the system type and written a
proper Makefile.  Please take note of how the mmx_cmod.S file is never
assemble if ./configure is invoked with --disable-mmx.  We are trying to
do that automatically.

This is an area where you have much more experience than I (as a Gentoo
developer) so any other insights you have would be greatly appreciated.
Please keep in mind that although I use and _LOVE_ Gentoo this is an E
development list so USE flags are not an option.  The solution needs to
be portable across many different GNU/Linux/X/E distributions.

On Tue, 2005-04-26 at 19:15 -0400, Mike Frysinger wrote:
 On Tuesday 26 April 2005 07:08 pm, Tres Melton wrote:
  On Tue, 2005-04-26 at 12:00 -0400, Michael Jennings wrote:
Another issue that needs to get resolved in the ./configure script
is that it enables MMX on my AMD64.  That's fine, I have MMX, but
the code in mmx_cmod.S is not compatible with a 64 bit chip.  The
proper thing to do at the moment is to disable MMX on AMD64.  Eterm
won't compile without doing that via ./configure --disable-mmx.
That should be automatic.  There is also a HAVE_MMX macro that
enables/disables certain things.  I propose adding a HAVE_MMX_64
macro for using the MMX extensions on AMD64 (and EM64T).
  
   I agree.  Do you have any suggestions for detecting x86_64 as opposed
   to i386?
 
  The only way that I know of detecting x86_64 is the way I did for the
  WMHints on AMD64: #ifdef LONG64.  That won't work while we are
  configuring though.  As I said, I'm new to some of these tools and
  configure is one of them.  Any ideas here people?  Please chime in.
 
 you could just use
 #ifdef __i386__
 and
 #ifdef __x86_64__
 in the C code and everything should work out ?
 -mike
 
 
 ---
 SF.Net email is sponsored by: Tell us your software development plans!
 Take this survey and enter to win a one-year sub to SourceForge.net
 Plus IDC's 2005 look-ahead and a copy of this survey
 Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
 ___
 enlightenment-devel mailing list
 enlightenment-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
-- 
Tres



---
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start!  http://www.idcswdc.com/cgi-bin/survey?id=105hix
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel