> It begs the question why the bounds check is inside the loop.
> Can't you check against shape?

There is no alternative.  In g{x where g is a vector and ** it doesn't know
that g is the grade of x **, it has to check every element of g for index
error.

> I'm aware of branch mis predict costs.

Then you should be able to solve the minimum of a vector puzzle. :-)







On Wed, Mar 5, 2014 at 3:04 PM, Yike Lu <yikelu.h...@gmail.com> wrote:

> I'm aware of branch mis predict costs.
>
> It begs the question why the bounds check is inside the loop. Can't you
> check against shape?
> On Mar 5, 2014 4:57 PM, "Roger Hui" <rogerhui.can...@gmail.com> wrote:
>
> > Compare the two:
> >
> > Sorting (fixed typos!):
> >
> > I4 count[M];
> > memset(count,0x00,sizeof(count));
> > for(i=0;i<n;++i)count[*x++]++;
> > for(i=0;i<M;++i){m=count[i]; for(j=0;j<m;++j)*z++=i;}
> >
> >
> > Indexing:
> >
> > #define ASSERT(p,stmt)  {if(!(p)){stmt;}
> > for(i=0;i<n;++i){j=*g++; ASSERT(0<=j&&j<n,INDEX_ERROR); *z++=x[j];}
> >
> >
> > Branching in modern CPUs is relatively
> > expensive<https://en.wikipedia.org/wiki/Branch_predictor>.
> >  Apparently the rest of the world is catching up to APL/J.  See also
> > Knuth's Two Notes on
> > Notation<http://arxiv.org/PS_cache/math/pdf/9205/9205211v1.pdf> (he
> > calls it Iverson's convention).
> >
> > In the sort program there isn't a branch in sight, other than possibly in
> > the looping, and the C compiler is pretty good at that.  In the indexing,
> > you have an if statement, in which (in this case) 100% of the time the
> > branch is not taken.  A branch which is 100% (or 0%? I forget) not taken
> is
> > particularly slow.
> >
> >
> >
> >
> >
> > On Wed, Mar 5, 2014 at 2:41 PM, Yike Lu <yikelu.h...@gmail.com> wrote:
> >
> > > Roger, why is index lookup so slow in the g2 case (see my message
> > earlier)?
> > > Are you really saying that bounds checking makes it so that the lookup
> is
> > > only as fast as the sort?
> > >
> > >
> > > On Wed, Mar 5, 2014 at 4:35 PM, Roger Hui <rogerhui.can...@gmail.com>
> > > wrote:
> > >
> > > > > I4 count[M];
> > > > > memset(count,0x00,sizeof(count));
> > > > > for(i=0;i<n;++i)count[*x++]=1;
> > > >
> > > > More typos:
> > > >
> > > >   for(i=0;i<n;++i)count[*x++]++;
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Mar 5, 2014 at 2:19 PM, Roger Hui <rogerhui.can...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Good answers.  For /:~x vs. g{x, the explanations are:
> > > > >
> > > > >    - Indexing must check for index error.  Sorting does not.
> > > > >    - Indexing uses random read access over a large chunk of memory
> > > (i.e.
> > > > >    x).  Sort does not.
> > > > >
> > > > > A more detailed explanation:  To sort over a small known universe
> > (and
> > > > > characters definitely qualify), you basically compute #/.~x (the
> > > ordering
> > > > > is wrong, but you get the idea).  In C:
> > > > >
> > > > > I4 count[M];
> > > > > memset(count,0x00,sizeof(count));
> > > > > for(i=0;i<n;++i)count[*x++]=1;
> > > > >
> > > > >
> > > > > This is lightning fast on modern CPUs: sequential read access and
> no
> > > > > branch prediction fails.  (And the ordering is correct.)  Once
> having
> > > the
> > > > > counts, as Henry said, you can do count#a. or in C:
> > > > >
> > > > > for(i=0;i<M;++i){m=count[j]; for(j=0;j<m;++j)*z++=i;}
> > > > >
> > > > >
> > > > > Also lightning fast with very localized reads.
> > > > >
> > > > > It's ironic that in school sorting is an application with heavy
> > > emphasis
> > > > > on comparisons, counting # of comparisons, etc.  In the method
> above,
> > > > there
> > > > > is not a single comparison involving x.  I once told someone that I
> > can
> > > > > sort 4-byte integers and 8-byte IEEE floats in linear time.  He
> > looked
> > > at
> > > > > me like I was crazy, presumably remembering from school that
> sorting
> > > was
> > > > > PROVEN to take n log n comparisons.
> > > > >
> > > > > As for why sorting is faster than grading, see
> > > > > http://www.jsoftware.com/jwiki/Essays/Sorting_versus_Grading
> > > > >
> > > > > Now, for those of you who know C (or other scalar language), is
> > there a
> > > > > faster way to find the minimum of a vector of small integers x
> > (2-byte
> > > > > integers, say) than the following:
> > > > >
> > > > > min=-32768;
> > > > > for(i=0;i<n;++i){if(min>*x)min=*x; ++x;}
> > > > >
> > > > >
> > > > > I know an alternative which is 70% faster.  No fancy SSE
> > instructions.
> > > >  No
> > > > > multicore.  No loop unrolling.
> > > > >
> > > > >
> > > > >
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to