> it avoids any branching in the loop.  And no "comparisons".

    I1 b[M];
    memset(b,0x00,sizeof(b));
    for(i=0;i<n;++i)b[*x++]=1;
    for(i=0;i<M;++i)if(b[i])min=i;

Well, there is branching and there is comparison, but it's in the M loop
rather than the n loop.  M is small and fixed (65536 for 2-bytes) whereas n
can be millions, billions.  The J models are:

    1 i.~ 1 x}M$0  NB. minimum
    1 i:~ 1 x}M$0  NB. maximum





On Wed, Mar 5, 2014 at 4:05 PM, Roger Hui <rogerhui.can...@gmail.com> wrote:

> Zehr gut.  It's not just that it's O(n) (the conventional if(min>*x)min=*x
> is also O(n)) but it avoids any branching in the loop.  And no
> "comparisons".
>
> Bonus question:  Suppose M is really large and the cost of setting count
> to 0 is prohibitive.  How can you avoid that cost?  (Not saying it's
> related to finding the min or the max).
>
>
>
>
> On Wed, Mar 5, 2014 at 3:27 PM, Peter B. Kessler <
> peter.b.kess...@oracle.com> wrote:
>
>> For the minimum in a small universe, use your "broken" sorting code (:-)
>>
>>
>>     I4 count[M];
>>     memset(count,0x00,sizeof(count));
>>     for(i=0;i<n;++i)count[*x++]=1;
>>
>> (O(n)) and then look for the first (lowest) 1 in count (also O(n)).
>>
>>                         ... peter
>>
>>
>> On 03/05/14 14:19, Roger Hui wrote:
>>
>>> Good answers.  For /:~x vs. g{x, the explanations are:
>>>
>>>     - Indexing must check for index error.  Sorting does not.
>>>     - Indexing uses random read access over a large chunk of memory (i.e.
>>>
>>>     x).  Sort does not.
>>>
>>> A more detailed explanation:  To sort over a small known universe (and
>>> characters definitely qualify), you basically compute #/.~x (the ordering
>>> is wrong, but you get the idea).  In C:
>>>
>>> I4 count[M];
>>> memset(count,0x00,sizeof(count));
>>> for(i=0;i<n;++i)count[*x++]=1;
>>>
>>>
>>> This is lightning fast on modern CPUs: sequential read access and no
>>> branch
>>> prediction fails.  (And the ordering is correct.)  Once having the
>>> counts,
>>> as Henry said, you can do count#a. or in C:
>>>
>>> for(i=0;i<M;++i){m=count[j]; for(j=0;j<m;++j)*z++=i;}
>>>
>>>
>>> Also lightning fast with very localized reads.
>>>
>>> It's ironic that in school sorting is an application with heavy emphasis
>>> on
>>> comparisons, counting # of comparisons, etc.  In the method above, there
>>> is
>>> not a single comparison involving x.  I once told someone that I can sort
>>> 4-byte integers and 8-byte IEEE floats in linear time.  He looked at me
>>> like I was crazy, presumably remembering from school that sorting was
>>> PROVEN to take n log n comparisons.
>>>
>>> As for why sorting is faster than grading, see
>>> http://www.jsoftware.com/jwiki/Essays/Sorting_versus_Grading
>>>
>>> Now, for those of you who know C (or other scalar language), is there a
>>> faster way to find the minimum of a vector of small integers x (2-byte
>>> integers, say) than the following:
>>>
>>> min=-32768;
>>> for(i=0;i<n;++i){if(min>*x)min=*x; ++x;}
>>>
>>>
>>> I know an alternative which is 70% faster.  No fancy SSE instructions.
>>>  No
>>> multicore.  No loop unrolling.
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>>  ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to