Re: [Jprogramming] Find nth duplicate in vector

Raul Miller Thu, 27 Jan 2022 05:05:33 -0800

Here's how I would modify my implementation to produce an empty result
when a non-duplicate is referenced.


   d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
   f=: {{y ({~,]) I.</\(* * x = ])+/\(i.@#~:i.~)y }}

   0 f d

   1 f d
2 5
   2 f d
3 8
   3 f d
3 11
   4 f d
2 12
   5 f d

Here, instead of using x i. ... to find the relevant duplicate index,
I am using I. </\ (* * x = ]) ...

Breaking down that expression, consider:

   2 = 0 0 0 0 0 1 1 1 2 2 2 3 4
0 0 0 0 0 0 0 0 1 1 1 0 0
   </\ 0 0 0 0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0

So, we're finding all values starting at the indicated duplicate and
stopping before the next duplicate, then refining that to the first
location (which is the duplicate we want).

The * * x... part is because there is no duplicate 0.

   *0 0 0 0 0 1 1 1 2 2 2 3 4
0 0 0 0 0 1 1 1 1 1 1 1 1

Anyways, I. here will either return a list of a single index (which
basically gives behavior like the previous version) or a list
containing no indices (which gives an empty result).

(Note also that if we were working with extremely long lists, and if
the duplicates we were looking for were typically at the beginning of
those long lists, it would be statistically advantageous to do a
little more work up front to avoid scanning the entire list.)

I hope this helps.

FYI,


--
Raul

On Thu, Jan 27, 2022 at 6:28 AM 'Mike Day' via Programming
<[email protected]> wrote:
>
> Perhaps I should point out that my verb,  h (see below),  uses a lot of space 
> for a large list,  especially if there are few repeats,  as it’s using an 
> outer product.
>
> I later produced an even more verbose effort which runs in less space on a 
> 10^6 long vector, but takes 3-4x the space & time of Raul’s f,  so it would 
> be nice to get that one right.
>
> Cheers,
>
> Mike
>
> Sent from my iPad
>
> > On 26 Jan 2022, at 22:36, Mike Day <[email protected]> wrote:
> >
> > Unfortunately,  Pawel wants 2 f d to be 3 11.  However, I find that 3 f d 
> > IS 3 11.
> > Other results are a bit strange, too:
> >    4 f  d
> > 2 12
> >    8 f  d
> > |index error: f
> > |   y    ({~,])x i.~+/\(i.@#~:i.~)y
> >
> > I wasn’t going to post my effort,  but it might interest Pawel.  This 
> > version works on the slightly more intuitive (for me at least, here) origin 
> > 1 value of “x”:
> >
> >    g =:  ] ({~ , ]) (i.~ >./@:(+/\"1)@:(~. =/ ]))
> >
> >    2 g d
> > 2 5
> >    3 g d
> > 3 11
> >    4 g d.  NB. also not error-checked, though!
> > |index error: g
> > |   4   g d
> >
> > This is a quick get-around to act as Pawel asks, and to give an empty 
> > result rather than an error if nothing satisfies the left argument:
> >
> >    h =: (g~ >:)~ :: ‘’
> >    2 h d
> > 3 11
> >    5 h d
> >
> >    #5 h d
> > 0
> >
> > FWIW,
> >
> > Mike
> >
> > Sent from my iPad
> >
> >> On 26 Jan 2022, at 21:00, Raul Miller <[email protected]> wrote:
> >>
> >> Here's a variation that works:
> >>
> >>   d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >>   f=: {{y ({~,]) x i.~ +/\(i.@#~:i.~)y }}
> >>   1 f d
> >> 2 5
> >>   2 f d
> >> 3 8
> >>
> >> The phrase (i.@# ~: i.~) finds the locations of duplicates
> >>
> >>   (i.@#~:i.~) 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >> 0 0 0 0 0 1 0 0 1 0 0 1 1
> >>
> >> And, +/\ computes a running sum
> >>   +/\(i.@#~:i.~) d
> >> 0 0 0 0 0 1 1 1 2 2 2 3 4
> >>
> >> With this, we can find the index of the first occurrence of a
> >> duplicate count number using x i.~ ...
> >>
> >> Once we have the index of a duplicate, we can return that index and
> >> the corresponding value from the list.
> >>
> >> --
> >> Raul
> >>
> >>> On Wed, Jan 26, 2022 at 3:49 PM Pawel Jakubas <[email protected]> 
> >>> wrote:
> >>>
> >>> It should be of course
> >>>  1 f d
> >>> 2 5
> >>>
> >>> Would be great if you could decompose your solution and the idea behind 
> >>> the
> >>> solution. Many thanks.
> >>>
> >>> Cheers,
> >>> Pawel
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Find nth duplicate in vector

Reply via email to