Re: [Jprogramming] Find nth duplicate in vector

Raul Miller Thu, 27 Jan 2022 10:11:19 -0800

Well, ... the implementation posted by xash is very nice, but in
http://jsoftware.com/pipermail/programming/2022-January/059790.html
Pawel Jakubas specified that the value appears first, and the index
appears second, and that the indices start with 1 for the first value.


Also, somewhere along the line, was the suggestion that the result of
f should be empty rather than throwing an error if there was no such
duplicate value.

FYI,

-- 
Raul


On Thu, Jan 27, 2022 at 12:50 PM Hauke Rehr <[email protected]> wrote:
>
> I wonder why there’s still so much traffic on this thread
> (okay, I’m to blame for quite some of it)
> I thought xash published the best correct solution.
> Am I wrong?
>
> Am 27.01.22 um 18:42 schrieb 'Mike Day' via Programming:
> > Pawel Jakubas has moved on to something else in his correspondence, 
> > presumably related to
> > this topic.
> >
> > Anyway, I think we need his adjudication, as I don't think Raul Miller's f 
> > is doing what PJ asked
> > for.
> >
> > Hauke Rehr's proposal is
> > hrf =: {{ (,~ {&y) {. I. (>: x) e."1 +/\ =/~ y }}
> >
> >
> > My latest version, mdf, is long-winded but perhaps reasonably efficient for 
> > large arguments;
> >
> > probably not an important consideration.
> >
> >
> > Consider: (better in fixed width font - which I can't get this end!)
> >
> > (,:~i.@#)d NB. example and indices
> > 0 1 2 3 4 5 6 7 8 9 10 11 12
> > 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >
> >
> > hrf&d each _1 0 1 2 3 4 NB. not error-checking
> > +---+---+---+----+---+---+
> > |1 0|1 0|2 5|3 11|1 0|1 0|
> > +---+---+---+----+---+---+
> > mdf&d each _1 0 1 2 3 4
> > ++---+---+----+++
> > ||1 0|2 5|3 11|||
> > ++---+---+----+++
> > f&d each _1 0 1 2 3 4 NB. Raul Miller's verb, f
> > +++---+---+----+----+
> > |||2 5|3 8|3 11|2 12|
> > +++---+---+----+----+
> >
> >
> > I _think_ Pawel asked for the first x-th repeated value (among all 
> > x-repeats), where repeat x
> >
> > requires at least x+1 mentions.
> >
> > So 2 5 is correct - the 1st repeat of 2 is at 5, while that of 3 is at 8, 
> > and 5 < 8.
> >
> > And I think 3 11 is the required result for x = 2 - the 2nd repeat of 3 is 
> > at 11, while that of 2 is at 12, and 11 < 12.
> >
> >
> > Sorry if that's wrong!
> >
> >
> > FWIW, mdf is pretty long-winded. Its speed would be compromised if the
> >
> > the value of x, the left-hand-side, is highish, as there's an explicit loop 
> > with x + 1 entries.
> >
> >
> > mdf =: {{
> > irepvals =. ''
> > NB. return empty if there are no x-repeats
> > r =. '' NB. empty default
> > nn =. >: x
> > if. 0 = nn =. x do.
> > r =. 0,~ {.y NB. not much to do, as must it be first value, at 0!
> > elseif. # repvals =. (~. #~ (nn<#)/.~ ) d do. NB. find instances, if any, 
> > of values with sufficient repeats
> > min =. <: <./ y NB. value to substitute for unwanted early occurrences of 
> > repeated values
> > for. i. 0 >. >: nn do.
> > y =. min irepvals } y NB. substitute early occurrences with irrelevant 
> > values
> > irepvals =. y i. repvals NB. get next indices
> > end.
> > end.
> >
> > if. # irepvals do.
> > r =. y ({~,] ) <./ irepvals NB. find the lowest relevant index and return 
> > required pair
> > else.
> > r NB. return prepared default
> > end.
> > }}
> >
> > Cheers,
> >
> >
> > Mike
> >
> >
> > On 27/01/2022 13:04, Raul Miller wrote:
> >> Here's how I would modify my implementation to produce an empty result
> >> when a non-duplicate is referenced.
> >>
> >> d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >> f=: {{y ({~,]) I.</\(* * x = ])+/\(i.@#~:i.~)y }}
> >>
> >> 0 f d
> >>
> >> 1 f d
> >> 2 5
> >> 2 f d
> >> 3 8
> >> 3 f d
> >> 3 11
> >> 4 f d
> >> 2 12
> >> 5 f d
> >>
> >> Here, instead of using x i. ... to find the relevant duplicate index,
> >> I am using I. </\ (* * x = ]) ...
> >>
> >> Breaking down that expression, consider:
> >>
> >> 2 = 0 0 0 0 0 1 1 1 2 2 2 3 4
> >> 0 0 0 0 0 0 0 0 1 1 1 0 0
> >> </\ 0 0 0 0 0 0 0 0 1 1 1 0 0
> >> 0 0 0 0 0 0 0 0 1 0 0 0 0
> >>
> >> So, we're finding all values starting at the indicated duplicate and
> >> stopping before the next duplicate, then refining that to the first
> >> location (which is the duplicate we want).
> >>
> >> The * * x... part is because there is no duplicate 0.
> >>
> >> *0 0 0 0 0 1 1 1 2 2 2 3 4
> >> 0 0 0 0 0 1 1 1 1 1 1 1 1
> >>
> >> Anyways, I. here will either return a list of a single index (which
> >> basically gives behavior like the previous version) or a list
> >> containing no indices (which gives an empty result).
> >>
> >> (Note also that if we were working with extremely long lists, and if
> >> the duplicates we were looking for were typically at the beginning of
> >> those long lists, it would be statistically advantageous to do a
> >> little more work up front to avoid scanning the entire list.)
> >>
> >> I hope this helps.
> >>
> >> FYI,
> >>
> >>
> >> --
> >> Raul
> >>
> >> On Thu, Jan 27, 2022 at 6:28 AM 'Mike Day' via Programming
> >> <[email protected]> wrote:
> >>> Perhaps I should point out that my verb, h (see below), uses a lot of 
> >>> space for a large list, especially if there are few repeats, as it’s 
> >>> using an outer product.
> >>>
> >>> I later produced an even more verbose effort which runs in less space on 
> >>> a 10^6 long vector, but takes 3-4x the space & time of Raul’s f, so it 
> >>> would be nice to get that one right.
> >>>
> >>> Cheers,
> >>>
> >>> Mike
> >>>
> >>> Sent from my iPad
> >>>
> >>>> On 26 Jan 2022, at 22:36, Mike Day <[email protected]> wrote:
> >>>>
> >>>> Unfortunately, Pawel wants 2 f d to be 3 11. However, I find that 3 f d 
> >>>> IS 3 11.
> >>>> Other results are a bit strange, too:
> >>>> 4 f d
> >>>> 2 12
> >>>> 8 f d
> >>>> |index error: f
> >>>> | y ({~,])x i.~+/\(i.@#~:i.~)y
> >>>>
> >>>> I wasn’t going to post my effort, but it might interest Pawel. This 
> >>>> version works on the slightly more intuitive (for me at least, here) 
> >>>> origin 1 value of “x”:
> >>>>
> >>>> g =: ] ({~ , ]) (i.~ >./@:(+/\"1)@:(~. =/ ]))
> >>>>
> >>>> 2 g d
> >>>> 2 5
> >>>> 3 g d
> >>>> 3 11
> >>>> 4 g d. NB. also not error-checked, though!
> >>>> |index error: g
> >>>> | 4 g d
> >>>>
> >>>> This is a quick get-around to act as Pawel asks, and to give an empty 
> >>>> result rather than an error if nothing satisfies the left argument:
> >>>>
> >>>> h =: (g~ >:)~ :: ‘’
> >>>> 2 h d
> >>>> 3 11
> >>>> 5 h d
> >>>>
> >>>> #5 h d
> >>>> 0
> >>>>
> >>>> FWIW,
> >>>>
> >>>> Mike
> >>>>
> >>>> Sent from my iPad
> >>>>
> >>>>> On 26 Jan 2022, at 21:00, Raul Miller <[email protected]> wrote:
> >>>>>
> >>>>> Here's a variation that works:
> >>>>>
> >>>>> d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >>>>> f=: {{y ({~,]) x i.~ +/\(i.@#~:i.~)y }}
> >>>>> 1 f d
> >>>>> 2 5
> >>>>> 2 f d
> >>>>> 3 8
> >>>>>
> >>>>> The phrase (i.@# ~: i.~) finds the locations of duplicates
> >>>>>
> >>>>> (i.@#~:i.~) 1 _1 2 3 4 2 5 6 3 8 10 3 2
> >>>>> 0 0 0 0 0 1 0 0 1 0 0 1 1
> >>>>>
> >>>>> And, +/\ computes a running sum
> >>>>> +/\(i.@#~:i.~) d
> >>>>> 0 0 0 0 0 1 1 1 2 2 2 3 4
> >>>>>
> >>>>> With this, we can find the index of the first occurrence of a
> >>>>> duplicate count number using x i.~ ...
> >>>>>
> >>>>> Once we have the index of a duplicate, we can return that index and
> >>>>> the corresponding value from the list.
> >>>>>
> >>>>> --
> >>>>> Raul
> >>>>>
> >>>>>> On Wed, Jan 26, 2022 at 3:49 PM Pawel Jakubas 
> >>>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>> It should be of course
> >>>>>> 1 f d
> >>>>>> 2 5
> >>>>>>
> >>>>>> Would be great if you could decompose your solution and the idea 
> >>>>>> behind the
> >>>>>> solution. Many thanks.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Pawel
> >>>>>> ----------------------------------------------------------------------
> >>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>>>> ----------------------------------------------------------------------
> >>>>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> ----------------------
> mail written using NEO
> neo-layout.org
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Find nth duplicate in vector

Reply via email to