Re: [Jprogramming] Find nth duplicate in vector

'Mike Day' via Programming Thu, 27 Jan 2022 09:43:10 -0800

Pawel Jakubas has moved on to something else in his correspondence, presumably 
related to
this topic.


Anyway, I think we need his adjudication, as I don't think Raul Miller's f is 
doing what PJ asked
for.

Hauke Rehr's proposal is
hrf =: {{ (,~ {&y) {. I. (>: x) e."1 +/\ =/~ y }}


My latest version, mdf, is long-winded but perhaps reasonably efficient for 
large arguments;

probably not an important consideration.


Consider: (better in fixed width font - which I can't get this end!)

(,:~i.@#)d NB. example and indices
0 1 2 3 4 5 6 7 8 9 10 11 12
1 _1 2 3 4 2 5 6 3 8 10 3 2


hrf&d each _1 0 1 2 3 4 NB. not error-checking
+---+---+---+----+---+---+
|1 0|1 0|2 5|3 11|1 0|1 0|
+---+---+---+----+---+---+
mdf&d each _1 0 1 2 3 4
++---+---+----+++
||1 0|2 5|3 11|||
++---+---+----+++
f&d each _1 0 1 2 3 4 NB. Raul Miller's verb, f
+++---+---+----+----+
|||2 5|3 8|3 11|2 12|
+++---+---+----+----+


I _think_ Pawel asked for the first x-th repeated value (among all x-repeats), 
where repeat x

requires at least x+1 mentions.

So 2 5 is correct - the 1st repeat of 2 is at 5, while that of 3 is at 8, and 5 
< 8.

And I think 3 11 is the required result for x = 2 - the 2nd repeat of 3 is at 
11, while that of 2 is at 12, and 11 < 12.


Sorry if that's wrong!


FWIW, mdf is pretty long-winded. Its speed would be compromised if the

the value of x, the left-hand-side, is highish, as there's an explicit loop 
with x + 1 entries.


mdf =: {{
irepvals =. ''
NB. return empty if there are no x-repeats
r =. '' NB. empty default
nn =. >: x
if. 0 = nn =. x do.
r =. 0,~ {.y NB. not much to do, as must it be first value, at 0!
elseif. # repvals =. (~. #~ (nn<#)/.~ ) d do. NB. find instances, if any, of 
values with sufficient repeats
min =. <: <./ y NB. value to substitute for unwanted early occurrences of 
repeated values
for. i. 0 >. >: nn do.
y =. min irepvals } y NB. substitute early occurrences with irrelevant values
irepvals =. y i. repvals NB. get next indices
end.
end.

if. # irepvals do.
r =. y ({~,] ) <./ irepvals NB. find the lowest relevant index and return 
required pair
else.
r NB. return prepared default
end.
}}

Cheers,


Mike


On 27/01/2022 13:04, Raul Miller wrote:
> Here's how I would modify my implementation to produce an empty result
> when a non-duplicate is referenced.
>
> d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
> f=: {{y ({~,]) I.</\(* * x = ])+/\(i.@#~:i.~)y }}
>
> 0 f d
>
> 1 f d
> 2 5
> 2 f d
> 3 8
> 3 f d
> 3 11
> 4 f d
> 2 12
> 5 f d
>
> Here, instead of using x i. ... to find the relevant duplicate index,
> I am using I. </\ (* * x = ]) ...
>
> Breaking down that expression, consider:
>
> 2 = 0 0 0 0 0 1 1 1 2 2 2 3 4
> 0 0 0 0 0 0 0 0 1 1 1 0 0
> </\ 0 0 0 0 0 0 0 0 1 1 1 0 0
> 0 0 0 0 0 0 0 0 1 0 0 0 0
>
> So, we're finding all values starting at the indicated duplicate and
> stopping before the next duplicate, then refining that to the first
> location (which is the duplicate we want).
>
> The * * x... part is because there is no duplicate 0.
>
> *0 0 0 0 0 1 1 1 2 2 2 3 4
> 0 0 0 0 0 1 1 1 1 1 1 1 1
>
> Anyways, I. here will either return a list of a single index (which
> basically gives behavior like the previous version) or a list
> containing no indices (which gives an empty result).
>
> (Note also that if we were working with extremely long lists, and if
> the duplicates we were looking for were typically at the beginning of
> those long lists, it would be statistically advantageous to do a
> little more work up front to avoid scanning the entire list.)
>
> I hope this helps.
>
> FYI,
>
>
> --
> Raul
>
> On Thu, Jan 27, 2022 at 6:28 AM 'Mike Day' via Programming
> <[email protected]> wrote:
>> Perhaps I should point out that my verb, h (see below), uses a lot of space 
>> for a large list, especially if there are few repeats, as it’s using an 
>> outer product.
>>
>> I later produced an even more verbose effort which runs in less space on a 
>> 10^6 long vector, but takes 3-4x the space & time of Raul’s f, so it would 
>> be nice to get that one right.
>>
>> Cheers,
>>
>> Mike
>>
>> Sent from my iPad
>>
>>> On 26 Jan 2022, at 22:36, Mike Day <[email protected]> wrote:
>>>
>>> Unfortunately, Pawel wants 2 f d to be 3 11. However, I find that 3 f d IS 
>>> 3 11.
>>> Other results are a bit strange, too:
>>> 4 f d
>>> 2 12
>>> 8 f d
>>> |index error: f
>>> | y ({~,])x i.~+/\(i.@#~:i.~)y
>>>
>>> I wasn’t going to post my effort, but it might interest Pawel. This version 
>>> works on the slightly more intuitive (for me at least, here) origin 1 value 
>>> of “x”:
>>>
>>> g =: ] ({~ , ]) (i.~ >./@:(+/\"1)@:(~. =/ ]))
>>>
>>> 2 g d
>>> 2 5
>>> 3 g d
>>> 3 11
>>> 4 g d. NB. also not error-checked, though!
>>> |index error: g
>>> | 4 g d
>>>
>>> This is a quick get-around to act as Pawel asks, and to give an empty 
>>> result rather than an error if nothing satisfies the left argument:
>>>
>>> h =: (g~ >:)~ :: ‘’
>>> 2 h d
>>> 3 11
>>> 5 h d
>>>
>>> #5 h d
>>> 0
>>>
>>> FWIW,
>>>
>>> Mike
>>>
>>> Sent from my iPad
>>>
>>>> On 26 Jan 2022, at 21:00, Raul Miller <[email protected]> wrote:
>>>>
>>>> Here's a variation that works:
>>>>
>>>> d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
>>>> f=: {{y ({~,]) x i.~ +/\(i.@#~:i.~)y }}
>>>> 1 f d
>>>> 2 5
>>>> 2 f d
>>>> 3 8
>>>>
>>>> The phrase (i.@# ~: i.~) finds the locations of duplicates
>>>>
>>>> (i.@#~:i.~) 1 _1 2 3 4 2 5 6 3 8 10 3 2
>>>> 0 0 0 0 0 1 0 0 1 0 0 1 1
>>>>
>>>> And, +/\ computes a running sum
>>>> +/\(i.@#~:i.~) d
>>>> 0 0 0 0 0 1 1 1 2 2 2 3 4
>>>>
>>>> With this, we can find the index of the first occurrence of a
>>>> duplicate count number using x i.~ ...
>>>>
>>>> Once we have the index of a duplicate, we can return that index and
>>>> the corresponding value from the list.
>>>>
>>>> --
>>>> Raul
>>>>
>>>>> On Wed, Jan 26, 2022 at 3:49 PM Pawel Jakubas <[email protected]> 
>>>>> wrote:
>>>>>
>>>>> It should be of course
>>>>> 1 f d
>>>>> 2 5
>>>>>
>>>>> Would be great if you could decompose your solution and the idea behind 
>>>>> the
>>>>> solution. Many thanks.
>>>>>
>>>>> Cheers,
>>>>> Pawel
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Find nth duplicate in vector

Reply via email to