Re: [Jprogramming] Find nth duplicate in vector

Hauke Rehr Thu, 27 Jan 2022 09:50:22 -0800

I wonder why there’s still so much traffic on this thread
(okay, I’m to blame for quite some of it)
I thought xash published the best correct solution.
Am I wrong?


Am 27.01.22 um 18:42 schrieb 'Mike Day' via Programming:

Pawel Jakubas has moved on to something else in his correspondence, presumably 
related to
this topic.

Anyway, I think we need his adjudication, as I don't think Raul Miller's f is 
doing what PJ asked
for.

Hauke Rehr's proposal is
hrf =: {{ (,~ {&y) {. I. (>: x) e."1 +/\ =/~ y }}


My latest version, mdf, is long-winded but perhaps reasonably efficient for 
large arguments;

probably not an important consideration.


Consider: (better in fixed width font - which I can't get this end!)

(,:~i.@#)d NB. example and indices
0 1 2 3 4 5 6 7 8 9 10 11 12
1 _1 2 3 4 2 5 6 3 8 10 3 2


hrf&d each _1 0 1 2 3 4 NB. not error-checking
+---+---+---+----+---+---+
|1 0|1 0|2 5|3 11|1 0|1 0|
+---+---+---+----+---+---+
mdf&d each _1 0 1 2 3 4
++---+---+----+++
||1 0|2 5|3 11|||
++---+---+----+++
f&d each _1 0 1 2 3 4 NB. Raul Miller's verb, f
+++---+---+----+----+
|||2 5|3 8|3 11|2 12|
+++---+---+----+----+


I _think_ Pawel asked for the first x-th repeated value (among all x-repeats), 
where repeat x

requires at least x+1 mentions.

So 2 5 is correct - the 1st repeat of 2 is at 5, while that of 3 is at 8, and 5 
< 8.

And I think 3 11 is the required result for x = 2 - the 2nd repeat of 3 is at 11, 
while that of 2 is at 12, and 11 < 12.


Sorry if that's wrong!


FWIW, mdf is pretty long-winded. Its speed would be compromised if the

the value of x, the left-hand-side, is highish, as there's an explicit loop 
with x + 1 entries.


mdf =: {{
irepvals =. ''
NB. return empty if there are no x-repeats
r =. '' NB. empty default
nn =. >: x
if. 0 = nn =. x do.
r =. 0,~ {.y NB. not much to do, as must it be first value, at 0!
elseif. # repvals =. (~. #~ (nn<#)/.~ ) d do. NB. find instances, if any, of 
values with sufficient repeats
min =. <: <./ y NB. value to substitute for unwanted early occurrences of 
repeated values
for. i. 0 >. >: nn do.
y =. min irepvals } y NB. substitute early occurrences with irrelevant values
irepvals =. y i. repvals NB. get next indices
end.
end.

if. # irepvals do.
r =. y ({~,] ) <./ irepvals NB. find the lowest relevant index and return 
required pair
else.
r NB. return prepared default
end.
}}

Cheers,


Mike


On 27/01/2022 13:04, Raul Miller wrote:

Here's how I would modify my implementation to produce an empty result
when a non-duplicate is referenced.

d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
f=: {{y ({~,]) I.</\(* * x = ])+/\(i.@#~:i.~)y }}

0 f d

1 f d
2 5
2 f d
3 8
3 f d
3 11
4 f d
2 12
5 f d

Here, instead of using x i. ... to find the relevant duplicate index,
I am using I. </\ (* * x = ]) ...

Breaking down that expression, consider:

2 = 0 0 0 0 0 1 1 1 2 2 2 3 4
0 0 0 0 0 0 0 0 1 1 1 0 0
</\ 0 0 0 0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0

So, we're finding all values starting at the indicated duplicate and
stopping before the next duplicate, then refining that to the first
location (which is the duplicate we want).

The * * x... part is because there is no duplicate 0.

*0 0 0 0 0 1 1 1 2 2 2 3 4
0 0 0 0 0 1 1 1 1 1 1 1 1

Anyways, I. here will either return a list of a single index (which
basically gives behavior like the previous version) or a list
containing no indices (which gives an empty result).

(Note also that if we were working with extremely long lists, and if
the duplicates we were looking for were typically at the beginning of
those long lists, it would be statistically advantageous to do a
little more work up front to avoid scanning the entire list.)

I hope this helps.

FYI,


--
Raul

On Thu, Jan 27, 2022 at 6:28 AM 'Mike Day' via Programming
<[email protected]> wrote:

Perhaps I should point out that my verb, h (see below), uses a lot of space for 
a large list, especially if there are few repeats, as it’s using an outer 
product.

I later produced an even more verbose effort which runs in less space on a 10^6 
long vector, but takes 3-4x the space & time of Raul’s f, so it would be nice 
to get that one right.

Cheers,

Mike

Sent from my iPad

On 26 Jan 2022, at 22:36, Mike Day <[email protected]> wrote:

Unfortunately, Pawel wants 2 f d to be 3 11. However, I find that 3 f d IS 3 11.
Other results are a bit strange, too:
4 f d
2 12
8 f d
|index error: f
| y ({~,])x i.~+/\(i.@#~:i.~)y

I wasn’t going to post my effort, but it might interest Pawel. This version 
works on the slightly more intuitive (for me at least, here) origin 1 value of 
“x”:

g =: ] ({~ , ]) (i.~ >./@:(+/\"1)@:(~. =/ ]))

2 g d
2 5
3 g d
3 11
4 g d. NB. also not error-checked, though!
|index error: g
| 4 g d

This is a quick get-around to act as Pawel asks, and to give an empty result 
rather than an error if nothing satisfies the left argument:

h =: (g~ >:)~ :: ‘’
2 h d
3 11
5 h d

#5 h d
0

FWIW,

Mike

Sent from my iPad

On 26 Jan 2022, at 21:00, Raul Miller <[email protected]> wrote:

Here's a variation that works:

d=: 1 _1 2 3 4 2 5 6 3 8 10 3 2
f=: {{y ({~,]) x i.~ +/\(i.@#~:i.~)y }}
1 f d
2 5
2 f d
3 8

The phrase (i.@# ~: i.~) finds the locations of duplicates

(i.@#~:i.~) 1 _1 2 3 4 2 5 6 3 8 10 3 2
0 0 0 0 0 1 0 0 1 0 0 1 1

And, +/\ computes a running sum
+/\(i.@#~:i.~) d
0 0 0 0 0 1 1 1 2 2 2 3 4

With this, we can find the index of the first occurrence of a
duplicate count number using x i.~ ...

Once we have the index of a duplicate, we can return that index and
the corresponding value from the list.

--
Raul

On Wed, Jan 26, 2022 at 3:49 PM Pawel Jakubas <[email protected]> wrote:

It should be of course
1 f d
2 5

Would be great if you could decompose your solution and the idea behind the
solution. Many thanks.

Cheers,
Pawel
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm


--
----------------------
mail written using NEO
neo-layout.org
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Find nth duplicate in vector

Reply via email to