I would be content to use
(i.#v)=i.~v
as the definition for nub etc.
The big question to me is, Can f/. be implemented so that it uses the
same partition as is induced by (i.#v)=i.~v ?
It feels as if yes... calculate the partition p using i.~, and then make
a pass through the result, front to back, replacing element i{p with
(i{p){p .
This has cost O(n)... and could be avoided for integers and f/.!.0 . Is
there an easy way to tell whether tolerant i.~ encountered items that
were tolerantly equal but not intolerantly equal?
Henry Rich
On 10/16/2015 5:47 PM, Roger Hui wrote:
I agree that there are bugs, but fixing them requires some thinking. As
Bill Lam surmised, the problem ultimately can be attributed to tolerant
comparison, tolerant equality in particular.
The J version I am using in this message is the following:
9!:14 ''
j602/2008-03-03/16:45
but the bugs/problems exist in all J versions, going as far back and as far
forward as you care to go. The tolerance I will use is the default in a
session:
] t=: 9!:18 ''
5.68434e_14
t - 2^_44 NB. i.e. t is exactly 2^_44
0
but any non-zero tolerance will have similar bugs/problems.
Tolerance makes = non-transitive (a=b and b=c, but a may not equal c):
a=: 1-t
b=: 1
c=: 1+t
a=b
1
b=c
1
a=c
0
Non-transitivity is OK when tolerant equality is used for its intended
purpose, namely compensating for the vicissitudes of floating point
representation and floating point computations. It is OK if you are
comparing scalar numbers, but causes problems when it is used with
insufficient care to deal with sets and arrays. Non-transitivity
invalidates some properties, important properties, that we take for
granted; properties that we assume without being conscious that we are
assuming them.
The primitives that are affected are as follows. (There may be a few other
more minor ones.)
i. dyad based on =
~. monad based on = or i.
~: monad based on = or i.
= dyad based on ~. =
f/. dyad based on i.
The dictionary entry on the monad ~. has it "correct", among other possible
correct definitions:
More precisely, the nub is found by selecting the leading item, suppressing
from the argument all items tolerantly equal to it, selecting the next
remaining item, and so on.
I am loath to use this in the implementation because it is O(n*2). More
importantly, it is "correct" only if you believe that nub or uniqueness makes
sense when = is non-transitive. For example, the following verb implements
the dictionary definition for vectors:
nub=: 3 : 0
z=. ''
while. *#y do.
z=. z,{.y
y=. y#~y~:{.y
end.
z
)
When = is transitive, as it is when tolerance is not involved, "nub" is
fine:
nub 'mississippi'
misp
nub 3 1 4 1 5 9 2 6 5 3 5 8 9 7 9
3 1 4 5 9 2 6 8 7
But if tolerant equality is involved:
X=: 1 + (t%2) * ?.1600$100
# nub X
28
# nub /:~X
34
# nub \:~X
34
# nub X {~ ?~#X
27
# nub X {~ ?~#X
28
/:~ ~. 3 : '# nub X {~ ?~ #X' "0 i. 1e4
23 24 25 26 27 28 29 30 31 32
/:~ ~. 3 : '# nub X {~ ?~ #X' "0 i. 3e4
23 24 25 26 27 28 29 30 31 32
That is, if you permute X, the number of "unique items" changes. The last
two expressions show that the number of unique items can range from 23 to
32 when you permute X. Note also that 34 can be the number of unique items
but does not come up as a result in 3e4 trial random permutations. The
problem is not the particular definition in the dictionary or its
implementation above, but with ANY definition of nub based on tolerant and
therefore non-transitive equality.
To discuss the bugs that Henry discussed, I'll use the following array.
(Similar to the one Henry used.)
v=: 1+(t%2)*i:3
~. v
1
~: v
1 0 0 0 0 0 0
= v
1 1 0 0 0 0 0
v </. v
┌───┬───┬─┬─┬─┐
│1 1│1 1│1│1│1│
└───┴───┴─┴─┴─┘
#/.~ v
2 1
The problem here is that the "idiom" (i.#v)=i.~v is used or assumed
for computing
the nub or the nubsieve. The same idiom has been used in APL since time
immemorial for such purposes, but is not valid or, at the very least, gives
misleading results, when transitivity does not hold.
i.~ v
0 0 1 1 2 3 4
(i.#v) = i.~ v
1 0 0 0 0 0 0
~:v
1 0 0 0 0 0 0
v #~ (i.#v)=i.~v
1
~.v
1
(v#~(i.#v)=i.~v) =/ v
1 1 0 0 0 0 0
=v
1 1 0 0 0 0 0
I believe that x i. y is correct, the index of the first item in x that is
tolerantly equal to an item of y, modeled as follows:
ix=: #@[ - +/ @ (+./\) @ (=/~)
ix~ v
0 0 1 1 2 3 4
i.~ v
0 0 1 1 2 3 4
But using the result of i. in other set type things such as nub, key, etc.
requires more analysis and care. So what can be done?
a. Use zero tolerance. This can be done via the fit conjunction, =!.0,
~:!.0, etc., without doing anything drastic (or anything at all) to the
system.
b. For the primitives in question, on an argument that uses tolerant equality,
pre-process the argument with i.~ and, if necessary, post-process by
selecting from the original argument. The result i.~ are representatives
for the original items; comparisons on these representatives are
transitive. For example:
] v1=: i.~ v NB. representatives of v
0 0 1 1 2 3 4
(i.#v1) = i.~ v1
1 0 1 0 1 1 1
~: v1 NB. AKA ~:v
1 0 1 0 1 1 1
v #~ ~:v1 NB. AKA ~.v, post-processing by selecting from v
1 1 1 1 1
(t%2) %~ _1 + v #~ ~:v1
_3 _1 1 2 3
(~.v1)=/v1 NB. AKA =v
1 1 0 0 0 0 0
0 0 1 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1
#/.~ v1 NB. AKA #/.~v
2 2 1 1 1
What about the "number of unique items" under permutation example?
X1=: i.~ X NB. representatives of X
/:~ ~. 3 : '# nub X1 {~ ?~ #X1' "0 i. 3e4
34
On Sat, Oct 10, 2015 at 11:40 PM, bill lam <[email protected]> wrote:
I think the issue is tolerance, it works for 0-tolerance.
eg.
(#,{.)/.!.0~ tb
1 1
1 1
1 1
1 1
1 1
1 1
1 1
Сб, 10 окт 2015, Henry Rich написал(а):
B Jonas reported that ~. is inconsistent with the definition:
tb=. 1+2e_14*i:3
~: tb
1 0 0 0 0 0 0
=tb
1 1 1 0 0 0 0
You could fix that in the docs. But the problem is much more serious:
the
partitioning induced by ~. does not match that induced by u/. . That
these
be identical is a canon of J, relied on by code whose name is legion.
~. tb
1
tb </. tb
+-----+-+-+-+-+
|1 1 1|1|1|1|1|
+-----+-+-+-+-+
What!? Only 1 unique item, but 4 partitions???
The special code for #/. shares the problem:
#/.~ tb
3
only 3 items from a list of 7!
The interpreter itself relies on the equivalence, and is let down:
(#,{.)/.~ tb
|length error
| (#,{.)/.~tb
there shouldn't be any error.
IMO this will be the first bug for us to fix, if Jsoftware ever creates a
proper open-source setup that allows user contributions to get into the
official release.
Henry Rich
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm