The point is that you can not use (i.#v)=i.~v because you can get some strange results, results that you can not justify as being correct. That is, wrong results. For example:
t=: 9!:18 '' v=: 1+(t%2)*i:3 nubsieve =: i.@# = i.~ nub =: nubsieve # ] selfclassify =: nub =/ ] nubsieve v 1 0 0 0 0 0 0 nub v 1 selfclassify v 1 1 0 0 0 0 0 nubsieve v is wrong because there are items in v not tolerantly equal to the first item. Therefore nub v is wrong. Therefore selfclassify v is wrong. As well, +/,selfclassify v should be #v . In contrast, if we do the preprocessing (in the interpreter), v1=: i.~ v NB. preprocessing nubsieve v1 1 0 1 0 1 1 1 (nubsieve v1) # v NB. AKA nub v 1 1 1 1 1 selfclassify v1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 In my previous message I did not specify how the preprocessing idea would affect key. The dictionary defines key as: x u/.y ←→ (=x) u@# y . With the preprocessing idea, this becomes (=i.~x) u@# y . For example: (=i.~v) <@# v ┌───┬───┬─┬─┬─┐ │1 1│1 1│1│1│1│ └───┴───┴─┴─┴─┘ (=i.~v) <@# i.#v ┌───┬───┬─┬─┬─┐ │0 1│2 3│4│5│6│ └───┴───┴─┴─┴─┘ (=v1) <@# i.#v ┌───┬───┬─┬─┬─┐ │0 1│2 3│4│5│6│ └───┴───┴─┴─┴─┘ So just to be clear, the preprocessing i.~x happens in the interpreter, when it detects that you are applying one of these set/array operations that requires tolerant equal. It's the way I propose to define the set operations on arguments requiring tolerant comparison, and the way I propose to fix the bugs. On Fri, Oct 16, 2015 at 8:48 PM, Henry Rich <[email protected]> wrote: > I would be content to use > > (i.#v)=i.~v > > as the definition for nub etc. > > The big question to me is, Can f/. be implemented so that it uses the same > partition as is induced by (i.#v)=i.~v ? > > It feels as if yes... calculate the partition p using i.~, and then make a > pass through the result, front to back, replacing element i{p with (i{p){p . > > This has cost O(n)... and could be avoided for integers and f/.!.0 . Is > there an easy way to tell whether tolerant i.~ encountered items that were > tolerantly equal but not intolerantly equal? > > Henry Rich > > > On 10/16/2015 5:47 PM, Roger Hui wrote: > >> I agree that there are bugs, but fixing them requires some thinking. As >> Bill Lam surmised, the problem ultimately can be attributed to tolerant >> comparison, tolerant equality in particular. >> >> The J version I am using in this message is the following: >> >> 9!:14 '' >> j602/2008-03-03/16:45 >> >> but the bugs/problems exist in all J versions, going as far back and as >> far >> forward as you care to go. The tolerance I will use is the default in a >> session: >> >> ] t=: 9!:18 '' >> 5.68434e_14 >> >> t - 2^_44 NB. i.e. t is exactly 2^_44 >> 0 >> >> but any non-zero tolerance will have similar bugs/problems. >> >> Tolerance makes = non-transitive (a=b and b=c, but a may not equal c): >> >> a=: 1-t >> b=: 1 >> c=: 1+t >> a=b >> 1 >> b=c >> 1 >> a=c >> 0 >> >> Non-transitivity is OK when tolerant equality is used for its intended >> purpose, namely compensating for the vicissitudes of floating point >> representation and floating point computations. It is OK if you are >> comparing scalar numbers, but causes problems when it is used with >> insufficient care to deal with sets and arrays. Non-transitivity >> invalidates some properties, important properties, that we take for >> granted; properties that we assume without being conscious that we are >> assuming them. >> >> The primitives that are affected are as follows. (There may be a few >> other >> more minor ones.) >> >> i. dyad based on = >> ~. monad based on = or i. >> ~: monad based on = or i. >> = dyad based on ~. = >> f/. dyad based on i. >> >> The dictionary entry on the monad ~. has it "correct", among other >> possible >> correct definitions: >> >> More precisely, the nub is found by selecting the leading item, >> suppressing >> from the argument all items tolerantly equal to it, selecting the next >> remaining item, and so on. >> >> >> I am loath to use this in the implementation because it is O(n*2). More >> importantly, it is "correct" only if you believe that nub or uniqueness >> makes >> sense when = is non-transitive. For example, the following verb >> implements >> the dictionary definition for vectors: >> >> nub=: 3 : 0 >> z=. '' >> while. *#y do. >> z=. z,{.y >> y=. y#~y~:{.y >> end. >> z >> ) >> >> When = is transitive, as it is when tolerance is not involved, "nub" is >> fine: >> >> nub 'mississippi' >> misp >> nub 3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 >> 3 1 4 5 9 2 6 8 7 >> >> But if tolerant equality is involved: >> >> X=: 1 + (t%2) * ?.1600$100 >> # nub X >> 28 >> # nub /:~X >> 34 >> # nub \:~X >> 34 >> # nub X {~ ?~#X >> 27 >> # nub X {~ ?~#X >> 28 >> /:~ ~. 3 : '# nub X {~ ?~ #X' "0 i. 1e4 >> 23 24 25 26 27 28 29 30 31 32 >> /:~ ~. 3 : '# nub X {~ ?~ #X' "0 i. 3e4 >> 23 24 25 26 27 28 29 30 31 32 >> >> That is, if you permute X, the number of "unique items" changes. The last >> two expressions show that the number of unique items can range from 23 to >> 32 when you permute X. Note also that 34 can be the number of unique >> items >> but does not come up as a result in 3e4 trial random permutations. The >> problem is not the particular definition in the dictionary or its >> implementation above, but with ANY definition of nub based on tolerant and >> therefore non-transitive equality. >> >> To discuss the bugs that Henry discussed, I'll use the following array. >> (Similar to the one Henry used.) >> >> v=: 1+(t%2)*i:3 >> ~. v >> 1 >> ~: v >> 1 0 0 0 0 0 0 >> = v >> 1 1 0 0 0 0 0 >> v </. v >> ┌───┬───┬─┬─┬─┐ >> │1 1│1 1│1│1│1│ >> └───┴───┴─┴─┴─┘ >> #/.~ v >> 2 1 >> >> The problem here is that the "idiom" (i.#v)=i.~v is used or assumed >> for computing >> the nub or the nubsieve. The same idiom has been used in APL since time >> immemorial for such purposes, but is not valid or, at the very least, >> gives >> misleading results, when transitivity does not hold. >> >> i.~ v >> 0 0 1 1 2 3 4 >> >> (i.#v) = i.~ v >> 1 0 0 0 0 0 0 >> ~:v >> 1 0 0 0 0 0 0 >> >> v #~ (i.#v)=i.~v >> 1 >> ~.v >> 1 >> >> (v#~(i.#v)=i.~v) =/ v >> 1 1 0 0 0 0 0 >> =v >> 1 1 0 0 0 0 0 >> >> I believe that x i. y is correct, the index of the first item in x that is >> tolerantly equal to an item of y, modeled as follows: >> >> ix=: #@[ - +/ @ (+./\) @ (=/~) >> ix~ v >> 0 0 1 1 2 3 4 >> i.~ v >> 0 0 1 1 2 3 4 >> >> But using the result of i. in other set type things such as nub, key, etc. >> requires more analysis and care. So what can be done? >> >> a. Use zero tolerance. This can be done via the fit conjunction, =!.0, >> ~:!.0, etc., without doing anything drastic (or anything at all) to the >> system. >> >> b. For the primitives in question, on an argument that uses tolerant >> equality, >> pre-process the argument with i.~ and, if necessary, post-process by >> selecting from the original argument. The result i.~ are representatives >> for the original items; comparisons on these representatives are >> transitive. For example: >> >> ] v1=: i.~ v NB. representatives of v >> 0 0 1 1 2 3 4 >> >> (i.#v1) = i.~ v1 >> 1 0 1 0 1 1 1 >> ~: v1 NB. AKA ~:v >> 1 0 1 0 1 1 1 >> >> v #~ ~:v1 NB. AKA ~.v, post-processing by selecting from v >> 1 1 1 1 1 >> (t%2) %~ _1 + v #~ ~:v1 >> _3 _1 1 2 3 >> >> (~.v1)=/v1 NB. AKA =v >> 1 1 0 0 0 0 0 >> 0 0 1 1 0 0 0 >> 0 0 0 0 1 0 0 >> 0 0 0 0 0 1 0 >> 0 0 0 0 0 0 1 >> >> #/.~ v1 NB. AKA #/.~v >> 2 2 1 1 1 >> >> What about the "number of unique items" under permutation example? >> >> X1=: i.~ X NB. representatives of X >> /:~ ~. 3 : '# nub X1 {~ ?~ #X1' "0 i. 3e4 >> 34 >> >> >> >> >> On Sat, Oct 10, 2015 at 11:40 PM, bill lam <[email protected]> wrote: >> >> I think the issue is tolerance, it works for 0-tolerance. >>> eg. >>> (#,{.)/.!.0~ tb >>> 1 1 >>> 1 1 >>> 1 1 >>> 1 1 >>> 1 1 >>> 1 1 >>> 1 1 >>> >>> Сб, 10 окт 2015, Henry Rich написал(а): >>> >>>> B Jonas reported that ~. is inconsistent with the definition: >>>> >>>> tb=. 1+2e_14*i:3 >>>> >>>> ~: tb >>>> >>>> 1 0 0 0 0 0 0 >>>> >>>> =tb >>>> >>>> 1 1 1 0 0 0 0 >>>> >>>> >>>> You could fix that in the docs. But the problem is much more serious: >>>> >>> the >>> >>>> partitioning induced by ~. does not match that induced by u/. . That >>>> >>> these >>> >>>> be identical is a canon of J, relied on by code whose name is legion. >>>> >>>> >>>> ~. tb >>>> >>>> 1 >>>> >>>> tb </. tb >>>> >>>> +-----+-+-+-+-+ >>>> >>>> |1 1 1|1|1|1|1| >>>> >>>> +-----+-+-+-+-+ >>>> >>>> >>>> What!? Only 1 unique item, but 4 partitions??? >>>> >>>> >>>> >>>> >>>> The special code for #/. shares the problem: >>>> >>>> >>>> #/.~ tb >>>> >>>> 3 >>>> >>>> >>>> only 3 items from a list of 7! >>>> >>>> >>>> >>>> The interpreter itself relies on the equivalence, and is let down: >>>> >>>> (#,{.)/.~ tb >>>> >>>> |length error >>>> >>>> | (#,{.)/.~tb >>>> >>>> >>>> there shouldn't be any error. >>>> >>>> >>>> >>>> IMO this will be the first bug for us to fix, if Jsoftware ever creates >>>> a >>>> proper open-source setup that allows user contributions to get into the >>>> official release. >>>> >>>> Henry Rich >>>> >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> >>> -- >>> regards, >>> ==================================================== >>> GPG key 1024D/4434BAB3 2008-08-24 >>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
