Re: [Jprogramming] order

Elijah Stone Sat, 09 Apr 2022 02:27:09 -0700

On Sat, 9 Apr 2022, Raul Miller wrote:

People will say that certain algorithms, such as hashing, are highlyefficient. But these assertions are quite often not accompanied byadequate benchmarking on large datasets. And, these approaches oftenhave inefficient worst case behavior.

Hashing is O(1) (or, if you prefer, O(#y) for ~.y, same as sorting). Asufficiently smart(tm) hash function will avoid inordinate collisionrates, so I am not sure what worst case behaviour you are referring to.

If you are able to share your 1e8-sized dataset where sorting beforeremoving duplicates was faster than using ~., that would be great. MaybeI can make it faster :)

For me, ~. on a vector of 1e8 integers is always faster than /:~ alone,regardless of the duplication rate, but the specifics depend on the shapeof your data (matrix? boxed? float? note /: is intolerant but i. et al arenot by default).

(I will note that the algorithm I am thinking of would only be useful forlarge datasets--it is useless for small ones--and it has a distinctadvantage over sorting in that domain.)

All that aside, though, I think the original question has value even if itturns out that an in-order ~. is always as fast as an out-of-order one._In general_, putting items in a certain order creates information, and itmay be possible to profit by avoiding the need to create that information.


It occurs to me that there is a cop-out method: use !., as in ~.!.1.  This

has precedent in i.!.1, but I do not like it. And I think the method canalso be used for -., resulting in ambiguities: should x -.!.1 y mean theorder of the result does not matter, or that the arguments are sorted? (Iknow there is currently no -.!.1, but it could plausibly exist.)

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] order

Reply via email to