Re: Complexity nomenclature

2016-03-19 Thread Nordlöw via Digitalmars-d
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
These complexities must be reflected in the name of the 
primitives. Or perhaps it's okay to conflate a couple of them. 
I know names are something we're awfully good at discussing 
:o). Destroy!



Andrei


I'm currently sitting in a software sector with one of the most 
demands on software stability and predictability there is. A key 
issue here is time-deterministic execution. I've seen no language 
that addresses this problem. What about adding yet another code 
qualifier, say @deterministic, that either


- forbids non-time-determinstic execution paths such as 
`if-statements` or perhaps better
- makes the compiler generate code that strives to be as 
deterministic as possible, for instance executing all brances of 
a conditional statement.


and of course checking this transitively over function calls. 
This would have to take into account direct or in-direct 
recursions which might become complicated.


I'm of course also aware of the inherent variations in microcode 
execution time for most CPU architectures of today. Despite this 
fact, safety-critical systems (in for instance aerospace) uses 
the architectures and are in need of tools that can do as good as 
currently possible at least at the source code level.


Destroy!


Re: Complexity nomenclature

2016-03-19 Thread Nordlöw via Digitalmars-d
On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu 
wrote:
Then what do we do with more complex relationships like O((m + 
n) log n) etc.


What about a compile-time expression tree similar to how, for 
instance, `BaseUnit`, `ScaledUnit`, (and suggestedly 
`ShiftedUnit` and `LinearUnit`), are defined and used in David 
Nadlinger's std.units?


Re: Complexity nomenclature

2015-12-10 Thread Jonny via Digitalmars-d
On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu 
wrote:

On 12/07/2015 09:43 AM, Timon Gehr wrote:


1-2) BigO("1")
3) BigO("count")
4) BigO("distance(first,last)")
5) BigO("ilist.size()")

There's also this:
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:

Well you'd need multiple terms if you want to say things like,
"inserting a range into an array is O(array[].walkLength +
r.walkLength)." When you insert a range in a binary search 
tree, the

complexity would be O(log(array[].walkLength) * r.walkLength).


BigO("array[].walkLength + r.walkLength");
BigO("log(array[].walkLength) * r.walkLength");


These are still expressible without a DSL: 
BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.


Somewhat independently of DSL or not: At this point I'm unclear 
whether supporting free variables with arbitrary names is a 
good thing. The key to unleashing the power of BigO is to 
compute it when combining functions, and the names seem to be 
specific to the function and therefore not easy to combine.



Andrei


There are many "operations" that can be carried out on methods.

Many are composable and decomposable in the exact same way and 
can be generalized as an algorithm:


let f,f1,f2, etc be functions,

if f = f1 + f2
then L(f) = L(f1) + L(f2)


Hence if L is a linear operator, this always holds. Addition(+) 
is linear, BigO is linear. Composing linear operators is 
linear(but it doesn't have an inverse like addition).



If D supported some generic way to handle such linear operators 
acting on methods, with calls to other methods as decomposing the 
method, one could do many cool things quite easily in D. BigO 
would just be one.



e.g.,

void foo()
{
bar1();
bar2();
for(auto i = 1; i < 100; i++)
   for(auto j = 1; j < 100; j++)
   bar3();
}

then we can see L(foo) = L(bar1) + L(bar2) + X(n^2)*L(bar3)

(Here, X(n^2) is because we have a double nested for loop which 
multiples the complexity by n^2 for big notation. To be more 
precise, we should refactor all code inside the function in such 
a way to make things precise, i.e.,


void foo()
{
bar1();
bar2();
}

void fooTemp()
{
for(auto i = 1; i < 100; i++)
   for(auto j = 1; j < 100; j++)
   bar3();
}

which then gives the more precise decomposition as

L(foo) = L(bar1) + L(bar2) + L(fooTemp)


The main point of all this, is that L can be many things. If one 
has the complete hierarchical call stack(with root = Main) then L 
can be scene as recursively working on the tree.


In this case, addition would rather be "insertion into tree". We 
could easily check relations such as if foo is called, calls, or 
disjoint from bar.



We can, as already stated, implement complexity analysis 
automatically:


BigO - The sup of the complexity of a function assuming all 
unknowns(loops, user input, unknown sub-function calls) have 
maximal complexity.
bigO - The sup of the complexity of all functions assuming all 
unknowns have 0 complexity.

LittleO - The inf of complexity
litleO - ...


The point, by simply implementing a decomposable structure on the 
"call hierarchy", one can implement all kinds of cool stuff. If 
it is exposed to D at run time, then even more amazing things can 
happen.


e.g., one could implement a real time profiler. L would be an 
"operator" that simply sends the current ticks when first called 
and when returning. The "addition" is really just code that does 
the computations on the differences it is getting.


Imagine hitting some hot key and your current app showing a 
tree(something like a fractal tree) where each node is a function 
call and sub nodes are functions that are the functions that the 
function calls("uses").


Color the tree based on the use rate(count how many times the 
function is called, or even accumulate hierarchically) or the 
execution time, or even BigO. You'll then see a visual of where 
the program is active the most. It might sort of look like the 
heat distribution map of the earth.


I realize it is more complex to implement than I'm making it out, 
but it's basically tree traversal and hooks stuff.


Once you start allowing meta coding, the power becomes magnified:

e.g. (psuedo-code)

// Define generic meta code function composer(the + in the 
algorithm) for BigO

@function!composer!BigO(functions)
{
let f1, ..., fn = functions
return function!composer!BigO(f1) + ... + 
function!composer!BigO(f2)


// Would need to separate parse f1 in such a way that it is 
composed of only calls to functions or none calls(the loop stuff 
I mentioned above)

}

// The Linear operator BigO that returns the complexity of a 
function f

@function!composer!BigO(f)
{
   return complexity(f);
}


// Explicitly specify complexity for foo
void foo()
[function!composer!BigO() { return 0; }
{
...
}




The above represents a mock way to allow the programmer to "hook" 
into the compilers call 

Re: Complexity nomenclature

2015-12-08 Thread ZombineDev via Digitalmars-d
On Monday, 7 December 2015 at 16:15:46 UTC, Andrei Alexandrescu 
wrote:


These are still expressible without a DSL: 
BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.




We can remove the use of strings if we tag walkLength with BigO:

// this:
BigO(Atom("array[].walkLength") + Atom("r.walkLength"))

//become this:
@(complexity!(array[].walkLength) + complexity!(r.walkLength))

Or if we rename complexity to bigO:

@(bigO!(array[].walkLength) + bigO!(r.walkLength))



Re: Complexity nomenclature

2015-12-08 Thread Jonny via Digitalmars-d
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
Consider the collections universe. So we have an imperative 
primitive like:


c.insertAfter(r, x)

where c is a collection, r is a range previously extracted from 
c, and x is a value convertible to collection's element type. 
The expression imperatively inserts x in the collection right 
after r.


Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection 
(e.g. c is an array)


* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the 
primitives. Or perhaps it's okay to conflate a couple of them. 
I know names are something we're awfully good at discussing 
:o). Destroy!



Andrei


Why not create the capability to get the complexity by the user:

writeln(c.insertAfter(r, null).complexity);

If all algorithms implemented such a feature, one could have a 
compile time argument to display the complexity of the algorithms 
along with profiling.


Complexity is not important except when profiling. It's also 
generally static depending on the types rather than the values of 
the types.




Re: Complexity nomenclature

2015-12-07 Thread Timon Gehr via Digitalmars-d

On 12/07/2015 02:46 PM, Andrei Alexandrescu wrote:

On 12/7/15 5:14 AM, Timon Gehr wrote:

D does not allow overloading of syntax to the extent necessary to make
similar things really pleasant in the long run, and it has been
repeatedly argued that this is a good thing; that custom parsing should
be used instead. It is easy to align the parser with D syntax. Anyway,
syntax is not the problem here, and the implementation can be downgraded
to not support parsing and/or proper names at any point.


I fail to see how no parens after log or "^" in lieu "^^" would make a
positive difference.


Neither have I attempted to show anything like that. All arguments in 
this debate are obvious and boring, so no need to discuss it.



What would be a few examples of things that won't
work pleasantly enough?
...


You gave this example:
http://en.cppreference.com/w/cpp/container/forward_list/insert_after

1-2) BigO("1")
3) BigO("count")
4) BigO("distance(first,last)")
5) BigO("ilist.size()")

There's also this:
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:

Well you'd need multiple terms if you want to say things like,
"inserting a range into an array is O(array[].walkLength +
r.walkLength)." When you insert a range in a binary search tree, the
complexity would be O(log(array[].walkLength) * r.walkLength).


BigO("array[].walkLength + r.walkLength");
BigO("log(array[].walkLength) * r.walkLength");




Re: Complexity nomenclature

2015-12-07 Thread Andrei Alexandrescu via Digitalmars-d

On 12/7/15 5:14 AM, Timon Gehr wrote:

D does not allow overloading of syntax to the extent necessary to make
similar things really pleasant in the long run, and it has been
repeatedly argued that this is a good thing; that custom parsing should
be used instead. It is easy to align the parser with D syntax. Anyway,
syntax is not the problem here, and the implementation can be downgraded
to not support parsing and/or proper names at any point.


I fail to see how no parens after log or "^" in lieu "^^" would make a 
positive difference. What would be a few examples of things that won't 
work pleasantly enough?


I'm not sure whether the DSL argument is well applied here. Clearly 
using D expressions for e.g. regex or SQL syntax would be at best 
avoided in favor of a DSL. In this case we're defining an algebra over 
restricted expressions, which are a subset of the usual mathematical 
expressions that D's syntax already emulates.


Looks like a debate on whether to choose one standard language vs. an 
obscure one (in this case invented ad-hoc) is starting. This is deja vu 
all over again :o).


I hope you won't mind if I give your idea a slightly different angle.


Andrei



Re: Complexity nomenclature

2015-12-07 Thread Andrei Alexandrescu via Digitalmars-d

On 12/07/2015 09:43 AM, Timon Gehr wrote:


1-2) BigO("1")
3) BigO("count")
4) BigO("distance(first,last)")
5) BigO("ilist.size()")

There's also this:
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:

Well you'd need multiple terms if you want to say things like,
"inserting a range into an array is O(array[].walkLength +
r.walkLength)." When you insert a range in a binary search tree, the
complexity would be O(log(array[].walkLength) * r.walkLength).


BigO("array[].walkLength + r.walkLength");
BigO("log(array[].walkLength) * r.walkLength");


These are still expressible without a DSL: 
BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.


Somewhat independently of DSL or not: At this point I'm unclear whether 
supporting free variables with arbitrary names is a good thing. The key 
to unleashing the power of BigO is to compute it when combining 
functions, and the names seem to be specific to the function and 
therefore not easy to combine.



Andrei



Re: Complexity nomenclature

2015-12-07 Thread Timon Gehr via Digitalmars-d

On 12/07/2015 02:07 AM, Andrei Alexandrescu wrote:

On 12/06/2015 07:50 PM, Timon Gehr wrote:

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:

>The next step up the expressiveness scale would be to have a
>sum-of-products representation.
>
>Proof of concept (disclaimer: hacked together in the middle of the
>night, and not tested thoroughly):
>
>http://dpaste.dzfl.pl/d1512905accd
>
>I think this general approach is probably close to the sweet spot. ...
>

Brilliant! ...


I have noticed another thing. The comparison operator is an
underapproximation (it sometimes returns NaN when ordering would
actually be possible).

E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

Interesting. It would be nice if the final version had a complete
decision procedure for ⊆.


I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei


Yes, certainly. By complete, I mean it orders everything that can be 
ordered.


Re: Complexity nomenclature

2015-12-07 Thread Timon Gehr via Digitalmars-d

On 12/07/2015 02:05 AM, Andrei Alexandrescu wrote:

On 12/06/2015 06:21 PM, Timon Gehr wrote:

The implementation of power on BigO given does not actually work in
general (there should have been an enforcement that makes sure there is
only one summand). I'll think about whether and how it can be made to
work for multiple summands.


No matter, you may always use runtime assertions - after all it's all
during compilation. That's the beauty of it!
...


Even better: I was wrong when I claimed it does not work.
For 0<=x, it actually works as-is:

O((f+g)^x) = O(max(f,g)^x) = O(max(f^x,g^x)) = O(f^x+g^x).


Define global constants K, N, and N1 through N7. K is constant
complexity
all others are free variables.

Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

On a the phone sry typos.


I generally tend to avoid DSLs based solely on operator overloading, as
they don't always work and hence are awkward to evolve. Here, the
biggest current nuisance is that the variable names are not very
descriptive.


The bright side is you get to standardize names. If you limit names to
K, N, and N1 through N7 then you can always impose to APIs the meaning
of these.
...


Well, how?


Another bright side is people don't need to learn a new, juuust slightly
different grammar for complexity expressions - just use D. For example
the grammar you defined allows log n without parens - what's the
priority of log compared to power etc?


There is no priority. The parser explicitly rejects log n^x. :-)


Why is power ^ in this notation and ^^ in D?


Because ^ is taken for bitwise xor in D due to backwards-compatibility 
constraints.



All these differences without a distinction are gratuitous.
Just use D.
...


D does not allow overloading of syntax to the extent necessary to make 
similar things really pleasant in the long run, and it has been 
repeatedly argued that this is a good thing; that custom parsing should 
be used instead. It is easy to align the parser with D syntax. Anyway, 
syntax is not the problem here, and the implementation can be downgraded 
to not support parsing and/or proper names at any point.




Re: Complexity nomenclature

2015-12-07 Thread Timon Gehr via Digitalmars-d

On 12/07/2015 05:15 PM, Andrei Alexandrescu wrote:

On 12/07/2015 09:43 AM, Timon Gehr wrote:


1-2) BigO("1")
3) BigO("count")
4) BigO("distance(first,last)")
5) BigO("ilist.size()")

There's also this:
On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:

Well you'd need multiple terms if you want to say things like,
"inserting a range into an array is O(array[].walkLength +
r.walkLength)." When you insert a range in a binary search tree, the
complexity would be O(log(array[].walkLength) * r.walkLength).


BigO("array[].walkLength + r.walkLength");
BigO("log(array[].walkLength) * r.walkLength");


These are still expressible without a DSL:
BigO(Atom("array[].walkLength") + Atom("r.walkLength")) etc.
...


This just goes from one string to two strings and adds some noise on top 
of it. Also, now, what is D and what is in a string is arbitrary.


BigO(Var.array[].walkLength + Var.r.walkLength);


Somewhat independently of DSL or not:


Yes, notation does not matter at this stage.


At this point I'm unclear whether
supporting free variables with arbitrary names is a good thing.


Restricting the set of names to 8 predefined ones does not simplify 
anything. It just means that a mapping of predefined names to real names 
has to be carefully managed manually and clashes have to be avoided, all 
while limiting the value of BigO as documentation.



The key
to unleashing the power of BigO is to compute it when combining
functions, and the names seem to be specific to the function and
therefore not easy to combine.
...


Just substitute something else for those names to get rid of them. That 
is how passing of parameters is handled. Probably we should get some 
actual examples where combination should work to see what could be done.


If you have:

void foo(int n,int m) @BigO("n*m^2"){ ... }

Something like this is easy to implement:

// compute runtime bound from bounds on parameters:
// first expression passed is O(x^3) and second expression passed
// is O(log(y)^(1/2)).
enum bound = getRuntimeBound!foo(BigO("x^3"),BigO("log(y)^(1/2)"));
static assert(bound == BigO("x^3*log(y)"));

Note how the end user does not need to worry much about names.


Re: Complexity nomenclature

2015-12-06 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 08:59 AM, Ola Fosheim Grøstad wrote:

On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:

Some upper bounds are incomparable, so there would not be a total
order. But that is not a problem.


It is a problem in all cases as you usually dont have an optimal bound.


Are you complaining that the given implementation does not support 
'min', or what are you trying to say here?



And with your approach that will most certainly be guaranteed to happen.


How is that specific to my approach? I only showed a more expressive 
BigO implementation.



Comparing bounds does not mean you are comparing running time.
...


BigO represents a set of functions. Comparing BigO checks for subset 
inclusion.



O(1) implies O(f(x)), O(N) implies O(N^2).



For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }.

O(1) ⊆ O(f(x)), O(N) ⊆ O(N²).

<= checks ⊆.
== checks =.

(The final implementation should use exact fractions instead of doubles.)


You can also get tighter bounds for specific input models.



Yes, you can.



Re: Complexity nomenclature

2015-12-06 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 03:39 PM, Timon Gehr wrote:


For our purposes, O(f) = { g | ∃c. ∀x⃗. |g(x⃗)| ≤ c·|f(x⃗)|+c }.


Hm, this does not actually work too well. E.g., we want

O(n+m) ⊆ O(n log m + m log n).

This breaks down with that definition if we e.g. fix m=1 and let n vary.

Anyway, I think this can be fixed.


Re: Complexity nomenclature

2015-12-06 Thread Ola Fosheim Grøstad via Digitalmars-d

On Sunday, 6 December 2015 at 03:24:24 UTC, Timon Gehr wrote:
Some upper bounds are incomparable, so there would not be a 
total order. But that is not a problem.


It is a problem in all cases as you usually dont have an optimal 
bound. And with your approach that will most certainly be 
guaranteed to happen. Comparing bounds does not mean you are 
comparing running time.


O(1) implies O(f(x)), O(N) implies O(N^2).

You can also get tighter bounds for specific input models.



Re: Complexity nomenclature

2015-12-06 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:

>The next step up the expressiveness scale would be to have a
>sum-of-products representation.
>
>Proof of concept (disclaimer: hacked together in the middle of the
>night, and not tested thoroughly):
>
>http://dpaste.dzfl.pl/d1512905accd
>
>I think this general approach is probably close to the sweet spot. ...
>

Brilliant! ...


I have noticed another thing. The comparison operator is an 
underapproximation (it sometimes returns NaN when ordering would 
actually be possible).


E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

Interesting. It would be nice if the final version had a complete 
decision procedure for ⊆.


Re: Complexity nomenclature

2015-12-06 Thread Andrei Alexandrescu via Digitalmars-d

On 12/06/2015 06:21 PM, Timon Gehr wrote:

The implementation of power on BigO given does not actually work in
general (there should have been an enforcement that makes sure there is
only one summand). I'll think about whether and how it can be made to
work for multiple summands.


No matter, you may always use runtime assertions - after all it's all 
during compilation. That's the beauty of it!



Define global constants K, N, and N1 through N7. K is constant complexity
all others are free variables.

Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

On a the phone sry typos.


I generally tend to avoid DSLs based solely on operator overloading, as
they don't always work and hence are awkward to evolve. Here, the
biggest current nuisance is that the variable names are not very
descriptive.


The bright side is you get to standardize names. If you limit names to 
K, N, and N1 through N7 then you can always impose to APIs the meaning 
of these.


Another bright side is people don't need to learn a new, juuust slightly 
different grammar for complexity expressions - just use D. For example 
the grammar you defined allows log n without parens - what's the 
priority of log compared to power etc? Why is power ^ in this notation 
and ^^ in D? All these differences without a distinction are gratuitous. 
Just use D.



If we'll go with a log(BigO) function, we possibly want to
make BigO closed under log without approximating iterated logarithms:

struct Term{
 Variable n;
 Fraction[] exponents; // exponents[0] is exponent of n,
   // exponents[1] is exponent of log n,
   // exponents[2] is exponent of log log n, ...
}

Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence
every BigO has a well-defined logarithm.


[1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)).
 O(log(f)+log(g)) ⊆ O(max(log(f),log(g)))
  = O(log(max(f,g)))
  ⊆ O(log(f+g)).


Yah, the stuff under log must be restricted. Here's the grammar I'm 
toying with:


Atom ::= K | N | N1 | ... | N7
SimpleExp ::= SimpleTerm ('+' SimpleTerm)*
SimpleTerm ::= SimpleFactor ('*' SimpleFactor)*
SimpleFactor ::= Atom ('^^' double)? | '(' SimpleExp ')'
BigO ::= Term ('+' Term)*
Term ::= SimpleFactor ('*' 'log' '(' SimpleExp ')' ('^^' double)?)?

(I used regex notations for "optional" and "zero or more".)

This is expressible with D's native operations (so no need for custom 
parsing) and covers, I think, what we need. It could be further 
simplified if we move some of the grammar's restrictions to runtime 
(e.g. no need for SimpleXxx, some expressions can be forced to be simple 
during "runtime").



Andrei



Re: Complexity nomenclature

2015-12-06 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:

>
>The next step up the expressiveness scale would be to have a
>sum-of-products representation.
>
>Proof of concept (disclaimer: hacked together in the middle of the
>night, and not tested thoroughly):
>
>http://dpaste.dzfl.pl/d1512905accd
>
>I think this general approach is probably close to the sweet spot. ...
>

Brilliant! My wife has a work emergency so I've been with the kids all day,
but here's what can be done to make this simpler.

Use D parsing and eliminate the whole parsing routine. Add multiply and
power are all defined so you only need log of bigo.
...


The implementation of power on BigO given does not actually work in 
general (there should have been an enforcement that makes sure there is 
only one summand). I'll think about whether and how it can be made to 
work for multiple summands.



Define global constants K, N, and N1 through N7. K is constant complexity
all others are free variables.

Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

On a the phone sry typos.


I generally tend to avoid DSLs based solely on operator overloading, as 
they don't always work and hence are awkward to evolve. Here, the 
biggest current nuisance is that the variable names are not very 
descriptive. If we'll go with a log(BigO) function, we possibly want to 
make BigO closed under log without approximating iterated logarithms:


struct Term{
Variable n;
Fraction[] exponents; // exponents[0] is exponent of n,
  // exponents[1] is exponent of log n,
  // exponents[2] is exponent of log log n, ...
}

Then O(log(f^c*g^d)) = O(log(f)+log(g)) = O(log(f+g)) [1], and hence 
every BigO has a well-defined logarithm.



[1] O(log(f+g)) ⊆ O(log(f*g)) = O(log(f)+log(g)).
O(log(f)+log(g)) ⊆ O(max(log(f),log(g)))
 = O(log(max(f,g)))
 ⊆ O(log(f+g)).


Re: Complexity nomenclature

2015-12-06 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 10:35 PM, Ola Fosheim Grøstad wrote:

On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:

Are you complaining that the given implementation does not support
'min', or what are you trying to say here?


I am saying that comparing bounds is not the same as comparing running
time of implemented algorithms.


Yes, that's what you said later down the post. It's completely unrelated 
to the sentence you claimed was false.




Re: Complexity nomenclature

2015-12-06 Thread Ola Fosheim Grøstad via Digitalmars-d

On Sunday, 6 December 2015 at 23:49:00 UTC, Timon Gehr wrote:
Yes, that's what you said later down the post. It's completely 
unrelated to the sentence you claimed was false.


i would assume that there would have to be a usecase for 
something added to a standard library. Based on the presented use 
case it is like using the classifications "younger than 100" and 
"younger than 16", apply them randomly to indivduals of the same 
age and use the classifications for making decisions about 
whether they should be allowed to see adult movies or not. 
Putting an ordering on the classification is in that case useless.


Re: Complexity nomenclature

2015-12-06 Thread Andrei Alexandrescu via Digitalmars-d

On 12/06/2015 07:50 PM, Timon Gehr wrote:

On 12/06/2015 08:48 PM, Andrei Alexandrescu wrote:

>The next step up the expressiveness scale would be to have a
>sum-of-products representation.
>
>Proof of concept (disclaimer: hacked together in the middle of the
>night, and not tested thoroughly):
>
>http://dpaste.dzfl.pl/d1512905accd
>
>I think this general approach is probably close to the sweet spot. ...
>

Brilliant! ...


I have noticed another thing. The comparison operator is an
underapproximation (it sometimes returns NaN when ordering would
actually be possible).

E.g., O(n·m) ⊆ O(n²+m²), because n·m ≤ n²+m².

Interesting. It would be nice if the final version had a complete
decision procedure for ⊆.


I think it's okay to leave N^^2 + N1 and N + N1^^2 unordered. -- Andrei


Re: Complexity nomenclature

2015-12-06 Thread Andrei Alexandrescu via Digitalmars-d
Timon Gehr  wrote:
> On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:
>> On 12/04/2015 10:24 PM, Timon Gehr wrote:
>>> void foo(@(BigO.linear) int n,@(BigO.linear) int m);
>>> 
>>> But UDAs for parameters are not supported.
>> 
>> That's actually pretty neat and easy to work around like this:
>> 
>> void foo(int n, int m) @(BigOParam!2(BigO.linear, BigO.linear);
>> 
>> In fact I went through the implementation but soon I hit a wall: what's
>> the _relationship_ between the two growths? It may be the sum O(m + n)
>> but also the product O(m * n). So the operator must be encoded as well.
>> 
>> Then what do we do with more complex relationships like O((m + n) log n)
>> etc.
>> 
>> Then once you get to some reasonable formula, what's the ordering on top
>> of these complexities? Probably difficult.
>> ...
> 
> Some upper bounds are incomparable, so there would not be a total order. 
> But that is not a problem.
> 
>> I gave up on this for the time being. Ideas welcome.
>> ...
> 
> The next step up the expressiveness scale would be to have a 
> sum-of-products representation.
> 
> Proof of concept (disclaimer: hacked together in the middle of the 
> night, and not tested thoroughly):
> 
> http://dpaste.dzfl.pl/d1512905accd
> 
> I think this general approach is probably close to the sweet spot. (The 
> implementation is not feature-complete yet, though. It would be nice if 
> it supported automatically computing a new asymptotic runtime bound from 
> asymptotic bounds on the arguments.)
> 

Brilliant! My wife has a work emergency so I've been with the kids all day,
but here's what can be done to make this simpler. 

Use D parsing and eliminate the whole parsing routine. Add multiply and
power are all defined so you only need log of bigo.

Define global constants K, N, and N1 through N7. K is constant complexity
all others are free variables.

Then complexities are regular D expressions e.g BigO(N^2 * log(N)).

On a the phone sry typos.


Re: Complexity nomenclature

2015-12-06 Thread Ola Fosheim Grøstad via Digitalmars-d

On Sunday, 6 December 2015 at 14:39:05 UTC, Timon Gehr wrote:
Are you complaining that the given implementation does not 
support 'min', or what are you trying to say here?


I am saying that comparing bounds is not the same as comparing 
running time of implemented algorithms. Insertion sort is both 
O(n^2) and O(n^3), but if you run it on a sorted array where each 
element have been swapped with neighbouring elements 16 times, 
then it is O(N).


So these derived bounds are too loose to be useful, generic 
algorithms cannot make use of them beyond the trivial case.


BigO represents a set of functions. Comparing BigO checks for 
subset inclusion.


But what can you use it for? When you compose algorithms and even 
run an optimizer over it, then combining a O(N^2) with O(N) kan 
turn into O(1). You need advanced compiler support for this to be 
valuable.



You can also get tighter bounds for specific input models.



Yes, you can.


Exactly, and when you compose/combine algorithms you often end up 
constraining the input model.




Re: Complexity nomenclature

2015-12-05 Thread BLM768 via Digitalmars-d

On Sunday, 6 December 2015 at 03:30:51 UTC, Timon Gehr wrote:

log(x^2) = 2 log x.


Why do log rules have to make everything so simple? ;)


Re: Complexity nomenclature

2015-12-05 Thread Dmitry Olshansky via Digitalmars-d

On 05-Dec-2015 01:48, Andrei Alexandrescu wrote:

On 12/04/2015 03:43 PM, Walter Bright wrote:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:

Was vaguely terrified reading this whole thread until hitting this
gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


"Even better".

http://dpaste.dzfl.pl/50340e189f92


Better still - no more strings:

http://dpaste.dzfl.pl/5ed2b3ad3043



That code is actually remarkably complete, with algebra and remarkable
constants.

Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
language.


Andrei



--
Dmitry Olshansky


Re: Complexity nomenclature

2015-12-05 Thread Andrei Alexandrescu via Digitalmars-d

On 12/05/2015 04:52 PM, BLM768 wrote:


Well, we could see how CAS libraries handle this kind of stuff (if there
_is_ one which handles it)...


CAS = ?


Really, though, with the more complex algorithms, you start running into
the kinds of issues noted in the first reply to this article:
http://www.perlmonks.org/?node_id=573138

Besides, is anyone actually going to specify that they need an algorithm
that is O(log(n) + m^2) or whatever? Or would a function just take _two_
algorithms, assert that each is O(log(n)) and O(m^2), respectively, and
then compose them itself? The complexities depend on the types of
container anyway, so in general, you should get only multi-term big-O
notation when you're using multiple containers (or Cartesian products of
a container with itself or something like that). Can't we just make sure
one container's insert() and the other container's sort() work within a
certain complexity?


Well you'd need multiple terms if you want to say things like, 
"inserting a range into an array is O(array[].walkLength + 
r.walkLength)." When you insert a range in a binary search tree, the 
complexity would be O(log(array[].walkLength) * r.walkLength).


Part of the art here, I feel, is knowing when to stop going too crazy 
about this. At this point we have a nice, round system that's simple to 
understand and use. Making it more complex should come only with a 
corresponding growth in power.



Andrei



Re: Complexity nomenclature

2015-12-05 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 12:55 AM, Andrei Alexandrescu wrote:

On 12/05/2015 04:52 PM, BLM768 wrote:


Well, we could see how CAS libraries handle this kind of stuff (if there
_is_ one which handles it)...


CAS = ?
...


Computer Algebra System, I assume.


Re: Complexity nomenclature

2015-12-05 Thread tsbockman via Digitalmars-d
On Saturday, 5 December 2015 at 23:55:16 UTC, Andrei Alexandrescu 
wrote:

On 12/05/2015 04:52 PM, BLM768 wrote:


Well, we could see how CAS libraries handle this kind of stuff 
(if there

_is_ one which handles it)...


CAS = ?


https://en.wikipedia.org/wiki/Computer_algebra_system


Re: Complexity nomenclature

2015-12-05 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 10:24 PM, Timon Gehr wrote:

void foo(@(BigO.linear) int n,@(BigO.linear) int m);

But UDAs for parameters are not supported.


That's actually pretty neat and easy to work around like this:

void foo(int n, int m) @(BigOParam!2(BigO.linear, BigO.linear);

In fact I went through the implementation but soon I hit a wall: what's 
the _relationship_ between the two growths? It may be the sum O(m + n) 
but also the product O(m * n). So the operator must be encoded as well.


Then what do we do with more complex relationships like O((m + n) log n) 
etc.


Then once you get to some reasonable formula, what's the ordering on top 
of these complexities? Probably difficult.


I gave up on this for the time being. Ideas welcome.


Andrei



Re: Complexity nomenclature

2015-12-05 Thread BLM768 via Digitalmars-d
On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu 
wrote:

On 12/04/2015 10:24 PM, Timon Gehr wrote:
In fact I went through the implementation but soon I hit a 
wall: what's the _relationship_ between the two growths? It may 
be the sum O(m + n) but also the product O(m * n). So the 
operator must be encoded as well.


Then what do we do with more complex relationships like O((m + 
n) log n) etc.


Then once you get to some reasonable formula, what's the 
ordering on top of these complexities? Probably difficult.


I gave up on this for the time being. Ideas welcome.

Andrei


Well, we could see how CAS libraries handle this kind of stuff 
(if there _is_ one which handles it)...


Really, though, with the more complex algorithms, you start 
running into the kinds of issues noted in the first reply to this 
article: http://www.perlmonks.org/?node_id=573138


Besides, is anyone actually going to specify that they need an 
algorithm that is O(log(n) + m^2) or whatever? Or would a 
function just take _two_ algorithms, assert that each is 
O(log(n)) and O(m^2), respectively, and then compose them itself? 
The complexities depend on the types of container anyway, so in 
general, you should get only multi-term big-O notation when 
you're using multiple containers (or Cartesian products of a 
container with itself or something like that). Can't we just make 
sure one container's insert() and the other container's sort() 
work within a certain complexity?


Re: Complexity nomenclature

2015-12-05 Thread H. S. Teoh via Digitalmars-d
On Sat, Dec 05, 2015 at 09:52:08PM +, BLM768 via Digitalmars-d wrote:
> On Saturday, 5 December 2015 at 20:48:16 UTC, Andrei Alexandrescu wrote:
> >On 12/04/2015 10:24 PM, Timon Gehr wrote:
> >In fact I went through the implementation but soon I hit a wall:
> >what's the _relationship_ between the two growths? It may be the sum
> >O(m + n) but also the product O(m * n). So the operator must be
> >encoded as well.
> >
> >Then what do we do with more complex relationships like O((m + n) log
> >n) etc.
> >
> >Then once you get to some reasonable formula, what's the ordering on
> >top of these complexities? Probably difficult.
> >
> >I gave up on this for the time being. Ideas welcome.
> >
> >Andrei
> 
> Well, we could see how CAS libraries handle this kind of stuff (if
> there _is_ one which handles it)...
> 
> Really, though, with the more complex algorithms, you start running
> into the kinds of issues noted in the first reply to this article:
> http://www.perlmonks.org/?node_id=573138
> 
> Besides, is anyone actually going to specify that they need an
> algorithm that is O(log(n) + m^2) or whatever? Or would a function
> just take _two_ algorithms, assert that each is O(log(n)) and O(m^2),
> respectively, and then compose them itself? The complexities depend on
> the types of container anyway, so in general, you should get only
> multi-term big-O notation when you're using multiple containers (or
> Cartesian products of a container with itself or something like that).
> Can't we just make sure one container's insert() and the other
> container's sort() work within a certain complexity?

Multi-term complexities arise from trivial graph algorithms. They are
not limited to the use of multiple containers. More precisely, the
multiple terms arise because of the structure of the graph (being
composed of nodes and edges); what the algorithms add are functions on
nodes and edges.

Unfortunately, once you have more than a single variable in your
functions, the big-O expressions rapidly become inextricably complex,
and can no longer map to nice abstractions like the reals + infinities +
infinitesimals linear ordering. For example, two graph algorithms may
be, respectively, O(V^2 + E) and O(V + E^2); it's difficult to judge
based on that which one is "superior".


T

-- 
They say that "guns don't kill people, people kill people." Well I think the 
gun helps. If you just stood there and yelled BANG, I don't think you'd kill 
too many people. -- Eddie Izzard, Dressed to Kill


Re: Complexity nomenclature

2015-12-05 Thread Timon Gehr via Digitalmars-d

On 12/05/2015 09:48 PM, Andrei Alexandrescu wrote:

On 12/04/2015 10:24 PM, Timon Gehr wrote:

void foo(@(BigO.linear) int n,@(BigO.linear) int m);

But UDAs for parameters are not supported.


That's actually pretty neat and easy to work around like this:

void foo(int n, int m) @(BigOParam!2(BigO.linear, BigO.linear);

In fact I went through the implementation but soon I hit a wall: what's
the _relationship_ between the two growths? It may be the sum O(m + n)
but also the product O(m * n). So the operator must be encoded as well.

Then what do we do with more complex relationships like O((m + n) log n)
etc.

Then once you get to some reasonable formula, what's the ordering on top
of these complexities? Probably difficult.
...


Some upper bounds are incomparable, so there would not be a total order. 
But that is not a problem.



I gave up on this for the time being. Ideas welcome.
...


The next step up the expressiveness scale would be to have a 
sum-of-products representation.


Proof of concept (disclaimer: hacked together in the middle of the 
night, and not tested thoroughly):


http://dpaste.dzfl.pl/d1512905accd

I think this general approach is probably close to the sweet spot. (The 
implementation is not feature-complete yet, though. It would be nice if 
it supported automatically computing a new asymptotic runtime bound from 
asymptotic bounds on the arguments.)


Re: Complexity nomenclature

2015-12-05 Thread BLM768 via Digitalmars-d

On Saturday, 5 December 2015 at 22:56:23 UTC, H. S. Teoh wrote:
Multi-term complexities arise from trivial graph algorithms. 
They are not limited to the use of multiple containers. More 
precisely, the multiple terms arise because of the structure of 
the graph (being composed of nodes and edges); what the 
algorithms add are functions on nodes and edges.


True. I don't expect that very many people would want to specify 
an algorithmic complexity for those operations, though. It seems 
much more rigidly defined than for lists, arrays, etc. The 
question is not really about whether "complex complexities" will 
exist but whether the user has a practical reason to care about 
specifying them.


Unfortunately, once you have more than a single variable in 
your functions, the big-O expressions rapidly become 
inextricably complex, and can no longer map to nice 
abstractions like the reals + infinities + infinitesimals 
linear ordering. For example, two graph algorithms may be, 
respectively, O(V^2 + E) and O(V + E^2); it's difficult to 
judge based on that which one is "superior".


Well, if one variable is "capped" (or at least expected to be 
"small") it's easy enough to eliminate that term. Beyond that, 
there isn't necessarily a single ordering of big-O expressions, 
but many of them can be ordered relative to a single variable. 
For instance, O(n + m^2) is less complex than O(n^2 + m) with 
respect to n (and vice versa for m). It's trickier when 
expressions are more deeply nested, but if select one term (in 
this case, n), set all the others to be constant (since none of 
them depends on n), and evaluate the resulting expression tree, 
you should get something half-usable. Some simplifying rules may 
be useful. For instance, log(x^2) should approach log(x) as x 
approaches infinity, I think. (My calculus is a bit rusty.)


Re: Complexity nomenclature

2015-12-05 Thread Timon Gehr via Digitalmars-d

On 12/06/2015 03:47 AM, BLM768 wrote:

log(x^2) should approach log(x) as x approaches infinity, I think. (My
calculus is a bit rusty.)


log(x^2) = 2 log x.


Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
Also maybe a simpler idea would just be to annotate the the 
operations with there complexity with UDAs. That way things 
that really care about the complexity can get it, and those who 
don't can ignore it. It has the benefit of being self 
documenting as well.


Yes, but can you export it reliably in generically composed 
functions?


Say you have a loop of N iterations calling a function that 
involves M operations, you want to export N*M, but then the 
caller needs to know what N and M are referring to, so you need 
some standard way of getting there. (e.g. N could be number of 
edges and M could be number of nodes or something).




Re: Complexity nomenclature

2015-12-04 Thread Jakob Ovrum via Digitalmars-d

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
When would a generic algorithm even need to know the complexity 
of the container?


A generic algorithm that uses exponential time when given the 
"wrong" inputs is a hidden danger. This can easily happen when 
composing with linear algorithms. This naming scheme, also 
employed by std.container, prevents this.


Also maybe a simpler idea would just be to annotate the the 
operations with there complexity with UDAs. That way things 
that really care about the complexity can get it, and those who 
don't can ignore it. It has the benefit of being self 
documenting as well.


If you don't care about the complexity, why are you using 
collections? Just use T[] and T[U]. Surely choosing a data 
structure is all about complexity (barring some details like 
optimizing for cache locality or small data structures).





Re: Complexity nomenclature

2015-12-04 Thread ZombineDev via Digitalmars-d

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
wrote:

On 12/3/15 10:29 PM, Jack Stouffer wrote:
On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
Alexandrescu wrote:

On 12/03/2015 09:10 PM, Idan Arye wrote:
The complexities of the operations is a property of the 
data structure
being used. If each collection type will have it's own set 
of method
names based on the complexity of operations on it, we won't 
be able to
have templated functions that operate on any kind of 
collection(or at
the very least, these functions will be really tedious to 
code).


Your premise is right but you reach the negation of the 
correct

conclusion. -- Andrei


How so? If a singly linked list and a doubly linked list have 
two
different method names for the same operation, then they 
cannot be

easily templated.


Took me a while to figure. There's a hierarchy of operations, 
e.g. if a collection implements insert, it automatically 
implements linearInsert. And so on. The collections framework 
provides these defaults, so client code that needs quick 
insert uses insert, whereas code that's fine with a linear 
upper bound uses linearInsert and captures both.


Another way to look at it: in STL container-independent code 
is near impossible because different containers use the same 
signature for operations that are different (either wrt 
iterator invalidation or complexity). My design avoids that by 
giving distinct operations distinct names.



Andrei


This sounds really complicated to use? What are the benefits?
When would a generic algorithm even need to know the complexity 
of the container?


Also maybe a simpler idea would just be to annotate the the 
operations with there complexity with UDAs. That way things 
that really care about the complexity can get it, and those who 
don't can ignore it. It has the benefit of being self 
documenting as well.


Ranges are a good example - they provide only the operations thay 
can efficiently implement. For example forward ranges could 
provide random access but at the cost of linear running time.


std.containers provides the `elem in container` operation only if 
they can implement it in logarithmic time or faster.


The fact that you can't use opIn for DList or Array is very good, 
because it prevents you from writing inefficient genric code. 
You're algorithm will manifest that it can only work with 
containers provide efficient operations and because of that you 
can be sure what its time complexity would be.


You should choose a specific data structure only because you can 
efficiently implement the algorithm you have in mind with it.


One of worst examples of the opposite is .Count extention method 
in .NET (which happens to have the same name as .Count member 
function of ICollection - one of the most used (often implicitly) 
interfaces), which has linear running time. The most horrible 
thing I have seen!


Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:
awfully popular outside of actual complexity theory papers. I 
don't really understand what calling it 'complexity' actually 
buys though.


It will make static arrays look very good. All operations are 
O(1).


Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.


I am sorry to say this, but hard performance requirements require 
O(1) on everything.


Big-Oh tells you essentially very little if it is more than O(1). 
Quick sort and insertion sort are O(N^2), bucket sort is O(N). 
That does not make bucket sort faster than quick sort or even 
insertion sort on a specific case as it is dominated by the 
number of possible values.





Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:

On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.


I am sorry to say this, but hard performance requirements 
require O(1) on everything.


And even then you cannot assume that it is real time as the 
following is O(1):


if (full_moon()) sleep(1000);

So you can have an O(1) algorithm that occasionally triggers an 
expensive constant-time/memory path that causes big problems in a 
running system. You need a hard upper bound measured in cycles.




Re: Complexity nomenclature

2015-12-04 Thread Andrea Fontana via Digitalmars-d

On Friday, 4 December 2015 at 01:37:33 UTC, Jack Stouffer wrote:
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
These complexities must be reflected in the name of the 
primitives.
I don't see why. IMO, names should convey what the function 
does, not how it does it.


I'm agree. It sounds like a bad idea.

And who knows, maybe someone will discover a more efficient way 
to implement "insertAfter" or the collection itself... We should 
change its name?








Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 4 December 2015 at 10:35:33 UTC, ZombineDev wrote:
However, here we're talking about API for containers and 
algorithms and I think that putting the complexity in the 
function signature is better than only using abstract data 
structures, because it makes you more aware of the consequences 
of your choice of data structures and algorithms that you make.
If we can figure out a way to put even more information it may 
be even better.


Well, if the algorithm can choose between containers it might be 
interesting to query the characteristics of a set of operations 
on each container. Not sure if the method name is is the best way 
to do it. Although I guess having a generic find(x) and 
find_instantly(x) would be ok, if the latter verison basically 
means less than 200 cycles in all cases. If it means less than 
2000 cycles I think it is not so useful.


Maybe it is possible to make the inference as static analysis and 
in addition make it possible for libraries to express various 
guarantees. Since it is worst case it would be acceptable that if 
analysis fails it will result in "unbounded".




Re: Complexity nomenclature

2015-12-04 Thread ZombineDev via Digitalmars-d

On Friday, 4 December 2015 at 07:50:19 UTC, Minas Mina wrote:

On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection 
(e.g. c is an array)


* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the 
primitives. Or perhaps
it's okay to conflate a couple of them. I know names are 
something we're awfully

good at discussing :o). Destroy!


Are you sure there are only 3 complexities? What if it's a 
self-balancing tree?


I'm also not sure it is a good idea to put the complexity in 
the name of the primitive. Do any other algorithm libraries do 
that?


I agree -- putting the complexity (eh, running time) to the 
primitive's name seems like a bad idea.


I believe that stability is a more important "feature" of a 
sorting algorithm than its running time, because you can make 
implications about it and use them for your own algorithms (I 
use it for implementing delta compression) and affects the 
output.


Running time is still a good guarantee to have, but if `sort` 
runs O(n^2) or O(nlogn) is not going to affect how your output 
will look like!


That's wrong. Imagine a sort method that calls to an API server 
to sort the numbers. Given a good internet connection, this 
implementation detail will not affect the correctness of the 
result. But I'm sure that no sane person would want to use this 
method. When the only job of a function is to implement an 
algorithm then the only thing you should care about is the time 
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.


Re: Complexity nomenclature

2015-12-04 Thread ZombineDev via Digitalmars-d
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:

On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.


I am sorry to say this, but hard performance requirements 
require O(1) on everything.


Big-Oh tells you essentially very little if it is more than 
O(1). Quick sort and insertion sort are O(N^2), bucket sort is 
O(N). That does not make bucket sort faster than quick sort or 
even insertion sort on a specific case as it is dominated by 
the number of possible values.


Well, I agree, but I didn't say hard real-time, only performance 
requirements that are hard to achieve because the system is 
large, and becuase it would cost even more if the system was 
larger (slower).


Re: Complexity nomenclature

2015-12-04 Thread Jakob Ovrum via Digitalmars-d

On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
"I just want to insert an element. I don't care how long it 
takes. Why do I need to specify that it should be linear?"


In my opinion, there should be "constantInsert", 
"linearInsert", etc. for those who care about the complexity, 
and "insert" should be always available.


In the current std.container, linearInsert aliases to the 
sublinear insert function when it's provided. If you don't care 
about running time, just use linearInsert.




Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d

On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
Well, I agree, but I didn't say hard real-time, only 
performance requirements that are hard to achieve because the 
system is large, and becuase it would cost even more if the 
system was larger (slower).


Yes, if you are writing for large scalable systems then it is 
interesting to know what the ceiling is, but then you probably 
also want something that is no worse than O(log N). Say for a 
chat service you'd know that the more expensive to develop 
solution can scale to 10 million users (O(1)), and the less 
expensive do develop solution will become very expensive to run 
when you hit 1 million (O(log N)).


So Big-Oh can be useful when you work with very large sizes and 
large variations, but in most cases I think it can be misleading 
without a good description of the actual implementation.


Another very useful measure in real time is the amount of 
variation between repeated calls. Say if there is a guarantee of 
<50% variation, you can measure actual running time and add 50% 
headroom on the hardware requirements. Some algorithms have some 
fixed size tables they need to clear out every once in a while 
and that can be very bad.




Re: Complexity nomenclature

2015-12-04 Thread tn via Digitalmars-d
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
wrote:

On 12/3/15 10:29 PM, Jack Stouffer wrote:
On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
Alexandrescu wrote:

On 12/03/2015 09:10 PM, Idan Arye wrote:
The complexities of the operations is a property of the data 
structure
being used. If each collection type will have it's own set 
of method
names based on the complexity of operations on it, we won't 
be able to
have templated functions that operate on any kind of 
collection(or at
the very least, these functions will be really tedious to 
code).


Your premise is right but you reach the negation of the 
correct

conclusion. -- Andrei


How so? If a singly linked list and a doubly linked list have 
two
different method names for the same operation, then they 
cannot be

easily templated.


Took me a while to figure. There's a hierarchy of operations, 
e.g. if a collection implements insert, it automatically 
implements linearInsert. And so on. The collections framework 
provides these defaults, so client code that needs quick insert 
uses insert, whereas code that's fine with a linear upper bound 
uses linearInsert and captures both.


"I just want to insert an element. I don't care how long it 
takes. Why do I need to specify that it should be linear?"


In my opinion, there should be "constantInsert", "linearInsert", 
etc. for those who care about the complexity, and "insert" should 
be always available.


Re: Complexity nomenclature

2015-12-04 Thread ZombineDev via Digitalmars-d
On Friday, 4 December 2015 at 09:50:15 UTC, Ola Fosheim Grøstad 
wrote:

On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
Well, I agree, but I didn't say hard real-time, only 
performance requirements that are hard to achieve because the 
system is large, and becuase it would cost even more if the 
system was larger (slower).


Yes, if you are writing for large scalable systems then it is 
interesting to know what the ceiling is, but then you probably 
also want something that is no worse than O(log N). Say for a 
chat service you'd know that the more expensive to develop 
solution can scale to 10 million users (O(1)), and the less 
expensive do develop solution will become very expensive to run 
when you hit 1 million (O(log N)).


So Big-Oh can be useful when you work with very large sizes and 
large variations, but in most cases I think it can be 
misleading without a good description of the actual 
implementation.


Another very useful measure in real time is the amount of 
variation between repeated calls. Say if there is a guarantee 
of <50% variation, you can measure actual running time and add 
50% headroom on the hardware requirements. Some algorithms have 
some fixed size tables they need to clear out every once in a 
while and that can be very bad.


I strongly agree with what you said earlier that typical 
complexity analysis is a too naive model for real-world systems. 
Very often (e.g. due to the nature of real hardware) your time 
may be dominated by constants. Or a function which looks like 
O(1) is sometimes O(n), due to hidden complexity. This (the fact 
that constants matter) is especially true when you aim for O(1). 
And yes, variation of execution time is also very important.


However, here we're talking about API for containers and 
algorithms and I think that putting the complexity in the 
function signature is better than only using abstract data 
structures, because it makes you more aware of the consequences 
of your choice of data structures and algorithms that you make.
If we can figure out a way to put even more information it may be 
even better.


Re: Complexity nomenclature

2015-12-04 Thread Jonathan M Davis via Digitalmars-d

On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:
My personal preference would be not to have the complexity in 
the names, as I prefer shorter/concise names. Typically when I 
am writing code using containers of any sort I will check the 
documentation to determine what the cost of the operations I 
need is and base my choice off of that. I would think (hope) 
most others do this too.  However, I don't have a strong 
objection to the what is being proposed.


std.container puts linear and/or stable in some of its names and 
then creates an alias to whichever one makes sense as the default 
where the alias doesn't have linear or stable in the name. e.g. 
linearRemove becomes remove via an alias.


But actually using something like stableLinearRemove in code (or 
stable.linear.remove) becomes hideous _very_ quickly. No one is 
going to like it, and it really only benefits generic code, which 
most container code really isn't (and given how different 
containers tend to be in the complexity of their operations, I 
question that it even makes sense to use containers generically 
in many cases if you actually care about efficiency). Usually the 
genericity is achieved via iterators or ranges, not via the code 
that actually operates on the container.


So, while I applaud Andrei's attempt to make it so that 
containers can be used generically where appropriate (and having 
a consistent API across containers is a big win even if they're 
never used generically), I do fear that attempts to put the 
complexity and stability of the functions into the API are going 
to make it so unwieldy that it's useless (or at least, very 
un-user-friendly). Obviously, we'll have to wait and see what he 
comes up with, but what almost everyone is going to want is 
remove and insert and the like, not stable.linear.insert or 
stableLinearInsert or any variant on that.


- Jonathan M Davis


Re: Complexity nomenclature

2015-12-04 Thread Jonathan M Davis via Digitalmars-d
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad 
wrote:

On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
and space complexity (the correctness is given). Otherwise, 
designing large systems becomes impossible, because all large 
systems have hard performance requirements.


I am sorry to say this, but hard performance requirements 
require O(1) on everything.


Only when dealing with an arbitrary number of elements. O(n^2) 
could be fine if you know for a fact that you're always dealing 
with a very small number of elements. And some algorithms with 
worse complexity are actually faster than algorithms with lower 
complexity when dealing with a small number of elements - 
particularly since with a small number of elements, the 
coefficients can matter a lot more than n. So, really, to know 
which algorithm is appropriate, you need to know how it's 
actually going to be used in the program in question and how 
different algorithms perform there. O(1) is definitely better, 
but it's not necessarily required.


But yes, if you're dealing with an arbitrarily large number of 
elements, anything worse than O(1) isn't going to cut it if you 
have hard performance requirements.


Big-Oh tells you essentially very little if it is more than 
O(1). Quick sort and insertion sort are O(N^2), bucket sort is 
O(N). That does not make bucket sort faster than quick sort or 
even insertion sort on a specific case as it is dominated by 
the number of possible values.


Yeah. Really, Big-Oh tells you something about the worst case 
with an arbitrary number of elements. What actually happens in 
the program depends heavily on the number of elements which are 
actually involved. With large numbers of elements, Big-Oh starts 
mattering, but if the number of elements isn't large, then 
coefficients and minor implementation details of an algorithm 
start mattering a lot more than its conceptual worst case.


Still, in general, it's better to be aware of algorithmic 
complexity and favor algorithms with lower complexity until you 
need to optimize the code based on your specific use case and 
determine that a higher complexity algorithm actually performs 
better in that specific use case.


- Jonathan M Davis


Re: Complexity nomenclature

2015-12-04 Thread tn via Digitalmars-d

On Friday, 4 December 2015 at 09:57:48 UTC, Jakob Ovrum wrote:

On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
"I just want to insert an element. I don't care how long it 
takes. Why do I need to specify that it should be linear?"


In my opinion, there should be "constantInsert", 
"linearInsert", etc. for those who care about the complexity, 
and "insert" should be always available.


In the current std.container, linearInsert aliases to the 
sublinear insert function when it's provided. If you don't care 
about running time, just use linearInsert.


I understand this.

However, my point is that it seems backwards if
1) When I don't care about the complexity, then I need to specify 
one (e.g. linearInsert).
2) When I care and want a constant complexity, then I don't 
specify one (e.g. use insert instead of constantInsert).


In addition, if a new container is added that only provides O(n 
log n) insert, then my "don't care" code that uses linearInsert 
does not support the new container without changes.




Re: Complexity nomenclature

2015-12-04 Thread CraigDillabaugh via Digitalmars-d
On Friday, 4 December 2015 at 13:58:50 UTC, Andrei Alexandrescu 
wrote:

On 12/03/2015 10:46 PM, Walter Bright wrote:

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:

Now this primitive may have three complexities:



Looks exaggerated, innit? The fact of the matter is people 
choose collections based on the complexity of their operations 
all the time. "I need to insert and remove at the front, so 
I'll use a list here." Or: "I need random access, I'll use a 
vector" etc.



Andrei


My personal preference would be not to have the complexity in the 
names, as I prefer shorter/concise names. Typically when I am 
writing code using containers of any sort I will check the 
documentation to determine what the cost of the operations I need 
is and base my choice off of that. I would think (hope) most 
others do this too.  However, I don't have a strong objection to 
the what is being proposed.


Would you have an insertLogarithmic ( insertInverseAckerman :o) 
too?






Re: Complexity nomenclature

2015-12-04 Thread Jonathan M Davis via Digitalmars-d
On Friday, 4 December 2015 at 13:59:41 UTC, Andrei Alexandrescu 
wrote:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:
Also maybe a simpler idea would just be to annotate the the 
operations

with there complexity with UDAs.


Great idea, will keep it in mind. Thanks! -- Andrei


Yeah. If the primary reason to have stable and linear and whatnot 
in the names is so that introspection can be used on them, then 
UDAs will work fine for that. Where it's stickier is if you're 
just calling them in generic code, since then you have no 
protection against the complexity changing when the container 
being used changes. But pretty much no one is going to be wanting 
to use stuff like stableLinearInsert or stable.linear.insert in 
their code instead of insert. So, while having the complexity be 
part of the API has some serious advantages, it's not 
user-friendly for normal use, whereas the UDAs put you somewhere 
in between not having the complexity in the API at all and 
forcing everyone to type out the complexity every time they use a 
function.


- Jonathan M Davis


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 09:50 AM, tn wrote:

However, my point is that it seems backwards if
1) When I don't care about the complexity, then I need to specify one
(e.g. linearInsert).
2) When I care and want a constant complexity, then I don't specify one
(e.g. use insert instead of constantInsert).


When complexity information is not present, the name of the function may 
carry an implicit and documented assumption of complexity bounds. -- Andrei


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 01:05 AM, Tofu Ninja wrote:

When would a generic algorithm even need to know the complexity of the
container?


Only always :o). -- Andrei


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 01:05 AM, Tofu Ninja wrote:

Also maybe a simpler idea would just be to annotate the the operations
with there complexity with UDAs.


Great idea, will keep it in mind. Thanks! -- Andrei


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/03/2015 10:46 PM, Walter Bright wrote:

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection (e.g. c
is an array)

* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the primitives. Or
perhaps
it's okay to conflate a couple of them. I know names are something
we're awfully
good at discussing :o). Destroy!


Are you sure there are only 3 complexities? What if it's a
self-balancing tree?

I'm also not sure it is a good idea to put the complexity in the name of
the primitive. Do any other algorithm libraries do that?


Looks exaggerated, innit? The fact of the matter is people choose 
collections based on the complexity of their operations all the time. "I 
need to insert and remove at the front, so I'll use a list here." Or: "I 
need random access, I'll use a vector" etc.


Not designing nomenclature for complexity leads to the usual design 
disasters such as providing a list with the same indexing operator as an 
array.


Turning that on its head, if we want to allow people to design 
collection-independent code and mix and match collections based only on 
their APIs, those APIs must reflect the complexity of operations.


Collection-independent code is big in my design btw.

One idea is to have the very name reflect the operation. I'm thinking 
"insert" to mean "scoot over elements at the tail and insert the new 
element in the gap" and "link" to mean "create a new node and link it 
within this collection without scooting over anything". Then the 
relative costs will be implied.


Another idea is to only classify operations as "quick" (expected O(1) up 
to O(log n)) and "not quick" (everything worse). Then we only have two 
categories, and "quick" will be in the name of the respective methods.



Andrei



Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 04:51 AM, tn wrote:

"I just want to insert an element. I don't care how long it takes. Why
do I need to specify that it should be linear?"


Oh but you do care.


In my opinion, there should be "constantInsert", "linearInsert", etc.
for those who care about the complexity, and "insert" should be always
available.


What would be the complexity assumption of "insert"?


Andrei


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 03:41 AM, Jakob Ovrum wrote:

On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:

When would a generic algorithm even need to know the complexity of the
container?


A generic algorithm that uses exponential time when given the "wrong"
inputs is a hidden danger. This can easily happen when composing with
linear algorithms. This naming scheme, also employed by std.container,
prevents this.


Also maybe a simpler idea would just be to annotate the the operations
with there complexity with UDAs. That way things that really care
about the complexity can get it, and those who don't can ignore it. It
has the benefit of being self documenting as well.


If you don't care about the complexity, why are you using collections?
Just use T[] and T[U]. Surely choosing a data structure is all about
complexity (barring some details like optimizing for cache locality or
small data structures).


WORD! -- Andrei



Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 02:50 AM, Minas Mina wrote:

I agree -- putting the complexity (eh, running time) to the primitive's
name seems like a bad idea.

I believe that stability is a more important "feature" of a sorting
algorithm than its running time, because you can make implications about
it and use them for your own algorithms (I use it for implementing delta
compression) and affects the output.

Running time is still a good guarantee to have, but if `sort` runs
O(n^2) or O(nlogn) is not going to affect how your output will look like!


Sort is almost always assumed to take n log n time. The problem is when 
you combine simpler operations, e.g. inserting elements at the front in 
a loop.


Andrei



Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 04:27 AM, Andrea Fontana wrote:

And who knows, maybe someone will discover a more efficient way to
implement "insertAfter" or the collection itself... We should change its
name?


Of course. The old name will still work because, remember, there's a 
hierarchy of operations; those that promise less automatically forward 
to those that promise more. -- Andrei




Re: Complexity nomenclature

2015-12-04 Thread tn via Digitalmars-d
On Friday, 4 December 2015 at 14:08:08 UTC, Andrei Alexandrescu 
wrote:

On 12/04/2015 04:51 AM, tn wrote:
"I just want to insert an element. I don't care how long it 
takes. Why

do I need to specify that it should be linear?"


Oh but you do care.


I don't, if I have a small container of constant size (or bounded 
by a small constant). Say, less than 10 elements.


Or perhaps I just want to quickly make some prototype.

And if the other part of my algorithm is exponential or even 
superexponential, then it is going to dominate anyway. Or should 
there be also baseTwoExponentialInsert etc. for the case in which 
I don't care as long as the complexity is at most O(2^n)?


In my opinion, there should be "constantInsert", 
"linearInsert", etc.
for those who care about the complexity, and "insert" should 
be always

available.


What would be the complexity assumption of "insert"?


None. So in a sense that would be an alias of whateverInsert or 
idontcareInsert or infinityInsert.




Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 01:05 AM, Tofu Ninja wrote:

Also maybe a simpler idea would just be to annotate the the operations
with there complexity with UDAs. That way things that really care about
the complexity can get it, and those who don't can ignore it. It has the
benefit of being self documenting as well.


Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450

That's quite promising. The code is very preliminary and uses strings 
for typical complexity values (e.g. "constant", "linear", and later 
"loglinear" etc). I need to see how to integrate this whole idea.


Also an unpleasant problem is overloading - when present, user code 
needs to specify which overload they're referring to.


Anyhow, this is really interesting. Thanks Tofu.


Andrei



Re: Complexity nomenclature

2015-12-04 Thread ref2401 via Digitalmars-d
On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu 
wrote:

Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450

That's quite promising. The code is very preliminary and uses 
strings for typical complexity values (e.g. "constant", 
"linear", and later "loglinear" etc). I need to see how to 
integrate this whole idea.


Also an unpleasant problem is overloading - when present, user 
code needs to specify which overload they're referring to.


Anyhow, this is really interesting. Thanks Tofu.


Andrei


Wow! It looks really great.



Re: Complexity nomenclature

2015-12-04 Thread Ola Fosheim Grøstad via Digitalmars-d
On Friday, 4 December 2015 at 15:33:56 UTC, Jonathan M Davis 
wrote:
Only when dealing with an arbitrary number of elements. O(n^2) 
could be fine if you know for a fact that you're always dealing 
with a very small number of elements.


I think there was a misunderstanding about notation here. If we 
know that we use a small number of elements, then all operations 
are O(1) by definition, for any small constant.


different algorithms perform there. O(1) is definitely better, 
but it's not necessarily required.


Well, O(1) isn't better, it is just mandatory for anything real 
time. E.g: you fix your buffer to 1 seconds of audio and that is 
44100 samples, then filling it is O(44100) == O(1).


If am doing a scanline renderer for 2048 pixels, then an 
insertion sort is faster than merge sort, because most 
linesegments only move a few pixels per scanline. But here the 
problem is that there might be a 0.1% chance that most lines are 
almost horizontal in a variety of angles and then it breaks down, 
so that will be very limiting, but I still can figure out the 
hard maximum time required to complete, so it is still O(1) and 
tractable as a real time operation.


If I accept arbitrary input, then I no longer can meet real time 
requirements, I cannot get to O(1) and therefore cannot put a 
fixed limit on maximum time required to complete.


And that is really the only thing O(1) says: you can put a fixed 
limit on the time needed to complete the operation at compile 
time.


actually involved. With large numbers of elements, Big-Oh 
starts mattering, but if the number of elements isn't large, 
then coefficients and minor implementation details of an 
algorithm start mattering a lot more than its conceptual worst 
case.


Yes, but most collections of entities are not very large by 
nature. Usually, when you try to solve a problem faster you break 
it down by useful segmentation and clusters (like age, kind, 
proximity).


Complexity is mostly for thinking about what options you have 
when designing an algorithm. The cookbook/library approach to 
complexity is rather limited since they were not written for the 
specifics of the problem.


Still, in general, it's better to be aware of algorithmic 
complexity and favor algorithms with lower complexity until you 
need to optimize the code based on your specific use case and 
determine that a higher complexity algorithm actually performs 
better in that specific use case.


Be aware of it, yes. Although in real code it is often best to 
just start out with something flexible like a dynamic array, and 
optimize when you know which operations are required and what 
operations you never will need. That often changes over time.


I find that I am able to solve most of my real time problems with 
arrays. When I need hard requirements, I often end up using very 
simple datastructures like fixed sized circular buffers with a 
carefully calculated headroom (number of mostly unused elements) 
or linked lists of chunks or something really easy to reason 
about.


During initialisation of the program it often does not matter. So 
if I use linear search and it completes in 100ms then fine. I 
don't care if there is a delay of 100ms at startup.


For batch things done in Python, I just use whatever is easy, 
with current machines most things complete in less than a few 
seconds anyway.


Making libraries easy to use is by far the most important 
parameter, I think. What would be cool is if the compiler could 
adjust the implementation of collections based on profiling!




Re: Complexity nomenclature

2015-12-04 Thread Minas Mina via Digitalmars-d
On Friday, 4 December 2015 at 22:48:03 UTC, Andrei Alexandrescu 
wrote:

On 12/04/2015 03:43 PM, Walter Bright wrote:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
Was vaguely terrified reading this whole thread until hitting 
this

gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


"Even better".

http://dpaste.dzfl.pl/50340e189f92

That code is actually remarkably complete, with algebra and 
remarkable constants.


Pretty crazy, yo. Thanks all for the great ideas. D is a 
remarkable language.



Andrei


Looks good :)


Re: Complexity nomenclature

2015-12-04 Thread Timon Gehr via Digitalmars-d

On 12/04/2015 09:03 PM, Andrei Alexandrescu wrote:

>
>Your cases then become:
>
>O(1): expBase=1, polyExp=0, logExp=0;
>O(log n): expBase=1, polyExp=0, logExp=1;
>O((log n)^k): expBase=1, polyExp=0, logExp=k;
>O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
>O(n): expBase=1, polyExp=1, logExp=0;
>O(n * log n): expBase=1, polyExp=1, logExp=1;
>O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
>O(n^k): expBase=1, polyExp=k, logExp=0;
>O(2^n): expBase=2, polyExp=0, logExp=0;
>
>Combinations just work, especially n^(1/2) * log n.
>
>
>

Nice, I'll continue with this. Thanks! I won't use expBase because no
operation leads to it; instead I'll just lump everything superpoly into one
"fast growing" category.



This is probably sensible in the context of std.container.
Note that some containers only give amortized guarantees. E.g. in C++, 
calling push_back n times on a vector that starts out empty is O(n), but 
the worst case for a single push_back is Ω(n).


Another question is how widely applicable BigO should become beyond 
std.container. E.g. some runtime bounds have multiple parameters.




Re: Complexity nomenclature

2015-12-04 Thread Jonathan M Davis via Digitalmars-d

On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:

On 12/4/2015 7:11 AM, Jonathan M Davis wrote:
std.container puts linear and/or stable in some of its names 
and then creates an
alias to whichever one makes sense as the default where the 
alias doesn't have
linear or stable in the name. e.g. linearRemove becomes remove 
via an alias.


Excessive use of aliases is a problem in and of itself - for 
example, trying to use a debugger with it, or examining the 
object code.


Oh, I agree. But if we end up with stuff like stableLinearRemove 
all over the place, it's better to have the aliases than not. 
However, it's far better IMHO to avoid having all of those long 
names in the first place.


I suggested in the pseudo namespaces thread using template 
parameters to express characteristics, as in:


remove!(stable, linear)

with sensible defaults so most of the time the user would just 
use:


remove

Generic code could use the parameters. Another alternative is 
to have attributes for the algorithm:


@stable @linear remove() { ... }

and then generic code could test for those attributes.


Either of those would be better than stableLinearRemove or 
stable.linear.remove. The UDAs would be more documentation 
friendly, though being able to pass around template arguments 
could be valuable for the cases where you're trying to enforce 
specific complexity requirements.


- Jonathan M Davis


Re: Complexity nomenclature

2015-12-04 Thread Walter Bright via Digitalmars-d

On 12/3/2015 10:05 PM, Tofu Ninja wrote:

Also maybe a simpler idea would just be to annotate the the operations with
there complexity with UDAs. That way things that really care about the
complexity can get it, and those who don't can ignore it. It has the benefit of
being self documenting as well.


Dang, you beat me to it!


Re: Complexity nomenclature

2015-12-04 Thread BLM768 via Digitalmars-d

On Friday, 4 December 2015 at 18:21:41 UTC, Walter Bright wrote:
I suggested in the pseudo namespaces thread using template 
parameters to express characteristics, as in:


remove!(stable, linear)

with sensible defaults so most of the time the user would just 
use:


remove


The nice thing about this is that it can be easy to specify which 
complexities an operation supports.


---

void remove(complexity, T)(List!T list, size_t index) 
if(complexity >=

Complexity.linear); //or however complexity objects work...

List l;
//...
l.remove(3);
l.remove!(Complexity.polynomial(2))(3);
l.remove!(Complexity.constant)(3);//fails; there's no template 
specialization for this case because complex < linear.





Re: Complexity nomenclature

2015-12-04 Thread Walter Bright via Digitalmars-d

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:

Was vaguely terrified reading this whole thread until hitting this gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d
I'll get back to this (on the phone) but this is incorrect:

sqrt really belongs under poly, as far as asymptotic behaviour is
concerned.

Fractional powers are sublinear. And sqrt times sqrt is linear which is
important.


Andrei


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d
Timon Gehr  wrote:
> On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:
>> Following up on this, I thought I'd define a little algebra 
>> It will be closed (no arbitrary expressions)
>> ...
> 
> I think that the exponents matter. E.g. n^1.5 vs n^2.
> The algebra can be made more expressive while simplifying its 
> implementation.
> 
>> 
>> The idea is to allow functions to compute their own complexity in terms
>> of complexities of their respective primitives.
>> 
>> Here's the algebra.
>> 
>> Terms:
>> 
>> 1 = O(1)
>> log = O(log n)
>> plog = O((log n)^k)
>> sqrt = O(sqrt n)
>> lin = O(n)
>> linlog = O(n * log n)
>> linplog = O(n * (log n)^k)
>> poly = O(n^k)
>> exp = O(2^n)
> 
> Example alternative:
> 
> struct BigO{
> Fraction logExp;
> Fraction polyExp;
> Fraction expBase;
> 
> bool opCmp(BigO rhs){
> return 
> tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp));
> }
> BigO opBinary(string op:"*")(BigO rhs){
> return 
> BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase);
> }
> // ...
> }
> 
> Your cases then become:
> 
> O(1): expBase=1, polyExp=0, logExp=0;
> O(log n): expBase=1, polyExp=0, logExp=1;
> O((log n)^k): expBase=1, polyExp=0, logExp=k;
> O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
> O(n): expBase=1, polyExp=1, logExp=0;
> O(n * log n): expBase=1, polyExp=1, logExp=1;
> O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
> O(n^k): expBase=1, polyExp=k, logExp=0;
> O(2^n): expBase=2, polyExp=0, logExp=0;
> 
> Combinations just work, especially n^(1/2) * log n.
> 
> 
> 

Nice, I'll continue with this. Thanks! I won't use expBase because no
operation leads to it; instead I'll just lump everything superpoly into one
"fast growing" category.



Re: Complexity nomenclature

2015-12-04 Thread H. S. Teoh via Digitalmars-d
On Fri, Dec 04, 2015 at 05:48:03PM -0500, Andrei Alexandrescu via Digitalmars-d 
wrote:
> On 12/04/2015 03:43 PM, Walter Bright wrote:
> >On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:
> >>Was vaguely terrified reading this whole thread until hitting this
> >>gem. Seems like a creative use for UDA.
> >
> >Yeah, I think it puts UDA on the map!
> >
> >Instead of string arguments, though, I suggest enums.
> 
> "Even better".
> 
> http://dpaste.dzfl.pl/50340e189f92
> 
> That code is actually remarkably complete, with algebra and remarkable
> constants.
> 
> Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
> language.
[...]

Nice, very nice!

But ... opMul? I thought that was deprecated? Shouldn't we be using
opBinary!"*" instead?


T

-- 
Food and laptops don't mix.


Re: Complexity nomenclature

2015-12-04 Thread H. S. Teoh via Digitalmars-d
On Fri, Dec 04, 2015 at 08:03:17PM +, Andrei Alexandrescu via Digitalmars-d 
wrote:
> I'll get back to this (on the phone) but this is incorrect:
> 
> sqrt really belongs under poly, as far as asymptotic behaviour is
> concerned.

OK, I guess by poly you mean n^k for k>1.  I was lumping everything
under polynomial for k>0. The reason is because of the homogeneity for
all values of k>0 when it comes to the algebra of complexities.
Obviously, for performance-measuring purposes, k=1 is a significant
landmark.


> Fractional powers are sublinear.

Wrong, O(n^(4/3)) is a fractional power, but it's not sublinear.


> And sqrt times sqrt is linear which is important.
[...]

But it's just a special case of the general multiplicative->additive
rule. Everyone knows 1/2 + 1/2 = 1; it doesn't seem to warrant special
treatment above, say, 1/3 + 2/3 = 1, or any of the other endless
identities involving 1.


T

-- 
Meat: euphemism for dead animal. -- Flora


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 03:43 PM, Walter Bright wrote:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:

Was vaguely terrified reading this whole thread until hitting this
gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


"Even better".

http://dpaste.dzfl.pl/50340e189f92

That code is actually remarkably complete, with algebra and remarkable 
constants.


Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable 
language.



Andrei


Re: Complexity nomenclature

2015-12-04 Thread Steven Schveighoffer via Digitalmars-d

On 12/4/15 5:48 PM, Andrei Alexandrescu wrote:

On 12/04/2015 03:43 PM, Walter Bright wrote:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:

Was vaguely terrified reading this whole thread until hitting this
gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


"Even better".

http://dpaste.dzfl.pl/50340e189f92

That code is actually remarkably complete, with algebra and remarkable
constants.

Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable
language.


This looks very excellent, nice work!

-Steve



Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 04:40 PM, Timon Gehr wrote:

Another question is how widely applicable BigO should become beyond
std.container.


I'm thinking we could add complexity annotations to range functions. For 
example the complexity of map is linear etc.



E.g. some runtime bounds have multiple parameters.


Yah, I was wondering how necessary that'd be. One problem is how to 
align parameters of different algorithms, for example say one is n * m 
and the other is m log n. What if they swap the order of m and n?


I guess some reasonable convention should take care of this. Otherwise, 
we'll need to use a hashtable mapping names to (exp, logExp) tuples.



Andrei




Re: Complexity nomenclature

2015-12-04 Thread Walter Bright via Digitalmars-d

On 12/4/2015 2:48 PM, Andrei Alexandrescu wrote:

On 12/04/2015 03:43 PM, Walter Bright wrote:

On 12/4/2015 10:49 AM, Dmitry Olshansky wrote:

Was vaguely terrified reading this whole thread until hitting this
gem. Seems
like a creative use for UDA.


Yeah, I think it puts UDA on the map!

Instead of string arguments, though, I suggest enums.


"Even better".

http://dpaste.dzfl.pl/50340e189f92

That code is actually remarkably complete, with algebra and remarkable 
constants.

Pretty crazy, yo. Thanks all for the great ideas. D is a remarkable language.



This is very similar to the si units program Oskar Linde wrote back in 2006.

Anyhow, this is very cool and should:

1. have an article written about it
2. get its own Phobos module



Re: Complexity nomenclature

2015-12-04 Thread BLM768 via Digitalmars-d
On Friday, 4 December 2015 at 22:17:21 UTC, Jonathan M Davis 
wrote:
Either of those would be better than stableLinearRemove or 
stable.linear.remove. The UDAs would be more documentation 
friendly, though being able to pass around template arguments 
could be valuable for the cases where you're trying to enforce 
specific complexity requirements.


It should be possible to unify the template and UDA forms. 
Assuming a template supportsComplexity(func, complexity) that 
determines whether a function or method has the specified 
complexity UDA (or a "faster" one), it's not hard to implement 
something like this:


void removeWithComplexity(T, complexity)(T collection, size_t 
index) if(supportsComplexity!(T.remove, complexity) {

collection.remove(idx);
}

List!int list;
//...
list.removeWithComplexity(Complexity.linear, 3);

---

Of course, this implementation has issues with collection 
operations which are defined as UFCS functions (as opposed to 
methods), but it should be possible to get around that issue.





Re: Complexity nomenclature

2015-12-04 Thread BLM768 via Digitalmars-d

On Saturday, 5 December 2015 at 00:13:02 UTC, BLM768 wrote:

list.removeWithComplexity(Complexity.linear, 3);


Uh, I mean list.removeWithComplexity!(Complexity.linear)(3).



Re: Complexity nomenclature

2015-12-04 Thread Timon Gehr via Digitalmars-d

On 12/05/2015 04:24 AM, Timon Gehr wrote:




If only products should be expressible,


But I think not. O(n+m) is common.


Re: Complexity nomenclature

2015-12-04 Thread Timon Gehr via Digitalmars-d

On 12/04/2015 11:57 PM, Andrei Alexandrescu wrote:

On 12/04/2015 04:40 PM, Timon Gehr wrote:

Another question is how widely applicable BigO should become beyond
std.container.


I'm thinking we could add complexity annotations to range functions. For
example the complexity of map is linear etc.
...


It depends on the mapped function.


E.g. some runtime bounds have multiple parameters.


Yah, I was wondering how necessary that'd be. One problem is how to
align parameters of different algorithms, for example say one is n * m
and the other is m log n. What if they swap the order of m and n?

I guess some reasonable convention should take care of this. Otherwise,
we'll need to use a hashtable mapping names to (exp, logExp) tuples.


Andrei



If only products should be expressible, this would maybe be reasonable 
as well:


void foo(@(BigO.linear) int n,@(BigO.linear) int m);

But UDAs for parameters are not supported.



Re: Complexity nomenclature

2015-12-04 Thread Jack Stouffer via Digitalmars-d
On Friday, 4 December 2015 at 16:06:25 UTC, Andrei Alexandrescu 
wrote:

Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450


This is a great compromise; I (and everyone else probably) would 
love to see a prototype of this when it's ready.


Re: Complexity nomenclature

2015-12-04 Thread Walter Bright via Digitalmars-d

On 12/4/2015 7:11 AM, Jonathan M Davis wrote:

On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:

My personal preference would be not to have the complexity in the names, as I
prefer shorter/concise names. Typically when I am writing code using
containers of any sort I will check the documentation to determine what the
cost of the operations I need is and base my choice off of that. I would think
(hope) most others do this too.  However, I don't have a strong objection to
the what is being proposed.


std.container puts linear and/or stable in some of its names and then creates an
alias to whichever one makes sense as the default where the alias doesn't have
linear or stable in the name. e.g. linearRemove becomes remove via an alias.


Excessive use of aliases is a problem in and of itself - for example, trying to 
use a debugger with it, or examining the object code.




But actually using something like stableLinearRemove in code (or
stable.linear.remove) becomes hideous _very_ quickly. No one is going to like
it,


I agree. Having such long names consisting of repeated use of the same words is 
a problem.


I suggested in the pseudo namespaces thread using template parameters to express 
characteristics, as in:


remove!(stable, linear)

with sensible defaults so most of the time the user would just use:

remove

Generic code could use the parameters. Another alternative is to have attributes 
for the algorithm:


@stable @linear remove() { ... }

and then generic code could test for those attributes.


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 01:37 PM, Andrei Alexandrescu wrote:

| 1   log  plog sqrt  lin  linlog   poly  exp
---+
1  | 1   log  plog sqrt  lin  linlog   poly  exp
log| log plog plog ? linlog   ?poly  exp
plog   | plogplog plog ? linplog  linplog  poly  exp
sqrt   | sqrt??lin   poly poly poly  exp
lin| lin linlog   linplog  poly  poly poly poly  exp
linlog | linlog  linplog  linplog  poly  poly poly poly  exp
poly   | polypoly poly poly  poly poly poly  exp
exp| exp exp  exp  exp   exp  exp  exp   exp

I've left a few question marks for the tricky cases. Ideas?


The question mark at log * linlog is spurious; should be replaced with 
linplog.


So the tricky cases are sqrt * log and sqrt * plog.


Andrei



Re: Complexity nomenclature

2015-12-04 Thread Dmitry Olshansky via Digitalmars-d

On 04-Dec-2015 19:06, Andrei Alexandrescu wrote:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:

Also maybe a simpler idea would just be to annotate the the operations
with there complexity with UDAs. That way things that really care about
the complexity can get it, and those who don't can ignore it. It has the
benefit of being self documenting as well.


Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450


Was vaguely terrified reading this whole thread until hitting this gem. 
Seems like a creative use for UDA.




That's quite promising. The code is very preliminary and uses strings
for typical complexity values (e.g. "constant", "linear", and later
"loglinear" etc). I need to see how to integrate this whole idea.

Also an unpleasant problem is overloading - when present, user code
needs to specify which overload they're referring to.

Anyhow, this is really interesting. Thanks Tofu.


Andrei




--
Dmitry Olshansky


Re: Complexity nomenclature

2015-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:

On 12/04/2015 01:05 AM, Tofu Ninja wrote:

Also maybe a simpler idea would just be to annotate the the operations
with there complexity with UDAs. That way things that really care about
the complexity can get it, and those who don't can ignore it. It has the
benefit of being self documenting as well.


Well look at what the cat dragged in:

http://dpaste.dzfl.pl/711aecacc450


Following up on this, I thought I'd define a little algebra for 
complexities. It will be closed (no arbitrary expressions) and will 
support only the usually encountered complexities.


The idea is to allow functions to compute their own complexity in terms 
of complexities of their respective primitives.


Here's the algebra.

Terms:

1 = O(1)
log = O(log n)
plog = O((log n)^k)
sqrt = O(sqrt n)
lin = O(n)
linlog = O(n * log n)
linplog = O(n * (log n)^k)
poly = O(n^k)
exp = O(2^n)

Ordering:

Terms above are in increasing order.

Summation:

x + y = max(x, y)

Product:

   | 1   log  plog sqrt  lin  linlog   poly  exp 


---+
1  | 1   log  plog sqrt  lin  linlog   poly  exp 


log| log plog plog ? linlog   ?poly  exp
plog   | plogplog plog ? linplog  linplog  poly  exp
sqrt   | sqrt??lin   poly poly poly  exp
lin| lin linlog   linplog  poly  poly poly poly  exp
linlog | linlog  linplog  linplog  poly  poly poly poly  exp
poly   | polypoly poly poly  poly poly poly  exp
exp| exp exp  exp  exp   exp  exp  exp   exp

I've left a few question marks for the tricky cases. Ideas?


Andrei



Re: Complexity nomenclature

2015-12-04 Thread Timon Gehr via Digitalmars-d

On 12/04/2015 07:37 PM, Andrei Alexandrescu wrote:

Following up on this, I thought I'd define a little algebra 
It will be closed (no arbitrary expressions)
...


I think that the exponents matter. E.g. n^1.5 vs n^2.
The algebra can be made more expressive while simplifying its 
implementation.




The idea is to allow functions to compute their own complexity in terms
of complexities of their respective primitives.

Here's the algebra.

Terms:

1 = O(1)
log = O(log n)
plog = O((log n)^k)
sqrt = O(sqrt n)
lin = O(n)
linlog = O(n * log n)
linplog = O(n * (log n)^k)
poly = O(n^k)
exp = O(2^n)


Example alternative:

struct BigO{
Fraction logExp;
Fraction polyExp;
Fraction expBase;

bool opCmp(BigO rhs){
return 
tuple(expBase,polyExp,logExp).opCmp(tuple(rhs.expBase,polyExp,logExp));

}
BigO opBinary(string op:"*")(BigO rhs){
return 
BigO(logExp+rhs.logExp,polyExp+rhs.polyExp,expBase*rhs.expBase);

}
// ...
}

Your cases then become:

O(1): expBase=1, polyExp=0, logExp=0;
O(log n): expBase=1, polyExp=0, logExp=1;
O((log n)^k): expBase=1, polyExp=0, logExp=k;
O(sqrt n): expBase=1, polyExp=Fraction(1,2), logExp=0;
O(n): expBase=1, polyExp=1, logExp=0;
O(n * log n): expBase=1, polyExp=1, logExp=1;
O(n * (log n)^k): expBase=1, polyExp=1, logExp=k;
O(n^k): expBase=1, polyExp=k, logExp=0;
O(2^n): expBase=2, polyExp=0, logExp=0;

Combinations just work, especially n^(1/2) * log n.




Re: Complexity nomenclature

2015-12-04 Thread H. S. Teoh via Digitalmars-d
On Fri, Dec 04, 2015 at 01:37:06PM -0500, Andrei Alexandrescu via Digitalmars-d 
wrote:
> On 12/04/2015 11:06 AM, Andrei Alexandrescu wrote:
[...]
> Here's the algebra.
> 
> Terms:
> 
> 1 = O(1)
> log = O(log n)
> plog = O((log n)^k)
> sqrt = O(sqrt n)
> lin = O(n)
> linlog = O(n * log n)
> linplog = O(n * (log n)^k)
> poly = O(n^k)
> exp = O(2^n)
> 
> Ordering:
> 
> Terms above are in increasing order.
> 
> Summation:
> 
> x + y = max(x, y)
> 
> Product:
> 
>| 1   log  plog sqrt  lin  linlog   poly  exp
> 
> ---+
> 1  | 1   log  plog sqrt  lin  linlog   poly  exp
> 
> log| log plog plog ? linlog   ?poly  exp
> plog   | plogplog plog ? linplog  linplog  poly  exp
> sqrt   | sqrt??lin   poly poly poly  exp
> lin| lin linlog   linplog  poly  poly poly poly  exp
> linlog | linlog  linplog  linplog  poly  poly poly poly  exp
> poly   | polypoly poly poly  poly poly poly  exp
> exp| exp exp  exp  exp   exp  exp  exp   exp
> 
> I've left a few question marks for the tricky cases. Ideas?
[...]

sqrt really belongs under poly, as far as asymptotic behaviour is
concerned. Basically, O(n^j) * O(n^k) for any real j, k > 0 is
asymptotically equal to O(n^(j+k)). Furthermore, O(O(n^j)^k) =
O(n^(j*k)). So you can treat polynomial complexities as a field of real
numbers, where + on the O(...) terms behaves like max(), * behaves like
+, and function composition behaves like ^.

Then exp behaves like an "infinite real", where exp > O(n^k) for all
real k>0. Its inverse, log, therefore, behaves like an "infinitesimal",
where O((log n)^k) for all real k>0 is less than O(n^k) for all real
k>0. (Yes, even O(n^(1/100)) will eventually outgrow O(log n).)

The log powers behave like "intermediate infinitesimals", where you have
O((log n)^j) < O((log n)^k) for all positive real j < k.

So these O-terms behave approximately like real numbers (poly) enhanced
with infinite quantities (exp and its derivations) and infinitesimal
quantities (log and its derivations), they follow the usual laws of
arithmetic. Therefore,

O(n^k) * O(log n)

can be treated as the equivalent of a real number k + an infinitesimal
L, such that k < (k+L) < k+j for all real j>0. In other words,

O(n) < O(n * log n) < O(n^(1+k))  for all k>0.

(Yes, even O(n^1.001) will eventually outgrow O(n*log n). O(log
n) behaves like an infinitesimal.)

The nice thing about this is that you can simplify a lot of complicated
O(...) expressions just by applying algebra as described above on the
analogous {real + infinities + infinitesimals} system. Ordering
relations are preserved (for the most part -- this only breaks down with
pathological cases like O(sin n) which is unlikely to be ever
encountered). Also, function composition between poly and non-poly
complexities will generally be non-commutative, and mostly will break
the + = max analogy. (But it seems unlikely, in real-world algorithms,
to ever need to worry about the intricacies of exponential-time
algorithms, since generally they are to be avoided where possible.)

So you can get a (mostly) closed algebra just by mapping poly to the
real numbers, and then adding:

L = infinitesimal quantity corresponding to O(log n)
E = infinite quantity corresponding to exp

Various operations inside O(...) are thus mapped:

+ inside O(...) = max(...)
* inside O(...) = + between mapped quantities
O(f(g(n))) = O(f(n)) * O(g(n))
O(1) = 0

Example: is O(n^3*log(n)) better than O(n^2*(log(n))^3)?
Answer:
O(n^3*log(n)) maps to 3 + L, and O(n^2*(log(n))^3) maps to 2 +
L^3. Since L^3 is infinitesimal, (2 + L^3) is very close to 2,
whereas (3 + L) lies slightly above 3. Therefore O(n^3*log(n))
is definitely faster-growing than O(n^2*(log(n))^3).

This algebra leads to various interesting questions like:

- Is there an intermediate complexity that lies between poly and exp?
  Yes: since exp is mapped to the infinite quantity E, it follows that
  E/2 is still an infinite quantity (even though it's strictly less than
  E). E/2 corresponds to E*1/2, which is the composition of exp with
  sqrt, so it follows that O(n^k) < O(e^sqrt(n)) < O(e^n) for all real
  k>0. (There are, of course, many other possibilities.)

- Similarly, the log powers O((log n)^k) are *always* slower-growing
  than any polynomial complexity, because L*k, being still
  infinitesimal, will always < j for all real j>0. So even
  O((log n)^1) will not outgrow O(n^1.1). Multiplying with a
  poly function preserves this relationship:

O(n^j * (log n)^k) --> j + L*k < j + m, for all real j,m>0, so
O(n^j * (log n)^k) < O(n^(j+m)) for all j,m>0.

  Basically you're just displacing the mapped values by a 

Re: Complexity nomenclature

2015-12-03 Thread Jack Stouffer via Digitalmars-d
On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu 
wrote:

On 12/03/2015 09:10 PM, Idan Arye wrote:
The complexities of the operations is a property of the data 
structure
being used. If each collection type will have it's own set of 
method
names based on the complexity of operations on it, we won't be 
able to
have templated functions that operate on any kind of 
collection(or at
the very least, these functions will be really tedious to 
code).


Your premise is right but you reach the negation of the correct 
conclusion. -- Andrei


How so? If a singly linked list and a doubly linked list have two 
different method names for the same operation, then they cannot 
be easily templated.


Re: Complexity nomenclature

2015-12-03 Thread Idan Arye via Digitalmars-d
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
I know names are something we're awfully good at discussing 
:o). Destroy!



Andrei


I find it ironic that this thread has moved to discuss how to 
name complexity/running-time...


Re: Complexity nomenclature

2015-12-03 Thread Tofu Ninja via Digitalmars-d
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu 
wrote:

On 12/3/15 10:29 PM, Jack Stouffer wrote:
On Friday, 4 December 2015 at 02:21:12 UTC, Andrei 
Alexandrescu wrote:

On 12/03/2015 09:10 PM, Idan Arye wrote:
The complexities of the operations is a property of the data 
structure
being used. If each collection type will have it's own set 
of method
names based on the complexity of operations on it, we won't 
be able to
have templated functions that operate on any kind of 
collection(or at
the very least, these functions will be really tedious to 
code).


Your premise is right but you reach the negation of the 
correct

conclusion. -- Andrei


How so? If a singly linked list and a doubly linked list have 
two
different method names for the same operation, then they 
cannot be

easily templated.


Took me a while to figure. There's a hierarchy of operations, 
e.g. if a collection implements insert, it automatically 
implements linearInsert. And so on. The collections framework 
provides these defaults, so client code that needs quick insert 
uses insert, whereas code that's fine with a linear upper bound 
uses linearInsert and captures both.


Another way to look at it: in STL container-independent code is 
near impossible because different containers use the same 
signature for operations that are different (either wrt 
iterator invalidation or complexity). My design avoids that by 
giving distinct operations distinct names.



Andrei


This sounds really complicated to use? What are the benefits?
When would a generic algorithm even need to know the complexity 
of the container?


Also maybe a simpler idea would just be to annotate the the 
operations with there complexity with UDAs. That way things that 
really care about the complexity can get it, and those who don't 
can ignore it. It has the benefit of being self documenting as 
well.


Re: Complexity nomenclature

2015-12-03 Thread Timon Gehr via Digitalmars-d

On 12/04/2015 03:36 AM, Andrei Alexandrescu wrote:

On 12/03/2015 09:27 PM, Timon Gehr wrote:

On 12/04/2015 03:19 AM, Andrei Alexandrescu wrote:

On 12/03/2015 08:55 PM, Timon Gehr wrote:

I don't really understand what calling it 'complexity' actually buys
though.


Instant understanding by everyone. -- Andrei


Well, no. "running time" is actually more descriptive.


http://en.cppreference.com/w/cpp/container/forward_list/insert_after

Andrei


http://www.merriam-webster.com/dictionary/literally


Re: Complexity nomenclature

2015-12-03 Thread Walter Bright via Digitalmars-d

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection (e.g. c is an 
array)

* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the primitives. Or perhaps
it's okay to conflate a couple of them. I know names are something we're awfully
good at discussing :o). Destroy!


Are you sure there are only 3 complexities? What if it's a self-balancing tree?

I'm also not sure it is a good idea to put the complexity in the name of the 
primitive. Do any other algorithm libraries do that?




Re: Complexity nomenclature

2015-12-03 Thread Andrei Alexandrescu via Digitalmars-d

On 12/3/15 10:29 PM, Jack Stouffer wrote:

On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:

On 12/03/2015 09:10 PM, Idan Arye wrote:

The complexities of the operations is a property of the data structure
being used. If each collection type will have it's own set of method
names based on the complexity of operations on it, we won't be able to
have templated functions that operate on any kind of collection(or at
the very least, these functions will be really tedious to code).


Your premise is right but you reach the negation of the correct
conclusion. -- Andrei


How so? If a singly linked list and a doubly linked list have two
different method names for the same operation, then they cannot be
easily templated.


Took me a while to figure. There's a hierarchy of operations, e.g. if a 
collection implements insert, it automatically implements linearInsert. 
And so on. The collections framework provides these defaults, so client 
code that needs quick insert uses insert, whereas code that's fine with 
a linear upper bound uses linearInsert and captures both.


Another way to look at it: in STL container-independent code is near 
impossible because different containers use the same signature for 
operations that are different (either wrt iterator invalidation or 
complexity). My design avoids that by giving distinct operations 
distinct names.



Andrei



Re: Complexity nomenclature

2015-12-03 Thread Minas Mina via Digitalmars-d

On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:

On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection 
(e.g. c is an array)


* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the 
primitives. Or perhaps
it's okay to conflate a couple of them. I know names are 
something we're awfully

good at discussing :o). Destroy!


Are you sure there are only 3 complexities? What if it's a 
self-balancing tree?


I'm also not sure it is a good idea to put the complexity in 
the name of the primitive. Do any other algorithm libraries do 
that?


I agree -- putting the complexity (eh, running time) to the 
primitive's name seems like a bad idea.


I believe that stability is a more important "feature" of a 
sorting algorithm than its running time, because you can make 
implications about it and use them for your own algorithms (I use 
it for implementing delta compression) and affects the output.


Running time is still a good guarantee to have, but if `sort` 
runs O(n^2) or O(nlogn) is not going to affect how your output 
will look like!


Re: Complexity nomenclature

2015-12-03 Thread Timon Gehr via Digitalmars-d

On 12/04/2015 02:27 AM, Andrei Alexandrescu wrote:

Consider the collections universe. So we have an imperative primitive like:

c.insertAfter(r, x)

where c is a collection, r is a range previously extracted from c, and x
is a value convertible to collection's element type. The expression
imperatively inserts x in the collection right after r.

Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection (e.g. c is
an array)

* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the primitives. Or
perhaps it's okay to conflate a couple of them. I know names are
something we're awfully good at discussing :o). Destroy!


Andrei


Regarding nomenclature, it would be awesome if we could call this 
"running time" instead of "complexity". Algorithms have running times 
and memory usages. Problems have (time and space) complexities. I know 
that calling running time 'complexity' is awfully popular outside of 
actual complexity theory papers. I don't really understand what calling 
it 'complexity' actually buys though.


Complexity nomenclature

2015-12-03 Thread Andrei Alexandrescu via Digitalmars-d

Consider the collections universe. So we have an imperative primitive like:

c.insertAfter(r, x)

where c is a collection, r is a range previously extracted from c, and x 
is a value convertible to collection's element type. The expression 
imperatively inserts x in the collection right after r.


Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection (e.g. c is 
an array)


* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the primitives. Or 
perhaps it's okay to conflate a couple of them. I know names are 
something we're awfully good at discussing :o). Destroy!



Andrei


Re: Complexity nomenclature

2015-12-03 Thread Jack Stouffer via Digitalmars-d
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
These complexities must be reflected in the name of the 
primitives.


I don't see why. IMO, names should convey what the function does, 
not how it does it. Complexity is usually put in the function 
documentation in Phobos when it's not constant, especially for 
range based ones, I don't see a reason to change that.


Re: Complexity nomenclature

2015-12-03 Thread Idan Arye via Digitalmars-d
On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu 
wrote:
Consider the collections universe. So we have an imperative 
primitive like:


c.insertAfter(r, x)

where c is a collection, r is a range previously extracted from 
c, and x is a value convertible to collection's element type. 
The expression imperatively inserts x in the collection right 
after r.


Now this primitive may have three complexities:

* linear in the length of r (e.g. c is a singly-linked list)

* linear in the number of elements after r in the collection 
(e.g. c is an array)


* constant (c is a doubly-linked list)

These complexities must be reflected in the name of the 
primitives. Or perhaps it's okay to conflate a couple of them. 
I know names are something we're awfully good at discussing 
:o). Destroy!



Andrei


The complexities of the operations is a property of the data 
structure being used. If each collection type will have it's own 
set of method names based on the complexity of operations on it, 
we won't be able to have templated functions that operate on any 
kind of collection(or at the very least, these functions will be 
really tedious to code).


  1   2   >