Re: [EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

2012-03-31 Thread Kristofer Munsterhjelm

On 03/31/2012 08:24 AM, Richard Fobes wrote:


So, wrapping up this explanation:

If the Condorcet-Kemeny problem were in the field of encryption, then of
course only an exact solution would be relevant.

But the Condorcet-Kemeny problem is an optimization problem -- or it can
be regarded as a sorting problem -- where the goal is to check various
sequences and find the one (or ones in the case of ties) that move the
biggest pairwise counts into the upper-right triangular area of a
matrix, while moving the smallest pairwise counts into the lower-left
triangular area.


I think the argument against that can be quite easily stated like this:

- If there exists an election profile where exhaustive Kemeny gives a 
different result (winner) than VoteFair, then VoteFair isn't Kemeny.
- On the other hand, if there exists no such profile, then VoteFair 
solves an NP-complete problem in the cryptographic sense.


Hence, either VoteFair isn't Kemeny or it proves that P = NP, which 
would be huge.


Now, if you're saying that VoteFair isn't Kemeny but it's close enough 
for all practical purposes, then that's another matter. Or if VoteFair, 
like my integer linear programming implementation, is exponential time 
but the constants mean the actual runtime doesn't become ugly until you 
have 20-30 candidates, then that would also work.



Speaking of which, I'm still looking forward to Warren supplying a
40-candidate or 50-candidate case (as ballot preferences, not pairwise
counts because they might not correlate with a real ranking scenario)
that he thinks would take a long time to calculate, and I'll be happy to
measure the calculation time. And I'll share the sorted pairwise counts
in matrix form so that anyone can visually verify that the full ranking
sequence is correct, and that if there is a deserving winner then that
candidate is correctly ranked in first place.


You can turn any pairwise matrix into a set of preferences. If every 
pairwise count is even and positive, then you just make n/2 people 
having ABCD (for four candidates) and n/2 have ABDC to get a 
pairwise count, for AB, of n. I think you can do similar things for the 
more general case, and if you have equal-rank and truncation, it becomes 
easier still.


The other problem is that I'm not aware of any trapdoor function for 
Kemeny. That is, while Warren can make a 50-candidate case which takes a 
very long time for say, my ILP implementation, there's no way to tell, 
if you answer oh, the ordering is AWXQRDF..., whether that it is 
actually the optimum. If we had a trapdoor function, that'd let us 
generate a hard Kemeny instance where the optimal ordering is already 
known -- and then you could check if you got the same thing out of VoteFair.


So if VoteFair is a good approximation, it might give something with a 
good Kemeny score (just as Ranked Pairs would, or Schulze), but it 
wouldn't give the optimum - and we'd have no way of knowing, short of 
running the hard instance through Kemeny itself and comparing the result.


(The upshot of which, I guess, is that I really should get off my behind 
and do some Monte Carlo to see if I can find a ballot set where VoteFair 
gives a different winner than something I know finds the Kemeny optimum. 
But I've been busy.)



Election-Methods mailing list - see http://electorama.com/em for list info


Re: [EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

2012-03-31 Thread Jameson Quinn
2012/3/31, Kristofer Munsterhjelm km_el...@lavabit.com:
 On 03/31/2012 08:24 AM, Richard Fobes wrote:

 So, wrapping up this explanation:

 If the Condorcet-Kemeny problem were in the field of encryption, then of
 course only an exact solution would be relevant.

 But the Condorcet-Kemeny problem is an optimization problem -- or it can
 be regarded as a sorting problem -- where the goal is to check various
 sequences and find the one (or ones in the case of ties) that move the
 biggest pairwise counts into the upper-right triangular area of a
 matrix, while moving the smallest pairwise counts into the lower-left
 triangular area.

 I think the argument against that can be quite easily stated like this:

 - If there exists an election profile where exhaustive Kemeny gives a
 different result (winner) than VoteFair, then VoteFair isn't Kemeny.
 - On the other hand, if there exists no such profile, then VoteFair
 solves an NP-complete problem in the cryptographic sense.

 Hence, either VoteFair isn't Kemeny or it proves that P = NP, which
 would be huge.

Clearly, we must assume the former. In that case, we have two
possibilities for implications.

- VoteFair ≠ Kemeny with some unknown random probability p. In that
case, no matter how low p is, we can basically never trust VoteFair to
be Kemeny.
- There is some significant, knowable portion of the domain where
VoteFair = Kemeny. For instance, VoteFair = Kemeny if the Smith set
has fewer than 10 members or something like that. In that case, we
could consider a Smith set of over 10 members to be comparable to a
tie under a normal polytime voting system, and accept that we'll get
essentially random results in that case.

Since K-Y is independent of Smith-dominated alternatives (ISDA), I
think the latter is more likely to be the case. So, Richard: instead
of making verbal arguments about traveling salesman cases or visual
inspection of matrices, your job is to give a rigorous proof that
VoteFair = Kemeny for a Smith set of up to N candidates, where N4
(the largest suspected Smith set from real-world election/polling
data).

Jameson

Election-Methods mailing list - see http://electorama.com/em for list info


[EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

2012-03-04 Thread Richard Fobes
Finally, after reading the articles cited by Warren Smith (listed at the 
bottom of this reply) plus some related articles, I can reply to his 
insistence that Condorcet-Kemeny calculations take too long to 
calculate.  Also, this reply addresses the same claim that appears in 
Wikipedia both in the Kemeny-Young method article and in the 
comparison table within the Wikipedia Voting systems article (in the 
polynomial time column that Markus Schulze added).


One source of confusion is that Warren, and perhaps others, regard the 
Condorcet-Kemeny problem as a decision problem that only has a yes 
or no answer.  This view is suggested by Warren's reference (below and 
in other messages) to the problem as being NP-complete, which only 
applies to decision problems.  Although it is possible to formulate a 
decision problem based on one or more specified characteristics of the 
Condorcet-Kemeny method, that is a different problem than the 
Condorcet-Kemeny problem.


In the real world of elections, the Condorcet-Kemeny problem is to 
calculate a ranking of all choices (e.g. candidates) that maximizes the 
sequence score (or minimizes the Kemeny score).


Clearly the Condorcet-Kemeny problem is an optimization problem, not a 
decision problem (and not a search problem).  It is an optimization 
problem because we have a way to measure how closely the solution 
reaches its goal.


(For contrast, consider the NP-hard subset sum problem in which the 
goal is to determine whether a specified list of integers contains a 
subset that can be added and/or subtracted to yield zero.  Any subset 
either sums to zero or it doesn't sum to zero.  This makes it easy to 
formulate the related decision (yes/no) problem that asks whether such a 
subset exists for a given set of numbers.)


Because the Condorcet-Kemeny problem is an optimization problem, the 
solution to the Condorcet-Kemeny problem can be an approximation.  If 
this approach is used, it becomes relevant to ask how closely the 
approximation reaches the ranking that has the highest sequence score. 
Yet even this question -- of how close? -- is not a decision problem 
(because it goes beyond a yes or no answer).


Keeping in mind that VoteFair popularity ranking calculations are 
mathematically equivalent to the Condorcet-Kemeny method, my claim is 
that VoteFair popularity ranking calculations yield, at the least, the 
same top-ranked choice, and the same few top-ranked choices, as the 
solution produced by examining every sequence score -- except (and this 
is the important part) in cases where the voter preferences are so 
convoluted that any top-ranked choice and any few top-ranked choices 
would be controversial.  As one academic paper elegantly put it: 
garbage in, garbage out.


More specifically, here is a set of claims that more rigorously state 
the above ambiguous claim.


Claim 1: For _some_ _instances_, a polynomial-time calculation can 
identify the full ranking that produces the highest Condorcet-Kemeny 
sequence score.


Claim 2: For _some_ _instances_, a polynomial-time calculation can rank 
the top most-popular candidates/choices and this partial ranking will be 
the same as the top portion of the full ranking as determined by 
identifying the highest Condorcet-Kemeny sequence score.


Claim 3: For the _remaining_ _instances_ (not covered in Claims 1 and 
2), an approximation of the full Condorcet-Kemeny ranking can be 
calculated in polynomial time.


Claim 4: For any cases in which the top-ranked candidate/choice 
according to the VoteFair popularity ranking algorithm differs from the 
top-ranked candidate/choice according to a full calculation of all 
sequence scores, the outcome of a runoff election between the two 
candidates/choices would be difficult to predict.


As done in the academic literature, I am excluding the cases in which 
more than one sequence has the same highest sequence score.


To help clarify the validity of these claims, I'll use an analogy.

Consider a special case of the rigorously studied Traveling Salesman 
Problem (TSP), which is NP-hard to solve.  (The TSP also can be 
expressed as a decision problem, in which case the decision problem is 
NP-complete, but that variation is not the problem discussed here.)


The special case -- which I will refer to as the non-returning Traveling 
Salesman Problem -- is that we want to know which city the salesman 
visits first, and we want to know, with successively less interest, 
which city the salesman visits second, third, and so on.  Additionally, 
for this special case, we specify that the cities to be visited are 
roughly located between a beginning point B and and ending point E.


To make this special case mathematically equivalent to the normal 
Traveling Salesman Problem in which the salesman returns to the starting 
city, we create a path of closely spaced cities (labeled + below) that 
lead back to the starting city B.


Here is a diagram of this problem.  Remember 

Re: [EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

2012-03-04 Thread Warren Smith
On Sun, Mar 4, 2012 at 3:44 PM, Richard Fobes
electionmeth...@votefair.org wrote:
 Finally, after reading the articles cited by Warren Smith (listed at the
 bottom of this reply) plus some related articles, I can reply to his
 insistence that Condorcet-Kemeny calculations take too long to calculate.
  Also, this reply addresses the same claim that appears in Wikipedia both in
 the Kemeny-Young method article and in the comparison table within the
 Wikipedia Voting systems article (in the polynomial time column that
 Markus Schulze added).

 One source of confusion is that Warren, and perhaps others, regard the
 Condorcet-Kemeny problem as a decision problem that only has a yes or
 no answer.  This view is suggested by Warren's reference (below and in
 other messages) to the problem as being NP-complete, which only applies to
 decision problems.  Although it is possible to formulate a decision problem
 based on one or more specified characteristics of the Condorcet-Kemeny
 method, that is a different problem than the Condorcet-Kemeny problem.

--the optimization problem is at least as hard as the decision
problem.You are erroneously creating the impression I somehow
was unaware of this, or that you somehow have here got some new
insight.  Neither is true.



 In the real world of elections, the Condorcet-Kemeny problem is to calculate
 a ranking of all choices (e.g. candidates) that maximizes the sequence score
 (or minimizes the Kemeny score).

 Clearly the Condorcet-Kemeny problem is an optimization problem, not a
 decision problem (and not a search problem).  It is an optimization problem
 because we have a way to measure how closely the solution reaches its goal.

 (For contrast, consider the NP-hard subset sum problem in which the goal
 is to determine whether a specified list of integers contains a subset that
 can be added and/or subtracted to yield zero.  Any subset either sums to
 zero or it doesn't sum to zero.  This makes it easy to formulate the related
 decision (yes/no) problem that asks whether such a subset exists for a given
 set of numbers.)




 Because the Condorcet-Kemeny problem is an optimization problem, the
 solution to the Condorcet-Kemeny problem can be an approximation.  If this
 approach is used, it becomes relevant to ask how closely the approximation
 reaches the ranking that has the highest sequence score. Yet even this
 question -- of how close? -- is not a decision problem (because it goes
 beyond a yes or no answer).

 Keeping in mind that VoteFair popularity ranking calculations are
 mathematically equivalent to the Condorcet-Kemeny method, my claim is that
 VoteFair popularity ranking calculations yield, at the least, the same
 top-ranked choice, and the same few top-ranked choices, as the solution
 produced by examining every sequence score -- except (and this is the
 important part) in cases where the voter preferences are so convoluted that
 any top-ranked choice and any few top-ranked choices would be controversial.
  As one academic paper elegantly put it: garbage in, garbage out.

 More specifically, here is a set of claims that more rigorously state the
 above ambiguous claim.

 Claim 1: For _some_ _instances_, a polynomial-time calculation can identify
 the full ranking that produces the highest Condorcet-Kemeny sequence score.

--oh whoo-whee.   Here's another claim: for SOME planets, I can
readily find a million dollars in gold piled up right next to me.

 Claim 2: For _some_ _instances_, a polynomial-time calculation can rank the
 top most-popular candidates/choices and this partial ranking will be the
 same as the top portion of the full ranking as determined by identifying the
 highest Condorcet-Kemeny sequence score.

 Claim 3: For the _remaining_ _instances_ (not covered in Claims 1 and 2), an
 approximation of the full Condorcet-Kemeny ranking can be calculated in
 polynomial time.

--what kind of approximation?  I can find an approximation to
a million dollars in gold, namely, 1 penny.

 Claim 4: For any cases in which the top-ranked candidate/choice according to
 the VoteFair popularity ranking algorithm differs from the top-ranked
 candidate/choice according to a full calculation of all sequence scores, the
 outcome of a runoff election between the two candidates/choices would be
 difficult to predict.

 As done in the academic literature, I am excluding the cases in which more
 than one sequence has the same highest sequence score.

--I'm not sure what that meant, but it sounds like garbage too.

 To help clarify the validity of these claims, I'll use an analogy.

 Consider a special case of the rigorously studied Traveling Salesman Problem
 (TSP), which is NP-hard to solve.  (The TSP also can be expressed as a
 decision problem, in which case the decision problem is NP-complete, but
 that variation is not the problem discussed here.)

 The special case -- which I will refer to as the non-returning Traveling
 Salesman Problem -- is that we want to 

Re: [EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

2012-03-04 Thread Andrew Myers

On 3/4/12 5:44 PM, Warren Smith wrote:

On Sun, Mar 4, 2012 at 3:44 PM, Richard Fobes
electionmeth...@votefair.org  wrote:

Finally, after reading the articles cited by Warren Smith (listed at the
bottom of this reply) plus some related articles, I can reply to his
insistence that Condorcet-Kemeny calculations take too long to calculate.
  Also, this reply addresses the same claim that appears in Wikipedia both in
the Kemeny-Young method article and in the comparison table within the
Wikipedia Voting systems article (in the polynomial time column that
Markus Schulze added).

One source of confusion is that Warren, and perhaps others, regard the
Condorcet-Kemeny problem as a decision problem that only has a yes or
no answer.  This view is suggested by Warren's reference (below and in
other messages) to the problem as being NP-complete, which only applies to
decision problems.  Although it is possible to formulate a decision problem
based on one or more specified characteristics of the Condorcet-Kemeny
method, that is a different problem than the Condorcet-Kemeny problem.

--the optimization problem is at least as hard as the decision
problem.You are erroneously creating the impression I somehow
was unaware of this, or that you somehow have here got some new
insight.  Neither is true.
I might try to say all this in a more friendly way than Warren does, but 
he is 100% right about all the technical issues here. This is basic 
computer science. Nothing fancy and no judgment calls are involved.


-- Andrew

attachment: andru.vcf
Election-Methods mailing list - see http://electorama.com/em for list info