#15481: Words.__contains__ returns wrong answers
---------------------------------+-------------------------
       Reporter:  ncohen         |        Owner:
           Type:  defect         |       Status:  new
       Priority:  major          |    Milestone:  sage-5.13
      Component:  combinatorics  |   Resolution:
       Keywords:                 |    Merged in:
        Authors:                 |    Reviewers:
Report Upstream:  N/A            |  Work issues:
         Branch:                 |       Commit:
   Dependencies:                 |     Stopgaps:
---------------------------------+-------------------------

Comment (by ncohen):

 Yoooooooooooooooo !!

 > First, I want to say that I think all these problems could be solved or
 get more clean by first finishing the ticket at #12224.

 Wow. And it is only 2 years old !!!

 > It is a big patch, but I am ready to put time on it to continue the
 review. Vincent : what's the status of it? Do you think we will manage to
 get your great improvement in Sage anytime soon?

 Yeah. Like this months ?.. Those are wrong results, guys !

 > The default alphabet when using just `Word('12')` is the set of Python
 object, because we wanted any list to be turned into a word:

 {{{
 sage: Word('12') in Words('12')
 False
 }}}

 Who cares ? Anybody expects to see "True", there. The question is "how",
 not what it should return.

 > {{{
 > sage: Word('12').parent().alphabet()
 > Set of Python objects of type 'object'
 > }}}

 Very cool. Now if it makes Sage return answers like that, it should be
 discarded as a bad design choice.

 > Of course, by default, we could have computed the alphabet from the
 letters appearing in the word, but this is not efficient when the word is
 very long or when the word is infinite.

 Well, I see no problem in having sage raise a "IDontKnowException" when
 asked to tell if an infinite (i.e. lazy) word belongs to a specific set of
 words.

 But once more, I don't care whether something is slow or fast. The default
 behaviour should be mathematically correct, and you are free to add one
 thousand parameters if you want to make it faster at the cost of
 correction. For as long as there are warnings everywhere. Those results
 are mathematically wrong, do we at least agree on that ?

 > So, we decided it is to the user to specify the alphabet.

 Very cool. Now 1) the user has no way to know that this is what he is
 expected to do 2) the answer raised by the default behaviour are still
 wrong.

 Come on guys ! Let's even do that : if the alphabet is not explicitly
 given by the user, you HAVE to return mathematically sound answers. If the
 user gives the alphabet explicitly, then you can do whatever you want (add
 warnings everywhere) to return more complicated and faster results if you
 need. But you can't make word equality depend on something which is NOT
 canonical and NOT explicitly given.

 > In short, the user may specify the alphabet. And this fixes the above
 problem:
 >
 > {{{
 > sage: Word('12', alphabet=['1','2']) in Words('12')
 > True
 > sage: Word('12', alphabet=['1','2', '3']) in Words('123')
 > True
 > }}}

 Guys. You are comparing WORDS. Not pairs "word, alphabet". So compare
 words, unless explicitly asked to do something different.

 > > {{{
 > > sage: Words(2,5).random_element() in Words(2, finite=False)
 > > True
 >
 > I agree, this is a bug: a finite word must not be in a set of infinite
 words.

 Yeah, that's because of the class hierarchy. It's not very easy to work
 with `:-/`

 > I agree this is a bug: it seems the `infinite` argument and sometimes
 the `finite` argument is not taken into account:
 >
 > {{{
 > sage: Words(2,infinite=True)
 > Words over {1, 2}
 > sage: Words(2,infinite=False)
 > Words over {1, 2}
 > sage: Words(2,finite=True)
 > Words over {1, 2}
 > sage: Words(2,finite=False)
 > Infinite Words over {1, 2}
 > }}}

 Note that #15479 fixed something here. Just adding a word to the
 `__repr__` function.

 > > {{{
 > > sage: Words('123')('121212') in Words(2)
 > > False
 >
 > This is OK. The alphabet of integers and of str are considered
 different.

 Oh. Right. Tricky, but right. You can have two words that are displayed as
 "123" but the letters of one are integers, the others are letters. Hmmm...
 Tricky, but right I guess.

 Okay, I continued reading the other answers and we have to make something
 clear : when you say that something is a word, it is a word. It is not "a
 word defined on an alphabet". If you want to have an object defined by
 both a word and an alphabet, this thing has no right to be displayed as
 "word: 123". It has to be displayed as "word: 123 on
 alphabet=['1','2','3']". So you can add whatever class you think are
 useful for you, for sage, for everybody on earth but what you define as a
 word is expected to be a word, and nothing else.
 And equality of words if perfectly defined. And a set of words is
 perfectly define. So there is no problem.

 If you think that the mathematical definitions do not yield sufficiently
 fast algorithm that's a problem indeed, and you can fix it however you
 want for as long as nobody can be led to misunderstand the results given.
 But a word is a sequence of letter, period. A pair "Word, Alphabet" is
 something else. But sage users have no way to guess that they compare
 alphabets when they compare words, and they do not have to expect it
 anyway because comparing words and comparing sets of words is CLEAR. So
 let's make Sage return true answers first, then there will be no problem
 to make things faster.

 > Yes, this is a choice we made. Maybe it is wrong. We decided `w in W` if
 and only if `W is the parent of w`. It makes the code more efficient but
 also less flexible...

 IT MAKES THE RESULTS **INCORRECT**.

 > Testing if some words is in some set of words appears so often in the
 code that we decided to make it like that. I think the `__contains__`
 could be changed to be more flexible because we don't have to use the
 `__contains__` to test for the parent of some words.
 >
 > > {{{
 > > sage: words.FibonacciWord() in Words([0,1,2])
 > > False
 > > }}}
 >
 > Same comment as above: `w in W` if and only if `W is the parent of w`.
 Should this be fixed?

 Give this to a colleague of yours who works on words and does not use
 Sage, ask what he thinks. His answer is what Sage should answer. And it's
 crystal clear.

 > {{{
 > sage: words.FibonacciWord() in Words([0,1])
 > True
 > sage: words.FibonacciWord() in Words([0,1,2])
 > False
 > }}}

 Come on guy... "Words" are sets, and you are telling me that if `A`
 contains `w` and `B` is larger than `A` I can not infer that `B` contains
 `w` ? honestly ?

 > So, in summary, things must be fixed according to
 >
 >  - the behavior of the input `finite=False`, etc. and
 >  - the behavior of `__contains__`.

 Word equality and containment should be word equality and containment. The
 alphabets are NOT involved in any of these things, and Sage should not
 involve them either to decide whether something is true or false. Of
 course you can cache the alphabet USED by a word if you need, or whatever
 suits you to make things faster, but all these operations are defined
 mathematically and you just don't return wrong answers because it makes
 the algorithms faster.

 If you need to have something representing a word and an alphabet, this
 thing's `__repr__` should indicate this information. If only to say "word:
 123 defined on an alphabet" without explicitly giving the alphabet. And no
 "word" will ever be equal to a "word defined on an alphabet".

 > So, what should `__contains__` be?
 >
 > 1. It seems you would like:
 >
 > {{{
 > #!python
 > def __contains__(self, word):
 >     A = self.alphabet()
 >     return all(letter in A for letter in word)
 > }}}
 >
 > This is much too long computations for words we know the alphabet. Also,
 it doesn't work for infinite words. But if it is what the user expect, why
 not?

 Indeed. That's the only mathematical truth. Note that you can cache the
 set of letters used in a word if you think it helps.

 > 2. Maybe something more wise could be:
 >
 > {{{
 > #!python
 > def __contains__(self, word):
 >     A = self.alphabet()
 >     B = word.parent().alphabet()
 >     return B is subset of A
 > }}}
 >
 > This will make code more efficient, but the user must provide an
 alphabet to the word. Thus, this won't work (if the default alphabet of
 Word remains the set of all Python objects):

 That is NOT word containment. That is not even the equality of pairs
 "(word, alphabet)". Right now, you have this :

 {{{
 sage: w1=Words('123')('12')
 sage: w2=Words('12345')('12')
 sage: w1 == w2
 True
 sage: w1 in Words('123')
 True
 sage: w2 in Words('123')
 False
 }}}

 How can you expect ANYBODY to agree with that ?

 > 3. For reference, the actual code is something like this :
 >
 > {{{
 > #!python
 > def __contains__(self, word):
 >     A = self.alphabet()
 >     B = word.parent().alphabet()
 >     return B is A
 > }}}

 I can't even understand how a code like that can be 1) written 2)
 reviewed.

 Nathann

--
Ticket URL: <http://trac.sagemath.org/ticket/15481#comment:3>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to