thiemowmde created this task.
thiemowmde added projects: Wikidata, Lexicographical data, Wikidata-Sprint.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION

With https://gerrit.wikimedia.org/r/345292 I attempted to:

  1. simply remove code that already started hurting us because of easy to confuse parameters, and
  2. work towards a higher goal of having a set of a few very simple, clean, and easy to use constructors.

I believe the frustration in the patch comments are partly caused by me failing to make 2. transparent. I did not expected the need to explain 2. in advance. I imagined 1. to be enough, and 2. only being relevant in future patches.

What I did now is counting all current usages of the Lexeme constructor. I found only one non-empty constructor in production code (in the deserializer). All other constructor calls are exclusively used in tests, with these combinations of parameters:

  • ID only: 9 usages
  • Lemmas only: 3
  • Lemmas + ID: 4
  • Lexical category only: 1
  • Lexical category + ID: 2
  • Language only: 1
  • Language + ID: 2
  • Lexical category + language only: 3
  • Lexical category + language + ID: 7
  • All: 3 usages

I suggest to reduce the default constructor to a single, still optional ItemId $id = null parameter. This covers the two most relevant use-cases. Currently all entity types have a nullable ID parameter as the first one. I find it nice to keep this symmetry, and would even expect it.

We may add one or two more optional parameters to the default constructor, but this is already highly disputable. Which set of parameters would that be?

  • ID + forms + senses might make sense because this is the rough, basic structure of a Lexeme without going into any detail.
  • ID + lemmas appear to make sense because a Lexeme without at least one lemma is useless in production, similar to an Item with no label. But this is barely needed in tests.
  • ID + statements feels like it doesn't make much sense, because forms and senses have statements too.

So I suggest to stick to ID only for now, and possibly add parameters for forms and senses when both are implemented.

I suggest two static constructors with different guarantees:

  • newFromLanguageAndLexicalCategory( ItemId $language, ItemId $lexicalCategory )
  • newFromIdLanguageAndLexicalCategory( ItemId $id, ItemId $language, ItemId $lexicalCategory )

The second one guarantees that the Lexeme is completely initialized and LexemeContent::isValid will consider it valid. The first is the same, but with no ID for convenience in tests. I suggest to not merge these two use-cases into one with an optional ID parameter. Note that no parameter in this suggestion is nullable/optional. Also note that this suggestion includes changing the order of language and lexical category.

I suggest to use setters for everything else, including the one use-case in production.


TASK DETAIL
https://phabricator.wikimedia.org/T162809

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: daniel, Aleksey_WMDE, Jonas, WMDE-leszek, thiemowmde, Aklapper, QZanden, Izno, Wikidata-bugs, aude, Darkdadaah, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to