I believe this concludes this investigation. Next step would be creating the actual implementation, and one of the requirements would be to somehow "batch" fetching the data of all lexemes on the page, to minimize the amount of DB querying.

Regarding the estimation of the table size. The very rough but secure estimate would be IMO: the lemma table would be of size 10 * number of all lexemes, and the item reference (language and lexical category) table would have the exact number of rows as the number of existing lexemes.
The more accurate and throughout estimates would be provided when we have the actual DB schema draft (I don't consider the proof-of-concept code to be this), and have it discussed with people with more DB expertise.




