Hi there!

Do you people think that 
https?://(\S+\.)+\S+(/\S*)?<https://regex101.com/?regex=https?://(\S+\.)+\S+(/\S*)?>
  can be improved? I think it can.

For instance, why require the protocol spec? For most use case scenarios (just 
clicking it), it's not useful. Arguably, it's not part of the website.

Also, the regex is lax. That's not a big issue, but a more strict one could 
prevent duplicate urls from being added by bots and data imports. Aside from 
the http/s discussion, constraints like these:

- warn against urls ending in /
- warn against uppercase

would help data imports from external databases or general batch edits with 
bots to avoid adding duplicate values.

Att,
Victor
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to