Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

Imre Samu Thu, 29 Nov 2018 08:23:02 -0800

> More specifically, at OSM that's the only Q-numbers people are aware of.

I would like to share my use case  ( sorry if sometimes is offtopic )

I am:
- member of Wikimédia Magyarország Egyesület (Wikimedia Hungary)
- OSM  meetup organizer
- in my mind:    'Q' == Wikidata ;   'Q'  == Quality      ( but this is a
false associations )
- I have experience working with data warehousing / relational databases

Q/P prefix for me like a https://en.wikipedia.org/wiki/Hungarian_notation

* "Hungarian notation aims to remedy this by providing the programmer with
explicit knowledge of each variable's data type."*
but now I am not sure:
- What is the real meaning of Q/P prefix  ->  Wikidata or Wikibase?

I am involved in some open geodata projects.
#1. adding Wikidata ID concordances to Natural Earth ( this is my work )

https://www.naturalearthdata.com/blog/miscellaneous/natural-earth-v4-1-0-release-notes/
#2. adding Wikidata ID concordances to https://whosonfirst.org/ ( Who's On
First is a gazetteer of places. )
#3. OSM

First time:  I tried SPARQL + Wikidata Query Service
My experience:
- more and more data -> ( like: Q486972, human settlement )  -> more
timeouts  ( in my complex geo queries )
  (a lot of farms imported in the Netherlands area, so I have to limit the
search radius;...   )
- data changes every time, so hard to write and validate complex program
codes.
After a few months, I have learned that for heavy data users the  Wikidata
Query Service sometimes not perfect. ( but good for light queries ! )

So now I am loading "Wikidata JSON dump"  to Postgres/PostGIS database -
and I am writing complex codes in SQL
My codes are very complex codes ( jaro_winkler distance, geo distance,
detecting Cebuno imports ; ranking multiple candidates for matching ) ;
And finally I can control the performance of the system  ( not timeout
) and I have reproducible results.

for example:  my simple SQL example code  - you can see lot of P/Q codes
inside ,
and you can expect -  now I am know lot of Q/P codes by heart !
select
wd_id
,wd_label
,get_wdcqv_globecoordinate(data,'P625','P518','Q1233637') as river_mouth
,get_wdcqv_globecoordinate(data,'P625','P518','Q7376362') as river_source
from wd.wdx
where wd_id='Q626';

And now the  "Natural Earth" tables  looks like this  ( relational database
)
+-------------+------------+-----------+
|    name     | wikidataid | iata_code |
+-------------+------------+-----------+
| Birsa Munda | Q598231    | IXR       |
| Barnaul     | Q1858312   | BAX       |
| Bareilly    | Q2788745   |           |

this is my current workflow.

But my real nightmare will start - if other databases start using Q/P
prefix:
for example, other Airport related databases start using Wikibase - with Q
codes
-  http://ourairports.com/   ;
-  https://www.flightradar24.com/data/airports
-  https://www.airnav.com/airports/

So every airport have at least  4 different Q codes!
And in the future, I have to check errors in this spreadsheet ( and
sometimes I don't see the header )
+-------------+------------+-----------+-------------+-----------+-----------+
|    name     | wikidataid | iata_code | ourairports | flightR24 | AirNav
  |
+-------------+------------+-----------+-------------+-----------+-----------+
| Birsa Munda | Q598231    | IXR       |  Q325324    | Q973      | Q1
  |
| Barnaul     | Q1858312   | BAX       |  Q42        | Q1        | Q8312
 |
| Bareilly    | Q2788745   |           |  Q1         | Q31       | Q45
 |

Q1 - everywhere - with different meanings

And what if some users want to add the new airport ID-s  back to the
wikidata (  linking databases )  Why not
so in the future, If I check the https://www.wikidata.org/wiki/Q598231
I will see a lot of different Q codes:
  Ourairports        Q325324
 FlightR24   Q973
 AirNav      Q1

And sometimes very hard to communicate for the new contributors that
Q1(AirNav) =/= Q1(Wikidata)

If I see any database/spreadsheet.
- and I see a Q code - My current expectations that this is a Wikidata
code.   :)
Just check:  https://github.com/search?q=Q28+hungary&type=Code

So my current opinion:
- please don't use Q/P prefixes in any new/other databases!

for me, unlearning a lot of Q/P values is hard,
so as I have more-and-more experience in Wikidata data model - I would like
less-and-less using any other Wikibase systems with similar Q/P prefixes.

My other pain point is the "Wikidata JSON dump" ,  a little more
information would be a big help for me:

for detecting data quality of items:
- last modification DateTime
- last modification user type ( anonym_user,  new_user,  experienced_user,
bot )
- edit counts by user type , for example:  { anonym_user=2 ,  new_user=0 ,
experienced_user=0,  bot=15 }
Info about wikidata life cycle
- Wikidata redirections / deletions   (  now: only in the .ttl files )

I know - I am not a typical user ...  and my problems, not a priority yet,

imho:

Integrating Wikidata iDs to other databases have already started ( OSM,
Natural Earth, Who's On First ,  ... )
and need some guideline/support for this cases - before too late.
Probably the current practice ( OSM, Natural Earth, Who's On First ,  ...
)  is not optimal.
A few months ago - I have learned an extremely painful lesson:
https://phabricator.wikimedia.org/T202676#4533486
quote>>>

*- "Q" does not mean "wikidata.org <http://wikidata.org>". It means "item"
and is used by all Wikibase installations so far.*
*- "Retroactively "reserving" the letter "Q" to be exclusively used by
wikidata.org <http://wikidata.org> can't work. It was never meant to be
like this, and there is no mechanism for this."-  *

*-  "Q" only means "wikidata.org <http://wikidata.org>" to users who know
about wikidata.org <http://wikidata.org>. These users should not have a
problem understanding that the moment an OSM Wikibase installation exists,
"osm:Q1" refers to this installation.*

<<<<quote

so now I am totally confused.

probably, my current practice is a "bad practice" ?   :(
And the  "Natural Earth"  wikidata integrations should add a "wd:" prefix
everywhere?,
but maybe it is too late to change
+-------------+---------------+-----------+
|    name     |   wikidataid  | iata_code |
+-------------+---------------+-----------+
| Birsa Munda | wd:Q598231    | IXR       |
| Barnaul     | wd:Q1858312   | BAX       |
| Bareilly    | wd:Q2788745   |           |

this is my retrospective,   thank you for reading.

best,
  Imre

Yuri Astrakhan <[email protected]> ezt írta (időpont: 2018. nov. 29.,
Cs, 7:17):

> On Thu, Nov 29, 2018 at 12:51 AM Federico Leva (Nemo) <[email protected]>
> wrote:
>
>> Yuri Astrakhan, 29/11/18 04:14:
>> > The "Q" prefix has a strong identity in itself.  Anyone will instantly
>> > say - yes, it's a Wikidata identifier
>>
>> But that's because most people only know one Wikibase installation, not
>> the other way around.
>>
>
> Of course! More specifically, at OSM that's the only Q-numbers people are
> aware of. All other ID systems do not have nearly the same level of
> recognition.  It would be silly to wait for government agencies to switch
> to the Q-numbers too, right?  Or to wait for 5-10 years until (and IF!) Q
> numbers become more common at other projects that are large enough to
> become well known, and use that potential future as a justification to not
> use a much more convenient system for the next 10 years.  The cost of that
> 10 years of "wait and see" is a significant user confusion.
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

Reply via email to