[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-30 Thread Pfps
Pfps added a comment.


  I don't understand why it was considered necessary to make a breaking  change 
the RDF dump to improve WDQS performance when there is a solution that does not 
make a breaking change to the dump.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pfps
Cc: Multichill, Pfps, Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, 
Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, mkroetzsch, Denny, 
Lucas_Werkmeister_WMDE, Aklapper, dcausse, CBogen, darthmon_wmde, Viztor, 
94rain, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, Snowolf, aude, Tobias1984, Manybubbles, Shizhao, 
Mbch331, Rxy
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-30 Thread dcausse
dcausse added a comment.


  @Multichill the discussion 

 seems to have stalled. Thanks to Peter the pros and cons has been well 
summarized now. I also understand that part of the misunderstanding of this 
change was the lack of clarity on the motivations as to why we require a 
breaking change like that. I hope it had been addressed in the linked 
discussion.
  Do you have additional comments to make here? Thanks!

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Multichill, Pfps, Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, 
Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, mkroetzsch, Denny, 
Lucas_Werkmeister_WMDE, Aklapper, dcausse, CBogen, darthmon_wmde, Viztor, 
94rain, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, Snowolf, aude, Tobias1984, Manybubbles, Shizhao, 
Mbch331, Rxy
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-17 Thread Multichill
Multichill added a comment.


  This needs community consensus before moving forward.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Multichill
Cc: Multichill, Pfps, Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, 
Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, mkroetzsch, Denny, 
Lucas_Werkmeister_WMDE, Aklapper, dcausse, CBogen, darthmon_wmde, Viztor, 
94rain, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, Snowolf, aude, Tobias1984, Manybubbles, Shizhao, 
Mbch331, Rxy
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-17 Thread Pfps
Pfps added a comment.


  I added some technical content on this issue to 
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#Blank_node_deprecation_in_WDQS_&_Wikibase_RDF_model

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pfps
Cc: Pfps, Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, 
Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, 
dcausse, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-17 Thread dcausse
dcausse added a comment.


  In T244341#6064237 , 
@Dipsacus_fullonum wrote:
  
  > Many queries use the optimizer hint `hint:Prior hint:rangeSafe true. ` when 
e.g. comparing date or number values with constants in a filter as suggested at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Fixed_values_and_ranges.
 Is there a risc that such queries will fail or give wrong results when 
`somevalue` become IRI's, and thus the values will be of different types?
  
  I cannot tell for sure, anything that involve query optimization via hints 
are by nature extremely fragile. But I believe that these kind of queries will 
remain as dangerous as they were before the switch.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, 
Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, 
dcausse, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Dipsacus_fullonum
Dipsacus_fullonum added a comment.


  Many queries use the optimizer hint `hint:Prior hint:rangeSafe true. ` when 
e.g. comparing date or number values with constants in a filter as suggested at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Fixed_values_and_ranges.
 Is there a risc that such queries will fail or give wrong results when 
`somevalue` become IRI's, and thus the values will be of different types?

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dipsacus_fullonum
Cc: Mmarx, Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, 
Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, 
dcausse, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Dipsacus_fullonum
Dipsacus_fullonum added a comment.


  In T244341#6062871 , 
@dcausse wrote:
  
  > What we will implement internally for the isSomeValue function won't be 
doing exactly `STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/somevalue/' ) 
` but uses blazegraph vocabulary and inlining facilities, not sure if this 
answers your question though.
  
  Yes, thank you. I was wondering if it is better (faster) to use `isLiteral` 
than `wikibase:isSomeValue` where possible .
  
  BTW also `isNumeric` can be used to test if a value is numeric or a blank 
node, and `lang` can used to test if a value is a monolingual text or a blank 
node. These should also still work.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dipsacus_fullonum
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread dcausse
dcausse added a comment.


  In T244341#6062795 , 
@Dipsacus_fullonum wrote:
  
  > Yes,  `isLiteral` should still work for properties where the real values 
are literals. Without knowing the internal workings of Blazegraph I would guess 
that it is more efficient than `STRSTARTS( STR(?o), 
'http://www.wikidata.org/prop/somevalue/' ) `. Maybe that could be used in some 
way?
  
  What we will implement internally for the isSomeValue function won't be doing 
exactly `STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/somevalue/' ) ` but 
uses blazegraph vocabulary and inlining facilities, not sure if this answers 
your question though.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Dipsacus_fullonum
Dipsacus_fullonum added a comment.


  Yes,  `isLiteral` should still work for properties where the real values are 
literals. Without knowing the internal workings of Blazegraph I would guess 
that it is more efficient than `STRSTARTS( STR(?o), 
'http://www.wikidata.org/prop/somevalue/' ) `. Maybe that could be used in some 
way?

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dipsacus_fullonum
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  `isLiteral` should still work, right? Blank nodes aren’t literals, the 
replacement IRIs won’t be literals either, no change.
  
  `isIRI` and `datatype` is a good point, though – such queries will have to be 
updated.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Dipsacus_fullonum
Dipsacus_fullonum added a comment.


  Here is an example where `isLiteral`is used that a value isn't somevalue: 
https://stackoverflow.com/questions/53102725/make-filtering-people-by-birthyear-and-deathyear-criteria-more-performative-in-s

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dipsacus_fullonum
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread Dipsacus_fullonum
Dipsacus_fullonum added a comment.


  You should be aware that also the functions `isIRI` or `isLiteral` (depending 
on property type) and `datatype` can be used and probably **is** used to test 
if a value is somevalue or a real value.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dipsacus_fullonum
Cc: Dipsacus_fullonum, Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, 
Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, 
CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-03-05 Thread dcausse
dcausse added a comment.


  @Luitzen thanks for bringing this up but I haven't included this in the 
possible solutions because:
  
  - this feature does not seem to be fully integrated/finished/tested, while I 
was able to tell blazegraph to store some specific bnode ids I was never able 
to fully control what the id was.  Sesame did seem to still generate its own id 
depending on the API being used (see 
https://jira.blazegraph.com/browse/BLZG-1915)
  - blank nodes are not allowed in DELETE/DELETE DATA (even in blazegraph with 
this option enable) sparql statements so I fear that low-level blazegraph 
integration would have to be done to take benefit of this option.
  - it's blazegraph specific

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, 
mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-03-04 Thread Luitzen
Luitzen added a comment.


  In order it make it possible to update the graph without querying, you could 
probably adapt/tailor the 
`com.bigdata.rdf.store.AbstractTripleStore.Options.STORE_BLAN‌​K_NODES` 
Blazegraph option.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Luitzen
Cc: Luitzen, VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, 
mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-03-02 Thread VladimirAlexiev
VladimirAlexiev added a comment.


  I've done a lot of work with GLAM data that often includes "unknown" for 
creator.
  Getty ULAN has a whole slew of "unknowns" 
http://vocab.getty.edu/doc/#ULAN_Hierarchy_and_Classes (note: the counts are 
several years old, I imagine there are a few more thousands of those now):
  
  - 500355043 Unidentified Named People includes things like "the master of 
painting X"
  - 500125081 Unknown People by Culture includes things like "unknown Egyptian" 
(to be used in situations like "unknown creator, but Egyptian culture"). We've 
modeled those as gvp:UnknownPersonConcept and groups (schema:Organization) but 
users still think of them as "persons".
  - Further, there are things like "unknown but from the circle of Rembrandt" 
or "unknown but copy after Rembrandt" etc, about 20 varieties of them, see
  
  
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts/Item_structure#Attribution_Qualifiers
 and 
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Attribution_Qualifier
  
  Despite the special value "unknown", actual WD usage shows there are 62k 
creator|author using the item wd:Q4233718 Anonymous: https://w.wiki/JVr.
  
  I think the two special values are unfortunate because:
  
  - they introduce special patterns that someone writing a query needs to cater 
for. Eg I couldn't remember the Novalue syntax to compare the query above to 
one that uses Novalue
  - they don't reflect the real-life complexity needed in some cases
  - they can't be fitted easily in faceted search interfaces or semantic search 
UIs: one needs special coding for these special values.
  
  Coming from CIDOC CRM, I also used to worry about the ontological impurity of 
"makes two unrelated unknown values equal" and "find entities which share the 
same value". But in practical terms, people would like to be able to search for 
"anonymous" and "unknown Egyptian" and are smart enough to understand that even 
if "anonymous" may have the most items in a collection, that doesn't make him 
the most prolific creator of all times.
  
  Cheers!

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: VladimirAlexiev
Cc: VladimirAlexiev, Lea_Lacroix_WMDE, Jheald, Daniel_Mietchen, mkroetzsch, 
Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-19 Thread Jheald
Jheald added a comment.


  @Lucas_Werkmeister_WMDE  The qualifier "stated as" (`p1932`) is currently 
used on 6.6 million statements.  I couldn't get a query to complete to count 
how many of those statements have an object that's a blank node.  My guess 
might be on the order of about 10,000 but that's just a number pulled out of 
the air, not based on anything.  Could be a *lot* more, if this mechanism has 
been used eg for scientific papers with unmatched editors, publishers, etc.
  
  (Maybe it will be easier to count under a new approach?)
  
  The number of cases of “we know the value but can’t represent it” may soon be 
much bigger on Commons though, where the pattern is being used as part of an 
idiom for creators that don't have a Wikidata item, but are known -- including 
creators known only by their wiki user-names.  The number of those cases -- eg 
self-taken pictures, self-made diagrams etc -- would probably go into the 
millions, once it's systematically applied.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jheald
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-19 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Sure: 
https://www.wikidata.org/wiki/Q4115189#Q4115189$7d68afee-408d-1c1e-946b-43d8d37a17b5

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-18 Thread dcausse
dcausse added a comment.


  In T244341#5893723 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > Well, I’d like to see what the IRIs for unknown value in qualifiers and 
references look like before we move ahead with this plan.
  
  Sure, I tried to add some but I'm not sure how I did not find my way in the 
UI, could you try to update the sandbox item so that we can have a look?

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-18 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Well, I’d like to see what the IRIs for unknown value in qualifiers and 
references look like before we move ahead with this plan.
  
  I’m also not yet sold on the rename from “unknown value” to “some value” in 
this more user-facing location. @Jheald, I’m aware that the snak type is 
//also// used to encode “we know the value but can’t represent it”, but do you 
have a source for how common this is?
  
  (Also, the snak type is `somevalue` as one word, so to me `isSomevalue` would 
make more sense than `isSomeValue`.)

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-18 Thread dcausse
dcausse added a comment.


  To move this forward I propose the following plan:
  
  1. add a `wikibase:isSomeValue` custom function configurable to work as a 
proxy to `isBlank()` or  `STRSTARTS( STR(?o), 
'http://www.wikidata.org/prop/somevalue/' )` and announce it
  2. instead of changing the RDF representation generated by wikibase add a new 
option to the updater/munger to transform (on the fly) blank nodes as IRIs 
placeholders
  3. setup a test instance of the query service using this proposal and ask for 
feedback
  4. if no major blockers are encountered we can announce that the RDF 
representation is about to change
  5. start emitting deprecation warnings when seeing `isBlank`
  6. after a deprecation period activate placeholder IRIs everywhere
  7. change the wikibase RDF representation

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-18 Thread dcausse
dcausse added a comment.


  In T244341#5890517 , 
@Lucas_Werkmeister_WMDE wrote:
  
  >> I haven't checked but I hope that at most one blank node can be attached 
to the same subject/predicate, if not this makes the sync algorithm a bit more 
complex.
  >
  > At least currently, this is not the case. I added a second “partner: 
unknown value” statement to the sandbox item 
, and now wd:Q4115189 wdt:P451 ?v 

 produces two blank nodes as result.
  
  Thanks for checking, this makes the diff process and the update query a bit 
more complex as now we need to track the number of blank nodes attached to a 
particular subject/predicate. As for the update query I believe this is still 
possible with:
  
DELETE { ?s ?p ?o }
WHERE {
  SELECT ?s ?p ?o {
wd:Q4115189 wdt:P451 ?o .
FILTER(isBlank(?o))
?s ?p ?o
  } LIMIT 1 # number of blank nodes to keep
}
  
  But overall this makes updating a triple with a blank node a completely 
separate operation that cannot be batched with and like `INSERT DATA` or 
`DELETE DATA`.
  
  > Once we stop using blank nodes for OWL constraints, though, I believe you 
can at least assume that blank nodes are never the subject of a triple – would 
that help? (I feel like this ought to eliminate the need for a full isomorphism 
check from your quote.)
  
  Indeed, this and the fact that for SomeValue all blank nodes are unique, even 
the same statement "SomeValue" used as wdt and ps is different currently 
.
 
  From the point of view of a "simple diff operation" this is a fortunate 
situation as it makes the update process simpler in the scenario we decline 
this task and stick with blank nodes. In the case we decide to move forward 
with IRIs placeholders the object of wdt and ps predicates of the same 
statement will become identical for SomeValue.

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-02-17 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  > I haven't checked but I hope that at most one blank node can be attached to 
the same subject/predicate, if not this makes the sync algorithm a bit more 
complex.
  
  At least currently, this is not the case. I added a second “partner: unknown 
value” statement to the sandbox item , 
and now wd:Q4115189 wdt:P451 ?v 

 produces two blank nodes as result.
  
  Once we stop using blank nodes for OWL constraints, though, I believe you can 
at least assume that blank nodes are never the subject of a triple – would that 
help? (I feel like this ought to eliminate the need for a full isomorphism 
check from your quote.)

TASK DETAIL
  https://phabricator.wikimedia.org/T244341

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, 
Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs