dcausse added a comment.
I suspect that because the `claims` field being an array of complex types it
can potentially be huge and asking to generate its string representation using
`f"{claims}"` might cause excessive mem usage and is I believe a very slow
operation.
I would look into ways to avoid having to serialize it as a string and
iterate over the object representation (I suspect a Row?) to do your filtering
or possibly asking hive to do a lateral view with the mainSnak (which is I
believe what you're looking for?):
select id, claims_ex.mainSnak.property, claims_ex.mainSnak.dataValue
from wmf.wikidata_entity lateral view explode(claims) claims_explode as
claims_ex
where snapshot = '2023-07-24' AND claims_ex.mainSnak.property = 'P31' limit
1;
OK
id property datavalue
Q38488724 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q37619467 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q38738598 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q37797268 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q38708632 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q37781259 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q39051969 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q37373175 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Q38327391 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":5,\"id\":\"Q5\"}"}
Q37598817 P31
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
Then adding yet another filter on `claims_ex.mainSnak.dataValue =
'{"entity-type":"item","numeric-id":13442814,"id":"Q13442814"}'` //should//
work.
If you're looking at all possible places where Q13442814 could be used inside
a claim (not only as the mainSnak then I'm not sure how to do that without
unfolding this deep datastructure.
TASK DETAIL
https://phabricator.wikimedia.org/T342111
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: AndrewTavis_WMDE, dcausse
Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt,
AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1,
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi,
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS,
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]