dcausse added a comment.

  I suspect that because the `claims` field being an array of complex types it 
can potentially be huge and asking to generate its string representation using 
`f"{claims}"` might cause excessive mem usage and is I believe a very slow 
operation.
  I would look into ways to avoid having to serialize it as a string and 
iterate over the object representation (I suspect a Row?) to do your filtering 
or possibly asking hive to do a lateral view with the mainSnak (which is I 
believe what you're looking for?):
  
    select id, claims_ex.mainSnak.property, claims_ex.mainSnak.dataValue
    from wmf.wikidata_entity lateral view explode(claims) claims_explode as 
claims_ex
    where snapshot = '2023-07-24' AND claims_ex.mainSnak.property = 'P31' limit 
1;
  
    OK
    id  property        datavalue
    Q38488724   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q37619467   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q38738598   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q37797268   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q38708632   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q37781259   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q39051969   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q37373175   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
    Q38327391   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":5,\"id\":\"Q5\"}"}
    Q37598817   P31     
{"typ":"wikibase-entityid","value":"{\"entity-type\":\"item\",\"numeric-id\":13442814,\"id\":\"Q13442814\"}"}
  
  Then adding yet another filter on `claims_ex.mainSnak.dataValue = 
'{"entity-type":"item","numeric-id":13442814,"id":"Q13442814"}'` //should// 
work.
  
  If you're looking at all possible places where Q13442814 could be used inside 
a claim (not only as the mainSnak then I'm not sure how to do that without 
unfolding this deep datastructure.

TASK DETAIL
  https://phabricator.wikimedia.org/T342111

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, dcausse
Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, 
AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to