AndrewTavis_WMDE added a comment.
Will check the following with @Manuel later today, but the following are the metrics I'm getting from the `20230717` dated data from `discovery.wikibase_rdf` (note that I don't have access to later ones given permission restrictions that are documented in T342416 <https://phabricator.wikimedia.org/T342416>): get_num_str_with_commas(total_triples) # 15,043,483,216 total_sa_triples = total_sa_direct_triples + total_sa_val_triples + total_sa_ref_triples # 7,188,746,257 + 200,337 + 332,476,964 get_num_str_with_commas(total_sa_triples) # 7,521,423,558 percent_sa_triples = round(total_sa_triples / total_triples * 100, 4) percent_sa_triples # 49.9979 total_only_sa_triples = total_sa_direct_triples + total_only_sa_val_triples + total_only_sa_ref_triples # 7,188,746,257 + 13,651 + 332,466,067 get_num_str_with_commas(total_only_sa_triples) # 7,521,225,975 percent_only_sa_triples = round(total_only_sa_triples / total_triples * 100, 4) percent_only_sa_triples # 49.9966 I did end up using PySpark so I could follow @dcausse's example as well as I could :) Should I upload the finished notebook to people.wikimedia.org? TASK DETAIL https://phabricator.wikimedia.org/T342111 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
