Michaelcochez added a comment.
An update on the current status, mainly regarding the index file: First, I made a mistake in my response above. The size of the file is a lot smaller than what I wrote above. The binary version is currently around 75mb (and not 1.5gb). Progress: - We implemented serialization using protocol buffers. Initial experiments seem promising. Store and load times appear to be only slightly slower compared to the native format. The on-disk size grew from 75mb to 250mb. However, using compression (bzip2) the protocol buffer version can be queezed into 51mb, so that seems to go well. - While rewriting the serialization code, I noticed that it was hard to maintain that in a separate project. Hence, I integrated a minimal version of the index creation into the codebase. - The previous index creation was using the rdf dump as its datasource. The new version uses the json dump. That has several benefits, mainly with regards to needed preprocessing steps (or rather avoiding them). - The go version has been bumped from 1.17 to 1.18 These changes are not merged into main yet. Development is ongoing in https://github.com/martaannaj/RecommenderServer/tree/protobuffer_serialization The following still needs to be done before merging. - test coverage for the new serialization format - checking whether gokart can be used with the latest go somehow. It does not support the new generics capabilities of go. Most functionality is covered by the other checking tools used. TASK DETAIL https://phabricator.wikimedia.org/T301471 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michaelcochez Cc: akosiaris, QChris, ItamarWMDE, Joe, Aklapper, Addshore, karapayneWMDE, Martaannaj, Michaelcochez, Astuthiodit_1, Arnoldokoth, Invadibot, maantietaja, wkandek, JMeybohm, Akuckartz, Nandana, jijiki, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Eevans, Hardikj, Wikidata-bugs, aude, Sjoerddebruin, Jdforrester-WMF, Mbch331, Jay8g, Dzahn
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
