Hi all, I am working witn Market Exchange Format (MEX) A quick explanation: it is a method to save high dimensional sparse matrix values into smaller files. It includes 3 files: - rows names with indexes file (name_r1,1) (name_r2,2) - column names and indexes file (name_c1,1) (name_c2,2) - values file: coordinates (indexes) of rows and columns and the value in the matrix. 1,2,5 (value in row 1 col 2 is 5)
The values file is large and zipped to save space. I am reading the values files line by line and filtering out the data that I am interested in based on the coordinate values and save it to a python dataframe. of course, it takes forever. Is there a way to do it with apache beam? I know how to read the values from a zip file and use DoFn to filter the relevant values. The sink part is not clear to me. What will be the best way to sink the data easily? Please let me know what you think Best, -- Eila, Founder & CEO Check out ORT’s new blog <https://www.orielresearch.com/blog> Linkedin <https://www.linkedin.com/company/oriel-research-therapeutics/>
