Hi all,

I am working witn  Market Exchange Format (MEX)
A quick explanation:
it is a method to save high dimensional sparse matrix values into smaller
files.
It includes 3 files:
- rows names with indexes file (name_r1,1) (name_r2,2)
- column names and indexes file (name_c1,1) (name_c2,2)
- values file: coordinates (indexes) of rows and columns and the value in
the matrix.
1,2,5 (value in row 1 col 2 is 5)

The values file is large and zipped to save space.
I am reading the values files line by line and filtering out the data that
I am interested in based on the coordinate values and save it to a python
dataframe. of course, it takes forever.

Is there a way to do it with apache beam?
I know how to read the values from a zip file and use DoFn to filter the
relevant values. The sink part is not clear to me. What will be the best
way to sink the data easily?

Please let me know what you think

Best,
-- 

Eila, Founder & CEO

Check out ORT’s new blog <https://www.orielresearch.com/blog>

Linkedin  <https://www.linkedin.com/company/oriel-research-therapeutics/>

Reply via email to