Yeah I figure this is a browser javascript limitation - anything with access to core zip libraries on a machine should be able to implement this fairly cheaply. I'm surprised that browsers dont provide c++ zip/unzip apis via javascript yet - jszip/pako etc all fall over unzipping > 500mb in my recent investigations (and are slow)
On Fri, 18 Dec 2020 at 04:26, Jacob Quinn <[email protected]> wrote: > Today, I think only C++ (and libraries that bind to it) have compression >> implemented. I think a new PR for java was just opened in the last few >> days. >> > > Note the Julia implementation (Arrow.jl) supports compressing when writing > and decompressing when reading. (Not that it really helps for the > javascript side of things here, but just wanted to point it out as the > Julia code is relatively new to the arrow project). > > On Thu, Dec 17, 2020 at 2:10 PM Andrew Clancy <[email protected]> wrote: > >> Yep - that's where I was expecting it! >> These guys appear to implement decompression using pako: >> https://github.com/usnistgov/jsfive - might be a good route to look >> into. >> >> >> >> On Thu, 17 Dec 2020 at 19:19, Micah Kornfield <[email protected]> >> wrote: >> >>> I don't know the support for the compression codecs in Javascript, but i >>> don't think anyone has attempted to implement them. >>> >>> I couldn't find the compression feature listed on the library status >>> docs [1]. >>> >>> But we should add a line item for it. Today, I think only C++ (and >>> libraries that bind to it) have compression implemented. I think a new PR >>> for java was just opened in the last few days. >>> >>> [1] https://arrow.apache.org/docs/status.html >>> >>> On Thu, Dec 17, 2020 at 10:10 AM Andrew Clancy <[email protected]> wrote: >>> >>>> So, I figured out the issue here - I had to remove compression from the >>>> pyarrow feather.write_feather(compression='uncompressed'). Is there >>>> any way to read a compressed feather file in arrow js? >>>> See the comment under the first answer here: >>>> https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955 >>>> I couldn't find anything in the arrow docs or notebooks on this - I'm >>>> assuming that's related to javascript compression libraries being so >>>> limited. >>>> >>>> >>>> On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have a simple feather file created via a pandas to_feather with a >>>>> datetime64[ns] column, and cannot get timestamps in javascript >>>>> [email protected] >>>>> >>>>> See this notebook: >>>>> https://observablehq.com/@nite/apache-arrow-timestamp-investigation >>>>> >>>>> I'm guessing I'm missing something, has anyone got any suggestions, or >>>>> decent examples of reading a file created in pandas? I've seen in examples >>>>> of [email protected] where dates stored as an array of 2 ints. >>>>> >>>>> File was created with: >>>>> >>>>> import pandas as pd >>>>> pd.read_parquet('sample.parquet') >>>>> df.to_feather('sample-seconds.feather') >>>>> >>>>> Final Q: I'm assuming this is the best place for this question? >>>>> Happy to post elsewhere if there's any other forums, or if this should be >>>>> a >>>>> JIRA ticket? >>>>> >>>>> Thanks! >>>>> Andy >>>>> >>>>
