Possibly, although updating an embedding will likely change every value in
the dataset. That seems to call for file versioning and meta-data about the
process that generated it.
> Thanks, you may mention me as a contributor to the blog post if you'd like!
>
Done ;).
Thanks again,
Joaquin
nd run git diff.
DeltaLake would help here, but again, is seems that it only 'tracks' Spark
operations done directly on the file?
Thanks!
Joaquin
PS. Nick, would you like to be mentioned as a contributor in the blog post?
Your comments helped a lot to improve it ;).
On Tue, Jun 30, 2020 at 6:4
r" will invoke the same code
> paths as the Arrow protocol file reader
>
> - Wes
>
> On Thu, Jun 20, 2019 at 4:12 PM Joaquin Vanschoren
> wrote:
> >
> > Thank you all for your very detailed answers! I also read in other
> threads
> > that the 1.0.0 release m
Thank you all for your very detailed answers! I also read in other threads
that the 1.0.0 release might be coming somewhere this fall? I'm really
looking forward to that.
@Wes: will there be any practical difference between Feather and Arrow
after the 1.0.0 release? It is just an alias? What would
them: https://github.com/apache/arrow/blob/master/site/faq.md
>
> Neal
>
> On Wed, Jun 12, 2019 at 3:39 AM Joaquin Vanschoren <
> joaquin.vanscho...@gmail.com> wrote:
>
> > Dear all,
> >
> > Thanks for creating Arrow! I'm part of OpenML.org, an open sourc
Dear all,
Thanks for creating Arrow! I'm part of OpenML.org, an open source
initiative/platform for sharing machine learning datasets and models. We
are currently storing data in either ARFF or Parquet, but are looking into
whether e.g. Feather or a mix of Feather and Parquet could be the new