Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

Terry Kim Tue, 17 Nov 2020 16:46:29 -0800

Hi,

We are happy to announce that Hyperspace v0.3.0 - an indexing subsystem for
Apache Spark™ - has been just released
<https://github.com/microsoft/hyperspace/releases/tag/v0.3.0>!

Here are the some of the highlights:

- Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset
where users can append or delete the source data.
- Hybrid scan: Prior to v0.3.0, any changes in the original dataset
content required a full refresh to make the index usable again,
which could
be a costly operation. With the Hybrid scan, the existing index can be
utilized along with newly appended and/or deleted source files, without
explicit refresh operation. Please check out the Hybrid Scan doc

<https://microsoft.github.io/hyperspace/docs/ug-mutable-dataset/#hybrid-scan>
for
more detail.
- Incremental refresh: v0.3.0 introduces a "incremental" mode to
refresh indexes. In this mode, index files are created only for the newly
appended source files; deleted source files are also handled by removing
them from the existing index files. Please check out the Incremental
Refresh doc

<https://microsoft.github.io/hyperspace/docs/ug-mutable-dataset/#refresh-index---incremental-mode>
for
more detail.
- Optimize index: The number of files for indexes can increase due to
the incremental refreshes, possibly degrading the performance. The new
"optimizeIndex" API optimizes the existing indexes by merging index files
to create an optimal number of files. Please check out the Optimize
Index doc
<https://microsoft.github.io/hyperspace/docs/ug-optimize-index/> for
more detail.

We would like to thank the community for the great feedback and all those
who contributed to this release.

Thanks,
Terry Kim on behalf of the Hyperspace team

Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

Reply via email to