Hi,

We are happy to announce that Hyperspace v0.3.0 - an indexing subsystem for
Apache Spark™ - has been just released
<https://github.com/microsoft/hyperspace/releases/tag/v0.3.0>!

Here are the some of the highlights:

   - Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset
   where users can append or delete the source data.
      - Hybrid scan: Prior to v0.3.0, any changes in the original dataset
      content required a full refresh to make the index usable again,
which could
      be a costly operation. With the Hybrid scan, the existing index can be
      utilized along with newly appended and/or deleted source files, without
      explicit refresh operation. Please check out the Hybrid Scan doc
      
<https://microsoft.github.io/hyperspace/docs/ug-mutable-dataset/#hybrid-scan>
for
      more detail.
      - Incremental refresh: v0.3.0 introduces a "incremental" mode to
      refresh indexes. In this mode, index files are created only for the newly
      appended source files; deleted source files are also handled by removing
      them from the existing index files. Please check out the Incremental
      Refresh doc
      
<https://microsoft.github.io/hyperspace/docs/ug-mutable-dataset/#refresh-index---incremental-mode>
for
      more detail.
   - Optimize index: The number of files for indexes can increase due to
   the incremental refreshes, possibly degrading the performance. The new
   "optimizeIndex" API optimizes the existing indexes by merging index files
   to create an optimal number of files. Please check out the Optimize
   Index doc
   <https://microsoft.github.io/hyperspace/docs/ug-optimize-index/> for
   more detail.

We would like to thank the community for the great feedback and all those
who contributed to this release.

Thanks,
Terry Kim on behalf of the Hyperspace team

Reply via email to