## PostgreSQL database anonymization and synthetic data generation tool

These releases mark major milestones, significantly expanding Greenmask's 
functionality and transforming it into a simple, extensible, and reliable 
solution for database security, data anonymization, and everyday operations. 
Our goal is to build a core system that serves as the foundation for 
comprehensive dynamic staging environments and robust data security.

These updates introduce new features such as database subsetting, pgzip 
support, restoration in topological order, and refactored transformers, greatly 
enhancing Greenmask's flexibility to meet diverse business needs. They also 
include numerous fixes and improvements.

## Greenmask Overview

Greenmask is a powerful open-source utility that is designed for logical 
database backup dumping, anonymization, synthetic data generation and 
restoration. It is stateless and does not require any changes to your database 
schema. It is designed to be highly customizable and backward-compatible with 
existing PostgreSQL utilities, fast and reliable.

Is perfect for:

* **Backup and Restoration**: Streamline daily tasks like logical backups, 
table restoration after truncation, or replacing pg_dump and pg_restore with 
ease.
* **Anonymization and Data Masking**: Simplify staging environment setup and 
analytical tasks by anonymizing and transforming backups, ensuring consistent, 
secure data for faster

Greenmask on [GitHub](https://github.com/GreenmaskIO/greenmask)

## Notable changes

* PostgreSQL 17 support - revised ported library to support PostgreSQL 17

* [Database Subset](https://docs.greenmask.io/v0.2.5/database_subset/) - a new 
feature that allows you to define a subset of the database,  allowing you to 
scale down the dump size 
([#110](https://github.com/GreenmaskIO/greenmask/issues/110)). This is  robust 
for multipurpose and especially useful for testing and development 
environments. It supports:

    * References with [NULL 
values](https://docs.greenmask.io/v0.2.5/database_subset#references-with-null-values)
 - generate the LEFT JOIN query for the FK reference with NULL values to 
include them in the subset.
    * Supports [virtual 
references](https://docs.greenmask.io/v0.2.5/database_subset#virtual-references)
 (virtual foreign keys) - create a logical      FK in Greenmask that will be 
used for subset dependencies graph. The virtual reference can be defined for a 
column or an expression, allowing you to get the value from JSON and similar.
    * Supports [circular 
references](https://docs.greenmask.io/v0.2.5/database_subset#circular-reference)
 - Greenmask will automatically resolve
      circular dependencies in the subset by generating a recursive query. The 
query is generated with integrity checks of the subset ensuring that the data 
gathered from circular dependencies is consistent.
    * Fully covered with documentation including 
[troubleshooting](https://docs.greenmask.io/v0.2.5/database_subset#troubleshooting)
 and 
[examples](https://docs.greenmask.io/v0.2.5/database_subset#example-dump-a-subset-of-the-database).
    * Supports FK and PK that have more than one column (or expression).
    * **Multi-cycles resolution in one strong connected component (SCC)** is 
supported - Greenmask will generate a recursive query for the SCC whether it is 
a single cycle or multiple cycles, making the subset system universal for any 
database schema.
    * **Supports polymorphic relationships** - You can define a [virtual 
reference for a table with polymorphic 
references](https://docs.greenmask.io/v0.2.5/database_subset/#polymorphic-references)
 using `polymorphic_exprs` attribute and use greenmask to generate a subset for 
such tables.

* [Transformation 
conditions](https://docs.greenmask.io/v0.2.5/built_in_transformers/transformation_condition/)
 -  execute a defined transformation only if a specified condition is met. 
[#133](https://github.com/GreenmaskIO/greenmask/pull/133)
* [Transformation 
inheritance](https://docs.greenmask.io/v0.2.5/built_in_transformers/transformation_inheritance/)
 - transformation inheritance for partitioned tables and tables with foreign 
keys. Define once and apply to all. [#229]
* **pgzip** support for faster 
[compression](https://docs.greenmask.io/v0.2.5/commands/dump#pgzip-compression) 
 and 
[decompression](https://docs.greenmask.io/v0.2.5/commands//restore#pgzip-decompression)
 — setting `--pgzip` can speed up the dump and restoration processes through 
parallel compression. In some tests, it shows up to 5x faster dump and restore 
operations.
* [Restoration in topological 
order](https://docs.greenmask.io/v0.2.5/commands/restore/#restoration-in-topological-order)
 - This flag ensures that dependent tables are not restored until the tables 
they depend on have been restored. This is useful when you want to be notified 
of errors as immediately as possible without waiting for the entire table to be 
restored.
* [Insert 
format](https://docs.greenmask.io/v0.2.5/commands/restore#inserts-and-error-handling)
 restoration - For a flexible restoration process, Greenmask now supports data 
restoration in the `INSERT` format. It generates the insert statements based on 
`COPY` records from the dump. You do not need to re-dump your data to use this 
feature; it can be defined in the `restore` command. The list of new features 
related to the `INSERT` format:

    * Generate `INSERT` statements with the `ON CONFLICT DO NOTHING` clause if 
the flag `--on-conflict-do-nothing` is set.
    * **[Error exclusion 
list](https://docs.greenmask.io/v0.2.5/configuration/#restoration-error-exclusion)**
 in the config to skip certain errors and continue inserting subsequent rows 
from the dump.
    * Use cases - **incremental dump and restoration** for logical data. For 
example, if you have a database, and you want to insert data periodically from 
another source, this can be used together with the database subset and 
transformations to catch up the target database.

* [Restore data 
batching](https://docs.greenmask.io/v0.2.5/commands/restore#restore-data-batching)
 ([#173](https://github.com/GreenmaskIO/greenmask/pull/174)) - By default, the 
COPY protocol returns the error only on transaction commit. To override this 
behavior, use the `--batch-size` flag to specify the number of rows to insert 
in a single batch during the COPY command. This is useful when you want to 
control the transaction size and commit.
* [Introduced](https://github.com/GreenmaskIO/greenmask/pull/162) `keep_null` 
parameter for `RandomPerson` transformer.

* [Introduced dynamic parameters in the 
transformers](https://docs.greenmask.io/v0.2.5/built_in_transformers/dynamic_parameters/)
    * Most transformers now support dynamic parameters where applicable.
    * Dynamic parameters are strictly enforced. If you need to cast values to 
another type, Greenmask provides templates and predefined cast functions 
accessible via `cast_to`. These functions cover frequent operations such as 
`UnixTimestampToDate` and `IntToBool`.
* The transformation logic has been significantly refactored, making 
transformers more customizable and flexible than before.
* [Introduced transformation 
engines](https://docs.greenmask.io/v0.2.5/built_in_transformers/transformation_engines)
    * `random` - generates transformer values based on pseudo-random algorithms.
    * `hash` - generates transformer values using hash functions. Currently, it 
utilizes `sha3` hash functions, which are secure but perform slowly. In the 
stable release, there will be an option to choose between `sha3` and `SipHash`.

* [Introduced static parameters value 
template](https://docs.greenmask.io/v0.2.5/built_in_transformers/parameters_templating)

* [Dumps retention 
management](https://docs.greenmask.io/v0.2.5/commands/delete) - Introduced 
retention parameters 
([#201](https://github.com/GreenmaskIO/greenmask/pull/201)) for the delete 
command. Introduced two new statuses: failed and in progress. A dump is 
considered failed if it lacks a "done" heartbeat or if the last heartbeat 
timestamp exceeds 30 minutes. The delete command now supports the following 
retention parameters:
    * `--dry-run`: Runs the deletion operation in test mode with verbose 
output, without actually deleting anything.
    * `--before-date 2024-08-27T23:50:54+00:00`: Deletes dumps older than the 
specified date. The date must be provided
      in RFC3339Nano format, for example: `2021-01-01T00:00:00Z`.
    * `--retain-recent 10`: Retains the N most recent dumps, where N is 
specified by the user.
    * `--retain-for 1w2d3h4m5s6ms7us8ns`: Retains dumps for the specified 
duration. The format supports weeks (w), days (d), hours (h), minutes (m), 
seconds (s), milliseconds (ms), microseconds (us), and nanoseconds (ns).
    * `--prune-failed`: Prunes (removes) all dumps that have failed.
    * `--prune-unsafe`: Prunes dumps with "unknown-or-failed" statuses. This 
option only works in conjunction with `--prune-failed`.

#### Releases list: 

* [v0.2.0](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.0)
* [v0.2.1](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.1)
* [v0.2.2](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.2)
* [v0.2.3](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.3)
* [v0.2.4](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.4)
* [v0.2.5](https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.5)

## Links

Feel free to reach out to us if you have any questions or need assistance:

* [Greenmask repository](https://github.com/GreenmaskIO/greenmask)
* [Documentation](https://docs.greenmask.io/)
* [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6)
* [Discord](https://discord.gg/tAJegUKSTB)
* [Telegram](https://t.me/greenmask_community)
* [Email](mailto:supp...@greenmask.io)
* [Twitter](https://twitter.com/GreenmaskIO)

Reply via email to