Hey David,

You might consider using the source-available tool Deephaven (
https://deephaven.io). It has a rich set of feature; all sorts of joins (
https://deephaven.io/core/docs/conceptual/choose-joins/), the ability to
create logical columns which are only materialized when fetched (see
https://deephaven.io/core/docs/reference/table-operations/select/update-view/),
and much more. It's source-available (
https://github.com/deephaven/deephaven-core), has a rich web-based user
interface (optional), and uses Arrow Flight for transport. There's a slack
community, which gets attention from the dev team (
https://deephaven.io/slack). You can even try out a live demo with some
pre-made queries to get a feel for whether or not the tool is what you're
looking for (look for the green `Try Demo` button on the right-side of the
website's header).

Nate

On Wed, Oct 25, 2023 at 12:00 PM Lee, David (PAG) <[email protected]>
wrote:

> Here's my ideal use case scenario..
>
> Create multiple datasets mapped to different file directories.
> Create more datasets by logically generating additional computed columns
> using expressions
> Create joined dataset by joining datasets
> Finally run a Scanner on the joined dataset to start materialization..
>
> Pyarrow.Dataset.filter supports adding a @filter, but it doesn't have a
> @columns argument.
> Pyarrow.Dataset.Scanner supports both @filter and @columns, but I don't
> want to create interim copies of data in memory.
>
> Simplified example:
> Give a table that captures local values like 'en-US', 'en-GB', 'fr-CA',
> etc..
> I want to use a pyarrow logical expression to split this into language and
> country so I end up with:
> Language: 'en', 'en', 'fr', ..
> Country: 'US', 'GB', 'CA', ..
> I then want to join Country to a Country dataset which contains Country
> and Country_Name
> Language: 'en', 'en', 'fr', ..
> Country: 'US', 'GB', 'CA', ..
> Country_Name: 'USA', 'Great Britain', 'Cananda', ..
>
> Basically can a dataset handle "logical" column projection to avoid
> physical materialization in memory?
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See
> http://www.blackrock.com/corporate/compliance/email-disclaimers for
> further information.  Please refer to
> http://www.blackrock.com/corporate/compliance/privacy-policy for more
> information about BlackRock’s Privacy Policy.
>
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/about-us/contacts-locations.
>
> © 2023 BlackRock, Inc. All rights reserved.
>

Reply via email to