Hey David, You might consider using the source-available tool Deephaven ( https://deephaven.io). It has a rich set of feature; all sorts of joins ( https://deephaven.io/core/docs/conceptual/choose-joins/), the ability to create logical columns which are only materialized when fetched (see https://deephaven.io/core/docs/reference/table-operations/select/update-view/), and much more. It's source-available ( https://github.com/deephaven/deephaven-core), has a rich web-based user interface (optional), and uses Arrow Flight for transport. There's a slack community, which gets attention from the dev team ( https://deephaven.io/slack). You can even try out a live demo with some pre-made queries to get a feel for whether or not the tool is what you're looking for (look for the green `Try Demo` button on the right-side of the website's header).
Nate On Wed, Oct 25, 2023 at 12:00 PM Lee, David (PAG) <[email protected]> wrote: > Here's my ideal use case scenario.. > > Create multiple datasets mapped to different file directories. > Create more datasets by logically generating additional computed columns > using expressions > Create joined dataset by joining datasets > Finally run a Scanner on the joined dataset to start materialization.. > > Pyarrow.Dataset.filter supports adding a @filter, but it doesn't have a > @columns argument. > Pyarrow.Dataset.Scanner supports both @filter and @columns, but I don't > want to create interim copies of data in memory. > > Simplified example: > Give a table that captures local values like 'en-US', 'en-GB', 'fr-CA', > etc.. > I want to use a pyarrow logical expression to split this into language and > country so I end up with: > Language: 'en', 'en', 'fr', .. > Country: 'US', 'GB', 'CA', .. > I then want to join Country to a Country dataset which contains Country > and Country_Name > Language: 'en', 'en', 'fr', .. > Country: 'US', 'GB', 'CA', .. > Country_Name: 'USA', 'Great Britain', 'Cananda', .. > > Basically can a dataset handle "logical" column projection to avoid > physical materialization in memory? > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for > further information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > > © 2023 BlackRock, Inc. All rights reserved. >
