If you're using open table formats, Delta Lake has the "generated column" feature which supports specifying a formula using other table columns.
https://docs.databricks.com/en/delta/generated-columns.html https://delta.io/blog/2023-04-12-delta-lake-generated-columns/ Cheers, Kevin On Thu, Aug 29, 2024 at 2:04 PM Jacek Pliszka <[email protected]> wrote: > Hi! > > Another option would be converting to an arrow-backed pandas table and > using a dataframe query method. Other libraries like DuckDB most > likely offer similar options. > > BR > > J > > czw., 29 sie 2024 o 02:54 Felipe Oliveira Carvalho > <[email protected]> napisał(a): > > > > You can build `compure::Expression` instances [1] and use them in > different contexts like scanning datasets [2] and producing Substrait plans > [3] that you can execute. > > > > But you have to write your own parser and define the scope and semantics > of the operations you would support. > > > > [1] > https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/expression.h#L45 > > [2] > https://github.com/apache/arrow/blob/main/cpp/examples/arrow/dataset_documentation_example.cc#L266 > > [3] > https://github.com/apache/arrow/blob/main/cpp/src/arrow/engine/substrait/relation.h#L55 > > > > -- > > Felipe > > > > On Wed, Aug 28, 2024 at 1:11 AM Surya Kiran Gullapalli < > [email protected]> wrote: > >> > >> Hello all, > >> Let's say I've a table containing 3 columns 'A', 'B', and 'C'. Is it > possible to create a 4th column 'D' using a formula (like (A+B)/C) ? > >> > >> I know I can manually create them using compute functions, but is it > possible to parse a formula like the above and compute the column on the > fly at runtime ? > >> > >> Any pointers are greatly appreciated. > >> > >> Thanks, > >> Surya >
