If you're using open table formats, Delta Lake has the "generated column"
feature which supports specifying a formula using other table columns.

https://docs.databricks.com/en/delta/generated-columns.html
https://delta.io/blog/2023-04-12-delta-lake-generated-columns/

Cheers,
Kevin

On Thu, Aug 29, 2024 at 2:04 PM Jacek Pliszka <[email protected]>
wrote:

> Hi!
>
> Another option would be converting to an arrow-backed pandas table and
> using a dataframe query method. Other libraries like DuckDB most
> likely offer similar options.
>
> BR
>
> J
>
> czw., 29 sie 2024 o 02:54 Felipe Oliveira Carvalho
> <[email protected]> napisał(a):
> >
> > You can build `compure::Expression` instances [1] and use them in
> different contexts like scanning datasets [2] and producing Substrait plans
> [3] that you can execute.
> >
> > But you have to write your own parser and define the scope and semantics
> of the operations you would support.
> >
> > [1]
> https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/expression.h#L45
> > [2]
> https://github.com/apache/arrow/blob/main/cpp/examples/arrow/dataset_documentation_example.cc#L266
> > [3]
> https://github.com/apache/arrow/blob/main/cpp/src/arrow/engine/substrait/relation.h#L55
> >
> > --
> > Felipe
> >
> > On Wed, Aug 28, 2024 at 1:11 AM Surya Kiran Gullapalli <
> [email protected]> wrote:
> >>
> >> Hello all,
> >> Let's say I've a table containing 3 columns 'A', 'B', and 'C'. Is it
> possible to create a 4th column 'D' using a formula (like (A+B)/C) ?
> >>
> >> I know I can manually create them using compute functions, but is it
> possible to parse a formula like the above and compute the column on the
> fly at runtime ?
> >>
> >> Any pointers are greatly appreciated.
> >>
> >> Thanks,
> >> Surya
>

Reply via email to