Hi guys,

I have written a couple of custom UDFS (specifically WEEK() and WEEKYEAR()
to get that date information out of timestamps).

I sampled two queries (on approx. 11 million records in Parquet files)

select count(*) from `table` group by extract(day from `timestamp`)

750ms

select count(*) from `table` group by week(`timestamp`)

2100ms

The code for the WEEK() function is not far from the code from the source
for the EXTRACT(DAY) function.  Furthermore, even if I copy the exact code
for the EXTRACT(DAY) function into that, it has the same performance
detriments.

My question is, why would a UDF be so much slower?  Is this by design or is
there something I'm missing?

Happy to attach the source code of the function if that helps.

Reply via email to