[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081350#comment-17081350 ] ASF GitHub Bot commented on DRILL-7668: --- asfgit commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076622#comment-17076622 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-610011770 Thanks @vvysotskyi for the review. Commits squashed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076600#comment-17076600 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r404339305 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,86 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +TimeStampHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long groupByInterval = interval.value; + + java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval)); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076058#comment-17076058 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r403849368 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,86 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +TimeStampHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long groupByInterval = interval.value; + + java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval)); Review comment: Looks like creating `Instant` here may be also omitted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076000#comment-17076000 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-609528521 @vvysotskyi Thanks for the review. I made the requested changes to the PR. If this is ok, I'll squash commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075995#comment-17075995 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r403792240 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,88 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +TimeStampHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long groupByInterval = interval.value; + + java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval)); + java.time.LocalDateTime localDate = instant.atZone(java.time.ZoneId.of("UTC")).toLocalDateTime(); + + out.value = localDate.atZone(java.time.ZoneId.of("UTC")).toInstant().toEpochMilli(); Review comment: Nope. Not sure why I did that, but I simplified and tested this with a few different timezones on my computer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075994#comment-17075994 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r403792155 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestTimeBucketFunction.java ## @@ -92,6 +94,28 @@ public void testTimeBucket() throws Exception { .go(); } + @Test + public void testDoubleTimeBucket() throws Exception { +String query = "SELECT time_bucket(CAST(1451606760 AS DOUBLE), 30) AS high FROM (values(1))"; +testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("high") + .baselineValues(145140L) + .go(); + } + + @Test + public void testTimeBucketTimestamp() throws Exception { +String query = "SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 30) AS high FROM (values(1))"; Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075889#comment-17075889 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r403733469 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestTimeBucketFunction.java ## @@ -92,6 +94,28 @@ public void testTimeBucket() throws Exception { .go(); } + @Test + public void testDoubleTimeBucket() throws Exception { +String query = "SELECT time_bucket(CAST(1451606760 AS DOUBLE), 30) AS high FROM (values(1))"; +testBuilder() + .sqlQuery(query) + .ordered() + .baselineColumns("high") + .baselineValues(145140L) + .go(); + } + + @Test + public void testTimeBucketTimestamp() throws Exception { +String query = "SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 30) AS high FROM (values(1))"; Review comment: Could you please replace it with string representation, it looks more friendly and helps to understand the source value: ```suggestion String query = "SELECT time_bucket(timestamp '2020-03-27 01:33:53.845', 30) AS high"; ``` Also, there is no need to specify `FROM (values(1))` if it is not used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075873#comment-17075873 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-609444297 @cgivre, @arina-ielchiieva, sorry for the delay, I have missed a letter about this PR earlier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075872#comment-17075872 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r403725422 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,88 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +TimeStampHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long groupByInterval = interval.value; + + java.time.Instant instant = java.time.Instant.ofEpochMilli(timestamp - (timestamp % groupByInterval)); + java.time.LocalDateTime localDate = instant.atZone(java.time.ZoneId.of("UTC")).toLocalDateTime(); + + out.value = localDate.atZone(java.time.ZoneId.of("UTC")).toInstant().toEpochMilli(); Review comment: @cgivre, could you please explain, what happens here? Initially, you calculate the required milliseconds, after that creates `Instant` instance based on that, converts it to `LocalDateTime` at `UTC` timezone, converts it to `ZonedDateTime`, converts it to `Instant` and after that converts back to milliseconds. Are all these transformations required? Usually, UDF shouldn't apply timezone to the values they handle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075813#comment-17075813 ] ASF GitHub Bot commented on DRILL-7668: --- arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-609407971 @vvysotskyi is this PR ready to be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071899#comment-17071899 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-606700977 @vvysotskyi Thanks for checking this. I made a small fix and ran the unit tests on my local machine with the system set to different timezones, and they passed. Please let me know if this solves the issue and I'll squash commits. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071545#comment-17071545 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-606196720 @cgivre, this pull request introduces unit tests failure: ``` [ERROR] org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp Time elapsed: 2.584 s <<< ERROR! java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) Expected Records near verification failure: Record Number: 0 { `high` : 2020-03-27T01:30, } Actual Records near verification failure: Record Number: 0 { `high` : 2020-03-26T18:30, } For query: SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 30) AS high FROM (values(1)) at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) Expected Records near verification failure: Record Number: 0 { `high` : 2020-03-27T01:30, } Actual Records near verification failure: Record Number: 0 { `high` : 2020-03-26T18:30, } at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) ``` Looks like it may be timezone-dependent failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071216#comment-17071216 ] ASF GitHub Bot commented on DRILL-7668: --- vvysotskyi commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-606196720 @cgivre, this pull request introduces unit tests failure: ``` [ERROR] org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp Time elapsed: 2.584 s <<< ERROR! java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) Expected Records near verification failure: Record Number: 0 { `high` : 2020-03-27T01:30, } Actual Records near verification failure: Record Number: 0 { `high` : 2020-03-26T18:30, } For query: SELECT time_bucket(CAST(1585272833845 AS TIMESTAMP), 30) AS high FROM (values(1)) at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) Expected Records near verification failure: Record Number: 0 { `high` : 2020-03-27T01:30, } Actual Records near verification failure: Record Number: 0 { `high` : 2020-03-26T18:30, } at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) Caused by: java.lang.Exception: at position 0 column '`high`' mismatched values, expected: 2020-03-27T01:30(LocalDateTime) but received 2020-03-26T18:30(LocalDateTime) at org.apache.drill.exec.udfs.TestTimeBucketFunction.testTimeBucketTimestamp(TestTimeBucketFunction.java:116) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Labels: ready-to-commit > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068681#comment-17068681 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-604990458 @paul-rogers Thank you very much for the review. @arina-ielchiieva Commits are squashed. We should be ready to commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068653#comment-17068653 ] ASF GitHub Bot commented on DRILL-7668: --- arina-ielchiieva commented on issue #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#issuecomment-604981406 @cgivre please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068192#comment-17068192 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398987931 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,85 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068186#comment-17068186 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398982654 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,85 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; Review comment: I was thinking the same thing actually... of course right after I submitted. I'll convert to TS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068136#comment-17068136 ] ASF GitHub Bot commented on DRILL-7668: --- paul-rogers commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398954740 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -97,9 +99,85 @@ public void eval() { long timestamp = inputDate.value; // Get the interval in milliseconds - long intervalToAdd = interval.value; + long groupByInterval = interval.value; - out.value = timestamp - (timestamp % intervalToAdd); + out.value = timestamp - (timestamp % groupByInterval); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; Review comment: Sorry; I wonder if the `TimeStamp` version should emit a timestamp. The interval is a valid ms-since-the-epoch number; seems to make sense to truncate a TS to another TS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067926#comment-17067926 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398790952 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -102,4 +104,80 @@ public void eval() { out.value = timestamp - (timestamp % intervalToAdd); } } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long intervalToAdd = interval.value; + + out.value = timestamp - (timestamp % intervalToAdd); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class DoubleTimeBucketFunction implements DrillSimpleFunc { + +@Param +Float8Holder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = (long)inputDate.value; Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067928#comment-17067928 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398791038 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -102,4 +104,80 @@ public void eval() { out.value = timestamp - (timestamp % intervalToAdd); } } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long intervalToAdd = interval.value; Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067842#comment-17067842 ] ASF GitHub Bot commented on DRILL-7668: --- paul-rogers commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398731150 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -102,4 +104,80 @@ public void eval() { out.value = timestamp - (timestamp % intervalToAdd); } } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long intervalToAdd = interval.value; Review comment: This is not an interval to "add" so much as a group-by interval. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067841#comment-17067841 ] ASF GitHub Bot commented on DRILL-7668: --- paul-rogers commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040#discussion_r398732279 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/TimeBucketFunctions.java ## @@ -102,4 +104,80 @@ public void eval() { out.value = timestamp - (timestamp % intervalToAdd); } } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class TimestampTimeBucketFunction implements DrillSimpleFunc { + +@Param +TimeStampHolder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = inputDate.value; + + // Get the interval in milliseconds + long intervalToAdd = interval.value; + + out.value = timestamp - (timestamp % intervalToAdd); +} + } + + /** + * This function is used for facilitating time series analysis by creating buckets of time intervals. See + * https://blog.timescale.com/blog/simplified-time-series-analytics-using-the-time_bucket-function/ for usage. The function takes two arguments: + * 1. The timestamp (as a Drill timestamp) + * 2. The desired bucket interval IN milliseconds + * + * The function returns a BIGINT of the nearest time bucket. + */ + @FunctionTemplate(name = "time_bucket", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class DoubleTimeBucketFunction implements DrillSimpleFunc { + +@Param +Float8Holder inputDate; + +@Param +BigIntHolder interval; + +@Output +BigIntHolder out; + +@Override +public void setup() { +} + +@Override +public void eval() { + // Get the timestamp in milliseconds + long timestamp = (long)inputDate.value; Review comment: `Math.round(inputDate.value)`. We would want 4. to be treated as 5. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7668) Allow Time Bucket Function to Accept Floats and Timestamps
[ https://issues.apache.org/jira/browse/DRILL-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067688#comment-17067688 ] ASF GitHub Bot commented on DRILL-7668: --- cgivre commented on pull request #2040: DRILL-7668: Allow Time Bucket Function to Accept Floats and Timestamps URL: https://github.com/apache/drill/pull/2040 # [DRILL-7668](https://issues.apache.org/jira/browse/DRILL-7668): Allow Time Bucket Function to Accept Floats and Timestamps ## Description Drill has a function `time_bucket()` which facilitates time series analysis. This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. Floats are typically not used for timestamps, however in the event that the data is coming from imperfect files, the numbers may be read as floats and hence require casting in queries. This PR makes this easier. ## Documentation `time_bucket()` function now will accept a `timestamp` or `float8` as an argument for the time. ## Testing Added two unit tests for the new data types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow Time Bucket Function to Accept Floats and Timestamps > -- > > Key: DRILL-7668 > URL: https://issues.apache.org/jira/browse/DRILL-7668 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 1.18.0 > > > Drill has a function `time_bucket()` which facilitates time series analysis. > This PR extends this function to accept `FLOAT8` and `TIMESTAMPS` as input. > Floats are typically not used for timestamps, however in the event that the > data is coming from imperfect files, the numbers may be read as floats and > hence require casting in queries. This PR makes this easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)