qiaojialin commented on a change in pull request #1828: URL: https://github.com/apache/iotdb/pull/1828#discussion_r542324978
########## File path: docs/UserGuide/Operation Manual/UDF User Defined Function.md ########## @@ -0,0 +1,416 @@ +<!-- + + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +--> + + + +# UDF (User Defined Function) + +IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs. + +This document describes how to write, register and use a UDF. Review comment: ```suggestion This document describes how to write, register and use an UDF. ``` ########## File path: docs/UserGuide/Operation Manual/UDF User Defined Function.md ########## @@ -0,0 +1,416 @@ +<!-- + + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +--> + + + +# UDF (User Defined Function) + +IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs. + +This document describes how to write, register and use a UDF. + + + +## UDF Types + +In IoTDB, you can expand two types of UDF: + +| UDF Class | Description | +| --------------------------------------------------- | ------------------------------------------------------------ | +| UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | +| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | + + + +## UDF Development Dependencies + +If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. + +``` xml +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>iotdb-server</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>tsfile</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> Review comment: You'd better check why we need tsfile after adding server. ########## File path: server/src/main/java/org/apache/iotdb/db/conf/IoTDBConfig.java ########## @@ -760,6 +760,28 @@ // the authorizer provider class which extends BasicAuthorizer private String authorizerProvider = "org.apache.iotdb.db.auth.authorizer.LocalFileAuthorizer"; + /** + * Used to estimate the memory usage of text fields in a UDF query. It is recommended to set this + * value to be slightly larger than the average length of all text records. + */ + private int udfInitialByteArrayLengthForMemoryControl = 48; + + /** + * How much memory may be used in ONE UDF query (in MB). + * <p> + * The upper limit is 20% of allocated memory for read. + * <p> + * udfMemoryBudgetInMB = udfReaderMemoryBudgetInMB + udfTransformerMemoryBudgetInMB + + * udfCollectorMemoryBudgetInMB + */ + private float udfMemoryBudgetInMB = (float) Math.min(300f, 0.2 * allocateMemoryForRead); Review comment: You allocate 0.2 of read memory to udf, how does query perceive this? ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/api/customizer/config/UDFConfigurations.java ########## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.api.customizer.config; + +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + +public abstract class UDFConfigurations { Review comment: ```suggestion public abstract class AbstractUDFConfigurations { ``` ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/service/UDFRegistrationService.java ########## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.service; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.lang.reflect.InvocationTargetException; +import java.util.HashMap; +import java.util.Map.Entry; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.locks.ReentrantReadWriteLock; +import org.apache.commons.io.FileUtils; +import org.apache.iotdb.db.conf.IoTDBDescriptor; +import org.apache.iotdb.db.engine.fileSystem.SystemFileFactory; +import org.apache.iotdb.db.exception.StartupException; +import org.apache.iotdb.db.exception.UDFRegistrationException; +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.db.query.udf.api.UDF; +import org.apache.iotdb.db.query.udf.core.context.UDFContext; +import org.apache.iotdb.db.service.IService; +import org.apache.iotdb.db.service.ServiceType; +import org.apache.iotdb.db.utils.TestOnly; +import org.apache.iotdb.tsfile.fileSystem.FSFactoryProducer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class UDFRegistrationService implements IService { + + private static final Logger logger = LoggerFactory.getLogger(UDFRegistrationService.class); + + private static final String ULOG_FILE_DIR = + IoTDBDescriptor.getInstance().getConfig().getSystemDir() + + File.separator + "udf" + File.separator; + private static final String LOG_FILE_NAME = ULOG_FILE_DIR + "ulog.txt"; + private static final String TEMPORARY_LOG_FILE_NAME = LOG_FILE_NAME + ".tmp"; + + private final ConcurrentHashMap<String, UDFRegistrationInformation> registrationInformation; + + private final ReentrantReadWriteLock lock; + private UDFLogWriter temporaryLogWriter; + + private UDFClassLoader udfClassLoader; + + private UDFRegistrationService() { + registrationInformation = new ConcurrentHashMap<>(); + lock = new ReentrantReadWriteLock(); + } + + @SuppressWarnings("squid:S3776") // Suppress high Cognitive Complexity warning + public void register(String functionName, String className, boolean isTemporary, + boolean writeToTemporaryLogFile) throws UDFRegistrationException { + UDFRegistrationInformation information = registrationInformation.get(functionName); + if (information != null) { + if (information.getClassName().equals(className)) { + String errorMessage; + if (information.isTemporary() == isTemporary) { + errorMessage = String + .format("UDF %s(%s) has already been registered successfully.", + functionName, className); + } else { + errorMessage = String.format( + "Failed to register %sTEMPORARY UDF %s(%s), because a %sTEMPORARY UDF %s(%s) with the same function name and the class name has already been registered.", + isTemporary ? "" : "non-", functionName, className, + information.isTemporary() ? "" : "non-", information.getFunctionName(), + information.getClassName()); Review comment: The if-else could be eliminated since the error message is generated according to information.isTemporary() ########## File path: server/src/main/java/org/apache/iotdb/db/qp/executor/PlanExecutor.java ########## @@ -668,6 +697,33 @@ private QueryDataSet processShowFlushTaskInfo() { return listDataSet; } + private QueryDataSet processShowFunctions(ShowFunctionsPlan showPlan) { + ListDataSet listDataSet = new ListDataSet( + Arrays.asList( + new PartialPath(COLUMN_FUNCTION_NAME, false), + new PartialPath(COLUMN_FUNCTION_CLASS, false) + // new PartialPath(COLUMN_FUNCTION_TEMPORARY, false) + ), + Arrays.asList( + TSDataType.TEXT, + TSDataType.TEXT + // TSDataType.BOOLEAN Review comment: remove or add ########## File path: server/src/main/java/org/apache/iotdb/db/qp/constant/DatetimeUtils.java ########## @@ -477,6 +477,46 @@ public static long convertDatetimeStrToLong(String str, ZoneOffset offset, int d return getInstantWithPrecision(str, timestampPrecision); } + /** + * convert duration string to time value. + * + * @param duration represent duration string like: 12d8m9ns, 1y1mo, etc. + * @return time in milliseconds, microseconds, or nanoseconds depending on the profile + */ + public static long convertDurationStrToLong(String duration) { + String timestampPrecision = IoTDBDescriptor.getInstance().getConfig().getTimestampPrecision(); + return convertDurationStrToLong(duration, timestampPrecision); + } + + /** + * convert duration string to time value. + * + * @param duration represent duration string like: 12d8m9ns, 1y1mo, etc. + * @return time in milliseconds, microseconds, or nanoseconds depending on the profile + */ + public static long convertDurationStrToLong(String duration, String timestampPrecision) { Review comment: merge master and the mo is supported ########## File path: server/src/main/java/org/apache/iotdb/db/qp/executor/PlanExecutor.java ########## @@ -668,6 +697,33 @@ private QueryDataSet processShowFlushTaskInfo() { return listDataSet; } + private QueryDataSet processShowFunctions(ShowFunctionsPlan showPlan) { + ListDataSet listDataSet = new ListDataSet( + Arrays.asList( + new PartialPath(COLUMN_FUNCTION_NAME, false), + new PartialPath(COLUMN_FUNCTION_CLASS, false) + // new PartialPath(COLUMN_FUNCTION_TEMPORARY, false) Review comment: remove this ########## File path: server/src/main/java/org/apache/iotdb/db/conf/IoTDBConstant.java ########## @@ -86,6 +86,10 @@ private IoTDBConstant() { public static final String COLUMN_CANCELLED = "cancelled"; public static final String COLUMN_DONE = "done"; + public static final String COLUMN_FUNCTION_NAME = "UDF name"; + public static final String COLUMN_FUNCTION_CLASS = "class name"; + public static final String COLUMN_FUNCTION_TEMPORARY = "temporary"; Review comment: this is not used ########## File path: server/src/main/java/org/apache/iotdb/db/exception/UDFRegistrationException.java ########## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.exception; + +public class UDFRegistrationException extends StorageEngineException { Review comment: extends QueryProcessException? ########## File path: docs/UserGuide/Operation Manual/UDF User Defined Function.md ########## @@ -0,0 +1,416 @@ +<!-- + + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +--> + + + +# UDF (User Defined Function) + +IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs. + +This document describes how to write, register and use a UDF. + + + +## UDF Types + +In IoTDB, you can expand two types of UDF: + +| UDF Class | Description | +| --------------------------------------------------- | ------------------------------------------------------------ | +| UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | +| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | + + + +## UDF Development Dependencies + +If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. + +``` xml +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>iotdb-server</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>tsfile</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> +``` + + + +## UDTF(User Defined Timeseries Generating Function) + +To write a UDTF, you need to inherit the `org.apache.iotdb.db.query.udf.api.UDTF` class, and at least implement the `beforeStart` method and a `transform` method. + +The following table shows all the interfaces available for user implementation. + +| Interface definition | Description | Required to Implement | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ----------------------------------------------------- | +| `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` | The initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and `beforeStart` will be called. | Required | +| `void beforeDestroy() ` | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | +| `void transform(Row row, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `RowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `SlidingSizeWindowAccessStrategy` or `SlidingTimeWindowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `RowWindow`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | + +Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors. + +The usage of each interface will be described in detail below. + + + +### `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` + +This method is mainly used to customize UDTF. In this method, the user can do the following things: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + + + + +#### `UDFParameters` + +`UDFParameters` is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is the paths (measurements) of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty. + + +Example: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + for (PartialPath path : parameters.getPaths()) { + // do something + } + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + + + +#### `UDTFConfigurations` + +You must use `UDTFConfigurations` to specify the strategy used by UDF to access raw data and the type of output sequence. + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(TSDataType.INT32); +} +``` + +The `setAccessStrategy` method is used to set the UDF's strategy for accessing the raw data, and the `setOutputDataType` method is used to set the data type of the output sequence. + + + +##### `setAccessStrategy` + +Note that the raw data access strategy you set here determines which `transform` method the framework will call. Please implement the `transform` method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by `UDFParameters`. Therefore, two `transform` methods are also allowed to be implemented in one UDF. + +The following are the strategies you can set: + +| Interface definition | Description | The `transform` Method to Call | +| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | +| `RowByRowAccessStrategy` | Process raw data row by row. The framework calls the `transform` method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(Row row, PointCollector collector) throws Exception` | +| `SlidingTimeWindowAccessStrategy` | Process a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SlidingSizeWindowAccessStrategy` | The raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | + + + +`RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters. + + + +`SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them: + +- Parameter 1: The display window on the time axis +- Parameter 2: Time interval for dividing the time axis (should be positive) +- Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) + +The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set. + +The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. + +The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details. + +<div style="text-align: center;"><img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/30497621/99787878-47b51480-2b5b-11eb-8ed3-84088c5c30f7.png"></div> + +Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows. + + + +`SlidingSizeWindowAccessStrategy`: `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: + +* Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. +* Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number) + +The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size. + +Please see the Javadoc for more details. + + + +##### `setOutputDataType` + +Note that the type of output sequence you set here determines the type of data that the `PointCollector` can actually receive in the `transform` method. The relationship between the output data type set in `setOutputDataType` and the actual data output type that `PointCollector` can receive is as follows: + +| Output Data Type Set in `setOutputDataType` | Data Type that `PointCollector` Can Receive | +| :------------------------------------------ | :----------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `java.lang.String` and `org.apache.iotdb.tsfile.utils.Binary` | + + + +### `void beforeDestroy() ` + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + +### `void transform(Row row, PointCollector collector) throws Exception` + +You need to implement this method when you specify the strategy of UDF to read the original data as `RowByRowAccessStrategy`. + +This method processes the raw data one row at a time. The raw data is input from `Row` and output by `PointCollector`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `void transform(Row row, PointCollector collector) throws Exception` method. It is an adder that receives two columns of time series as input. When two data points in a row are not `null`, this UDF will output the algebraic sum of these two data points. + +``` java +import org.apache.iotdb.db.query.udf.api.UDTF; +import org.apache.iotdb.db.query.udf.api.access.Row; +import org.apache.iotdb.db.query.udf.api.collector.PointCollector; +import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.db.query.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + + + +### `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` + +You need to implement this method when you specify the strategy of UDF to read the original data as `SlidingTimeWindowAccessStrategy` or `SlidingSizeWindowAccessStrategy`. + +This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from `RowWindow` and output by `PointCollector`. `RowWindow` can help you access a batch of `Row`, it provides a set of interfaces for random access and iterative access to this batch of `Row`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. + +```java +import java.io.IOException; +import org.apache.iotdb.db.query.udf.api.UDTF; +import org.apache.iotdb.db.query.udf.api.access.RowWindow; +import org.apache.iotdb.db.query.udf.api.collector.PointCollector; +import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.db.query.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.getRow(0).getTime(), rowWindow.windowSize()); + } + } +} +``` + + + +## Maven Project Example + +If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). + + + +## UDF Registration + +The process of registering a UDF in IoTDB is as follows: + +1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. +2. Package your project into a JAR. If you use Maven to manage your project, you can refer to the Maven project example above. +3. Place the JAR package in the directory `iotdb-server-0.12.0-SNAPSHOT/lib` . +4. Register the UDF with the SQL statement, assuming that the name given to the UDF is `example`. + +The following shows the SQL syntax of how to register a UDF. + +```sql +CREATE FUNCTION <UDF-NAME> AS <UDF-CLASS-FULL-PATHNAME> +``` + +Here is an example: + +```sql +CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF" +``` + +Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process. + +Note: Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function can be registered, but cannot be called. Review comment: it's better to throw an exception when registered ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/api/access/RowIterator.java ########## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.api.access; + +import java.io.IOException; + +public interface RowIterator { + + /** + * Returns {@code true} if the iteration has more rows. + * + * @return {@code true} if the iteration has more rows + */ + boolean hasNextRow(); + + /** + * Returns the next row in the iteration. + * <p> + * Note that the Row instance returned by this method each time is the same instance. In other + * words, calling {@code next()} will only change the member variables inside the Row instance, + * but will not generate a new Row instance. + * + * @return the next element in the iteration + * @throws IOException if any I/O errors occur + */ + Row next() throws IOException; + + /** + * Resets the iteration. Review comment: ```suggestion * Reset the iteration. ``` ########## File path: server/src/main/java/org/apache/iotdb/db/qp/executor/PlanExecutor.java ########## @@ -668,6 +697,33 @@ private QueryDataSet processShowFlushTaskInfo() { return listDataSet; } + private QueryDataSet processShowFunctions(ShowFunctionsPlan showPlan) { + ListDataSet listDataSet = new ListDataSet( + Arrays.asList( + new PartialPath(COLUMN_FUNCTION_NAME, false), + new PartialPath(COLUMN_FUNCTION_CLASS, false) + // new PartialPath(COLUMN_FUNCTION_TEMPORARY, false) + ), + Arrays.asList( + TSDataType.TEXT, + TSDataType.TEXT + // TSDataType.BOOLEAN + ) + ); + for (UDFRegistrationInformation info : UDFRegistrationService.getInstance() + .getRegistrationInformation()) { + if (showPlan.showTemporary() && !info.isTemporary()) { + continue; + } + RowRecord rowRecord = new RowRecord(0); // ignore timestamp + rowRecord.addField(Binary.valueOf(info.getFunctionName()), TSDataType.TEXT); + rowRecord.addField(Binary.valueOf(info.getClassName()), TSDataType.TEXT); + // rowRecord.addField(info.isTemporary(), TSDataType.BOOLEAN); Review comment: same ########## File path: docs/UserGuide/Operation Manual/UDF User Defined Function.md ########## @@ -0,0 +1,416 @@ +<!-- + + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +--> + + + +# UDF (User Defined Function) + +IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs. + +This document describes how to write, register and use a UDF. + + + +## UDF Types + +In IoTDB, you can expand two types of UDF: + +| UDF Class | Description | +| --------------------------------------------------- | ------------------------------------------------------------ | +| UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | +| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | + + + +## UDF Development Dependencies + +If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. + +``` xml +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>iotdb-server</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> +<dependency> + <groupId>org.apache.iotdb</groupId> + <artifactId>tsfile</artifactId> + <version>0.12.0-SNAPSHOT</version> + <scope>provided</scope> +</dependency> +``` + + + +## UDTF(User Defined Timeseries Generating Function) + +To write a UDTF, you need to inherit the `org.apache.iotdb.db.query.udf.api.UDTF` class, and at least implement the `beforeStart` method and a `transform` method. + +The following table shows all the interfaces available for user implementation. + +| Interface definition | Description | Required to Implement | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ----------------------------------------------------- | +| `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` | The initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and `beforeStart` will be called. | Required | +| `void beforeDestroy() ` | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | +| `void transform(Row row, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `RowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `SlidingSizeWindowAccessStrategy` or `SlidingTimeWindowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `RowWindow`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | + +Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors. + +The usage of each interface will be described in detail below. + + + +### `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` + +This method is mainly used to customize UDTF. In this method, the user can do the following things: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + + + + +#### `UDFParameters` + +`UDFParameters` is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is the paths (measurements) of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty. + + +Example: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + for (PartialPath path : parameters.getPaths()) { + // do something + } + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + + + +#### `UDTFConfigurations` + +You must use `UDTFConfigurations` to specify the strategy used by UDF to access raw data and the type of output sequence. + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(TSDataType.INT32); +} +``` + +The `setAccessStrategy` method is used to set the UDF's strategy for accessing the raw data, and the `setOutputDataType` method is used to set the data type of the output sequence. + + + +##### `setAccessStrategy` + +Note that the raw data access strategy you set here determines which `transform` method the framework will call. Please implement the `transform` method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by `UDFParameters`. Therefore, two `transform` methods are also allowed to be implemented in one UDF. + +The following are the strategies you can set: + +| Interface definition | Description | The `transform` Method to Call | +| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | +| `RowByRowAccessStrategy` | Process raw data row by row. The framework calls the `transform` method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(Row row, PointCollector collector) throws Exception` | +| `SlidingTimeWindowAccessStrategy` | Process a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SlidingSizeWindowAccessStrategy` | The raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | + + + +`RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters. + + + +`SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them: + +- Parameter 1: The display window on the time axis +- Parameter 2: Time interval for dividing the time axis (should be positive) +- Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) + +The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set. + +The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. + +The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details. + +<div style="text-align: center;"><img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/30497621/99787878-47b51480-2b5b-11eb-8ed3-84088c5c30f7.png"></div> + +Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows. + + + +`SlidingSizeWindowAccessStrategy`: `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: + +* Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. +* Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number) + +The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size. + +Please see the Javadoc for more details. + + + +##### `setOutputDataType` + +Note that the type of output sequence you set here determines the type of data that the `PointCollector` can actually receive in the `transform` method. The relationship between the output data type set in `setOutputDataType` and the actual data output type that `PointCollector` can receive is as follows: + +| Output Data Type Set in `setOutputDataType` | Data Type that `PointCollector` Can Receive | +| :------------------------------------------ | :----------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `java.lang.String` and `org.apache.iotdb.tsfile.utils.Binary` | + + + +### `void beforeDestroy() ` + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + +### `void transform(Row row, PointCollector collector) throws Exception` + +You need to implement this method when you specify the strategy of UDF to read the original data as `RowByRowAccessStrategy`. + +This method processes the raw data one row at a time. The raw data is input from `Row` and output by `PointCollector`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `void transform(Row row, PointCollector collector) throws Exception` method. It is an adder that receives two columns of time series as input. When two data points in a row are not `null`, this UDF will output the algebraic sum of these two data points. + +``` java +import org.apache.iotdb.db.query.udf.api.UDTF; +import org.apache.iotdb.db.query.udf.api.access.Row; +import org.apache.iotdb.db.query.udf.api.collector.PointCollector; +import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.db.query.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + + + +### `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` + +You need to implement this method when you specify the strategy of UDF to read the original data as `SlidingTimeWindowAccessStrategy` or `SlidingSizeWindowAccessStrategy`. + +This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from `RowWindow` and output by `PointCollector`. `RowWindow` can help you access a batch of `Row`, it provides a set of interfaces for random access and iterative access to this batch of `Row`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. + +```java +import java.io.IOException; +import org.apache.iotdb.db.query.udf.api.UDTF; +import org.apache.iotdb.db.query.udf.api.access.RowWindow; +import org.apache.iotdb.db.query.udf.api.collector.PointCollector; +import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.db.query.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.getRow(0).getTime(), rowWindow.windowSize()); + } + } +} +``` + + + +## Maven Project Example + +If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). + + + +## UDF Registration + +The process of registering a UDF in IoTDB is as follows: + +1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. +2. Package your project into a JAR. If you use Maven to manage your project, you can refer to the Maven project example above. +3. Place the JAR package in the directory `iotdb-server-0.12.0-SNAPSHOT/lib` . +4. Register the UDF with the SQL statement, assuming that the name given to the UDF is `example`. + +The following shows the SQL syntax of how to register a UDF. + +```sql +CREATE FUNCTION <UDF-NAME> AS <UDF-CLASS-FULL-PATHNAME> +``` + +Here is an example: + +```sql +CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF" Review comment: 'example' looks strange, using my-udf is better.. ########## File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/UDTFPlan.java ########## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.qp.physical.crud; + +import java.time.ZoneId; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.db.qp.logical.Operator; +import org.apache.iotdb.db.query.udf.core.context.UDFContext; +import org.apache.iotdb.db.query.udf.core.executor.UDTFExecutor; + +public class UDTFPlan extends RawDataQueryPlan implements UDFPlan { + + protected final ZoneId zoneId; + + protected Map<String, UDTFExecutor> columnName2Executor = new HashMap<>(); + protected Map<Integer, UDTFExecutor> originalOutputColumnIndex2Executor = new HashMap<>(); + + protected List<String> datasetOutputColumnIndex2UdfColumnName = new ArrayList<>(); Review comment: add some example and javadoc ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/core/reader/LayerPointReader.java ########## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.core.reader; + +import java.io.IOException; +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.utils.Binary; + +public interface LayerPointReader { + + boolean next() throws QueryProcessException, IOException; + + void readyForNext(); Review comment: check this method ########## File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/UDFPlan.java ########## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.qp.physical.crud; + +import java.util.List; +import org.apache.iotdb.db.exception.metadata.MetadataException; +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.db.query.udf.core.context.UDFContext; + +public interface UDFPlan { + + void constructUdfExecutors(List<UDFContext> udfContexts) + throws QueryProcessException, MetadataException; + + void initializeUdfExecutor(long queryId, float collectorMemoryBudgetInMb) + throws QueryProcessException; Review comment: Add some javadoc, I do not know which should be called first ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/api/customizer/strategy/SlidingSizeWindowAccessStrategy.java ########## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.api.customizer.strategy; + +import org.apache.iotdb.db.exception.query.QueryProcessException; +import org.apache.iotdb.db.query.udf.api.UDTF; +import org.apache.iotdb.db.query.udf.api.access.RowWindow; +import org.apache.iotdb.db.query.udf.api.collector.PointCollector; +import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; + +/** + * Used in {@link UDTF#beforeStart(UDFParameters, UDTFConfigurations)}. + * <p> + * When the access strategy of a UDTF is set to an instance of this class, the method {@link + * UDTF#transform(RowWindow, PointCollector)} of the UDTF will be called to transform the original + * data. You need to override the method in your own UDTF class. + * <p> + * Sliding size window is a kind of size-based window. Except for the last call, each call of the + * method {@link UDTF#transform(RowWindow, PointCollector)} processes a window with {@code + * windowSize} rows (aligned by time) of the original data and can generate any number of data + * points. + * <p> + * Sample code: + * <pre>{@code + * @Override + * public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + * configurations + * .setOutputDataType(TSDataType.INT32) + * .setAccessStrategy(new SlidingSizeWindowAccessStrategy(10000)); // window size + * }</pre> + * + * @see UDTF + * @see UDTFConfigurations + */ +public class SlidingSizeWindowAccessStrategy implements AccessStrategy { + + private final int windowSize; + private final int slidingStep; + + /** + * Constructor. You need to specify the number of rows in each sliding size window (except for the + * last window) and the sliding step to the next window. + * + * @param windowSize the number of rows in each sliding size window (0 < windowSize) + * @param slidingStep the number of rows between the first point of the next window and the first + * point of the current window (0 < slidingStep) Review comment: The 'between' looks strange. For example, 1,2,3,4,5 is a window, 6,7,8,9,10 is a window. number of rows between 1 and 6 is 4 or 5? ########## File path: server/src/main/java/org/apache/iotdb/db/conf/IoTDBConfig.java ########## @@ -760,6 +760,28 @@ // the authorizer provider class which extends BasicAuthorizer private String authorizerProvider = "org.apache.iotdb.db.auth.authorizer.LocalFileAuthorizer"; + /** + * Used to estimate the memory usage of text fields in a UDF query. It is recommended to set this + * value to be slightly larger than the average length of all text records. + */ + private int udfInitialByteArrayLengthForMemoryControl = 48; + + /** + * How much memory may be used in ONE UDF query (in MB). + * <p> + * The upper limit is 20% of allocated memory for read. + * <p> + * udfMemoryBudgetInMB = udfReaderMemoryBudgetInMB + udfTransformerMemoryBudgetInMB + + * udfCollectorMemoryBudgetInMB + */ + private float udfMemoryBudgetInMB = (float) Math.min(300f, 0.2 * allocateMemoryForRead); Review comment: maybe you need to change chunkmeta_chunk_timeseriesmeta_free_memory_proportion ########## File path: server/src/main/java/org/apache/iotdb/db/query/udf/service/UDFClassLoader.java ########## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.db.query.udf.service; + +import java.io.File; +import java.io.IOException; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.HashSet; +import java.util.Set; +import org.apache.commons.io.FileUtils; +import org.apache.iotdb.db.engine.fileSystem.SystemFileFactory; + +public class UDFClassLoader extends URLClassLoader { Review comment: add a javadoc to explain why the fileSet only expands ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
