[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/929 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-218371850 +1 the latest patch looks good to me. @eminency thank you for your work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-214145976 @jihoonson The problem you mentioned and null handling problem are fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-211224626 Hi @eminency, I met this warning message while starting up Tajo. This is weird because I configured the path to hive udf like below. Would you check it? ``` # in tajo-site.xml tajo.function.hive.code-dir ${TAJO_HOME}/tajo-core-tests/src/test/resources/hiveudf/ # in TajoCli default> select my_upper('abcd'); ERROR: function does not exist: my_upper(text) # in the master log ... 2016-04-18 15:15:19,987 WARN org.apache.tajo.engine.function.hiveudf.HiveFunctionLoader: Hive UDF directory doesn't exist ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r60009836 --- Diff: tajo-project/pom.xml --- @@ -1007,6 +1006,12 @@ + +org.apache.hive +hive-exec +${hive.version} +provided --- End diff -- I missed this. Please ignore the above comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r60009769 --- Diff: tajo-plan/pom.xml --- @@ -173,6 +173,10 @@ hadoop-common + org.apache.hive + hive-exec --- End diff -- The dependency scope should be provided. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r60009569 --- Diff: tajo-core/pom.xml --- @@ -252,6 +252,10 @@ hadoop-yarn-server-common provided + + org.apache.hive + hive-exec --- End diff -- The dependency scope should be provided. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r60009547 --- Diff: tajo-core-tests/pom.xml --- @@ -323,6 +323,10 @@ + + org.apache.hive + hive-exec --- End diff -- The dependency scope should be test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-201157356 @jihoonson I fixed some of what you suggested. Read my comments. In addition, loading from HDFS might be separated to another issue. It needs too large working to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r56941559 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,113 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + private Writable [] params; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); +params = new Writable[desc.getParamTypes().length]; + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), udfclass); +} catch (ClassNotFoundException e) { + throw new TajoInternalError(e); +} + } + + private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { --- End diff -- To do that, Tajo type should be converted to Writable type, but Writable type cannot be fixed. For example, it can be inherited. So comparing between Tajo types can be more robust. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r56941033 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,113 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + private Writable [] params; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); +params = new Writable[desc.getParamTypes().length]; + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), udfclass); +} catch (ClassNotFoundException e) { + throw new TajoInternalError(e); +} + } + + private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { --- End diff -- What a good point --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r56934161 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,161 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static OptionalloadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + Path udfPath = new Path(udfdir); + FileSystem fs = udfPath.getFileSystem(conf); + + if (!fs.isDirectory(udfPath)) { +return Optional.empty(); + } + + // loop each jar file + for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> path.getName().endsWith(".jar"))) { + +URL[] urls = new URL[]{new URL("jar:" + fstatus.getPath().toUri().toURL() + "!/")}; + +// extract and register UDF's decendants (legacy Hive UDF form) +Set
udfClasses = getSubclassesFromJarEntry(urls, UDF.class); +if (udfClasses != null) { + buildFunctionsFromUDF(udfClasses, funcList, "jar:"+urls[0].getPath()); +} + } +} catch (IOException e) { + throw new TajoInternalError(e); +} + +return Optional.of(funcList); + } + + private static Set getSubclassesFromJarEntry(URL[] urls, Class targetCls) { +Reflections refl = new Reflections(new ConfigurationBuilder(). +setUrls(urls). +addClassLoader(new URLClassLoader(urls))); + +return refl.getSubTypesOf(targetCls); + } + + static void buildFunctionsFromUDF(Set classes, List list, String jarurl) { +for (Class clazz: classes) { + String [] names; + String value = null, extended = null; + + Description desc = clazz.getAnnotation(Description.class); + + // Check @Description annotation (if exists) + if (desc != null) { +names = desc.name().split(","); +for (int i=0; i
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55786236 --- Diff: tajo-common/pom.xml --- @@ -245,6 +249,12 @@ powermock-api-mockito test + + org.apache.hive --- End diff -- I will move it into tajo-plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55785562 --- Diff: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java --- @@ -0,0 +1,105 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.catalog; + +import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.function.*; + +public class FunctionDescBuilder { --- End diff -- Actually, I found out a main difference. PB's builder has a nested structure. It must build children's PB at first and build its own PB. So it looks complex to use. The builder I made receives all informations needed at same level, then builds FunctionDesc including children objects at once. So it doesn't have to consider objects' hierarchy. Which one can be better do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55643624 --- Diff: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java --- @@ -0,0 +1,105 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.catalog; + +import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.function.*; + +public class FunctionDescBuilder { --- End diff -- I think it was just inconvenient and I didn't consider about Proto.Builder. It'd be better to use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-194132955 Hi @eminency, I left some comments. In addition, I found the below error when I start a tajo cluster. I put the hive udf jar file on hdfs and specified it in tajo-site.xml. ``` 2016-03-09 14:34:06,613 FATAL org.apache.tajo.master.TajoMaster: Error starting TajoMaster org.apache.tajo.exception.TajoInternalError: internal error: unknown protocol: hdfs at org.apache.tajo.engine.function.hiveudf.HiveFunctionLoader.loadHiveUDFs(HiveFunctionLoader.java:70) at org.apache.tajo.master.TajoMaster.loadFunctions(TajoMaster.java:229) at org.apache.tajo.master.TajoMaster.serviceInit(TajoMaster.java:181) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.tajo.master.TajoMaster.main(TajoMaster.java:628) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55476149 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,113 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + private Writable [] params; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); +params = new Writable[desc.getParamTypes().length]; + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), udfclass); +} catch (ClassNotFoundException e) { + throw new TajoInternalError(e); +} + } + + private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { +Class [] methodParamTypes = m.getParameterTypes(); +if (checkParamTypes(methodParamTypes, paramTypes)) { + return m; +} + } +} + +return null; --- End diff -- If 'evaluate' method is not found, an exception should be thrown. How about adding a new exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55475892 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,113 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + private Writable [] params; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); +params = new Writable[desc.getParamTypes().length]; + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), udfclass); +} catch (ClassNotFoundException e) { + throw new TajoInternalError(e); +} + } + + private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { --- End diff -- Class.getMethod(String name, Class... parameterTypes) can simplify this for loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55475795 --- Diff: tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java --- @@ -215,12 +216,48 @@ public void serviceInit(Configuration conf) throws Exception { LOG.info("Tajo Master is initialized."); } + private Collection loadFunctions() throws IOException, AmbiguousFunctionException { +List functionList = new ArrayList<>(FunctionLoader.loadBuiltinFunctions().values()); + +HashMapfuncSet = new HashMap<>(); + +for (FunctionDesc desc: functionList) { + funcSet.put(desc.hashCodeWithoutType(), desc); +} + + checkUDFduplicateAndMerge(FunctionLoader.loadUserDefinedFunctions(systemConf), funcSet, functionList); +checkUDFduplicateAndMerge(HiveFunctionLoader.loadHiveUDFs(systemConf), funcSet, functionList); + +return functionList; + } + + /** + * Checks duplicates between pre-loaded functions and UDFs. And they are meged to funcList. + * + * @param udfs UDF list + * @param funcSet set for pre-loaded functions to match signature + * @param funcList list to be merged + * @throws AmbiguousFunctionException + */ + private void checkUDFduplicateAndMerge(Optional udfs, HashMap
funcSet, List funcList) --- End diff -- It would be better if you add a test for this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55475465 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,161 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static OptionalloadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + Path udfPath = new Path(udfdir); + FileSystem fs = udfPath.getFileSystem(conf); + + if (!fs.isDirectory(udfPath)) { +return Optional.empty(); + } + + // loop each jar file + for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> path.getName().endsWith(".jar"))) { + +URL[] urls = new URL[]{new URL("jar:" + fstatus.getPath().toUri().toURL() + "!/")}; + +// extract and register UDF's decendants (legacy Hive UDF form) +Set
udfClasses = getSubclassesFromJarEntry(urls, UDF.class); +if (udfClasses != null) { + buildFunctionsFromUDF(udfClasses, funcList, "jar:"+urls[0].getPath()); +} + } +} catch (IOException e) { + throw new TajoInternalError(e); +} + +return Optional.of(funcList); + } + + private static Set getSubclassesFromJarEntry(URL[] urls, Class targetCls) { +Reflections refl = new Reflections(new ConfigurationBuilder(). +setUrls(urls). +addClassLoader(new URLClassLoader(urls))); + +return refl.getSubTypesOf(targetCls); + } + + static void buildFunctionsFromUDF(Set classes, List list, String jarurl) { +for (Class clazz: classes) { + String [] names; + String value = null, extended = null; + + Description desc = clazz.getAnnotation(Description.class); + + // Check @Description annotation (if exists) + if (desc != null) { +names = desc.name().split(","); +for (int i=0; i
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55474452 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,161 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static OptionalloadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + Path udfPath = new Path(udfdir); + FileSystem fs = udfPath.getFileSystem(conf); + + if (!fs.isDirectory(udfPath)) { +return Optional.empty(); + } + + // loop each jar file + for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> path.getName().endsWith(".jar"))) { + +URL[] urls = new URL[]{new URL("jar:" + fstatus.getPath().toUri().toURL() + "!/")}; + +// extract and register UDF's decendants (legacy Hive UDF form) +Set
udfClasses = getSubclassesFromJarEntry(urls, UDF.class); --- End diff -- Since getSubclassesFromJarEntry() can take the array of urls, it would be better to pull the getSubclassesFromJarEntry() and buildFunctionsFromUDF() calls out from the for loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55474019 --- Diff: tajo-core-tests/src/test/java/org/apache/tajo/engine/function/hiveudf/TestHiveFunctionLoader.java --- @@ -0,0 +1,149 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import com.google.common.base.Preconditions; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.tajo.TajoTestingCluster; +import org.apache.tajo.catalog.CatalogService; +import org.apache.tajo.catalog.CatalogUtil; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.client.TajoClient; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +import java.net.URL; +import java.sql.ResultSet; +import java.util.*; + +public class TestHiveFunctionLoader { --- End diff -- IMO, it would be better to write this test class using ExprTestBase. Please refer to TestPythonFunctions. In addition, we need to separate tests for HiveFunctionLoader and hive udf execution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55473420 --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java --- @@ -309,6 +309,9 @@ public static int setDateOrder(int dateOrder) { PYTHON_CODE_DIR("tajo.function.python.code-dir", ""), PYTHON_CONTROLLER_LOG_DIR("tajo.function.python.controller.log-dir", ""), +// HIVE UDF +HIVE_UDF_DIR("hive.udf.dir", "./lib/hiveudf"), --- End diff -- How about changing to tajo.function.hive.code-dir? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55472456 --- Diff: tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java --- @@ -0,0 +1,128 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.util; + +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.io.*; +import org.apache.tajo.common.TajoDataTypes.Type; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.datum.*; +import org.apache.tajo.exception.NotImplementedException; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.exception.UnsupportedDataTypeException; +import org.apache.tajo.util.datetime.DateTimeConstants; +import org.apache.tajo.util.datetime.DateTimeUtil; +import org.reflections.ReflectionUtils; + +import java.sql.Timestamp; +import java.util.Set; + +public class WritableTypeConverter { + + public static DataType convertWritableToTajoType(Class writableClass) { +if (writableClass == null) + return null; + +DataType.Builder builder = DataType.newBuilder(); +Setparents = ReflectionUtils.getAllSuperTypes(writableClass); + +if (writableClass == ByteWritable.class || parents.contains(ByteWritable.class)) { + return builder.setType(Type.INT1).build(); +} +if (writableClass == ShortWritable.class || parents.contains(ShortWritable.class)) { + return builder.setType(Type.INT2).build(); +} +if (writableClass == IntWritable.class || parents.contains(IntWritable.class)) { + return builder.setType(Type.INT4).build(); +} +if (writableClass == LongWritable.class || parents.contains(LongWritable.class)) { + return builder.setType(Type.INT8).build(); +} +if (writableClass == Text.class || parents.contains(Text.class)) { + return builder.setType(Type.TEXT).build(); +} +if (writableClass == FloatWritable.class || parents.contains(FloatWritable.class)) { + return builder.setType(Type.FLOAT4).build(); +} +if (writableClass == DoubleWritable.class || parents.contains(DoubleWritable.class)) { + return builder.setType(Type.FLOAT8).build(); +} +if (writableClass == DateWritable.class || parents.contains(DateWritable.class)) { + return builder.setType(Type.DATE).build(); +} +if (writableClass == TimestampWritable.class || parents.contains(TimestampWritable.class)) { + return builder.setType(Type.TIMESTAMP).build(); +} +if (writableClass == BytesWritable.class || parents.contains(BytesWritable.class)) { + return builder.setType(Type.VARBINARY).build(); +} + +throw new TajoRuntimeException(new NotImplementedException(writableClass.getSimpleName())); + } + + public static Writable convertDatum2Writable(Datum value) { --- End diff -- Char type is missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55472450 --- Diff: tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java --- @@ -0,0 +1,128 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.util; + +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.io.*; +import org.apache.tajo.common.TajoDataTypes.Type; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.datum.*; +import org.apache.tajo.exception.NotImplementedException; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.exception.UnsupportedDataTypeException; +import org.apache.tajo.util.datetime.DateTimeConstants; +import org.apache.tajo.util.datetime.DateTimeUtil; +import org.reflections.ReflectionUtils; + +import java.sql.Timestamp; +import java.util.Set; + +public class WritableTypeConverter { + + public static DataType convertWritableToTajoType(Class writableClass) { --- End diff -- Char type is missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55472204 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,161 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static OptionalloadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + Path udfPath = new Path(udfdir); + FileSystem fs = udfPath.getFileSystem(conf); + + if (!fs.isDirectory(udfPath)) { --- End diff -- It would be good if we write some logs to let users know what's the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55472123 --- Diff: tajo-common/pom.xml --- @@ -245,6 +249,12 @@ powermock-api-mockito test + + org.apache.hive --- End diff -- This can increase the size of the tajo client. How about moving it to other modules like tajo-core and tajo-plan? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55471853 --- Diff: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java --- @@ -0,0 +1,105 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.catalog; + +import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.function.*; + +public class FunctionDescBuilder { --- End diff -- It is seemingly similar to FunctionDescProto.Builder. Would you have any reason for adding a new class instead of using PB's builder? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-193074302 Sorry for late review. I left some comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55159105 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,112 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + private Writable [] params; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); +params = new Writable[desc.getParamTypes().length]; + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), udfclass); +} catch (ClassNotFoundException e) { + e.printStackTrace(); --- End diff -- Throwing exception is required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55157997 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,161 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static Collection loadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + Path udfPath = new Path(udfdir); + FileSystem fs = udfPath.getFileSystem(conf); + + if (!fs.isDirectory(udfPath)) { +return null; --- End diff -- How about using Optional? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55157850 --- Diff: tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java --- @@ -0,0 +1,127 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.util; + +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.io.*; +import org.apache.tajo.common.TajoDataTypes.Type; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.datum.*; +import org.apache.tajo.exception.NotImplementedException; +import org.apache.tajo.exception.TajoRuntimeException; +import org.apache.tajo.util.datetime.DateTimeConstants; +import org.apache.tajo.util.datetime.DateTimeUtil; +import org.reflections.ReflectionUtils; + +import java.sql.Timestamp; +import java.util.Set; + +public class WritableTypeConverter { + + public static DataType convertWritableToTajoType(Class writableClass) { +if (writableClass == null) + return null; + +DataType.Builder builder = DataType.newBuilder(); +Setparents = ReflectionUtils.getAllSuperTypes(writableClass); + +if (writableClass == ByteWritable.class || parents.contains(ByteWritable.class)) { + return builder.setType(Type.INT1).build(); +} +if (writableClass == ShortWritable.class || parents.contains(ShortWritable.class)) { + return builder.setType(Type.INT2).build(); +} +if (writableClass == IntWritable.class || parents.contains(IntWritable.class)) { + return builder.setType(Type.INT4).build(); +} +if (writableClass == LongWritable.class || parents.contains(LongWritable.class)) { + return builder.setType(Type.INT8).build(); +} +if (writableClass == Text.class || parents.contains(Text.class)) { + return builder.setType(Type.TEXT).build(); +} +if (writableClass == FloatWritable.class || parents.contains(FloatWritable.class)) { + return builder.setType(Type.FLOAT4).build(); +} +if (writableClass == DoubleWritable.class || parents.contains(DoubleWritable.class)) { + return builder.setType(Type.FLOAT8).build(); +} +if (writableClass == DateWritable.class || parents.contains(DateWritable.class)) { + return builder.setType(Type.DATE).build(); +} +if (writableClass == TimestampWritable.class || parents.contains(TimestampWritable.class)) { + return builder.setType(Type.TIMESTAMP).build(); +} +if (writableClass == BytesWritable.class || parents.contains(BytesWritable.class)) { + return builder.setType(Type.VARBINARY).build(); +} + +throw new TajoRuntimeException(new NotImplementedException(writableClass.getSimpleName())); + } + + public static Writable convertDatum2Writable(Datum value) { +switch(value.type()) { + case INT1: return new ByteWritable(value.asByte()); + case INT2: return new ShortWritable(value.asInt2()); + case INT4: return new IntWritable(value.asInt4()); + case INT8: return new LongWritable(value.asInt8()); + + case FLOAT4: return new FloatWritable(value.asFloat4()); + case FLOAT8: return new DoubleWritable(value.asFloat8()); + + // NOTE: value should be DateDatum + case DATE: return new DateWritable(value.asInt4() - DateTimeConstants.UNIX_EPOCH_JDATE); + + // NOTE: value should be TimestampDatum + case TIMESTAMP: +TimestampWritable result = new TimestampWritable(); +result.setTime(DateTimeUtil.julianTimeToJavaTime(value.asInt8())); +return result; + + case TEXT: return new Text(value.asChars()); + case VARBINARY: return new BytesWritable(value.asByteArray()); +} + +throw new
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r55157868 --- Diff: tajo-common/src/test/java/org/apache/tajo/util/TestWritableTypeConverter.java --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.util; + +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.io.Writable; +import org.apache.tajo.datum.DateDatum; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.datum.TimestampDatum; +import org.apache.tajo.util.datetime.DateTimeUtil; +import org.apache.tajo.util.datetime.TimeMeta; +import org.junit.Test; + +import static org.junit.Assert.*; + +public class TestWritableTypeConverter { --- End diff -- It would be better if you add some negative tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-188273376 @jihoonson I applied most of your suggestions. Could you check them again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r53128977 --- Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java --- @@ -200,6 +200,7 @@ public void serviceInit(Configuration conf) throws Exception { historyReader = new HistoryReader(workerContext.getWorkerName(), this.systemConf); FunctionLoader.loadUserDefinedFunctions(systemConf); +HiveFunctionLoader.loadHiveUDFs(systemConf); --- End diff -- If any duplicate exists, it is already designed to raise AmbiguousFunctionException. So this point might be skipped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r52709548 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,95 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(udfclass); +} catch (ClassNotFoundException e) { + e.printStackTrace(); +} + } + + private Method getEvaluateMethod(Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { +return m; + } +} + +return null; + } + + @Override + public Datum eval(Tuple tuple) { +Datum resultDatum; +Writable [] params = new Writable[tuple.size()]; --- End diff -- Good idea! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-179742455 Hi @eminency, thanks for your work. It seems good overall. I left some comments. In addition, would you briefly evaluate the performance of Hive UDF call? I wonder how large its overhead is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r51850151 --- Diff: tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java --- @@ -0,0 +1,103 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.util; + +import org.apache.hadoop.io.*; +import org.apache.tajo.common.TajoDataTypes.Type; +import org.apache.tajo.common.TajoDataTypes.DataType; +import org.apache.tajo.datum.*; +import org.apache.tajo.exception.NotImplementedException; +import org.apache.tajo.exception.TajoRuntimeException; +import org.reflections.ReflectionUtils; + +import java.util.Set; + +public class WritableTypeConverter { + + public static DataType convertWritableToTajoType(Class writableClass) { +if (writableClass == null) + return null; + +DataType.Builder builder = DataType.newBuilder(); +Setparents = ReflectionUtils.getAllSuperTypes(writableClass); + +if (writableClass == ByteWritable.class || parents.contains(ByteWritable.class)) { + return builder.setType(Type.INT1).build(); +} +if (writableClass == ShortWritable.class || parents.contains(ShortWritable.class)) { + return builder.setType(Type.INT2).build(); +} +if (writableClass == IntWritable.class || parents.contains(IntWritable.class)) { + return builder.setType(Type.INT4).build(); +} +if (writableClass == LongWritable.class || parents.contains(LongWritable.class)) { + return builder.setType(Type.INT8).build(); +} +if (writableClass == Text.class || parents.contains(Text.class)) { + return builder.setType(Type.TEXT).build(); +} +if (writableClass == FloatWritable.class || parents.contains(FloatWritable.class)) { + return builder.setType(Type.FLOAT4).build(); +} +if (writableClass == DoubleWritable.class || parents.contains(DoubleWritable.class)) { + return builder.setType(Type.FLOAT8).build(); +} +if (writableClass == BytesWritable.class || parents.contains(BytesWritable.class)) { + return builder.setType(Type.VARBINARY).build(); +} + +throw new TajoRuntimeException(new NotImplementedException(writableClass.getSimpleName())); + } + + public static Writable convertDatum2Writable(Datum value) { +switch(value.type()) { + case INT1: return new ByteWritable(value.asByte()); + case INT2: return new ShortWritable(value.asInt2()); + case INT4: return new IntWritable(value.asInt4()); + case INT8: return new LongWritable(value.asInt8()); + + case FLOAT4: return new FloatWritable(value.asFloat4()); + case FLOAT8: return new DoubleWritable(value.asFloat8()); + + case TEXT: return new Text(value.asChars()); + case VARBINARY: return new BytesWritable(value.asByteArray()); --- End diff -- We need to support date, time, timestamp, and boolean types as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r51849874 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,95 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(udfclass); +} catch (ClassNotFoundException e) { + e.printStackTrace(); +} + } + + private Method getEvaluateMethod(Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { +return m; + } +} + +return null; + } + + @Override + public Datum eval(Tuple tuple) { +Datum resultDatum; +Writable [] params = new Writable[tuple.size()]; --- End diff -- How about making this ```params``` as a member variable? Its contents can be overwritten per ```eval()``` call instead of creating new one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r51850040 --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java --- @@ -0,0 +1,154 @@ +/*** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.engine.function.hiveudf; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.catalog.FunctionDescBuilder; +import org.apache.tajo.catalog.proto.CatalogProtos; +import org.apache.tajo.common.TajoDataTypes; +import org.apache.tajo.conf.TajoConf; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.util.WritableTypeConverter; +import org.reflections.Reflections; +import org.reflections.util.ConfigurationBuilder; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.*; + +public class HiveFunctionLoader { + public static Collection loadHiveUDFs(TajoConf conf) { +ArrayList funcList = new ArrayList<>(); +String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR); + +try { + FileSystem localFS = FileSystem.getLocal(conf); --- End diff -- Hive basically assumes that the Jar files are stored on HDFS. We also need to support it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r51850654 --- Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java --- @@ -200,6 +200,7 @@ public void serviceInit(Configuration conf) throws Exception { historyReader = new HistoryReader(workerContext.getWorkerName(), this.systemConf); FunctionLoader.loadUserDefinedFunctions(systemConf); +HiveFunctionLoader.loadHiveUDFs(systemConf); --- End diff -- It seems that TajoWorker also needs to check duplicated functions as TajoMaster. In addition, it will be better if the routine to load functions is consistent no matter where it is called from. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/929#issuecomment-179781185 I missed many things... I will make the code handle them soon. UDF call test also will be done ASAP. Thanks @jihoonson . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/929#discussion_r51849575 --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java --- @@ -0,0 +1,95 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tajo.plan.function; + +import org.apache.hadoop.io.*; +import org.apache.tajo.catalog.FunctionDesc; +import org.apache.tajo.datum.Datum; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.function.UDFInvocationDesc; +import org.apache.tajo.storage.Tuple; +import org.apache.tajo.util.WritableTypeConverter; + +import java.io.IOException; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.net.URL; +import java.net.URLClassLoader; + +public class HiveFunctionInvoke extends FunctionInvoke implements Cloneable { + private Object instance = null; + private Method evalMethod = null; + + public HiveFunctionInvoke(FunctionDesc desc) { +super(desc); + } + + @Override + public void init(FunctionInvokeContext context) throws IOException { +UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF(); + +URL [] urls = new URL [] { new URL(udfDesc.getPath()) }; +URLClassLoader loader = new URLClassLoader(urls); + +try { + Class udfclass = loader.loadClass(udfDesc.getName()); + evalMethod = getEvaluateMethod(udfclass); +} catch (ClassNotFoundException e) { + e.printStackTrace(); +} + } + + private Method getEvaluateMethod(Class clazz) { +Constructor constructor = clazz.getConstructors()[0]; + +try { + instance = constructor.newInstance(); +} catch (InstantiationException|IllegalAccessException|InvocationTargetException e) { + throw new TajoInternalError(e); +} + +for (Method m: clazz.getMethods()) { + if (m.getName().equals("evaluate")) { +return m; --- End diff -- If there are several functions of the same name which accept different arguments, this looks to return a wrong function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF
GitHub user eminency opened a pull request: https://github.com/apache/tajo/pull/929 TAJO-1686: Allow Tajo to use Hive UDF This patch consists of parts below. Core Logic 1. **TajoMaster.java / HiveFunctionLoader.java** : in charge of finding Hive UDFs and registering them. 2. **HiveFunctionInvoke.java** : to call Hive UDF Utility 1. **WritableTypeConverter.java** : utility class, to convert from Writable type to Datum and vice versa. 2. **FunctionDescBuilder.java** : helper/builder to build FunctionDesc more conveniently. and others. You can merge this pull request into a Git repository by running: $ git pull https://github.com/eminency/tajo hive_udf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #929 commit 8e2cafb21f6d149dc3bec7a3c7a845d54e678672 Author: Jongyoung ParkDate: 2015-10-23T08:21:31Z base commit 070f3610442199edaa55e7f76c5343c549a66a01 Author: Jongyoung Park Date: 2015-10-27T07:41:41Z Method name refactoring commit 7efc29eefe6d78837cc6a817d1f1c68d0f550fce Author: Jongyoung Park Date: 2015-10-27T07:55:08Z Remove a simple method commit 3a803f19a7758b18d5a592a4f44d95c071cf6f14 Author: Jongyoung Park Date: 2015-10-29T03:21:41Z get UDF classes commit 0fdfa0903f5d4922bf5d1c484013ec079770dede Author: Jongyoung Park Date: 2015-10-29T03:22:07Z refactor for Java 8 commit 1cd0d5a04b88d6fc2b7bf11f7c85089a7eef375b Author: Jongyoung Park Date: 2015-10-29T03:23:03Z test for local commit 98f65a6001b4619e12ee34bc95a13c367b5ad1a9 Author: Jongyoung Park Date: 2015-10-29T07:11:12Z name and deterministic commit 6a69fdd2eab72380535e5b959e26ced1b55b811a Author: Jongyoung Park Date: 2015-10-29T07:11:28Z variable name refactoring commit a0fe3b21114ab67f710585485891ac787782e964 Author: Jongyoung Park Date: 2015-10-29T07:29:55Z remove useless enum commit 6dcd24b537a0b245a079ef8db1c411a0a7563b19 Author: Jongyoung Park Date: 2015-10-29T07:31:32Z Revert "remove useless enum" This reverts commit a8739670ffb688b3d2c2ea5b8e8422d21059cf92. commit 055a6db84feeb185c279b1b41e61190ede0ca155 Author: Jongyoung Park Date: 2015-10-29T07:32:30Z Remove useless enum commit 64899ae7bdb6b2dad951c3d98d9282ecc83bd4cc Author: Jongyoung Park Date: 2015-11-09T02:58:20Z Weird variable name changed commit 935cd2891d6810d466c12838cf0498f94eb47bb0 Author: Jongyoung Park Date: 2015-11-09T05:32:48Z analyzeUDF test commit 6bc4e1eba76f242ee26c0dff2143dff819accb1b Author: Jongyoung Park Date: 2015-11-09T05:33:08Z test code missed commit e2c0d719fa5c707fa08775b756aff45eb7cd19c6 Author: Jongyoung Park Date: 2015-11-09T06:03:31Z License added commit e675e15067fbf1effdafb3bfa0fc76da895f6a36 Author: Jongyoung Park Date: 2015-11-09T06:12:19Z deterministic added and test modified commit 6df6409852fd324c9d83b02fbd0d077f3ee08d27 Author: Jongyoung Park Date: 2015-11-09T08:40:09Z deterministic test commit 110c5882eba1780e323412c4339b11154ad62ea8 Author: Jongyoung Park Date: 2015-11-10T07:59:37Z More test and FunctionDescBuilder added commit 2142e3a0dca128c64c81dc22c8fe437ac60f4167 Author: Jongyoung Park Date: 2015-11-11T09:28:12Z test refactoring commit 5fd2c9187ac137fd34f0e03d8bf59d8db9cc2ef9 Author: Jongyoung Park Date: 2015-11-11T09:30:04Z setup sequence refined commit b4a7015d485e253195c29657373519aad12880fd Author: Jongyoung Park Date: 2015-11-12T08:57:40Z Method names are changed commit 96a8c72a5f59079a8f782b861f970bf77e588e97 Author: Jongyoung Park Date: 2015-11-12T08:59:33Z Test problem fixed commit 96b9294495d27cb098147765db7841816273aa27 Author: Jongyoung Park Date: 2015-11-16T03:28:52Z AmbiguousFunctionException in loading function commit 2b43dc1dd8fff8c6aee0d4badcaae4d10abfc345 Author: Jongyoung Park Date: 2015-11-16T05:36:56Z Basic finding test commit fd69123f7e0972f584c82f8bb5ccb9e783151972 Author: Jongyoung Park Date: 2015-11-16T06:08:03Z process synonyms commit b31a9180e19018b40db9599cc3f1f7c1512c143b Author: Jongyoung Park Date: 2015-11-16T06:28:24Z process description and example commit