[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-05-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/tajo/pull/929


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-05-11 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-218371850
  
+1 the latest patch looks good to me.
@eminency thank you for your work!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-24 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-214145976
  
@jihoonson 
The problem you mentioned and null handling problem are fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-18 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-211224626
  
Hi @eminency, I met this warning message while starting up Tajo. This is 
weird because I configured the path to hive udf like below. Would you check it?

```
# in tajo-site.xml
 
  tajo.function.hive.code-dir
  ${TAJO_HOME}/tajo-core-tests/src/test/resources/hiveudf/
 

# in TajoCli
default> select my_upper('abcd');
ERROR: function does not exist: my_upper(text)

# in the master log
...
2016-04-18 15:15:19,987 WARN 
org.apache.tajo.engine.function.hiveudf.HiveFunctionLoader: Hive UDF directory 
doesn't exist
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-18 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r60009836
  
--- Diff: tajo-project/pom.xml ---
@@ -1007,6 +1006,12 @@
   
 
   
+  
+org.apache.hive
+hive-exec
+${hive.version}
+provided
--- End diff --

I missed this. Please ignore the above comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-18 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r60009769
  
--- Diff: tajo-plan/pom.xml ---
@@ -173,6 +173,10 @@
   hadoop-common
 
 
+  org.apache.hive
+  hive-exec
--- End diff --

The dependency scope should be provided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-18 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r60009569
  
--- Diff: tajo-core/pom.xml ---
@@ -252,6 +252,10 @@
   hadoop-yarn-server-common
   provided
 
+
+  org.apache.hive
+  hive-exec
--- End diff --

The dependency scope should be provided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-04-18 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r60009547
  
--- Diff: tajo-core-tests/pom.xml ---
@@ -323,6 +323,10 @@
 
   
 
+
+  org.apache.hive
+  hive-exec
--- End diff --

The dependency scope should be test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-25 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-201157356
  
@jihoonson 
I fixed some of what you suggested. Read my comments.
In addition, loading from HDFS might be separated to another issue. It 
needs too large working to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-22 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r56941559
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+  private Writable [] params;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+params = new Writable[desc.getParamTypes().length];
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), 
udfclass);
+} catch (ClassNotFoundException e) {
+  throw new TajoInternalError(e);
+}
+  }
+
+  private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, 
Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
--- End diff --

To do that, Tajo type should be converted to Writable type, but Writable 
type cannot be fixed. For example, it can be inherited.

So comparing between Tajo types can be more robust.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-22 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r56941033
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+  private Writable [] params;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+params = new Writable[desc.getParamTypes().length];
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), 
udfclass);
+} catch (ClassNotFoundException e) {
+  throw new TajoInternalError(e);
+}
+  }
+
+  private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, 
Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
--- End diff --

What a good point


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-21 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r56934161
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,161 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Optional loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  Path udfPath = new Path(udfdir);
+  FileSystem fs = udfPath.getFileSystem(conf);
+
+  if (!fs.isDirectory(udfPath)) {
+return Optional.empty();
+  }
+
+  // loop each jar file
+  for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> 
path.getName().endsWith(".jar"))) {
+
+URL[] urls = new URL[]{new URL("jar:" + 
fstatus.getPath().toUri().toURL() + "!/")};
+
+// extract and register UDF's decendants (legacy Hive UDF form)
+Set udfClasses = 
getSubclassesFromJarEntry(urls, UDF.class);
+if (udfClasses != null) {
+  buildFunctionsFromUDF(udfClasses, funcList, 
"jar:"+urls[0].getPath());
+}
+  }
+} catch (IOException e) {
+  throw new TajoInternalError(e);
+}
+
+return Optional.of(funcList);
+  }
+
+  private static  Set 
getSubclassesFromJarEntry(URL[] urls, Class targetCls) {
+Reflections refl = new Reflections(new ConfigurationBuilder().
+setUrls(urls).
+addClassLoader(new URLClassLoader(urls)));
+
+return refl.getSubTypesOf(targetCls);
+  }
+
+  static void buildFunctionsFromUDF(Set classes, 
List list, String jarurl) {
+for (Class clazz: classes) {
+  String [] names;
+  String value = null, extended = null;
+
+  Description desc = clazz.getAnnotation(Description.class);
+
+  // Check @Description annotation (if exists)
+  if (desc != null) {
+names = desc.name().split(",");
+for (int i=0; i

[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-10 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55786236
  
--- Diff: tajo-common/pom.xml ---
@@ -245,6 +249,12 @@
   powermock-api-mockito
   test
 
+
+  org.apache.hive
--- End diff --

I will move it into tajo-plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-10 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55785562
  
--- Diff: 
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java
 ---
@@ -0,0 +1,105 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.catalog;
+
+import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.function.*;
+
+public class FunctionDescBuilder {
--- End diff --

Actually, I found out a main difference. PB's builder has a nested 
structure. It must build children's PB at first and build its own PB. So it 
looks complex to use.

The builder I made receives all informations needed at same level, then 
builds FunctionDesc including children objects at once. So it doesn't have to 
consider objects' hierarchy.

Which one can be better do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-09 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55643624
  
--- Diff: 
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java
 ---
@@ -0,0 +1,105 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.catalog;
+
+import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.function.*;
+
+public class FunctionDescBuilder {
--- End diff --

I think it was just inconvenient and I didn't consider about Proto.Builder.
It'd be better to use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-194132955
  
Hi @eminency, I left some comments.
In addition, I found the below error when I start a tajo cluster. I put the 
hive udf jar file on hdfs and specified it in tajo-site.xml.

```
2016-03-09 14:34:06,613 FATAL org.apache.tajo.master.TajoMaster: Error 
starting TajoMaster
org.apache.tajo.exception.TajoInternalError: internal error: unknown 
protocol: hdfs
at 
org.apache.tajo.engine.function.hiveudf.HiveFunctionLoader.loadHiveUDFs(HiveFunctionLoader.java:70)
at 
org.apache.tajo.master.TajoMaster.loadFunctions(TajoMaster.java:229)
at 
org.apache.tajo.master.TajoMaster.serviceInit(TajoMaster.java:181)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.tajo.master.TajoMaster.main(TajoMaster.java:628)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55476149
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+  private Writable [] params;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+params = new Writable[desc.getParamTypes().length];
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), 
udfclass);
+} catch (ClassNotFoundException e) {
+  throw new TajoInternalError(e);
+}
+  }
+
+  private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, 
Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
+Class [] methodParamTypes = m.getParameterTypes();
+if (checkParamTypes(methodParamTypes, paramTypes)) {
+  return m;
+}
+  }
+}
+
+return null;
--- End diff --

If 'evaluate' method is not found, an exception should be thrown. How about 
adding a new exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55475892
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+  private Writable [] params;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+params = new Writable[desc.getParamTypes().length];
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), 
udfclass);
+} catch (ClassNotFoundException e) {
+  throw new TajoInternalError(e);
+}
+  }
+
+  private Method getEvaluateMethod(TajoDataTypes.DataType [] paramTypes, 
Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
--- End diff --

Class.getMethod(String name, Class... parameterTypes) can simplify this 
for loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55475795
  
--- Diff: tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java ---
@@ -215,12 +216,48 @@ public void serviceInit(Configuration conf) throws 
Exception {
 LOG.info("Tajo Master is initialized.");
   }
 
+  private Collection loadFunctions() throws IOException, 
AmbiguousFunctionException {
+List functionList = new 
ArrayList<>(FunctionLoader.loadBuiltinFunctions().values());
+
+HashMap funcSet = new HashMap<>();
+
+for (FunctionDesc desc: functionList) {
+  funcSet.put(desc.hashCodeWithoutType(), desc);
+}
+
+
checkUDFduplicateAndMerge(FunctionLoader.loadUserDefinedFunctions(systemConf), 
funcSet, functionList);
+checkUDFduplicateAndMerge(HiveFunctionLoader.loadHiveUDFs(systemConf), 
funcSet, functionList);
+
+return functionList;
+  }
+
+  /**
+   * Checks duplicates between pre-loaded functions and UDFs. And they are 
meged to funcList.
+   *
+   * @param udfs UDF list
+   * @param funcSet set for pre-loaded functions to match signature
+   * @param funcList list to be merged
+   * @throws AmbiguousFunctionException
+   */
+  private void checkUDFduplicateAndMerge(Optional 
udfs, HashMap funcSet, List funcList)
--- End diff --

It would be better if you add a test for this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55475465
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,161 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Optional loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  Path udfPath = new Path(udfdir);
+  FileSystem fs = udfPath.getFileSystem(conf);
+
+  if (!fs.isDirectory(udfPath)) {
+return Optional.empty();
+  }
+
+  // loop each jar file
+  for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> 
path.getName().endsWith(".jar"))) {
+
+URL[] urls = new URL[]{new URL("jar:" + 
fstatus.getPath().toUri().toURL() + "!/")};
+
+// extract and register UDF's decendants (legacy Hive UDF form)
+Set udfClasses = 
getSubclassesFromJarEntry(urls, UDF.class);
+if (udfClasses != null) {
+  buildFunctionsFromUDF(udfClasses, funcList, 
"jar:"+urls[0].getPath());
+}
+  }
+} catch (IOException e) {
+  throw new TajoInternalError(e);
+}
+
+return Optional.of(funcList);
+  }
+
+  private static  Set 
getSubclassesFromJarEntry(URL[] urls, Class targetCls) {
+Reflections refl = new Reflections(new ConfigurationBuilder().
+setUrls(urls).
+addClassLoader(new URLClassLoader(urls)));
+
+return refl.getSubTypesOf(targetCls);
+  }
+
+  static void buildFunctionsFromUDF(Set classes, 
List list, String jarurl) {
+for (Class clazz: classes) {
+  String [] names;
+  String value = null, extended = null;
+
+  Description desc = clazz.getAnnotation(Description.class);
+
+  // Check @Description annotation (if exists)
+  if (desc != null) {
+names = desc.name().split(",");
+for (int i=0; i

[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55474452
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,161 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Optional loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  Path udfPath = new Path(udfdir);
+  FileSystem fs = udfPath.getFileSystem(conf);
+
+  if (!fs.isDirectory(udfPath)) {
+return Optional.empty();
+  }
+
+  // loop each jar file
+  for (FileStatus fstatus : fs.listStatus(udfPath, (Path path) -> 
path.getName().endsWith(".jar"))) {
+
+URL[] urls = new URL[]{new URL("jar:" + 
fstatus.getPath().toUri().toURL() + "!/")};
+
+// extract and register UDF's decendants (legacy Hive UDF form)
+Set udfClasses = 
getSubclassesFromJarEntry(urls, UDF.class);
--- End diff --

Since getSubclassesFromJarEntry() can take the array of urls, it would be 
better to pull the getSubclassesFromJarEntry() and buildFunctionsFromUDF() 
calls out from the for loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55474019
  
--- Diff: 
tajo-core-tests/src/test/java/org/apache/tajo/engine/function/hiveudf/TestHiveFunctionLoader.java
 ---
@@ -0,0 +1,149 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.tajo.TajoTestingCluster;
+import org.apache.tajo.catalog.CatalogService;
+import org.apache.tajo.catalog.CatalogUtil;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.client.TajoClient;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+import java.net.URL;
+import java.sql.ResultSet;
+import java.util.*;
+
+public class TestHiveFunctionLoader {
--- End diff --

IMO, it would be better to write this test class using ExprTestBase. Please 
refer to TestPythonFunctions.
In addition, we need to separate tests for HiveFunctionLoader and hive udf 
execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55473420
  
--- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
@@ -309,6 +309,9 @@ public static int setDateOrder(int dateOrder) {
 PYTHON_CODE_DIR("tajo.function.python.code-dir", ""),
 PYTHON_CONTROLLER_LOG_DIR("tajo.function.python.controller.log-dir", 
""),
 
+// HIVE UDF
+HIVE_UDF_DIR("hive.udf.dir", "./lib/hiveudf"),
--- End diff --

How about changing to tajo.function.hive.code-dir?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55472456
  
--- Diff: 
tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java ---
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.util;
+
+import org.apache.hadoop.hive.serde2.io.DateWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampWritable;
+import org.apache.hadoop.io.*;
+import org.apache.tajo.common.TajoDataTypes.Type;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.datum.*;
+import org.apache.tajo.exception.NotImplementedException;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.exception.UnsupportedDataTypeException;
+import org.apache.tajo.util.datetime.DateTimeConstants;
+import org.apache.tajo.util.datetime.DateTimeUtil;
+import org.reflections.ReflectionUtils;
+
+import java.sql.Timestamp;
+import java.util.Set;
+
+public class WritableTypeConverter {
+
+  public static DataType convertWritableToTajoType(Class writableClass) {
+if (writableClass == null)
+  return null;
+
+DataType.Builder builder = DataType.newBuilder();
+Set parents = 
ReflectionUtils.getAllSuperTypes(writableClass);
+
+if (writableClass == ByteWritable.class || 
parents.contains(ByteWritable.class)) {
+  return builder.setType(Type.INT1).build();
+}
+if (writableClass == ShortWritable.class || 
parents.contains(ShortWritable.class)) {
+  return builder.setType(Type.INT2).build();
+}
+if (writableClass == IntWritable.class || 
parents.contains(IntWritable.class)) {
+  return builder.setType(Type.INT4).build();
+}
+if (writableClass == LongWritable.class || 
parents.contains(LongWritable.class)) {
+  return builder.setType(Type.INT8).build();
+}
+if (writableClass == Text.class || parents.contains(Text.class)) {
+  return builder.setType(Type.TEXT).build();
+}
+if (writableClass == FloatWritable.class || 
parents.contains(FloatWritable.class)) {
+  return builder.setType(Type.FLOAT4).build();
+}
+if (writableClass == DoubleWritable.class || 
parents.contains(DoubleWritable.class)) {
+  return builder.setType(Type.FLOAT8).build();
+}
+if (writableClass == DateWritable.class || 
parents.contains(DateWritable.class)) {
+  return builder.setType(Type.DATE).build();
+}
+if (writableClass == TimestampWritable.class || 
parents.contains(TimestampWritable.class)) {
+  return builder.setType(Type.TIMESTAMP).build();
+}
+if (writableClass == BytesWritable.class || 
parents.contains(BytesWritable.class)) {
+  return builder.setType(Type.VARBINARY).build();
+}
+
+throw new TajoRuntimeException(new 
NotImplementedException(writableClass.getSimpleName()));
+  }
+
+  public static Writable convertDatum2Writable(Datum value) {
--- End diff --

Char type is missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55472450
  
--- Diff: 
tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java ---
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.util;
+
+import org.apache.hadoop.hive.serde2.io.DateWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampWritable;
+import org.apache.hadoop.io.*;
+import org.apache.tajo.common.TajoDataTypes.Type;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.datum.*;
+import org.apache.tajo.exception.NotImplementedException;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.exception.UnsupportedDataTypeException;
+import org.apache.tajo.util.datetime.DateTimeConstants;
+import org.apache.tajo.util.datetime.DateTimeUtil;
+import org.reflections.ReflectionUtils;
+
+import java.sql.Timestamp;
+import java.util.Set;
+
+public class WritableTypeConverter {
+
+  public static DataType convertWritableToTajoType(Class writableClass) {
--- End diff --

Char type is missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55472204
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,161 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Optional loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  Path udfPath = new Path(udfdir);
+  FileSystem fs = udfPath.getFileSystem(conf);
+
+  if (!fs.isDirectory(udfPath)) {
--- End diff --

It would be good if we write some logs to let users know what's the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55472123
  
--- Diff: tajo-common/pom.xml ---
@@ -245,6 +249,12 @@
   powermock-api-mockito
   test
 
+
+  org.apache.hive
--- End diff --

This can increase the size of the tajo client. How about moving it to other 
modules like tajo-core and tajo-plan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-08 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55471853
  
--- Diff: 
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/FunctionDescBuilder.java
 ---
@@ -0,0 +1,105 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.catalog;
+
+import org.apache.tajo.catalog.proto.CatalogProtos.FunctionType;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.function.*;
+
+public class FunctionDescBuilder {
--- End diff --

It is seemingly similar to FunctionDescProto.Builder. Would you have any 
reason for adding a new class instead of using PB's builder?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-06 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-193074302
  
Sorry for late review. I left some comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-06 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55159105
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,112 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+  private Writable [] params;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+params = new Writable[desc.getParamTypes().length];
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(functionDesc.getParamTypes(), 
udfclass);
+} catch (ClassNotFoundException e) {
+  e.printStackTrace();
--- End diff --

Throwing exception is required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-06 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55157997
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,161 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Collection loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  Path udfPath = new Path(udfdir);
+  FileSystem fs = udfPath.getFileSystem(conf);
+
+  if (!fs.isDirectory(udfPath)) {
+return null;
--- End diff --

How about using Optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-06 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55157850
  
--- Diff: 
tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java ---
@@ -0,0 +1,127 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.util;
+
+import org.apache.hadoop.hive.serde2.io.DateWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampWritable;
+import org.apache.hadoop.io.*;
+import org.apache.tajo.common.TajoDataTypes.Type;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.datum.*;
+import org.apache.tajo.exception.NotImplementedException;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.apache.tajo.util.datetime.DateTimeConstants;
+import org.apache.tajo.util.datetime.DateTimeUtil;
+import org.reflections.ReflectionUtils;
+
+import java.sql.Timestamp;
+import java.util.Set;
+
+public class WritableTypeConverter {
+
+  public static DataType convertWritableToTajoType(Class writableClass) {
+if (writableClass == null)
+  return null;
+
+DataType.Builder builder = DataType.newBuilder();
+Set parents = 
ReflectionUtils.getAllSuperTypes(writableClass);
+
+if (writableClass == ByteWritable.class || 
parents.contains(ByteWritable.class)) {
+  return builder.setType(Type.INT1).build();
+}
+if (writableClass == ShortWritable.class || 
parents.contains(ShortWritable.class)) {
+  return builder.setType(Type.INT2).build();
+}
+if (writableClass == IntWritable.class || 
parents.contains(IntWritable.class)) {
+  return builder.setType(Type.INT4).build();
+}
+if (writableClass == LongWritable.class || 
parents.contains(LongWritable.class)) {
+  return builder.setType(Type.INT8).build();
+}
+if (writableClass == Text.class || parents.contains(Text.class)) {
+  return builder.setType(Type.TEXT).build();
+}
+if (writableClass == FloatWritable.class || 
parents.contains(FloatWritable.class)) {
+  return builder.setType(Type.FLOAT4).build();
+}
+if (writableClass == DoubleWritable.class || 
parents.contains(DoubleWritable.class)) {
+  return builder.setType(Type.FLOAT8).build();
+}
+if (writableClass == DateWritable.class || 
parents.contains(DateWritable.class)) {
+  return builder.setType(Type.DATE).build();
+}
+if (writableClass == TimestampWritable.class || 
parents.contains(TimestampWritable.class)) {
+  return builder.setType(Type.TIMESTAMP).build();
+}
+if (writableClass == BytesWritable.class || 
parents.contains(BytesWritable.class)) {
+  return builder.setType(Type.VARBINARY).build();
+}
+
+throw new TajoRuntimeException(new 
NotImplementedException(writableClass.getSimpleName()));
+  }
+
+  public static Writable convertDatum2Writable(Datum value) {
+switch(value.type()) {
+  case INT1: return new ByteWritable(value.asByte());
+  case INT2: return new ShortWritable(value.asInt2());
+  case INT4: return new IntWritable(value.asInt4());
+  case INT8: return new LongWritable(value.asInt8());
+
+  case FLOAT4: return new FloatWritable(value.asFloat4());
+  case FLOAT8: return new DoubleWritable(value.asFloat8());
+
+  // NOTE: value should be DateDatum
+  case DATE: return new DateWritable(value.asInt4() - 
DateTimeConstants.UNIX_EPOCH_JDATE);
+
+  // NOTE: value should be TimestampDatum
+  case TIMESTAMP:
+TimestampWritable result = new TimestampWritable();
+result.setTime(DateTimeUtil.julianTimeToJavaTime(value.asInt8()));
+return result;
+
+  case TEXT: return new Text(value.asChars());
+  case VARBINARY: return new BytesWritable(value.asByteArray());
+}
+
+throw new 

[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-03-06 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r55157868
  
--- Diff: 
tajo-common/src/test/java/org/apache/tajo/util/TestWritableTypeConverter.java 
---
@@ -0,0 +1,65 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.tajo.util;
+
+import org.apache.hadoop.hive.serde2.io.DateWritable;
+import org.apache.hadoop.hive.serde2.io.TimestampWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.tajo.datum.DateDatum;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.datum.TimestampDatum;
+import org.apache.tajo.util.datetime.DateTimeUtil;
+import org.apache.tajo.util.datetime.TimeMeta;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+public class TestWritableTypeConverter {
--- End diff --

It would be better if you add some negative tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-24 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-188273376
  
@jihoonson 
I applied most of your suggestions.
Could you check them again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-16 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r53128977
  
--- Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java ---
@@ -200,6 +200,7 @@ public void serviceInit(Configuration conf) throws 
Exception {
 historyReader = new HistoryReader(workerContext.getWorkerName(), 
this.systemConf);
 
 FunctionLoader.loadUserDefinedFunctions(systemConf);
+HiveFunctionLoader.loadHiveUDFs(systemConf);
--- End diff --

If any duplicate exists, it is already designed to raise 
AmbiguousFunctionException.
So this point might be skipped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-11 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r52709548
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(udfclass);
+} catch (ClassNotFoundException e) {
+  e.printStackTrace();
+}
+  }
+
+  private Method getEvaluateMethod(Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
+return m;
+  }
+}
+
+return null;
+  }
+
+  @Override
+  public Datum eval(Tuple tuple) {
+Datum resultDatum;
+Writable [] params = new Writable[tuple.size()];
--- End diff --

Good idea!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-179742455
  
Hi @eminency, thanks for your work. It seems good overall.
I left some comments. In addition, would you briefly evaluate the 
performance of Hive UDF call? I wonder how large its overhead is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r51850151
  
--- Diff: 
tajo-common/src/main/java/org/apache/tajo/util/WritableTypeConverter.java ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.util;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.common.TajoDataTypes.Type;
+import org.apache.tajo.common.TajoDataTypes.DataType;
+import org.apache.tajo.datum.*;
+import org.apache.tajo.exception.NotImplementedException;
+import org.apache.tajo.exception.TajoRuntimeException;
+import org.reflections.ReflectionUtils;
+
+import java.util.Set;
+
+public class WritableTypeConverter {
+
+  public static DataType convertWritableToTajoType(Class writableClass) {
+if (writableClass == null)
+  return null;
+
+DataType.Builder builder = DataType.newBuilder();
+Set parents = 
ReflectionUtils.getAllSuperTypes(writableClass);
+
+if (writableClass == ByteWritable.class || 
parents.contains(ByteWritable.class)) {
+  return builder.setType(Type.INT1).build();
+}
+if (writableClass == ShortWritable.class || 
parents.contains(ShortWritable.class)) {
+  return builder.setType(Type.INT2).build();
+}
+if (writableClass == IntWritable.class || 
parents.contains(IntWritable.class)) {
+  return builder.setType(Type.INT4).build();
+}
+if (writableClass == LongWritable.class || 
parents.contains(LongWritable.class)) {
+  return builder.setType(Type.INT8).build();
+}
+if (writableClass == Text.class || parents.contains(Text.class)) {
+  return builder.setType(Type.TEXT).build();
+}
+if (writableClass == FloatWritable.class || 
parents.contains(FloatWritable.class)) {
+  return builder.setType(Type.FLOAT4).build();
+}
+if (writableClass == DoubleWritable.class || 
parents.contains(DoubleWritable.class)) {
+  return builder.setType(Type.FLOAT8).build();
+}
+if (writableClass == BytesWritable.class || 
parents.contains(BytesWritable.class)) {
+  return builder.setType(Type.VARBINARY).build();
+}
+
+throw new TajoRuntimeException(new 
NotImplementedException(writableClass.getSimpleName()));
+  }
+
+  public static Writable convertDatum2Writable(Datum value) {
+switch(value.type()) {
+  case INT1: return new ByteWritable(value.asByte());
+  case INT2: return new ShortWritable(value.asInt2());
+  case INT4: return new IntWritable(value.asInt4());
+  case INT8: return new LongWritable(value.asInt8());
+
+  case FLOAT4: return new FloatWritable(value.asFloat4());
+  case FLOAT8: return new DoubleWritable(value.asFloat8());
+
+  case TEXT: return new Text(value.asChars());
+  case VARBINARY: return new BytesWritable(value.asByteArray());
--- End diff --

We need to support date, time, timestamp, and boolean types as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r51849874
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(udfclass);
+} catch (ClassNotFoundException e) {
+  e.printStackTrace();
+}
+  }
+
+  private Method getEvaluateMethod(Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
+return m;
+  }
+}
+
+return null;
+  }
+
+  @Override
+  public Datum eval(Tuple tuple) {
+Datum resultDatum;
+Writable [] params = new Writable[tuple.size()];
--- End diff --

How about making this ```params``` as a member variable? Its contents can 
be overwritten per ```eval()``` call instead of creating new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r51850040
  
--- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/function/hiveudf/HiveFunctionLoader.java
 ---
@@ -0,0 +1,154 @@
+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.engine.function.hiveudf;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.catalog.FunctionDescBuilder;
+import org.apache.tajo.catalog.proto.CatalogProtos;
+import org.apache.tajo.common.TajoDataTypes;
+import org.apache.tajo.conf.TajoConf;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.util.WritableTypeConverter;
+import org.reflections.Reflections;
+import org.reflections.util.ConfigurationBuilder;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.*;
+
+public class HiveFunctionLoader {
+  public static Collection loadHiveUDFs(TajoConf conf) {
+ArrayList funcList = new ArrayList<>();
+String udfdir = conf.getVar(TajoConf.ConfVars.HIVE_UDF_DIR);
+
+try {
+  FileSystem localFS = FileSystem.getLocal(conf);
--- End diff --

Hive basically assumes that the Jar files are stored on HDFS. We also need 
to support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r51850654
  
--- Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java ---
@@ -200,6 +200,7 @@ public void serviceInit(Configuration conf) throws 
Exception {
 historyReader = new HistoryReader(workerContext.getWorkerName(), 
this.systemConf);
 
 FunctionLoader.loadUserDefinedFunctions(systemConf);
+HiveFunctionLoader.loadHiveUDFs(systemConf);
--- End diff --

It seems that TajoWorker also needs to check duplicated functions as 
TajoMaster. In addition, it will be better if the routine to load functions is 
consistent no matter where it is called from.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/929#issuecomment-179781185
  
I missed many things... I will make the code handle them soon.
UDF call test also will be done ASAP.
Thanks @jihoonson .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-02-04 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/929#discussion_r51849575
  
--- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/function/HiveFunctionInvoke.java 
---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tajo.plan.function;
+
+import org.apache.hadoop.io.*;
+import org.apache.tajo.catalog.FunctionDesc;
+import org.apache.tajo.datum.Datum;
+import org.apache.tajo.exception.TajoInternalError;
+import org.apache.tajo.function.UDFInvocationDesc;
+import org.apache.tajo.storage.Tuple;
+import org.apache.tajo.util.WritableTypeConverter;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.net.URL;
+import java.net.URLClassLoader;
+
+public class HiveFunctionInvoke extends FunctionInvoke implements 
Cloneable {
+  private Object instance = null;
+  private Method evalMethod = null;
+
+  public HiveFunctionInvoke(FunctionDesc desc) {
+super(desc);
+  }
+
+  @Override
+  public void init(FunctionInvokeContext context) throws IOException {
+UDFInvocationDesc udfDesc = functionDesc.getInvocation().getUDF();
+
+URL [] urls = new URL [] { new URL(udfDesc.getPath()) };
+URLClassLoader loader = new URLClassLoader(urls);
+
+try {
+  Class udfclass = loader.loadClass(udfDesc.getName());
+  evalMethod = getEvaluateMethod(udfclass);
+} catch (ClassNotFoundException e) {
+  e.printStackTrace();
+}
+  }
+
+  private Method getEvaluateMethod(Class clazz) {
+Constructor constructor = clazz.getConstructors()[0];
+
+try {
+  instance = constructor.newInstance();
+} catch 
(InstantiationException|IllegalAccessException|InvocationTargetException e) {
+  throw new TajoInternalError(e);
+}
+
+for (Method m: clazz.getMethods()) {
+  if (m.getName().equals("evaluate")) {
+return m;
--- End diff --

If there are several functions of the same name which accept different 
arguments, this looks to return a wrong function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1686: Allow Tajo to use Hive UDF

2016-01-06 Thread eminency
GitHub user eminency opened a pull request:

https://github.com/apache/tajo/pull/929

TAJO-1686: Allow Tajo to use Hive UDF

This patch consists of parts below.

 Core Logic 
1. **TajoMaster.java / HiveFunctionLoader.java** : in charge of finding 
Hive UDFs and registering them.
2. **HiveFunctionInvoke.java** : to call Hive UDF

 Utility
1. **WritableTypeConverter.java** : utility class, to convert from Writable 
type to Datum and vice versa.
2. **FunctionDescBuilder.java** : helper/builder to build FunctionDesc more 
conveniently.

and others.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eminency/tajo hive_udf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tajo/pull/929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #929


commit 8e2cafb21f6d149dc3bec7a3c7a845d54e678672
Author: Jongyoung Park 
Date:   2015-10-23T08:21:31Z

base

commit 070f3610442199edaa55e7f76c5343c549a66a01
Author: Jongyoung Park 
Date:   2015-10-27T07:41:41Z

Method name refactoring

commit 7efc29eefe6d78837cc6a817d1f1c68d0f550fce
Author: Jongyoung Park 
Date:   2015-10-27T07:55:08Z

Remove a simple method

commit 3a803f19a7758b18d5a592a4f44d95c071cf6f14
Author: Jongyoung Park 
Date:   2015-10-29T03:21:41Z

get UDF classes

commit 0fdfa0903f5d4922bf5d1c484013ec079770dede
Author: Jongyoung Park 
Date:   2015-10-29T03:22:07Z

refactor for Java 8

commit 1cd0d5a04b88d6fc2b7bf11f7c85089a7eef375b
Author: Jongyoung Park 
Date:   2015-10-29T03:23:03Z

test for local

commit 98f65a6001b4619e12ee34bc95a13c367b5ad1a9
Author: Jongyoung Park 
Date:   2015-10-29T07:11:12Z

name and deterministic

commit 6a69fdd2eab72380535e5b959e26ced1b55b811a
Author: Jongyoung Park 
Date:   2015-10-29T07:11:28Z

variable name refactoring

commit a0fe3b21114ab67f710585485891ac787782e964
Author: Jongyoung Park 
Date:   2015-10-29T07:29:55Z

remove useless enum

commit 6dcd24b537a0b245a079ef8db1c411a0a7563b19
Author: Jongyoung Park 
Date:   2015-10-29T07:31:32Z

Revert "remove useless enum"

This reverts commit a8739670ffb688b3d2c2ea5b8e8422d21059cf92.

commit 055a6db84feeb185c279b1b41e61190ede0ca155
Author: Jongyoung Park 
Date:   2015-10-29T07:32:30Z

Remove useless enum

commit 64899ae7bdb6b2dad951c3d98d9282ecc83bd4cc
Author: Jongyoung Park 
Date:   2015-11-09T02:58:20Z

Weird variable name changed

commit 935cd2891d6810d466c12838cf0498f94eb47bb0
Author: Jongyoung Park 
Date:   2015-11-09T05:32:48Z

analyzeUDF test

commit 6bc4e1eba76f242ee26c0dff2143dff819accb1b
Author: Jongyoung Park 
Date:   2015-11-09T05:33:08Z

test code missed

commit e2c0d719fa5c707fa08775b756aff45eb7cd19c6
Author: Jongyoung Park 
Date:   2015-11-09T06:03:31Z

License added

commit e675e15067fbf1effdafb3bfa0fc76da895f6a36
Author: Jongyoung Park 
Date:   2015-11-09T06:12:19Z

deterministic added and test modified

commit 6df6409852fd324c9d83b02fbd0d077f3ee08d27
Author: Jongyoung Park 
Date:   2015-11-09T08:40:09Z

deterministic test

commit 110c5882eba1780e323412c4339b11154ad62ea8
Author: Jongyoung Park 
Date:   2015-11-10T07:59:37Z

More test and FunctionDescBuilder added

commit 2142e3a0dca128c64c81dc22c8fe437ac60f4167
Author: Jongyoung Park 
Date:   2015-11-11T09:28:12Z

test refactoring

commit 5fd2c9187ac137fd34f0e03d8bf59d8db9cc2ef9
Author: Jongyoung Park 
Date:   2015-11-11T09:30:04Z

setup sequence refined

commit b4a7015d485e253195c29657373519aad12880fd
Author: Jongyoung Park 
Date:   2015-11-12T08:57:40Z

Method names are changed

commit 96a8c72a5f59079a8f782b861f970bf77e588e97
Author: Jongyoung Park 
Date:   2015-11-12T08:59:33Z

Test problem fixed

commit 96b9294495d27cb098147765db7841816273aa27
Author: Jongyoung Park 
Date:   2015-11-16T03:28:52Z

AmbiguousFunctionException in loading function

commit 2b43dc1dd8fff8c6aee0d4badcaae4d10abfc345
Author: Jongyoung Park 
Date:   2015-11-16T05:36:56Z

Basic finding test

commit fd69123f7e0972f584c82f8bb5ccb9e783151972
Author: Jongyoung Park 
Date:   2015-11-16T06:08:03Z

process synonyms

commit b31a9180e19018b40db9599cc3f1f7c1512c143b
Author: Jongyoung Park 
Date:   2015-11-16T06:28:24Z

process description and example

commit