Not sure if there is a function for that but I wrote a UDF to do so - https://github.com/chandeepsingh/Hive-UDFs <https://github.com/chandeepsingh/Hive-UDFs>
hive> ADD JAR hive-udfs-1.0-uber.jar; Added [hive-udfs-1.0-uber.jar] to class path Added resources: [hive-udfs-1.0-uber.jar] hive> CREATE TEMPORARY FUNCTION array_dedup AS 'com.hive.udfs.UdfArrayDeDup'; OK Time taken: 0.015 seconds hive> SELECT array_dedup(array("blah","blah","blah")) from table1 limit 1; OK ["blah"] Time taken: 0.502 seconds, Fetched: 1 row(s) Here is the code: package com.hive.udfs; /** * * @author chandeepsingh * Remove duplicates from an array */ import java.util.ArrayList; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils; import java.util.HashSet; import java.util.List; public class UdfArrayDeDup extends GenericUDF { ListObjectInspector arrayOI = null; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { arrayOI = (ListObjectInspector) arguments[0]; return ObjectInspectorUtils.getStandardObjectInspector(arrayOI); } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { List<?> myArr = (List<?>) ObjectInspectorUtils.copyToStandardObject(arguments[0].get(), arrayOI); HashSet<Object> myHashSet = new HashSet<>(); myHashSet.addAll(myArr); if (myHashSet != null) { return new ArrayList<>(myHashSet); } else { return null; } } @Override public String getDisplayString(String[] input) { return new String(); } } > On Mar 13, 2016, at 1:30 AM, Rex X <dnsr...@gmail.com> wrote: > > For the first question, is there any way to use "set" instead of an "array" > to dedupe all elements? > > "select array(1,1)" will return "[1,1]", not "[1]". > > > > On Sat, Mar 12, 2016 at 5:26 PM, Rex X <dnsr...@gmail.com > <mailto:dnsr...@gmail.com>> wrote: > Thank you, Chandeep. Yes, my first problem solved. > How about the second one? Is there any way to append an element to an > existing array? > > > > On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh <c...@chandeep.com > <mailto:c...@chandeep.com>> wrote: > If you only want the array while you’re querying table1 your example should > work. If you want to add AB to the table you’ll probably need to create a new > table by selecting everything you need from table1. > > hive> select * from table1 limit 1; > OK > temp1 temp2 temp3 > > hive> select f1, array(f2, f3) AS AB from table1 limit 1; > OK > temp1 [“temp2”,"temp3"] > > >> On Mar 13, 2016, at 12:33 AM, Rex X <dnsr...@gmail.com >> <mailto:dnsr...@gmail.com>> wrote: >> >> How to make the following work? >> >> 1. combine columns A and B to make one array as a new column AB. Both column >> A and B are string types. >> >> select >> string_columnA, >> string_columnB, >> array(string_columnA, string_columnB) as AB >> from Table1; >> >> 2. append columnA to an existing array-type column B >> >> select >> string_columnA, >> array_columnB, >> array_flatmerge(string_columnA, array_columnB) as AB >> from Table2; >> >> In fact, I should say "set" instead of "array" above, since I expect no >> duplicates. >> >> Any idea? >> > > >