Re: Date format - any easier way

Philip Tromans Tue, 15 May 2012 08:24:30 -0700

I knocked up the following when we were experimenting with Hive. I've been
meaning to go and tidy it up for a while, but using it with a separator of
"" (empty string) should have the desired effect. (Obviously the UDF throws
an exception if the array is empty, been meaning to fix that for a while...)


Cheers,

Phil.

import java.util.List;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
import
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

@Description(name = "implode", value = "_FUNC_(list,separator) - joins the
elements of list together, separated by the given string."
+ " Returns a string. [Add an example here]")
public class GenericUDFImplode extends GenericUDF {
private ListObjectInspector listOI = null;
private PrimitiveObjectInspector glueOI = null;

@Override
public Object evaluate(DeferredObject[] params) throws HiveException {
return join(listOI.getList(params[0].get()),
glueOI.getPrimitiveJavaObject(params[1].get()).toString());
}

private String join(List<?> list, String separator) {
if (list == null) {
return null;
}
if (list.size() == 0) {
return "";
}

StringBuffer buf = new StringBuffer();

for (Object o : list) {
buf.append(o);
buf.append(separator);
}

return buf.substring(0, buf.length() - separator.length());
}

@Override
public String getDisplayString(String[] args) {
return "implode(" + args[0] + "," + args[1] + ")";
}

@Override
public ObjectInspector initialize(ObjectInspector[] params)
throws UDFArgumentException {
if (params[0].getCategory() != Category.LIST) {
throw new UDFArgumentException("Expected: List as argument 1 to implode()");
}
if (params[1].getCategory() != Category.PRIMITIVE) {
throw new UDFArgumentException("Expected: Primitive as argument 2 to
implode()");
}

listOI = (ListObjectInspector) params[0];
glueOI = (PrimitiveObjectInspector) params[1];

return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
}

}


On 15 May 2012 15:33, Nitin Pawar <nitinpawar...@gmail.com> wrote:

> I will write an UDF for array concatenation and upload on GIT if anyone
> does not have it already
>
>
> On Tue, May 15, 2012 at 7:24 PM, Zoltán Tóth-Czifra <
> zoltan.tothczi...@softonic.com> wrote:
>
>>  Matt, thanks!
>>
>>  Luckily the order of the parts of the date is correct (reordering them
>> would bet he same craziness).
>>
>>  Finally it is:
>>
>>  regexp_replace(
>> date_sub(
>> to_date(
>> from_unixtime(
>> unix_timestamp()
>> )
>> ), 1
>> ), "[-]", ""
>> )
>>
>>  Nitin, concat apparently doesn't take arrays, and I did not find any
>> other way to join arrays in HQL. However, it would be very handy.
>>
>>  Thanks guys!
>>
>>  ------------------------------
>> *From:* Tucker, Matt [matt.tuc...@disney.com]
>> *Sent:* Tuesday, May 15, 2012 3:33 PM
>>
>> *To:* user@hive.apache.org
>> *Subject:* RE: Date format - any easier way
>>
>>   What about wrapping it in regexp_replace(…, “[-]”, “”) ?  It may not
>> be the cleanest, but I’d recommend passing variables from the shell :)
>>
>>
>>
>> Matt Tucker
>>
>>
>>
>> *From:* Zoltán Tóth-Czifra [mailto:zoltan.tothczi...@softonic.com]
>> *Sent:* Tuesday, May 15, 2012 9:27 AM
>> *To:* user@hive.apache.org
>> *Subject:* RE: Date format - any easier way
>>
>>
>>
>> Nitin,
>>
>>
>>
>> Thank you. As you see below I know and use this function. My problem is
>> that it doesn't give YYYYMMDD format, but YYYY-MM-DD instead, and
>> formatting is not trivial as you can see it too.
>>
>>
>>  ------------------------------
>>
>> *From:* Nitin Pawar [nitinpawar...@gmail.com]
>> *Sent:* Tuesday, May 15, 2012 3:24 PM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Date format - any easier way
>>
>> you may want to have a look at this function
>>
>>
>>
>> date_sub(string startdate, int days)
>>
>> Subtract a number of days to startdate: date_sub('2008-12-31', 1) =
>> '2008-12-30'
>>
>>
>>
>> On Tue, May 15, 2012 at 6:41 PM, Zoltán Tóth-Czifra <
>> zoltan.tothczi...@softonic.com> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Thanks you very much in advance for your help.
>>
>>
>>
>> My problem in short is getting the date for yesterday in a YYYYMMDD
>> format. As I use this format for partitions, I need this format in quite
>> some queries.
>>
>>
>>
>> So far I have this:
>>
>>
>>
>> concat(
>>
>> year( date_sub( to_date( from_unixtime( unix_timestamp() ) ), 1 ) ),
>>
>> CASE
>>
>> WHEN month( date_sub( to_date( from_unixtime( unix_timestamp() ) ), 1 ) )
>> < 10
>>
>> THEN concat( '0', month( date_sub( to_date( from_unixtime(
>> unix_timestamp() ) ), 1 ) ) )
>>
>> ELSE trim( month( date_sub( to_date( from_unixtime( unix_timestamp() ) ),
>> 1 ) ) )
>>
>> END,
>>
>> CASE
>>
>> WHEN day( date_sub( to_date( from_unixtime( unix_timestamp() ) ), 1 ) ) <
>> 10
>>
>> THEN concat( '0', day( date_sub( to_date( from_unixtime( unix_timestamp()
>> ) ), 1 ) ) )
>>
>> ELSE trim(day( date_sub( to_date( from_unixtime( unix_timestamp() ) ), 1
>> ) ) )
>>
>> END
>>
>> );
>>
>>
>>
>>
>>
>> ...but it seems to be a bit crazy, especially if you have to repeat it in
>> hundreds of queries. Is there any other (better) way to get this format
>> from yesterday? - there has to be. As I can't use local user variables nor
>> macros whatsoever, I need to repeat myself a lot here. If there is no other
>> way, probably I need to change my partitions.
>>
>>
>>
>> Any ideas are appreciated. Thank you!
>>
>>
>>
>> Zoltan
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>
>

Re: Date format - any easier way

Reply via email to