Spark SQL UDF returning a list?
Hi, Can a UDF return a list of values that can be used in a WHERE clause? Something like: sqlCtx.registerFunction(myudf, { Array(1, 2, 3) }) val sql = select doc_id, doc_value from doc_table where doc_id in myudf() This does not work: Exception in thread main java.lang.RuntimeException: [1.57] failure: ``('' expected but identifier myudf found I also tried returning a List of Ints, that did not work either. Is there a way to write a UDF that returns a list? Thanks -Jerry - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL UDF returning a list?
Hi, On Wed, Dec 3, 2014 at 4:31 PM, Jerry Raj jerry@gmail.com wrote: Exception in thread main java.lang.RuntimeException: [1.57] failure: ``('' expected but identifier myudf found I also tried returning a List of Ints, that did not work either. Is there a way to write a UDF that returns a list? You seem to be hitting a parser limitation before your function is even called. The message you are seeing is saying there must be an opening bracket here, and I am afraid you won't get around this whatever function you write... (maybe the HiveContext provides a possibility, though). Tobias
RE: Spark SQL UDF returning a list?
Yes I agree, and it may also be ambiguous in semantic. A list of objects V.S. A list with single List Object. I’ve also tested that, seems a. There is a bug in registerFunction, which doesn’t support the UDF without argument. ( I just create a PR for this: https://github.com/apache/spark/pull/3595 ) b. It expects the function return type to be immutable.Seq[XX] for List, immutable.Map[X, X] for Map, scala.Product for Struct, and only Array[Byte] for binary. The Array[_] is not supported. Cheng Hao From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: Thursday, December 4, 2014 9:05 AM To: Jerry Raj Cc: user Subject: Re: Spark SQL UDF returning a list? Hi, On Wed, Dec 3, 2014 at 4:31 PM, Jerry Raj jerry@gmail.commailto:jerry@gmail.com wrote: Exception in thread main java.lang.RuntimeException: [1.57] failure: ``('' expected but identifier myudf found I also tried returning a List of Ints, that did not work either. Is there a way to write a UDF that returns a list? You seem to be hitting a parser limitation before your function is even called. The message you are seeing is saying there must be an opening bracket here, and I am afraid you won't get around this whatever function you write... (maybe the HiveContext provides a possibility, though). Tobias