RE: And.eval short circuiting

2015-09-15 Thread Zack Sampson
I see. We're having problems with code like this (forgive my noob scala):

val df = Seq(("moose","ice"), (null,"fire")).toDF("animals", "elements")
df
  .filter($"animals".rlike(".*"))
  .filter(callUDF({(value: String) => value.length > 2}, BooleanType, 
$"animals"))
.collect()

This code throws a NPE because:
* Catalyst combines the filters with an AND
* the first filter passes returns null on the first input
* the second filter tries to read the length of that null

This feels weird. Reading that code, I wouldn't expect null to be passed to the 
second filter. Even weirder is that if you call collect() after the first 
filter you won't see nulls, and if you write the data to disk and reread it, 
the NPE won't happen.

It's bewildering! Is this the intended behavior?

From: Reynold Xin [r...@databricks.com]
Sent: Monday, September 14, 2015 10:14 PM
To: Zack Sampson
Cc: dev@spark.apache.org
Subject: Re: And.eval short circuiting

rxin=# select null and true;
 ?column?
--

(1 row)

rxin=# select null and false;
 ?column?
--
 f
(1 row)


null and false should return false.


On Mon, Sep 14, 2015 at 9:12 PM, Zack Sampson 
<zsamp...@palantir.com<mailto:zsamp...@palantir.com>> wrote:
It seems like And.eval can avoid calculating right.eval if left.eval returns 
null. Is there a reason it's written like it is?


override def eval(input: Row): Any = {
  val l = left.eval(input)
  if (l == false) {
false
  } else {
val r = right.eval(input)
if (r == false) {
  false
} else {
  if (l != null && r != null) {
true
  } else {
null
  }
}
  }
}



And.eval short circuiting

2015-09-14 Thread Zack Sampson
It seems like And.eval can avoid calculating right.eval if left.eval returns 
null. Is there a reason it's written like it is?


override def eval(input: Row): Any = {
  val l = left.eval(input)
  if (l == false) {
false
  } else {
val r = right.eval(input)
if (r == false) {
  false
} else {
  if (l != null && r != null) {
true
  } else {
null
  }
}
  }
}


RE: When to expect UTF8String?

2015-06-12 Thread Zack Sampson
We are using Expression for two things.

1. Custom aggregators that do map-side combine.

2. UDFs with more than 22 arguments which is not supported by ScalaUdf, and to 
avoid wrapping a Java function interface in one of 22 different Scala function 
interfaces depending on the number of parameters.

Are there methods we can use to convert to/from the internal representation in 
these cases?

From: Michael Armbrust [mich...@databricks.com]
Sent: Thursday, June 11, 2015 9:05 PM
To: Zack Sampson
Cc: dev@spark.apache.org
Subject: Re: When to expect UTF8String?

Through the DataFrame API, users should never see UTF8String.

Expression (and any class in the catalyst package) is considered internal and 
so uses the internal representation of various types.  Which type we use here 
is not stable across releases.

Is there a reason you aren't defining a UDF instead?

On Thu, Jun 11, 2015 at 8:08 PM, zsampson 
zsamp...@palantir.commailto:zsamp...@palantir.com wrote:
I'm hoping for some clarity about when to expect String vs UTF8String when
using the Java DataFrames API.

In upgrading to Spark 1.4, I'm dealing with a lot of errors where what was
once a String is now a UTF8String. The comments in the file and the related
commit message indicate that maybe it should be internal to SparkSQL's
implementation.

However, when I add a column containing a custom subclass of Expression, the
row passed to the eval method contains instances of UTF8String. Ditto for
AggregateFunction.update. Is this expected? If so, when should I generally
know to deal with UTF8String objects?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/When-to-expect-UTF8String-tp12710.htmlhttps://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_When-2Dto-2Dexpect-2DUTF8String-2Dtp12710.htmld=BQMFaQc=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8r=JTeV6BsFY8hARUm33aoIqBdzwQIcuTioZt881I11O_Mm=03dQBm7iPTCL33eIdtabOwGkj02beDizwxfaDAv1Xhss=EhYOx1s29rjLhkJfDhjQ_9QFNdw0GZ_YxaV6ZiXuqase=
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.orgmailto:dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.orgmailto:dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org