Wes McKinney created SPARK-13946: ------------------------------------ Summary: PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions Key: SPARK-13946 URL: https://issues.apache.org/jira/browse/SPARK-13946 Project: Spark Issue Type: Bug Components: PySpark Reporter: Wes McKinney
In my opinion, this code should raise an exception rather than silently discarding the predicate: {code} import numpy as np import pandas as pd df = pd.DataFrame({'foo': np.random.randn(1000000), 'bar': np.random.randn(1000000)}) sdf = sqlContext.createDataFrame(df) sdf2 = sdf[sdf.bar > 0] sdf.agg(F.count(sdf2.foo)).show() +----------+ |count(foo)| +----------+ | 1000000| +----------+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org