Xiangrui Meng created SPARK-16164:
-------------------------------------

             Summary: Filter pushdown should keep the ordering in the logical 
plan
                 Key: SPARK-16164
                 URL: https://issues.apache.org/jira/browse/SPARK-16164
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Xiangrui Meng


[~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with 
additional filters. It seems that during filter pushdown, we changed the 
ordering in the logical plan. I'm not sure whether we should treat this as a 
bug.

{code}
val df1 = (0 until 3).map(_.toString).toDF
val indexer = new StringIndexer()
  .setInputCol("value")
  .setOutputCol("idx")
  .setHandleInvalid("skip")
  .fit(df1)
val df2 = (0 until 5).map(_.toString).toDF
val predictions = indexer.transform(df2)
predictions.where('idx > 2).show()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to