[jira] [Updated] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16164: -- Assignee: Dongjoon Hyun > Filter pushdown should keep the ordering in the logical plan > > > Key: SPARK-16164 > URL: https://issues.apache.org/jira/browse/SPARK-16164 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Dongjoon Hyun > Fix For: 2.0.1, 2.1.0 > > > [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with > additional filters. It seems that during filter pushdown, we changed the > ordering in the logical plan. I'm not sure whether we should treat this as a > bug. > {code} > val df1 = (0 until 3).map(_.toString).toDF > val indexer = new StringIndexer() > .setInputCol("value") > .setOutputCol("idx") > .setHandleInvalid("skip") > .fit(df1) > val df2 = (0 until 5).map(_.toString).toDF > val predictions = indexer.transform(df2) > predictions.show() // this is okay > predictions.where('idx > 2).show() // this will throw an exception > {code} > Please see the notebook at > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html > for error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16164: -- Description: [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug. {code} val df1 = (0 until 3).map(_.toString).toDF val indexer = new StringIndexer() .setInputCol("value") .setOutputCol("idx") .setHandleInvalid("skip") .fit(df1) val df2 = (0 until 5).map(_.toString).toDF val predictions = indexer.transform(df2) predictions.show() // this is okay predictions.where('idx > 2).show() // this will throw an exception {code} Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html for error messages. was: [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug. {code} val df1 = (0 until 3).map(_.toString).toDF val indexer = new StringIndexer() .setInputCol("value") .setOutputCol("idx") .setHandleInvalid("skip") .fit(df1) val df2 = (0 until 5).map(_.toString).toDF val predictions = indexer.transform(df2) predictions.where('idx > 2).show() {code} Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html for error messages. > Filter pushdown should keep the ordering in the logical plan > > > Key: SPARK-16164 > URL: https://issues.apache.org/jira/browse/SPARK-16164 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng > > [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with > additional filters. It seems that during filter pushdown, we changed the > ordering in the logical plan. I'm not sure whether we should treat this as a > bug. > {code} > val df1 = (0 until 3).map(_.toString).toDF > val indexer = new StringIndexer() > .setInputCol("value") > .setOutputCol("idx") > .setHandleInvalid("skip") > .fit(df1) > val df2 = (0 until 5).map(_.toString).toDF > val predictions = indexer.transform(df2) > predictions.show() // this is okay > predictions.where('idx > 2).show() // this will throw an exception > {code} > Please see the notebook at > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html > for error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16164: -- Issue Type: Bug (was: Improvement) > Filter pushdown should keep the ordering in the logical plan > > > Key: SPARK-16164 > URL: https://issues.apache.org/jira/browse/SPARK-16164 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng > > [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with > additional filters. It seems that during filter pushdown, we changed the > ordering in the logical plan. I'm not sure whether we should treat this as a > bug. > {code} > val df1 = (0 until 3).map(_.toString).toDF > val indexer = new StringIndexer() > .setInputCol("value") > .setOutputCol("idx") > .setHandleInvalid("skip") > .fit(df1) > val df2 = (0 until 5).map(_.toString).toDF > val predictions = indexer.transform(df2) > predictions.where('idx > 2).show() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan
[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16164: -- Description: [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug. {code} val df1 = (0 until 3).map(_.toString).toDF val indexer = new StringIndexer() .setInputCol("value") .setOutputCol("idx") .setHandleInvalid("skip") .fit(df1) val df2 = (0 until 5).map(_.toString).toDF val predictions = indexer.transform(df2) predictions.where('idx > 2).show() {code} Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html for error messages. was: [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug. {code} val df1 = (0 until 3).map(_.toString).toDF val indexer = new StringIndexer() .setInputCol("value") .setOutputCol("idx") .setHandleInvalid("skip") .fit(df1) val df2 = (0 until 5).map(_.toString).toDF val predictions = indexer.transform(df2) predictions.where('idx > 2).show() {code} > Filter pushdown should keep the ordering in the logical plan > > > Key: SPARK-16164 > URL: https://issues.apache.org/jira/browse/SPARK-16164 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng > > [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with > additional filters. It seems that during filter pushdown, we changed the > ordering in the logical plan. I'm not sure whether we should treat this as a > bug. > {code} > val df1 = (0 until 3).map(_.toString).toDF > val indexer = new StringIndexer() > .setInputCol("value") > .setOutputCol("idx") > .setHandleInvalid("skip") > .fit(df1) > val df2 = (0 until 5).map(_.toString).toDF > val predictions = indexer.transform(df2) > predictions.where('idx > 2).show() > {code} > Please see the notebook at > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html > for error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org