[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085315#comment-16085315 ] Gheorghe Gheorghe commented on SPARK-21390: --- Thanks for looking into it, I've tested it also with spark 2.1.0 for hadoop 2.7 and same results, not working. To answer your question, yeah it can be that is only for case classes because if you do {code:java} filterCondition.contains(true) {code} it also works fine. > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gheorghe Gheorghe updated SPARK-21390: -- Description: Hello everybody, I've encountered a strange situation with the spark-shell. When I run the code below in my IDE the second test case prints as expected count "1". However, when I run the same code using the spark-shell in the second test case I get 0 back as a count. I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. {code:java} import org.apache.spark.sql.Dataset case class SomeClass(field1:String, field2:String) val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) // Test 1 val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) // Test 2 case class OtherClass(field1:String, field2:String) val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) {code} Note if I transform the dataset first I get 1 back as expected. {code:java} println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count) {code} Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) was: Hello everybody, I've encountered a strange situation with the spark-shell. When I run the code below in my IDE the second test case prints as expected count "1". However, when I run the same code using the spark-shell in the second test case I get 0 back as a count. I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. {code:java} import org.apache.spark.sql.Dataset case class SomeClass(field1:String, field2:String) val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) // Test 1 val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) // Test 2 case class OtherClass(field1:String, field2:String) val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) {code} Note if I do this it is printing 1 as expected. {code:java} println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count) {code} Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache
[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gheorghe Gheorghe updated SPARK-21390: -- Description: Hello everybody, I've encountered a strange situation with the spark-shell. When I run the code below in my IDE the second test case prints as expected count "1". However, when I run the same code using the spark-shell in the second test case I get 0 back as a count. I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. {code:java} import org.apache.spark.sql.Dataset case class SomeClass(field1:String, field2:String) val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) // Test 1 val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) // Test 2 case class OtherClass(field1:String, field2:String) val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) {code} Note if I do this it is printing 1 as expected. {code:java} println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count) {code} Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) was: Hello everybody, I've encountered a strange situation with spark 2.0.1 in spark-shell. When I run the code below in my IDE I get the in the second test case as expected 1. However, when I run the spark shell with the same code the second test case is returning 0. I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. {code:java} import org.apache.spark.sql.Dataset case class SomeClass(field1:String, field2:String) val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) // Test 1 val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) // Test 2 case class OtherClass(field1:String, field2:String) val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) {code} Note if I do this it is printing 1 as expected. {code:java} println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count) {code} Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I do this it is printing 1 as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21390) Dataset filter api inconsistency
Gheorghe Gheorghe created SPARK-21390: - Summary: Dataset filter api inconsistency Key: SPARK-21390 URL: https://issues.apache.org/jira/browse/SPARK-21390 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.1 Reporter: Gheorghe Gheorghe Priority: Minor Hello everybody, I've encountered a strange situation with spark 2.0.1 in spark-shell. When I run the code below in my IDE I get the in the second test case as expected 1. However, when I run the spark shell with the same code the second test case is returning 0. I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. {code:java} import org.apache.spark.sql.Dataset case class SomeClass(field1:String, field2:String) val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) // Test 1 val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) // Test 2 case class OtherClass(field1:String, field2:String) val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) {code} Note if I do this it is printing 1 as expected. {code:java} println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count) {code} Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org