[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-13 Thread Gheorghe Gheorghe (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085315#comment-16085315
 ] 

Gheorghe Gheorghe commented on SPARK-21390:
---

Thanks for looking into it, I've tested it also with spark 2.1.0 for hadoop 2.7 
and same results, not working. 
To answer your question, yeah it can be that is only for case classes because 
if you do 
{code:java}
filterCondition.contains(true)
{code}
it also works fine.

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Gheorghe Gheorghe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gheorghe Gheorghe updated SPARK-21390:
--
Description: 
Hello everybody, 

I've encountered a strange situation with the spark-shell.
When I run the code below in my IDE the second test case prints as expected 
count "1". However, when I run the same code using the spark-shell in the 
second test case I get 0 back as a count. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and 
spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I transform the dataset first I get 1 back as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, 
x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as 
experimental 
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)

  was:
Hello everybody, 

I've encountered a strange situation with the spark-shell.
When I run the code below in my IDE the second test case prints as expected 
count "1". However, when I run the same code using the spark-shell in the 
second test case I get 0 back as a count. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and 
spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, 
x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as 
experimental 
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)


> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Gheorghe Gheorghe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gheorghe Gheorghe updated SPARK-21390:
--
Description: 
Hello everybody, 

I've encountered a strange situation with the spark-shell.
When I run the code below in my IDE the second test case prints as expected 
count "1". However, when I run the same code using the spark-shell in the 
second test case I get 0 back as a count. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and 
spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, 
x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as 
experimental 
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)

  was:
Hello everybody, 

I've encountered a strange situation with spark 2.0.1 in spark-shell. 
When I run the code below in my IDE I get the in the second test case as 
expected 1. However, when I run the spark shell with the same code the second 
test case is returning 0. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and 
spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, 
x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as 
experimental 
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)


> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I do this it is printing 1 as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Gheorghe Gheorghe (JIRA)
Gheorghe Gheorghe created SPARK-21390:
-

 Summary: Dataset filter api inconsistency
 Key: SPARK-21390
 URL: https://issues.apache.org/jira/browse/SPARK-21390
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.1
Reporter: Gheorghe Gheorghe
Priority: Minor


Hello everybody, 

I've encountered a strange situation with spark 2.0.1 in spark-shell. 
When I run the code below in my IDE I get the in the second test case as 
expected 1. However, when I run the spark shell with the same code the second 
test case is returning 0. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and 
spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, 
x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as 
experimental 
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org