[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles

2017-01-10 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816398#comment-15816398
 ] 

Jakob Odersky commented on SPARK-14280:
---

Twitter chill for scala 2.12 is finally out and I'm pleased to say that 
Spark-REPL now builds and runs on the latest version of scala without any 
snapshot dependencies.

> Update change-version.sh and pom.xml to add Scala 2.12 profiles
> ---
>
> Key: SPARK-14280
> URL: https://issues.apache.org/jira/browse/SPARK-14280
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The following instructions will be kept quasi-up-to-date and are the best 
> starting point for building a Spark snapshot with Scala 2.12.0-M4:
> * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12.
> * Install dependencies:
> ** chill: check out https://github.com/twitter/chill/pull/253 and run 
> {{sbt ++2.12.0-M4 publishLocal}}
> * Run {{./dev/change-scala-version.sh 2.12.0-M4}}
> * To compile Spark, run {{build/sbt -Dscala-2.12}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14519) Cross-publish Kafka for Scala 2.12

2016-12-12 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-14519:
--
Summary: Cross-publish Kafka for Scala 2.12  (was: Cross-publish Kafka for 
Scala 2.12.0-M4)

> Cross-publish Kafka for Scala 2.12
> --
>
> Key: SPARK-14519
> URL: https://issues.apache.org/jira/browse/SPARK-14519
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> In order to build the streaming Kafka connector, we need to publish Kafka for 
> Scala 2.12.0-M4. Someone should file an issue against the Kafka project and 
> work with their developers to figure out what will block their upgrade / 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2016-12-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736107#comment-15736107
 ] 

Jakob Odersky commented on SPARK-17647:
---

I rebased the PR and resolved the conflict. However, there is still the 
incompatibility issue with the sql ANTLR parser. I talk about it in my last 
[two comments | 
https://github.com/apache/spark/pull/15398#issuecomment-255917940 ] and propose 
a few solutions. Any feedback is welcome!

> SQL LIKE does not handle backslashes correctly
> --
>
> Key: SPARK-17647
> URL: https://issues.apache.org/jira/browse/SPARK-17647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xiangrui Meng
>  Labels: correctness
>
> Try the following in SQL shell:
> {code}
> select '' like '%\\%';
> {code}
> It returned false, which is wrong.
> cc: [~yhuai] [~joshrosen]
> A false-negative considered previously:
> {code}
> select '' rlike '.*.*';
> {code}
> It returned true, which is correct if we assume that the pattern is treated 
> as a Java string but not raw string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles

2016-12-05 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723814#comment-15723814
 ] 

Jakob Odersky commented on SPARK-14280:
---

You're welcome pull the changes back into your repo of course, however I'd also 
gladly continue on working to add 2.12 support!

Btw, how should this kind of all-or-nothing change be integrated into spark? I 
don't want to make a pull request for some half-baked feature, however I also 
feel like continuing to pile on features to this will result in a huge 
changeset, impossible to review

> Update change-version.sh and pom.xml to add Scala 2.12 profiles
> ---
>
> Key: SPARK-14280
> URL: https://issues.apache.org/jira/browse/SPARK-14280
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The following instructions will be kept quasi-up-to-date and are the best 
> starting point for building a Spark snapshot with Scala 2.12.0-M4:
> * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12.
> * Install dependencies:
> ** chill: check out https://github.com/twitter/chill/pull/253 and run 
> {{sbt ++2.12.0-M4 publishLocal}}
> * Run {{./dev/change-scala-version.sh 2.12.0-M4}}
> * To compile Spark, run {{build/sbt -Dscala-2.12}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles

2016-12-05 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723807#comment-15723807
 ] 

Jakob Odersky commented on SPARK-14280:
---

Hi [~joshrosen],

I rebased your initial work onto the latest master and upgraded dependencies. 
You can see the changes here 
https://github.com/apache/spark/compare/master...jodersky:scala-2.12

There were a few of merge conflicts, often with respect to dependency version 
mismatches. I tried to resolve conflicts cleanly, however considering that I 
also had to take into account libraries that recently built  for 2.12, it could 
be that certain of your changes in the pom.xml were lost.

There are still quite a few depedency issues for the latest scala versions, 
however core still builds :)

> Update change-version.sh and pom.xml to add Scala 2.12 profiles
> ---
>
> Key: SPARK-14280
> URL: https://issues.apache.org/jira/browse/SPARK-14280
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The following instructions will be kept quasi-up-to-date and are the best 
> starting point for building a Spark snapshot with Scala 2.12.0-M4:
> * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12.
> * Install dependencies:
> ** chill: check out https://github.com/twitter/chill/pull/253 and run 
> {{sbt ++2.12.0-M4 publishLocal}}
> * Run {{./dev/change-scala-version.sh 2.12.0-M4}}
> * To compile Spark, run {{build/sbt -Dscala-2.12}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143
 ] 

Jakob Odersky edited comment on SPARK-14222 at 11/3/16 8:33 PM:


Thanks Sean, however I realized that the dependency is in fact not yet 
published for 2.12.0 final. The package I linked is from a different org.

There's a ticket for a release here 
https://github.com/FasterXML/jackson-module-scala/pull/294


was (Author: jodersky):
Thanks Sean, however I realized that the dependency is in fact not yet 
published for 2.12.0 final. The package I linked is from a different org, oops

> Cross-publish jackson-module-scala for Scala 2.12
> -
>
> Key: SPARK-14222
> URL: https://issues.apache.org/jira/browse/SPARK-14222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In order to build Spark against Scala 2.12, we need to either remove our 
> jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. 
> Personally, I'd prefer to remove it because I don't think we make extensive 
> use of it and because I'm not a huge fan of the implicit mapping between case 
> classes and JSON wire formats (the extra verbosity required by other 
> approaches is a feature, IMO, rather than a bug because it makes it much 
> harder to accidentally break wire compatibility).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143
 ] 

Jakob Odersky edited comment on SPARK-14222 at 11/3/16 8:30 PM:


Thanks Sean, however I realized that the dependency is in fact not yet 
published for 2.12.0 final. The package I linked is from a different org, oops


was (Author: jodersky):
Thanks Sean, however I realized that the dependency is in fact not yet 
published for 2.12.0 final. The package I linked is from a different org

> Cross-publish jackson-module-scala for Scala 2.12
> -
>
> Key: SPARK-14222
> URL: https://issues.apache.org/jira/browse/SPARK-14222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In order to build Spark against Scala 2.12, we need to either remove our 
> jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. 
> Personally, I'd prefer to remove it because I don't think we make extensive 
> use of it and because I'm not a huge fan of the implicit mapping between case 
> classes and JSON wire formats (the extra verbosity required by other 
> approaches is a feature, IMO, rather than a bug because it makes it much 
> harder to accidentally break wire compatibility).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143
 ] 

Jakob Odersky commented on SPARK-14222:
---

Thanks Sean, however I realized that the dependency is in fact not yet 
published for 2.12.0 final. The package I linked is from a different org

> Cross-publish jackson-module-scala for Scala 2.12
> -
>
> Key: SPARK-14222
> URL: https://issues.apache.org/jira/browse/SPARK-14222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In order to build Spark against Scala 2.12, we need to either remove our 
> jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. 
> Personally, I'd prefer to remove it because I don't think we make extensive 
> use of it and because I'm not a huge fan of the implicit mapping between case 
> classes and JSON wire formats (the extra verbosity required by other 
> approaches is a feature, IMO, rather than a bug because it makes it much 
> harder to accidentally break wire compatibility).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634117#comment-15634117
 ] 

Jakob Odersky commented on SPARK-14222:
---

A newer version of module (vertsion 2.8.4) is available for scala 2.12 now 
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22jackson-module-scala_2.12%22. 
Can we upgrade spark's dependency (currently Spark uses 2.6.5)?

> Cross-publish jackson-module-scala for Scala 2.12
> -
>
> Key: SPARK-14222
> URL: https://issues.apache.org/jira/browse/SPARK-14222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In order to build Spark against Scala 2.12, we need to either remove our 
> jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. 
> Personally, I'd prefer to remove it because I don't think we make extensive 
> use of it and because I'm not a huge fan of the implicit mapping between case 
> classes and JSON wire formats (the extra verbosity required by other 
> approaches is a feature, IMO, rather than a bug because it makes it much 
> harder to accidentally break wire compatibility).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14220) Build and test Spark against Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634027#comment-15634027
 ] 

Jakob Odersky commented on SPARK-14220:
---

at least most dependencies will probably make 2.12 builds available, now that 
it is considered binary-stable

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Priority: Blocker
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14220) Build and test Spark against Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634027#comment-15634027
 ] 

Jakob Odersky edited comment on SPARK-14220 at 11/3/16 7:54 PM:


At least most dependencies will probably make 2.12 builds available, now that 
it is considered binary-stable. The closure cleaning and byte code manipulation 
stuff is a whole different story though...


was (Author: jodersky):
at least most dependencies will probably make 2.12 builds available, now that 
it is considered binary-stable

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Priority: Blocker
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14220) Build and test Spark against Scala 2.12

2016-11-03 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633762#comment-15633762
 ] 

Jakob Odersky commented on SPARK-14220:
---

Scala 2.12 was just officially announced :)

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Priority: Blocker
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18018) Specify alternate escape character in 'LIKE' expression

2016-10-19 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590136#comment-15590136
 ] 

Jakob Odersky commented on SPARK-18018:
---

I've started a very early prototype 
[here|https://github.com/jodersky/spark/compare/SPARK-17647...jodersky:escape].
It's still very much wip and I'm currently pondering whether to make `like` 
expressions ternary or include the `escape` option in a new `pattern` 
expression. Any feedback is welcome!

> Specify alternate escape character in 'LIKE' expression
> ---
>
> Key: SPARK-18018
> URL: https://issues.apache.org/jira/browse/SPARK-18018
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jakob Odersky
>
> Spark currently uses the backslash character () to escape patterns in 
> 'LIKE' expressions.
> Other RDBMS ([MS|https://msdn.microsoft.com/en-us/library/ms179859.aspx], 
> [Oracle|https://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm],
>  
> [DB2|http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_likepredicate.html],
>  
> [MySQL|http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html],
>  
> [PostgreSQL|https://www.postgresql.org/docs/9.0/static/functions-matching.html],
>  [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]) 
> support specifying an alternate escape character with an extended syntax of 
> the `LIKE` operator.
> The syntax is the same in all above mentioned systems and is described as 
> follows:
> {code}
> expression LIKE pattern [ESCAPE escapeChar]
> {code}
> where {{escapeChar}} is a single-character expression that will replace the 
> backslash as escape character.
> Adding this extended to Spark SQL would be a usability improvement for users 
> coming from other systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18018) Specify alternate escape character in 'LIKE' expression

2016-10-19 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-18018:
-

 Summary: Specify alternate escape character in 'LIKE' expression
 Key: SPARK-18018
 URL: https://issues.apache.org/jira/browse/SPARK-18018
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Jakob Odersky


Spark currently uses the backslash character () to escape patterns in 
'LIKE' expressions.
Other RDBMS ([MS|https://msdn.microsoft.com/en-us/library/ms179859.aspx], 
[Oracle|https://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm],
 
[DB2|http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_likepredicate.html],
 
[MySQL|http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html],
 
[PostgreSQL|https://www.postgresql.org/docs/9.0/static/functions-matching.html],
 [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]) 
support specifying an alternate escape character with an extended syntax of the 
`LIKE` operator.

The syntax is the same in all above mentioned systems and is described as 
follows:
{code}
expression LIKE pattern [ESCAPE escapeChar]
{code}
where {{escapeChar}} is a single-character expression that will replace the 
backslash as escape character.

Adding this extended to Spark SQL would be a usability improvement for users 
coming from other systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-10-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582882#comment-15582882
 ] 

Jakob Odersky commented on SPARK-17368:
---

[~arisofala...@gmail.com] Let me explain the fix to what I thought was 
initially impossible.
Value classes do have a class-representation for compatibility with Java, and 
although this will have a slight overhead compared to the primitive 
counterpart, catalyst will mostly negate that overhead by proving its own 
encoders and operators on serialized objects. This means that any operations on 
datasets that allow user defined functions (e.g. `map`, `filter` etc) will work 
with the class representation instead of the wrapped value.
Regarding the availability of encoders: while we cannot create type-classes 
that apply only to value classes (an implicit for `AnyVal` will also be applied 
to primitive types), without resorting to macros, this fix adds value class 
support to existing encoders. E.g. you can define your value class as a case 
class and have a working encoder out-of-the-box.
Unfortunately there is no way to statically verify that the wrapped value is 
also encodable, but encoders in general will perform "deep inspection" during 
runtime.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>Assignee: Jakob Odersky
> Fix For: 2.1.0
>
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15577) Java can't import DataFrame type alias

2016-10-10 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564302#comment-15564302
 ] 

Jakob Odersky commented on SPARK-15577:
---

this cleaning of jiras is really good to see :) Considering that spark 2.0 has 
already shipped with the type alias, I think it is safe to close this ticket. 
We can always reopen it if necessary.

> Java can't import DataFrame type alias
> --
>
> Key: SPARK-15577
> URL: https://issues.apache.org/jira/browse/SPARK-15577
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0
>Reporter: holdenk
>
> After SPARK-13244, all Java code needs to be updated to use Dataset 
> instead of DataFrame as we used a type alias. Should we consider adding a 
> DataFrame to the Java API which just extends Dataset for compatibility?
> cc [~liancheng] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15577) Java can't import DataFrame type alias

2016-10-10 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563910#comment-15563910
 ] 

Jakob Odersky commented on SPARK-15577:
---

This was considered and trade-offs were actively discussed, but ultimately the 
type alias was chosen over sub classing.
I think the main argument in favor of aliasing was to avoid incompatibilities 
in future libraries, i.e. there is utility function was written to accept a 
{{DataFrame}}, however I want to pass in a {{Dataset\[Row\]}}.

[This email thread| 
http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html]
 contains the whole discussion

> Java can't import DataFrame type alias
> --
>
> Key: SPARK-15577
> URL: https://issues.apache.org/jira/browse/SPARK-15577
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0
>Reporter: holdenk
>
> After SPARK-13244, all Java code needs to be updated to use Dataset 
> instead of DataFrame as we used a type alias. Should we consider adding a 
> DataFrame to the Java API which just extends Dataset for compatibility?
> cc [~liancheng] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15577) Java can't import DataFrame type alias

2016-10-10 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563910#comment-15563910
 ] 

Jakob Odersky edited comment on SPARK-15577 at 10/10/16 11:41 PM:
--

This was considered and trade-offs were actively discussed, but ultimately the 
type alias was chosen over sub classing.
I think a principal argument in favor of aliasing was to avoid 
incompatibilities in future libraries, i.e. there is utility function was 
written to accept a {{DataFrame}}, however I want to pass in a 
{{Dataset\[Row\]}}.

[This email thread| 
http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html]
 contains the whole discussion


was (Author: jodersky):
This was considered and trade-offs were actively discussed, but ultimately the 
type alias was chosen over sub classing.
I think the main argument in favor of aliasing was to avoid incompatibilities 
in future libraries, i.e. there is utility function was written to accept a 
{{DataFrame}}, however I want to pass in a {{Dataset\[Row\]}}.

[This email thread| 
http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html]
 contains the whole discussion

> Java can't import DataFrame type alias
> --
>
> Key: SPARK-15577
> URL: https://issues.apache.org/jira/browse/SPARK-15577
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0
>Reporter: holdenk
>
> After SPARK-13244, all Java code needs to be updated to use Dataset 
> instead of DataFrame as we used a type alias. Should we consider adding a 
> DataFrame to the Java API which just extends Dataset for compatibility?
> cc [~liancheng] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2016-10-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553517#comment-15553517
 ] 

Jakob Odersky commented on SPARK-17647:
---

Xiao pointed me to this issue, I can take a look at it

> SQL LIKE does not handle backslashes correctly
> --
>
> Key: SPARK-17647
> URL: https://issues.apache.org/jira/browse/SPARK-17647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xiangrui Meng
>  Labels: correctness
>
> Try the following in SQL shell:
> {code}
> select '' like '%\\%';
> {code}
> It returned false, which is wrong.
> cc: [~yhuai] [~joshrosen]
> A false-negative considered previously:
> {code}
> select '' rlike '.*.*';
> {code}
> It returned true, which is correct if we assume that the pattern is treated 
> as a Java string but not raw string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16264) Allow the user to use operators on the received DataFrame

2016-09-15 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494587#comment-15494587
 ] 

Jakob Odersky commented on SPARK-16264:
---

I just came across this issue through a comment in the ForeachSink. I 
understand why Sinks would be better off by not knowing about the type of 
QueryExecution, however I'm not quite sure what you mean by "having something 
similar to foreachwriter". Is the idea to have only a single foreach sink and 
expose all custom user sinks as foreach writers?

> Allow the user to use operators on the received DataFrame
> -
>
> Key: SPARK-16264
> URL: https://issues.apache.org/jira/browse/SPARK-16264
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>
> Currently Sink cannot apply any operators on the given DataFrame because new 
> DataFrame created by the operator will use QueryExecution rather than 
> IncrementalExecution.
> There are two options to fix this one:
> 1. Merge IncrementalExecution into QueryExecution so that QueryExecution can 
> also deal with streaming operators.
> 2. Make Dataset operators inherits the QueryExecution(IncrementalExecution is 
> just a subclass of IncrementalExecution) from it's parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478428#comment-15478428
 ] 

Jakob Odersky commented on SPARK-14221:
---

I just saw that chill already [has a pending PR to upgrade to Kryo 
4.0.0|https://github.com/twitter/chill/pull/258]

> Cross-publish Chill for Scala 2.12
> --
>
> Key: SPARK-14221
> URL: https://issues.apache.org/jira/browse/SPARK-14221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We need to cross-publish Chill in order to build against Scala 2.12.
> Upstream issue: https://github.com/twitter/chill/issues/252
> I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into 
> multiple failed tests due to issues with Java8 lambda serialization (similar 
> to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be 
> slightly more involved then just bumping the dependencies in the Chill build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903
 ] 

Jakob Odersky edited comment on SPARK-14221 at 9/9/16 6:30 PM:
---

[~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works 
when the PR got created but that was never published (and is hence a major 
blocker for Chill). Instead, Kryo went straight to version 4.0.0 (see changes 
[here|https://github.com/EsotericSoftware/kryo#new-in-release-400]).

Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, 
updating the Kryo version in Chill, in order to support Scala 2.12 will also 
need discussion upstream.


was (Author: jodersky):
[~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works 
when the PR got created but that was never published. Instead, Kryo went 
straight to version 4.0.0 (see changes 
[here|https://github.com/EsotericSoftware/kryo#new-in-release-400]).

Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, 
updating the Kryo version in Chill, in order to support Scala 2.12 will also 
need discussion upstream.

> Cross-publish Chill for Scala 2.12
> --
>
> Key: SPARK-14221
> URL: https://issues.apache.org/jira/browse/SPARK-14221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We need to cross-publish Chill in order to build against Scala 2.12.
> Upstream issue: https://github.com/twitter/chill/issues/252
> I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into 
> multiple failed tests due to issues with Java8 lambda serialization (similar 
> to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be 
> slightly more involved then just bumping the dependencies in the Chill build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12

2016-09-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903
 ] 

Jakob Odersky commented on SPARK-14221:
---

[~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works 
when the PR got created but that was never published. Instead, Kryo went 
straight to version 4.0.0 (see changes 
[here|https://github.com/EsotericSoftware/kryo#new-in-release-400]).

Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, 
updating the Kryo version in Chill, in order to support Scala 2.12 will also 
need discussion upstream.

> Cross-publish Chill for Scala 2.12
> --
>
> Key: SPARK-14221
> URL: https://issues.apache.org/jira/browse/SPARK-14221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We need to cross-publish Chill in order to build against Scala 2.12.
> Upstream issue: https://github.com/twitter/chill/issues/252
> I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into 
> multiple failed tests due to issues with Java8 lambda serialization (similar 
> to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be 
> slightly more involved then just bumping the dependencies in the Chill build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-07 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471428#comment-15471428
 ] 

Jakob Odersky commented on SPARK-17368:
---

Hmm, you're right my assumption was of using only value classes in the 
beginning and at the end was too naive.

[~srowen], how likely do you think it is that we can include a meta-encoder in 
Spark? It could be included in the form of an optional import. Since the 
existing encoders/ScalaReflection framework already use runtime-reflection, my 
guess is that adding compile-time reflection will not be too difficult.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833
 ] 

Jakob Odersky edited comment on SPARK-17368 at 9/6/16 10:57 PM:


So I thought about this a bit more and although it is possible to support value 
classes, I currently see two main issues that make it cumbersome:

1. Catalyst (the engine behind Datasets) generates and compiles code during 
runtime, that will represent the actual computation. This code being Java, 
together with the fact that value classes don't have runtime representations, 
will require changes in the implementation of Encoders (see my experimental 
branch 
[here|https://github.com/apache/spark/compare/master...jodersky:value-classes]).

2. The largest problem of both is how will encoders for value classes be 
accessible? Currently, encoders are exposed as type classes and there is 
unfortunately no way to create type classes for classes extending AnyVal (you 
could create an encoder for AnyVals, however that would also apply to any 
primitive type and you would get implicit resolution conflicts). Requiring 
explicit encoders for value classes may work, however you would still have no 
compile-time safety, as accessing of a value class' inner val will occur during 
runtime and may hence fail if it is not encodable.

The cleanest solution would be to use meta programming: it would guarantee 
"encodability" during compile-time and could easily complement the current API. 
Unfortunately however, I don't think it could be included in Spark in the near 
future as the current meta programming solutions in Scala are either too new 
(scala.meta) or on their way to being deprecated (the current experimental 
scala macros). (I have been wanting to experiment with meta encoders for a 
while though, so maybe I'll try putting together an external library for that)

How inconvenient is it to extract the wrapped value before creating a dataset 
and re-wrapping your final results?


was (Author: jodersky):
So I thought about this a bit more and although it is possible to support value 
classes, I currently see two main issues that make it cumbersome:

1. Catalyst (the engine behind Datasets) generates and compiles code during 
runtime, that will represent the actual computation. This code being Java, 
together with the fact that value classes don't have runtime representations, 
will require changes in the implementation of Encoders (see my experimental 
branch here).

2. The largest problem of both is how will encoders for value classes be 
accessible? Currently, encoders are exposed as type classes and there is 
unfortunately no way to create type classes for classes extending AnyVal (you 
could create an encoder for AnyVals, however that would also apply to any 
primitive type and you would get implicit resolution conflicts). Requiring 
explicit encoders for value classes may work, however you would still have no 
compile-time safety, as accessing of a value class' inner val will occur during 
runtime and may hence fail if it is not encodable.

The cleanest solution would be to use meta programming: it would guarantee 
"encodability" during compile-time and could easily complement the current API. 
Unfortunately however, I don't think it could be included in Spark in the near 
future as the current meta programming solutions in Scala are either too new 
(scala.meta) or on their way to being deprecated (the current experimental 
scala macros). (I have been wanting to experiment with meta encoders for a 
while though, so maybe I'll try putting together an external library for that)

How inconvenient is it to extract the wrapped value before creating a dataset 
and re-wrapping your final results?

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, 

[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833
 ] 

Jakob Odersky commented on SPARK-17368:
---

So I thought about this a bit more and although it is possible to support value 
classes, I currently see two main issues that make it cumbersome:

1. Catalyst (the engine behind Datasets) generates and compiles code during 
runtime, that will represent the actual computation. This code being Java, 
together with the fact that value classes don't have runtime representations, 
will require changes in the implementation of Encoders (see my experimental 
branch here).

2. The largest problem of both is how will encoders for value classes be 
accessible? Currently, encoders are exposed as type classes and there is 
unfortunately no way to create type classes for classes extending AnyVal (you 
could create an encoder for AnyVals, however that would also apply to any 
primitive type and you would get implicit resolution conflicts). Requiring 
explicit encoders for value classes may work, however you would still have no 
compile-time safety, as accessing of a value class' inner val will occur during 
runtime and may hence fail if it is not encodable.

The cleanest solution would be to use meta programming: it would guarantee 
"encodability" during compile-time and could easily complement the current API. 
Unfortunately however, I don't think it could be included in Spark in the near 
future as the current meta programming solutions in Scala are either too new 
(scala.meta) or on their way to being deprecated (the current experimental 
scala macros). (I have been wanting to experiment with meta encoders for a 
while though, so maybe I'll try putting together an external library for that)

How inconvenient is it to extract the wrapped value before creating a dataset 
and re-wrapping your final results?

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-02 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459707#comment-15459707
 ] 

Jakob Odersky commented on SPARK-17368:
---

Yeah macros would be awesome, something with Scala.meta would be neat :) In the 
mean time it occurred to me that Catalyst uses ClassTags to do reflection in 
lots of places. These are generated during compile-time, so it might just yet 
be possible to support value classes.
A quick test showed me that value classes can be detected and their parameters 
accessed. Getting a Schema for such a case is trivial, I'll see about adding 
encoders next!

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-02 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459587#comment-15459587
 ] 

Jakob Odersky commented on SPARK-17368:
---

I'm currently taking a look at this but my first analysis is not very positive: 
considering that value classes are pure compile-time constructs I think it 
isn't possible to do anything with them through reflection, which Catalyst 
assumes.
Here's a relevant blog post 
http://tech.kinja.com/scala-value-classes-and-reflection-here-be-dragons-1527846740
I'll check it out in a bit more detail but I fear that we'll have to resolve 
this as a won't fix and not support value classes in Datasets :(

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17367) Cannot define value classes in REPL

2016-09-02 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457862#comment-15457862
 ] 

Jakob Odersky commented on SPARK-17367:
---

You're absolutely correct, it is a Scala issue. I raised it here as well 
though, since the -Yrepl-class-based option was originally created for Spark 
(the standard object-wrapping behaviour had issues with the ClosureCleaner and 
serialization IIRC) and was contributed back to Scala.

Should I close the issue?

> Cannot define value classes in REPL
> ---
>
> Key: SPARK-17367
> URL: https://issues.apache.org/jira/browse/SPARK-17367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Jakob Odersky
>
> It is currently not possible to define a class extending `AnyVal` in the 
> REPL. The underlying reason is the {{-Yrepl-class-based}} option used by 
> Spark Shell.
> The report here is more of an FYI for anyone stumbling upon the problem, see 
> the upstream issue [https://issues.scala-lang.org/browse/SI-9910] for any 
> progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966
 ] 

Jakob Odersky edited comment on SPARK-17368 at 9/1/16 11:48 PM:


FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} 
defined outside of {{object BreakSpark}})

Please also be aware that the given example will *not compile* in a spark 
shell. See this related issue https://issues.apache.org/jira/browse/SPARK-17367 
regarding the definition of value classes in the REPL.


was (Author: jodersky):
FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} 
defined outside of {{object BreakSpark}})

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966
 ] 

Jakob Odersky commented on SPARK-17368:
---

FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} 
defined outside of {{object BreakSpark}})

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17367) Cannot define value classes in REPL

2016-09-01 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-17367:
-

 Summary: Cannot define value classes in REPL
 Key: SPARK-17367
 URL: https://issues.apache.org/jira/browse/SPARK-17367
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Reporter: Jakob Odersky


It is currently not possible to define a class extending `AnyVal` in the REPL. 
The underlying reason is the {{-Yrepl-class-based}} option used by Spark Shell.

The report here is more of an FYI for anyone stumbling upon the problem, see 
the upstream issue [https://issues.scala-lang.org/browse/SI-9910] for any 
progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
 ] 

Jakob Odersky edited comment on SPARK-17103 at 8/17/16 5:28 PM:


That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with said 
setting enabled).

I'll check this out tomorrow, but my first intuition is that it's an upstream 
bug.


was (Author: jodersky):
That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).

I'll check this out tomorrow, but my first intuition is that it's an upstream 
bug.

> Can not define class variable in repl
> -
>
> Key: SPARK-17103
> URL: https://issues.apache.org/jira/browse/SPARK-17103
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> I can not execute the following code in spark 2.0 repl, but it succeeds in 
> scala 2.11 repl
> spark 2.0 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test {val f=new File(".")}
> :11: error: not found: type File
>class Test {val f=new File(".")}
> {code}
> scala 2.11 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test { val f=new File(".")}
> defined class Test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
 ] 

Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:53 AM:


That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).

I'll check this out tomorrow, but my first intuition is that it's an upstream 
bug.


was (Author: jodersky):
That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).

> Can not define class variable in repl
> -
>
> Key: SPARK-17103
> URL: https://issues.apache.org/jira/browse/SPARK-17103
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> I can not execute the following code in spark 2.0 repl, but it succeeds in 
> scala 2.11 repl
> spark 2.0 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test {val f=new File(".")}
> :11: error: not found: type File
>class Test {val f=new File(".")}
> {code}
> scala 2.11 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test { val f=new File(".")}
> defined class Test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
 ] 

Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:52 AM:


That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).


was (Author: jodersky):
That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).

There is one option that is set by spark and has caused previous issues

> Can not define class variable in repl
> -
>
> Key: SPARK-17103
> URL: https://issues.apache.org/jira/browse/SPARK-17103
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> I can not execute the following code in spark 2.0 repl, but it succeeds in 
> scala 2.11 repl
> spark 2.0 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test {val f=new File(".")}
> :11: error: not found: type File
>class Test {val f=new File(".")}
> {code}
> scala 2.11 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test { val f=new File(".")}
> defined class Test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17103) Can not define class variable in repl

2016-08-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
 ] 

Jakob Odersky commented on SPARK-17103:
---

That's true, the spark repl is basically just a thin wrapper around the scala 
repl, with custom initialization and settings. One of the settings, 
"-Yrepl-class-based", has caused issues previously and seems to be the culprit 
here again (I can reproduce the issue by running a normal scala repl with 
-Yrepl-class-based{/code} enabled).

There is one option that is set by spark and has caused previous issues

> Can not define class variable in repl
> -
>
> Key: SPARK-17103
> URL: https://issues.apache.org/jira/browse/SPARK-17103
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> I can not execute the following code in spark 2.0 repl, but it succeeds in 
> scala 2.11 repl
> spark 2.0 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test {val f=new File(".")}
> :11: error: not found: type File
>class Test {val f=new File(".")}
> {code}
> scala 2.11 repl
> {code}
> scala> import java.io.File
> import java.io.File
> scala> class Test { val f=new File(".")}
> defined class Test
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely

2016-08-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423602#comment-15423602
 ] 

Jakob Odersky commented on SPARK-17095:
---

Since this bug also occurs when there are no opening braces (}}} anywhere in 
the doc is sufficient), I think this is an issue with scaladoc itself. I would 
recommend creating a bug report on the scala tracker 
https://issues.scala-lang.org/secure/Dashboard.jspa.
Ideally, code blocks could be delimited with an arbitrary number of opening 
symbols followed by an arbitrary number of closing symbols (e.g. you could use 
 (4 braces)  to delimit code that itself contains }}} 3 braces.

> Latex and Scala doc do not play nicely
> --
>
> Key: SPARK-17095
> URL: https://issues.apache.org/jira/browse/SPARK-17095
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Seth Hendrickson
>Priority: Minor
>  Labels: starter
>
> In Latex, it is common to find "}}}" when closing several expressions at 
> once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added 
> Mathjax to render Latex equations in scaladoc. However, when scala doc sees 
> "}}}" or "{{{" it treats it as a special character for code block. This 
> results in some very strange output.
> A poor workaround is to use "}}\,}" in latex which inserts a small 
> whitespace. This is not ideal, and we can hopefully find a better solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299168#comment-15299168
 ] 

Jakob Odersky commented on SPARK-15014:
---

You might still have some issues with classloaders, I didn't think of that at 
first

> Spark Shell could use Ammonite Shell
> 
>
> Key: SPARK-15014
> URL: https://issues.apache.org/jira/browse/SPARK-15014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.1
> Environment: All
>Reporter: John-Michael Reed
>Priority: Minor
>  Labels: shell, shell-script
>
> Lihaoyi has an enhanced Scala Shell called Ammonite. 
> https://github.com/lihaoyi/Ammonite
> Users of Ammonite shell have tried to use it with Apache Spark. 
> https://github.com/lihaoyi/Ammonite/issues/382
> Spark Shell does not work with Ammonite Shell, but I want it to because the 
> Ammonite REPL offers enhanced auto-complete, pretty printing, and other 
> features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299118#comment-15299118
 ] 

Jakob Odersky commented on SPARK-15014:
---

spark-shell is a very thin wrapper around the standard scala repl (with spark 
dependencies). It does some configuration and exposes a spark context and some 
imports, almost everything is implemented in these two files: 

- 
https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala
- 
https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala.

I don't know much about ammonite, but as a workaround could you use spark as a 
standalone program in your shell? Just add the spark dependencies and create a 
spark context manually.

> Spark Shell could use Ammonite Shell
> 
>
> Key: SPARK-15014
> URL: https://issues.apache.org/jira/browse/SPARK-15014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.1
> Environment: All
>Reporter: John-Michael Reed
>Priority: Minor
>  Labels: shell, shell-script
>
> Lihaoyi has an enhanced Scala Shell called Ammonite. 
> https://github.com/lihaoyi/Ammonite
> Users of Ammonite shell have tried to use it with Apache Spark. 
> https://github.com/lihaoyi/Ammonite/issues/382
> Spark Shell does not work with Ammonite Shell, but I want it to because the 
> Ammonite REPL offers enhanced auto-complete, pretty printing, and other 
> features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738
 ] 

Jakob Odersky commented on SPARK-13581:
---

I can't reproduce it anymore either

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Critical
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738
 ] 

Jakob Odersky edited comment on SPARK-13581 at 5/18/16 8:26 PM:


I can't reproduce it anymore either. Should I close it as "fixed" or invalid?


was (Author: jodersky):
I can't reproduce it anymore either

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Critical
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4

2016-04-26 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259100#comment-15259100
 ] 

Jakob Odersky commented on SPARK-14519:
---

That sounds reasonable, however should the parent JIRA then still be marked as 
a blocker for 2.0?

> Cross-publish Kafka for Scala 2.12.0-M4
> ---
>
> Key: SPARK-14519
> URL: https://issues.apache.org/jira/browse/SPARK-14519
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> In order to build the streaming Kafka connector, we need to publish Kafka for 
> Scala 2.12.0-M4. Someone should file an issue against the Kafka project and 
> work with their developers to figure out what will block their upgrade / 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-04-26 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259079#comment-15259079
 ] 

Jakob Odersky commented on SPARK-14146:
---

the reason this fails is because spark-shell sets the `-Yrepl-class-based` of 
the scala REPL. I'm looking into this.

> Imported implicits can't be found in Spark REPL in some cases
> -
>
> Key: SPARK-14146
> URL: https://issues.apache.org/jira/browse/SPARK-14146
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> {code}
> class I(i: Int) {
>   def double: Int = i * 2
> }
> class Context {
>   implicit def toI(i: Int): I = new I(i)
> }
> val c = new Context
> import c._
> // OK
> 1.double
> // Fail
> class A; 1.double
> {code}
> The above code snippets can work in Scala REPL however.
> This will affect our Dataset functionality, for example:
> {code}
> class A; Seq(1 -> "a").toDS() // fail
> {code}
> or in paste mode:
> {code}
> :paste
> class A
> Seq(1 -> "a").toDS() // fail
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4

2016-04-26 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258974#comment-15258974
 ] 

Jakob Odersky commented on SPARK-14519:
---

>From a reply in the mailing list archive (14/4/2016):
{quote}
If no-one else beats me to it, I intend to start this conversation after
0.10.0.0 is released (probably 1 to 2 months away depending on how RCs go).
{quote}
What should we do? Scala 2.12 support is a blocker for Spark 2.0, which is 
planned to enter code freeze in a week.

> Cross-publish Kafka for Scala 2.12.0-M4
> ---
>
> Key: SPARK-14519
> URL: https://issues.apache.org/jira/browse/SPARK-14519
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> In order to build the streaming Kafka connector, we need to publish Kafka for 
> Scala 2.12.0-M4. Someone should file an issue against the Kafka project and 
> work with their developers to figure out what will block their upgrade / 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14417) Cleanup Scala deprecation warnings once we drop 2.10.X

2016-04-26 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258883#comment-15258883
 ] 

Jakob Odersky commented on SPARK-14417:
---

I suggested that Arun add the JIRA in the title and close the issue. That way 
his work will stay readily available from here, when Scala 2.10 support is 
dropped.

> Cleanup Scala deprecation warnings once we drop 2.10.X
> --
>
> Key: SPARK-14417
> URL: https://issues.apache.org/jira/browse/SPARK-14417
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: holdenk
>Priority: Minor
>
> While a previous issue addressed many of the deprecation warnings, since we 
> didn't want to introduce scala version specific code there are a number of 
> deprecation warnings we can't easily fix. Once we drop Scala 2.10 we should 
> go back and cleanup these remaining issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-26 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258702#comment-15258702
 ] 

Jakob Odersky commented on SPARK-14511:
---

release is out, pr has been submitted

> Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
> --
>
> Key: SPARK-14511
> URL: https://issues.apache.org/jira/browse/SPARK-14511
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Before we can move to 2.12, we need to publish our forked genjavadoc for 
> 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257217#comment-15257217
 ] 

Jakob Odersky commented on SPARK-14511:
---

Update: an issue was discovered during release-testing upstream. I just 
submitted a fix for it, tested against Akka and Spark.
Javadoc in Spark emits a few error messages, however these were already present 
previously and do not affect the final, generated documentation.
I'll get back when the release is out

> Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
> --
>
> Key: SPARK-14511
> URL: https://issues.apache.org/jira/browse/SPARK-14511
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Before we can move to 2.12, we need to publish our forked genjavadoc for 
> 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job

2016-04-20 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251304#comment-15251304
 ] 

Jakob Odersky commented on SPARK-10001:
---

FYI, I took up the issue (previous pr #8216)

> Allow Ctrl-C in spark-shell to kill running job
> ---
>
> Key: SPARK-10001
> URL: https://issues.apache.org/jira/browse/SPARK-10001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.4.1
>Reporter: Cheolsoo Park
>Priority: Minor
>
> Hitting Ctrl-C in spark-sql (and other tools like presto) cancels any running 
> job and starts a new input line on the prompt. It would be nice if 
> spark-shell also can do that. Otherwise, in case a user submits a job, say he 
> made a mistake, and wants to cancel it, he needs to exit the shell and 
> re-login to continue his work. Re-login can be a pain especially in Spark on 
> yarn, since it takes a while to allocate AM container and initial executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version

2016-04-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246336#comment-15246336
 ] 

Jakob Odersky commented on SPARK-14511:
---

cf https://github.com/typesafehub/genjavadoc/issues/73
I can create a PR with the dependency updates once upstream releases

> Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
> --
>
> Key: SPARK-14511
> URL: https://issues.apache.org/jira/browse/SPARK-14511
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Before we can move to 2.12, we need to publish our forked genjavadoc for 
> 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-04-15 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242725#comment-15242725
 ] 

Jakob Odersky commented on SPARK-7992:
--

[~mengxr], The PR is finally in! Let's hope upstream makes a release soon.

> Hide private classes/objects in in generated Java API doc
> -
>
> Key: SPARK-7992
> URL: https://issues.apache.org/jira/browse/SPARK-7992
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>
> After SPARK-5610, we found that private classes/objects still show up in the 
> generated Java API doc, e.g., under `org.apache.spark.api.r` we can see
> {code}
> BaseRRDD
> PairwiseRRDD
> RRDD
> SpecialLengths
> StringRRDD
> {code}
> We should update genjavadoc to hide those private classes/methods. The best 
> approach is to find a good mapping from Scala private to Java, and merge it 
> into the main genjavadoc repo. A WIP PR is at 
> https://github.com/typesafehub/genjavadoc/pull/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-28 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215315#comment-15215315
 ] 

Jakob Odersky commented on SPARK-7992:
--

[~mengxr], I just submitted [another 
PR|https://github.com/typesafehub/genjavadoc/pull/71] to the genjavadoc 
project. Once accepted, the original functionality should be straight-forward 
to merge.

> Hide private classes/objects in in generated Java API doc
> -
>
> Key: SPARK-7992
> URL: https://issues.apache.org/jira/browse/SPARK-7992
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>
> After SPARK-5610, we found that private classes/objects still show up in the 
> generated Java API doc, e.g., under `org.apache.spark.api.r` we can see
> {code}
> BaseRRDD
> PairwiseRRDD
> RRDD
> SpecialLengths
> StringRRDD
> {code}
> We should update genjavadoc to hide those private classes/methods. The best 
> approach is to find a good mapping from Scala private to Java, and merge it 
> into the main genjavadoc repo. A WIP PR is at 
> https://github.com/typesafehub/genjavadoc/pull/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-23 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280
 ] 

Jakob Odersky edited comment on SPARK-7992 at 3/23/16 10:16 PM:


Hey [~mengxr],
you caught me in a very busy time last week and I'm afraid to say that I 
completely forgot about this. I just took up the issue. Take a look at my 
comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. 


was (Author: jodersky):
Hey Xiangrui,
you caught me in a very busy time last week and I'm afraid to say that I 
completely forgot about this. I just took up the issue. Take a look at my 
comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. 

> Hide private classes/objects in in generated Java API doc
> -
>
> Key: SPARK-7992
> URL: https://issues.apache.org/jira/browse/SPARK-7992
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>
> After SPARK-5610, we found that private classes/objects still show up in the 
> generated Java API doc, e.g., under `org.apache.spark.api.r` we can see
> {code}
> BaseRRDD
> PairwiseRRDD
> RRDD
> SpecialLengths
> StringRRDD
> {code}
> We should update genjavadoc to hide those private classes/methods. The best 
> approach is to find a good mapping from Scala private to Java, and merge it 
> into the main genjavadoc repo. A WIP PR is at 
> https://github.com/typesafehub/genjavadoc/pull/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-23 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280
 ] 

Jakob Odersky commented on SPARK-7992:
--

Hey Xiangrui,
you caught me in a very busy time last week and I'm afraid to say that I 
completely forgot about this. I just took up the issue. Take a look at my 
comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. 

> Hide private classes/objects in in generated Java API doc
> -
>
> Key: SPARK-7992
> URL: https://issues.apache.org/jira/browse/SPARK-7992
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>
> After SPARK-5610, we found that private classes/objects still show up in the 
> generated Java API doc, e.g., under `org.apache.spark.api.r` we can see
> {code}
> BaseRRDD
> PairwiseRRDD
> RRDD
> SpecialLengths
> StringRRDD
> {code}
> We should update genjavadoc to hide those private classes/methods. The best 
> approach is to find a good mapping from Scala private to Java, and merge it 
> into the main genjavadoc repo. A WIP PR is at 
> https://github.com/typesafehub/genjavadoc/pull/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc

2016-03-19 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197877#comment-15197877
 ] 

Jakob Odersky commented on SPARK-7992:
--

I'll check it out

> Hide private classes/objects in in generated Java API doc
> -
>
> Key: SPARK-7992
> URL: https://issues.apache.org/jira/browse/SPARK-7992
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>
> After SPARK-5610, we found that private classes/objects still show up in the 
> generated Java API doc, e.g., under `org.apache.spark.api.r` we can see
> {code}
> BaseRRDD
> PairwiseRRDD
> RRDD
> SpecialLengths
> StringRRDD
> {code}
> We should update genjavadoc to hide those private classes/methods. The best 
> approach is to find a good mapping from Scala private to Java, and merge it 
> into the main genjavadoc repo. A WIP PR is at 
> https://github.com/typesafehub/genjavadoc/pull/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13929) Use Scala reflection for UDFs

2016-03-16 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-13929:
-

 Summary: Use Scala reflection for UDFs
 Key: SPARK-13929
 URL: https://issues.apache.org/jira/browse/SPARK-13929
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Jakob Odersky
Priority: Minor


{{ScalaReflection}} uses native Java reflection for User Defined Types which 
would fail if such types are not plain Scala classes that map 1:1 to Java.

Consider the following extract (from here 
https://github.com/apache/spark/blob/92024797a4fad594b5314f3f3be5c6be2434de8a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L376
 ):
{code}
case t if Utils.classIsLoadable(className) &&
Utils.classForName(className).isAnnotationPresent(classOf[SQLUserDefinedType]) 
=>

val udt = 
Utils.classForName(className).getAnnotation(classOf[SQLUserDefinedType]).udt().newInstance()
//...
{code}

If {{t}}'s runtime class is actually synthetic (something that doesn't exist in 
Java and hence uses a dollar sign internally), such as nested classes or 
package objects, the above code will fail.

Currently there are no known use-cases of synthetic user-defined types (hence 
the minor priority), however it would be best practice to remove plain Java 
reflection and rely on Scala reflection instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196928#comment-15196928
 ] 

Jakob Odersky commented on SPARK-13118:
---

Upate: there was actually with inner classes (or package objects or any other 
synthetic class containing a dollar sign), however it only occurs when wrapped 
in option types.

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-15 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196463#comment-15196463
 ] 

Jakob Odersky commented on SPARK-13118:
---

Should I remove the JIRA ID from my existing PR?

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465
 ] 

Jakob Odersky commented on SPARK-13118:
---

Sure, I'll submit a PR with the test

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194163#comment-15194163
 ] 

Jakob Odersky commented on SPARK-13118:
---

[~marmbrus], what's the issue at hand? Creating a simple test:
{code}
package object packageobject {
  case class Container(x: Int)
}
test("Package objects") {
import packageobject._
val ds = Seq(Container(1)).toDS()
checkDataset(ds, Container(1))
}
{code}
works without an issue. I might be testing something completely irrelevant but 
I can't quite make out the issue from the description.

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193968#comment-15193968
 ] 

Jakob Odersky commented on SPARK-13118:
---

If I recall correctly, I couldn't reproduce the issue. I'll have another shot 
at it though

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-13581:
--
Description: 
When running an action on a DataFrame obtained by reading from a libsvm file a 
MatchError is thrown, however doing the same on a cached DataFrame works fine.
{code}
val df = 
sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
//file is in spark repository

df.select(df("features")).show() //MatchError

df.cache()
df.select(df("features")).show() //OK
{code}

The exception stack trace is the following:
{code}
scala.MatchError: 1.0 (of class java.lang.Double)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
{code}

This issue first appeared in commit {{1dac964c1}}, in PR 
[#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.

[~jeffzhang], do you have any insight of what could be going on?

cc [~iyounus]

  was:
When running an action on a DataFrame obtained by reading from a libsvm file a 
MatchError is thrown, however doing the same on a cached DataFrame works fine.
{code}
val df = 
sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
//file is

df.select(df("features")).show() //MatchError

df.cache()
df.select(df("features")).show() //OK
{code}

The exception stack trace is the following:
{code}
scala.MatchError: 1.0 (of class java.lang.Double)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
{code}

This issue first appeared in commit {{1dac964c1}}, in PR 
[#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.

[~jeffzhang], do you have any insight of what could be going on?

cc [~iyounus]


> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Minor
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> 

[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173058#comment-15173058
 ] 

Jakob Odersky commented on SPARK-13581:
---

It's in spark "data/mllib/sample_libsvm_data.txt"

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Minor
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-13581:
-

 Summary: LibSVM throws MatchError
 Key: SPARK-13581
 URL: https://issues.apache.org/jira/browse/SPARK-13581
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Jakob Odersky
Priority: Minor


When running an action on a DataFrame obtained by reading from a libsvm file a 
MatchError is thrown, however doing the same on a cached DataFrame works fine.
{code}
val df = 
sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
//file is

df.select(df("features")).show() //MatchError

df.cache()
df.select(df("features")).show() //OK
{code}

The exception stack trace is the following:
{code}
scala.MatchError: 1.0 (of class java.lang.Double)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
[info]  at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
[info]  at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
[info]  at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
{code}

This issue first appeared in commit {{1dac964c1}}, in PR 
[#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.

[~jeffzhang], do you have any insight of what could be going on?

cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-26 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky reopened SPARK-7768:
--

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Priority: Critical
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-25 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky closed SPARK-7768.

Resolution: Fixed

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Priority: Critical
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2016-02-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168155#comment-15168155
 ] 

Jakob Odersky commented on SPARK-7768:
--

[~marmbrus]
UDTs are public now (in Scala at least), can this JIRA be closed?

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Priority: Critical
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types

2016-02-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119
 ] 

Jakob Odersky edited comment on SPARK-12878 at 2/25/16 10:22 PM:
-

I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is a collection of elements B in this case, I don't think that 
the individual Bs are serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge ([~marmbrus] [~cloud_fan]) on the topic can 
clarify what's going on?


was (Author: jodersky):
I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is a collection of elements B in this case, I don't think that 
the individual Bs are serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?

> Dataframe fails with nested User Defined Types
> --
>
> Key: SPARK-12878
> URL: https://issues.apache.org/jira/browse/SPARK-12878
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Joao
>Priority: Blocker
>
> Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. 
> In version 1.5.2 the code below worked just fine:
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.catalyst.InternalRow
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> import org.apache.spark.sql.types._
> @SQLUserDefinedType(udt = classOf[AUDT])
> case class A(list:Seq[B])
> class AUDT extends UserDefinedType[A] {
>   override def sqlType: DataType = StructType(Seq(StructField("list", 
> ArrayType(BUDT, containsNull = false), nullable = true)))
>   override def userClass: Class[A] = classOf[A]
>   override def serialize(obj: Any): Any = obj match {
> case A(list) =>
>   val row = new GenericMutableRow(1)
>   row.update(0, new 
> GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
>   row
>   }
>   override def deserialize(datum: Any): A = {
> datum match {
>   case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq)
> }
>   }
> }
> object AUDT extends AUDT
> @SQLUserDefinedType(udt = classOf[BUDT])
> case class B(text:Int)
> class BUDT extends UserDefinedType[B] {
>   override def sqlType: DataType = StructType(Seq(StructField("num", 
> IntegerType, nullable = false)))
>   override def userClass: Class[B] = classOf[B]
>   override def serialize(obj: Any): Any = obj match {
> case B(text) =>
>   val row = new GenericMutableRow(1)
>   row.setInt(0, text)
>   row
>   }
>   override def deserialize(datum: Any): B = {
> datum match {  case row: InternalRow => new B(row.getInt(0))  }
>   }
> }
> object BUDT extends BUDT
> object Test {
>   def main(args:Array[String]) = {
> val col = Seq(new A(Seq(new B(1), new B(2))),
>   new A(Seq(new B(3), new B(4
> val sc = new SparkContext(new 
> SparkConf().setMaster("local[1]").setAppName("TestSpark"))
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val df = sc.parallelize(1 to 2 zip col).toDF("id","b")
> df.select("b").show()
> df.collect().foreach(println)
>   }
> }
> In the new version (1.6.0) I needed to include the following import:
> import 

[jira] [Commented] (SPARK-10712) JVM crashes with spark.sql.tungsten.enabled = true

2016-02-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167911#comment-15167911
 ] 

Jakob Odersky commented on SPARK-10712:
---

Any news on this? Is it still an issue?

> JVM crashes with spark.sql.tungsten.enabled = true
> --
>
> Key: SPARK-10712
> URL: https://issues.apache.org/jira/browse/SPARK-10712
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: 1 node - Linux, 64GB ram, 8 core
>Reporter: Mauro Pirrone
>Priority: Critical
>
> When turning on tungsten, I get the following error when executing a 
> query/job with a few joins. When tungsten is turned off, the error does not 
> appear. Also note that tungsten works for me in other cases.
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ffadaf59200, pid=7598, tid=140710015645440
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 
> 1.8.0_45-b14)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x7eb200]
> #
> # Core dump written. Default location: //core or core.7598 (max size 100 
> kB). To ensure a full core dump, try "ulimit -c unlimited" before starting 
> Java again
> #
> # An error report file with more information is saved as:
> # //hs_err_pid7598.log
> Compiled method (nm)   44403 10436 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x7ffac6b49290,0x7ffac6b495f8] = 872
>  relocation [0x7ffac6b493b8,0x7ffac6b49400] = 72
>  main code  [0x7ffac6b49400,0x7ffac6b495f8] = 504
> Compiled method (nm)   44403 10436 n 0   sun.misc.Unsafe::copyMemory 
> (native)
>  total in heap  [0x7ffac6b49290,0x7ffac6b495f8] = 872
>  relocation [0x7ffac6b493b8,0x7ffac6b49400] = 72
>  main code  [0x7ffac6b49400,0x7ffac6b495f8] = 504
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7ff7902e7800):  JavaThread "broadcast-hash-join-1" 
> daemon [_thread_in_vm, id=16548, stack(0x7ff66bd98000,0x7ff66be99000)]
> siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 
> 0x00069f572b10
> Registers:
> RAX=0x00069f672b08, RBX=0x7ff7902e7800, RCX=0x000394132140, 
> RDX=0xfffe0004
> RSP=0x7ff66be97048, RBP=0x7ff66be970a0, RSI=0x000394032148, 
> RDI=0x00069f572b10
> R8 =0x7ff66be970d0, R9 =0x0028, R10=0x7ff79cc0e1e7, 
> R11=0x7ff79cc0e198
> R12=0x7ff66be970c0, R13=0x7ff66be970d0, R14=0x0028, 
> R15=0x30323048
> RIP=0x7ff7b0dae200, EFLAGS=0x00010282, CSGSFS=0xe033, 
> ERR=0x0004
>   TRAPNO=0x000e
> Top of Stack: (sp=0x7ff66be97048)
> 0x7ff66be97048:   7ff7b1042b1a 7ff7902e7800
> 0x7ff66be97058:   7ff7 7ff7902e7800
> 0x7ff66be97068:   7ff7902e7800 7ff7ad2846a0
> 0x7ff66be97078:   7ff7897048d8 
> 0x7ff66be97088:   7ff66be97110 7ff66be971f0
> 0x7ff66be97098:   7ff7902e7800 7ff66be970f0
> 0x7ff66be970a8:   7ff79cc0e261 0010
> 0x7ff66be970b8:   000390c04048 00066f24fac8
> 0x7ff66be970c8:   7ff7902e7800 000394032120
> 0x7ff66be970d8:   7ff7902e7800 7ff66f971af0
> 0x7ff66be970e8:   7ff7902e7800 7ff66be97198
> 0x7ff66be970f8:   7ff79c9d4c4d 7ff66a454b10
> 0x7ff66be97108:   7ff79c9d4c4d 0010
> 0x7ff66be97118:   7ff7902e5a90 0028
> 0x7ff66be97128:   7ff79c9d4760 000394032120
> 0x7ff66be97138:   30323048 7ff66be97160
> 0x7ff66be97148:   00066f24fac8 000390c04048
> 0x7ff66be97158:   7ff66be97158 7ff66f978eeb
> 0x7ff66be97168:   7ff66be971f0 7ff66f9791c8
> 0x7ff66be97178:   7ff668e90c60 7ff66f978f60
> 0x7ff66be97188:   7ff66be97110 7ff66be971b8
> 0x7ff66be97198:   7ff66be97238 7ff79c9d4c4d
> 0x7ff66be971a8:   0010 
> 0x7ff66be971b8:   38363130 38363130
> 0x7ff66be971c8:   0028 7ff66f973388
> 0x7ff66be971d8:   000394032120 30323048
> 0x7ff66be971e8:   000665823080 00066f24fac8
> 0x7ff66be971f8:   7ff66be971f8 7ff66f973357
> 0x7ff66be97208:   7ff66be97260 7ff66f976fe0
> 0x7ff66be97218:    7ff66f973388
> 0x7ff66be97228:   7ff66be971b8 7ff66be97248
> 

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163902#comment-15163902
 ] 

Jakob Odersky commented on SPARK-13118:
---

Ah, just realized the context of this issue, it's part of the Dataset API 
super-ticket

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163755#comment-15163755
 ] 

Jakob Odersky commented on SPARK-13118:
---

Hi Michael,
what's the concrete issue you encounter, is it a (de-)serialization bug?
I ran a simple test with DataFrames containing classes defined in package 
objects and everything worked out fine.

I also quickly checked {{o.a.s.sql.catalyst.ScalaReflection}} but it seems that 
type names are always accessed via native scala reflection utilities.

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119
 ] 

Jakob Odersky edited comment on SPARK-12878 at 2/24/16 7:16 PM:


I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is a collection of elements B in this case, I don't think that 
the individual Bs are serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?


was (Author: jodersky):
I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is a collection of elements B in this case, I don't think that 
the individual B's are serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?

> Dataframe fails with nested User Defined Types
> --
>
> Key: SPARK-12878
> URL: https://issues.apache.org/jira/browse/SPARK-12878
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Joao
>Priority: Blocker
>
> Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. 
> In version 1.5.2 the code below worked just fine:
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.catalyst.InternalRow
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> import org.apache.spark.sql.types._
> @SQLUserDefinedType(udt = classOf[AUDT])
> case class A(list:Seq[B])
> class AUDT extends UserDefinedType[A] {
>   override def sqlType: DataType = StructType(Seq(StructField("list", 
> ArrayType(BUDT, containsNull = false), nullable = true)))
>   override def userClass: Class[A] = classOf[A]
>   override def serialize(obj: Any): Any = obj match {
> case A(list) =>
>   val row = new GenericMutableRow(1)
>   row.update(0, new 
> GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
>   row
>   }
>   override def deserialize(datum: Any): A = {
> datum match {
>   case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq)
> }
>   }
> }
> object AUDT extends AUDT
> @SQLUserDefinedType(udt = classOf[BUDT])
> case class B(text:Int)
> class BUDT extends UserDefinedType[B] {
>   override def sqlType: DataType = StructType(Seq(StructField("num", 
> IntegerType, nullable = false)))
>   override def userClass: Class[B] = classOf[B]
>   override def serialize(obj: Any): Any = obj match {
> case B(text) =>
>   val row = new GenericMutableRow(1)
>   row.setInt(0, text)
>   row
>   }
>   override def deserialize(datum: Any): B = {
> datum match {  case row: InternalRow => new B(row.getInt(0))  }
>   }
> }
> object BUDT extends BUDT
> object Test {
>   def main(args:Array[String]) = {
> val col = Seq(new A(Seq(new B(1), new B(2))),
>   new A(Seq(new B(3), new B(4
> val sc = new SparkContext(new 
> SparkConf().setMaster("local[1]").setAppName("TestSpark"))
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val df = sc.parallelize(1 to 2 zip col).toDF("id","b")
> df.select("b").show()
> df.collect().foreach(println)
>   }
> }
> In the new version (1.6.0) I needed to include the following import:
> import 

[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119
 ] 

Jakob Odersky edited comment on SPARK-12878 at 2/24/16 7:15 PM:


I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is a collection of elements B in this case, I don't think that 
the individual B's are serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?


was (Author: jodersky):
I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is of type B in this case, I don't think that the B's are 
serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?

> Dataframe fails with nested User Defined Types
> --
>
> Key: SPARK-12878
> URL: https://issues.apache.org/jira/browse/SPARK-12878
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Joao
>Priority: Blocker
>
> Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. 
> In version 1.5.2 the code below worked just fine:
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.catalyst.InternalRow
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> import org.apache.spark.sql.types._
> @SQLUserDefinedType(udt = classOf[AUDT])
> case class A(list:Seq[B])
> class AUDT extends UserDefinedType[A] {
>   override def sqlType: DataType = StructType(Seq(StructField("list", 
> ArrayType(BUDT, containsNull = false), nullable = true)))
>   override def userClass: Class[A] = classOf[A]
>   override def serialize(obj: Any): Any = obj match {
> case A(list) =>
>   val row = new GenericMutableRow(1)
>   row.update(0, new 
> GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
>   row
>   }
>   override def deserialize(datum: Any): A = {
> datum match {
>   case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq)
> }
>   }
> }
> object AUDT extends AUDT
> @SQLUserDefinedType(udt = classOf[BUDT])
> case class B(text:Int)
> class BUDT extends UserDefinedType[B] {
>   override def sqlType: DataType = StructType(Seq(StructField("num", 
> IntegerType, nullable = false)))
>   override def userClass: Class[B] = classOf[B]
>   override def serialize(obj: Any): Any = obj match {
> case B(text) =>
>   val row = new GenericMutableRow(1)
>   row.setInt(0, text)
>   row
>   }
>   override def deserialize(datum: Any): B = {
> datum match {  case row: InternalRow => new B(row.getInt(0))  }
>   }
> }
> object BUDT extends BUDT
> object Test {
>   def main(args:Array[String]) = {
> val col = Seq(new A(Seq(new B(1), new B(2))),
>   new A(Seq(new B(3), new B(4
> val sc = new SparkContext(new 
> SparkConf().setMaster("local[1]").setAppName("TestSpark"))
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val df = sc.parallelize(1 to 2 zip col).toDF("id","b")
> df.select("b").show()
> df.collect().foreach(println)
>   }
> }
> In the new version (1.6.0) I needed to include the following import:
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> 

[jira] [Commented] (SPARK-12878) Dataframe fails with nested User Defined Types

2016-02-23 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119
 ] 

Jakob Odersky commented on SPARK-12878:
---

I just tried your example and get a slightly different exception:

{{java.lang.ClassCastException: B cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit)

However I actually don't understand why this worked in 1.5.2 in the first 
place. Consider the following extract from your snippet:
{code}
case A(list) =>
  val row = new GenericMutableRow(1)
  row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
  row
{code}
although `list` is of type B in this case, I don't think that the B's are 
serialized according to the definition in BUDT.
I would assume you are solely responsible for the serialization and would have 
to call something like {{list.map(BUDT.serialize(_))}} to convert any child 
elements to an "SQL Datum" (not sure what that is but the docs say it, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType)

Maybe someone with more knowledge on the topic can clarify what's going on?

> Dataframe fails with nested User Defined Types
> --
>
> Key: SPARK-12878
> URL: https://issues.apache.org/jira/browse/SPARK-12878
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Joao
>Priority: Blocker
>
> Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. 
> In version 1.5.2 the code below worked just fine:
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql.catalyst.InternalRow
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> import org.apache.spark.sql.types._
> @SQLUserDefinedType(udt = classOf[AUDT])
> case class A(list:Seq[B])
> class AUDT extends UserDefinedType[A] {
>   override def sqlType: DataType = StructType(Seq(StructField("list", 
> ArrayType(BUDT, containsNull = false), nullable = true)))
>   override def userClass: Class[A] = classOf[A]
>   override def serialize(obj: Any): Any = obj match {
> case A(list) =>
>   val row = new GenericMutableRow(1)
>   row.update(0, new 
> GenericArrayData(list.map(_.asInstanceOf[Any]).toArray))
>   row
>   }
>   override def deserialize(datum: Any): A = {
> datum match {
>   case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq)
> }
>   }
> }
> object AUDT extends AUDT
> @SQLUserDefinedType(udt = classOf[BUDT])
> case class B(text:Int)
> class BUDT extends UserDefinedType[B] {
>   override def sqlType: DataType = StructType(Seq(StructField("num", 
> IntegerType, nullable = false)))
>   override def userClass: Class[B] = classOf[B]
>   override def serialize(obj: Any): Any = obj match {
> case B(text) =>
>   val row = new GenericMutableRow(1)
>   row.setInt(0, text)
>   row
>   }
>   override def deserialize(datum: Any): B = {
> datum match {  case row: InternalRow => new B(row.getInt(0))  }
>   }
> }
> object BUDT extends BUDT
> object Test {
>   def main(args:Array[String]) = {
> val col = Seq(new A(Seq(new B(1), new B(2))),
>   new A(Seq(new B(3), new B(4
> val sc = new SparkContext(new 
> SparkConf().setMaster("local[1]").setAppName("TestSpark"))
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val df = sc.parallelize(1 to 2 zip col).toDF("id","b")
> df.select("b").show()
> df.collect().foreach(println)
>   }
> }
> In the new version (1.6.0) I needed to include the following import:
> import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
> However, Spark crashes in runtime:
> 16/01/18 14:36:22 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getStruct(rows.scala:248)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
>   at 

[jira] [Comment Edited] (SPARK-12422) Binding Spark Standalone Master to public IP fails

2016-02-23 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159904#comment-15159904
 ] 

Jakob Odersky edited comment on SPARK-12422 at 2/24/16 12:16 AM:
-

This blocker issue is quite old now, can you or anyone else still reproduce it?
I tried it in a non-docker environment (Debian 9) and everything worked fine 
(Spark versions 1.5.2 and 1.6.0).


was (Author: jodersky):
This blocker issue is quite old now, can you still reproduce it?
I tried it in a non-docker environment (Debian 9) and everything worked fine 
(Spark versions 1.5.2 and 1.6.0).

> Binding Spark Standalone Master to public IP fails
> --
>
> Key: SPARK-12422
> URL: https://issues.apache.org/jira/browse/SPARK-12422
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.5.2
> Environment: Fails on direct deployment on Mac OSX and also in Docker 
> Environment (running on OSX or Ubuntu)
>Reporter: Bennet Jeutter
>Priority: Blocker
>
> The start of the Spark Standalone Master fails, when the host specified 
> equals the public IP address. For example I created a Docker Machine with 
> public IP 192.168.99.100, then I run:
> /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h 
> 192.168.99.100
> It'll fail with:
> Exception in thread "main" java.net.BindException: Failed to bind to: 
> /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> So I thought oh well, lets just bind to the local IP and access it via public 
> IP - this doesn't work, it will give:
> dropping message [class akka.actor.ActorSelectionMessage] for non-local 
> recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at 
> [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are 
> [akka.tcp://sparkMaster@spark-master:7077]
> So there is currently no possibility to run all this... related stackoverflow 
> issues:
> * 
> http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node
> * 
> http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12422) Binding Spark Standalone Master to public IP fails

2016-02-23 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159904#comment-15159904
 ] 

Jakob Odersky commented on SPARK-12422:
---

This blocker issue is quite old now, can you still reproduce it?
I tried it in a non-docker environment (Debian 9) and everything worked fine 
(Spark versions 1.5.2 and 1.6.0).

> Binding Spark Standalone Master to public IP fails
> --
>
> Key: SPARK-12422
> URL: https://issues.apache.org/jira/browse/SPARK-12422
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.5.2
> Environment: Fails on direct deployment on Mac OSX and also in Docker 
> Environment (running on OSX or Ubuntu)
>Reporter: Bennet Jeutter
>Priority: Blocker
>
> The start of the Spark Standalone Master fails, when the host specified 
> equals the public IP address. For example I created a Docker Machine with 
> public IP 192.168.99.100, then I run:
> /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h 
> 192.168.99.100
> It'll fail with:
> Exception in thread "main" java.net.BindException: Failed to bind to: 
> /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> So I thought oh well, lets just bind to the local IP and access it via public 
> IP - this doesn't work, it will give:
> dropping message [class akka.actor.ActorSelectionMessage] for non-local 
> recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at 
> [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are 
> [akka.tcp://sparkMaster@spark-master:7077]
> So there is currently no possibility to run all this... related stackoverflow 
> issues:
> * 
> http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node
> * 
> http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138447#comment-15138447
 ] 

Jakob Odersky commented on SPARK-13172:
---

Cool, thanks for the snippet! I agree, the first approach looks alot better

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137575#comment-15137575
 ] 

Jakob Odersky commented on SPARK-13172:
---

I would suggest taking similar approach to what the Scala library does: 
https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L1,
 that is just call mkString on the stack trace.

Using e.printStackTrace is not as flexible, it doesn't give you a string and as 
far as I know it prints to stderr with no option to redirect.

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated

2016-02-08 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137575#comment-15137575
 ] 

Jakob Odersky edited comment on SPARK-13172 at 2/8/16 8:03 PM:
---

I would suggest taking similar approach to what the Scala library does: 
https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L16,
 that is just call mkString on the stack trace.

Using e.printStackTrace is not as flexible, it doesn't give you a string and as 
far as I know it prints to stderr with no option to redirect.


was (Author: jodersky):
I would suggest taking similar approach to what the Scala library does: 
https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L1,
 that is just call mkString on the stack trace.

Using e.printStackTrace is not as flexible, it doesn't give you a string and as 
far as I know it prints to stderr with no option to redirect.

> Stop using RichException.getStackTrace it is deprecated
> ---
>
> Key: SPARK-13172
> URL: https://issues.apache.org/jira/browse/SPARK-13172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> Throwable getStackTrace is the recommended alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-08 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137795#comment-15137795
 ] 

Jakob Odersky commented on SPARK-13171:
---

This is very strange, are you sure it has something to do with the changes 
introduced by my PR? As mentioned previously, the only effective change between 
future() and Future.apply() is one less indirection. The only potentially 
visible changes would be for code that relies on reflection or does some macro 
magic.

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135979#comment-15135979
 ] 

Jakob Odersky commented on SPARK-13171:
---

[~tedyu]

On GitHub, [~smilegator] wrote:

> BTW, I also hit similar issues without this merge in my local environment. 
> Thus, I do not know what is the root causes of this problem

Have you tried running the tests before the future merge?

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated

2016-02-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135976#comment-15135976
 ] 

Jakob Odersky commented on SPARK-13171:
---

I can't see any reason why the above change should impact anything at all.

The biggest change is an additional indirection. Could it be that Hive does 
some funky reflection or macro stuff?

> Update promise & future to Promise and Future as the old ones are deprecated
> 
>
> Key: SPARK-13171
> URL: https://issues.apache.org/jira/browse/SPARK-13171
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Assignee: Jakob Odersky
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We use the promise and future functions on the concurrent object, both of 
> which have been deprecated in 2.11 . The full traits are present in Scala 
> 2.10 as well so this should be a safe migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!

2016-02-05 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135222#comment-15135222
 ] 

Jakob Odersky commented on SPARK-13176:
---

One of the places the process api is used is in creating symlinks. Since Spark 
requires at least Java 1.7, we can drop the use of external commands and rely 
on the nio.Files api instead.

> Ignore deprecation warning for ProcessBuilder lines_!
> -
>
> Key: SPARK-13176
> URL: https://issues.apache.org/jira/browse/SPARK-13176
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> The replacement,  stream_! & lineStream_! is not present in 2.10 API.
> Note @SupressWarnings for deprecation doesn't appear to work 
> https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings 
> might involve wrapping or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!

2016-02-05 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135254#comment-15135254
 ] 

Jakob Odersky commented on SPARK-13176:
---

The PR I submitted uses java.nio.Files, it does not fix the underlying problem 
of ignoring specific deprecation warnings.

> Ignore deprecation warning for ProcessBuilder lines_!
> -
>
> Key: SPARK-13176
> URL: https://issues.apache.org/jira/browse/SPARK-13176
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> The replacement,  stream_! & lineStream_! is not present in 2.10 API.
> Note @SupressWarnings for deprecation doesn't appear to work 
> https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings 
> might involve wrapping or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13208) Replace Pair with tuples

2016-02-04 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-13208:
--
Priority: Trivial  (was: Major)

> Replace Pair with tuples
> 
>
> Key: SPARK-13208
> URL: https://issues.apache.org/jira/browse/SPARK-13208
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples, Spark Core, SQL, Streaming
>Reporter: Jakob Odersky
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13208) Replace Pair with tuples

2016-02-04 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-13208:
-

 Summary: Replace Pair with tuples
 Key: SPARK-13208
 URL: https://issues.apache.org/jira/browse/SPARK-13208
 Project: Spark
  Issue Type: Sub-task
Reporter: Jakob Odersky






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!

2016-02-04 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131985#comment-15131985
 ] 

Jakob Odersky commented on SPARK-13176:
---

A possible workaround would be to filter out warnings in sbt, similar to what 
is done here https://github.com/apache/spark/pull/9128/files. Deprecation 
warnings could either be checked against a "whitelist" or, alternatively, the 
resulting classfiles could be inspected for the presence of some special 
annotations.

Note however that both solutions are a hack and will not work with maven.

> Ignore deprecation warning for ProcessBuilder lines_!
> -
>
> Key: SPARK-13176
> URL: https://issues.apache.org/jira/browse/SPARK-13176
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> The replacement,  stream_! & lineStream_! is not present in 2.10 API.
> Note @SupressWarnings for deprecation doesn't appear to work 
> https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings 
> might involve wrapping or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)

2016-01-26 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky resolved SPARK-12990.
---
Resolution: Duplicate

Fixed in 00026fa9912ecee5637f1e7dd222f977f31f6766

> Fatal warnings on @transient parameters (Scala 2.11)
> 
>
> Key: SPARK-12990
> URL: https://issues.apache.org/jira/browse/SPARK-12990
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Reporter: Jakob Odersky
>Priority: Critical
>
> Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and 
> {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient 
> annotations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)

2016-01-25 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-12990:
-

 Summary: Fatal warnings on @transient parameters (Scala 2.11)
 Key: SPARK-12990
 URL: https://issues.apache.org/jira/browse/SPARK-12990
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Reporter: Jakob Odersky


Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and 
{{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient 
annotations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12958) Map accumulator in spark

2016-01-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116461#comment-15116461
 ] 

Jakob Odersky commented on SPARK-12958:
---

I agree that it's very specific, maybe it would make sense to include a more 
generic version of {{MapAccumulator}}. Something like MapAccumulator[A, B], 
where B has to have an implicit AccumulatorParam[B] itself?

> Map accumulator in spark
> 
>
> Key: SPARK-12958
> URL: https://issues.apache.org/jira/browse/SPARK-12958
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Reporter: Souri
>Priority: Minor
>
> Spark by default supports accumulators of Int,Long,Double,Float.
> It would be good if we can have a Map accumulator where each executor can 
> just add key->value pairs and driver can have access to the aggregated value 
> for each key in the map.
> In this way, it would also be easier to use accumulators for various metrics. 
> We can define metrics at runtime as the map can take any string key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)

2016-01-25 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116409#comment-15116409
 ] 

Jakob Odersky commented on SPARK-12990:
---

marking as critical since it breaks the build

> Fatal warnings on @transient parameters (Scala 2.11)
> 
>
> Key: SPARK-12990
> URL: https://issues.apache.org/jira/browse/SPARK-12990
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Reporter: Jakob Odersky
>Priority: Critical
>
> Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and 
> {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient 
> annotations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)

2016-01-25 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-12990:
--
Priority: Critical  (was: Major)

> Fatal warnings on @transient parameters (Scala 2.11)
> 
>
> Key: SPARK-12990
> URL: https://issues.apache.org/jira/browse/SPARK-12990
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Reporter: Jakob Odersky
>Priority: Critical
>
> Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and 
> {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient 
> annotations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12801) The DataFrame.rdd not return same result

2016-01-13 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096867#comment-15096867
 ] 

Jakob Odersky commented on SPARK-12801:
---

I can't reproduce this either

> The DataFrame.rdd not return same result
> 
>
> Key: SPARK-12801
> URL: https://issues.apache.org/jira/browse/SPARK-12801
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.2
> Environment: 3 servers of centos7, cluster mode
>Reporter: Joseph Sun
>
> run spark-shell and typeing following codes.
> > import org.apache.spark.sql.types._
> > val schema = StructType(StructField("id",IntegerType,true)::Nil)
> > val rdd = sc.parallelize((0 to 1)).map(Row(_))
> > val df = sqlContext.createDataFrame(rdd,schema)
> > df.registerTempTable("test")
> > sqlContext.cacheTable("test")
> > sqlContext.sql("select *  from test limit 2").collect()
> show Array[org.apache.spark.sql.Row] = Array([0], [1]) 
> > sqlContext.sql("select *  from test limit 2").rdd.collect()
> run the code one more times,the result is not consistent.
> some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1])
> or: Array[org.apache.spark.sql.Row] = Array([2500], [2501])
> why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples

2016-01-13 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097327#comment-15097327
 ] 

Jakob Odersky commented on SPARK-12777:
---

I get the same error in SparkShell, however everything works in a plain 
application (as shown in the listing)
{code}
import org.apache.spark._
import org.apache.spark.sql._

case class Test(v: (Int, Int))

object Main {

  val conf = new SparkConf().setMaster("local").setAppName("testbench")
  val sc = new SparkContext(conf)
  val sqlContext = new SQLContext(sc)

  def main(args: Array[String]): Unit = {
import sqlContext.implicits._

val rdd = sc.parallelize(
  Seq(
Test((1,2)),
Test((3,4
val ds = sqlContext.createDataset(rdd).toDS
ds.show
  }
}
{code}

> Dataset fields can't be Scala tuples
> 
>
> Key: SPARK-12777
> URL: https://issues.apache.org/jira/browse/SPARK-12777
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Chris Jansen
>
> Datasets can't seem to handle scala tuples as fields of case classes in 
> datasets.
> {code}
> Seq((1,2), (3,4)).toDS().show() //works
> {code}
> When including a tuple as a field, the code fails:
> {code}
> case class Test(v: (Int, Int))
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   UnresolvedException: : Invalid call to dataType on unresolved object, tree: 
> 'name  (unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381)
>  org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388)
>  org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391)
>  
> org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441)
>  org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119)
>  org.apache.spark.Logging$class.logDebug(Logging.scala:62)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253)
>  org.apache.spark.sql.Dataset.(Dataset.scala:78)
>  org.apache.spark.sql.Dataset.(Dataset.scala:89)
>  org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507)
>  
> org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80)
> {code}
> When providing a type alias, the code fails in a different way:
> {code}
> type TwoInt = (Int, Int)
> case class Test(v: TwoInt)
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   NoSuchElementException: : head of empty list  (ScalaReflection.scala:504)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504)
>  
> 

[jira] [Created] (SPARK-12816) Schema generation for type aliases does not work

2016-01-13 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-12816:
-

 Summary: Schema generation for type aliases does not work
 Key: SPARK-12816
 URL: https://issues.apache.org/jira/browse/SPARK-12816
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Jakob Odersky


Related to the second part of SPARK-12777.
Assume the following:

{code}
case class Container[A](a: A)
type IntContainer = Container[Int]
{code}

Generating a schema with 
{code}org.apache.spark.sql.catalyst.ScalaReflection.schemaFor[IntContainer]{code}
 fails miserably with {{NoSuchElementException: : head of empty list  
(ScalaReflection.scala:504)}} (the same exception as described in the related 
issues)

Since {{schemaFor}} is called whenever a schema is implicitly needed, 
{{Datasets}} cannot be created from certain aliased types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples

2016-01-13 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097431#comment-15097431
 ] 

Jakob Odersky commented on SPARK-12777:
---

Concerning the problem with type aliases, I can reproduce them both inside a 
SparkShell and inside a standalone program. See issue SPARK-12816 and related 
PR.

> Dataset fields can't be Scala tuples
> 
>
> Key: SPARK-12777
> URL: https://issues.apache.org/jira/browse/SPARK-12777
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Chris Jansen
>
> Datasets can't seem to handle scala tuples as fields of case classes in 
> datasets.
> {code}
> Seq((1,2), (3,4)).toDS().show() //works
> {code}
> When including a tuple as a field, the code fails:
> {code}
> case class Test(v: (Int, Int))
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   UnresolvedException: : Invalid call to dataType on unresolved object, tree: 
> 'name  (unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76)
>  
> org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217)
>  
> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381)
>  org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388)
>  org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391)
>  
> org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172)
>  
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441)
>  org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119)
>  org.apache.spark.Logging$class.logDebug(Logging.scala:62)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>  
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253)
>  org.apache.spark.sql.Dataset.(Dataset.scala:78)
>  org.apache.spark.sql.Dataset.(Dataset.scala:89)
>  org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507)
>  
> org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80)
> {code}
> When providing a type alias, the code fails in a different way:
> {code}
> type TwoInt = (Int, Int)
> case class Test(v: TwoInt)
> Seq(Test((1,2)), Test((3,4)).toDS().show //fails
> {code}
> {code}
>   NoSuchElementException: : head of empty list  (ScalaReflection.scala:504)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:502)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor(ScalaReflection.scala:502)
>  
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:509)
>  
> 

[jira] [Created] (SPARK-12761) Clean up duplicated code in scala 2.11 repl.Main

2016-01-11 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-12761:
-

 Summary: Clean up duplicated code in scala 2.11 repl.Main
 Key: SPARK-12761
 URL: https://issues.apache.org/jira/browse/SPARK-12761
 Project: Spark
  Issue Type: Improvement
  Components: Spark Shell
Reporter: Jakob Odersky
Priority: Trivial


There is duplicate code in 
{{/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}. According 
to git blame, I moved the "settings" val to a method-local val, however due to 
a subsequent merge it was reintroduced as a global val.

Cf 
https://github.com/apache/spark/blame/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala,
 line 33.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12761) Clean up duplicated code in scala 2.11 repl.Main

2016-01-11 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092964#comment-15092964
 ] 

Jakob Odersky commented on SPARK-12761:
---

@vanzin, I think this is your area of expertise

> Clean up duplicated code in scala 2.11 repl.Main
> 
>
> Key: SPARK-12761
> URL: https://issues.apache.org/jira/browse/SPARK-12761
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Jakob Odersky
>Priority: Trivial
>
> There is duplicate code in 
> {{/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}. 
> According to git blame, I moved the "settings" val to a method-local val, 
> however due to a subsequent merge it was reintroduced as a global val.
> Cf 
> https://github.com/apache/spark/blame/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala,
>  line 33.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4257) Spark master can only be accessed by hostname

2016-01-07 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088405#comment-15088405
 ] 

Jakob Odersky commented on SPARK-4257:
--

The way I interpret the documentation {{-h HOST, --host HOSTHostname to 
listen on}} requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of 
the hostname).
Although the documentation is somewhat ambiguous, I think what you are 
experiencing is expected behavior.

> Spark master can only be accessed by hostname
> -
>
> Key: SPARK-4257
> URL: https://issues.apache.org/jira/browse/SPARK-4257
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Critical
>
> After sbin/start-all.sh, the spark shell can not connect to standalone master 
> by spark://IP:7077, it works if replace IP by hostname.
> In the docs[1], it says use `spark://IP:PORT` to connect to master.
> [1] http://spark.apache.org/docs/latest/spark-standalone.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4257) Spark master can only be accessed by hostname

2016-01-07 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088405#comment-15088405
 ] 

Jakob Odersky edited comment on SPARK-4257 at 1/7/16 11:46 PM:
---

The way I interpret the documentation "-h HOST, --host HOST Hostname to 
listen on" requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of the 
hostname).
Although the documentation is somewhat ambiguous, I think what you are 
experiencing is expected behavior.


was (Author: jodersky):
The way I interpret the documentation {{-h HOST, --host HOSTHostname to 
listen on}} requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of 
the hostname).
Although the documentation is somewhat ambiguous, I think what you are 
experiencing is expected behavior.

> Spark master can only be accessed by hostname
> -
>
> Key: SPARK-4257
> URL: https://issues.apache.org/jira/browse/SPARK-4257
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Critical
>
> After sbin/start-all.sh, the spark shell can not connect to standalone master 
> by spark://IP:7077, it works if replace IP by hostname.
> In the docs[1], it says use `spark://IP:PORT` to connect to master.
> [1] http://spark.apache.org/docs/latest/spark-standalone.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12648) UDF with Option[Double] throws ClassCastException

2016-01-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086388#comment-15086388
 ] 

Jakob Odersky edited comment on SPARK-12648 at 1/6/16 10:22 PM:


In spark-shell:
{code}
val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
"weight")
df: org.apache.spark.sql.DataFrame = [name: string, weight: double]
{code}
You're getting a DataFrame containing doubles, not optional doubles. Not 100% 
sure, but I'm guessing that creating a DataFrame from Option types is syntactic 
sugar to avoid using nulls in client code. Spark then optimizes the option 
types to nullable or default values.


was (Author: jodersky):
In spark-shell:
{code}
val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
"weight")
df: org.apache.spark.sql.DataFrame = [name: string, weight: double]
{/code}
You're getting a DataFrame containing doubles, not optional doubles. Not 100% 
sure, but I'm guessing that creating a DataFrame from Option types is syntactic 
sugar to avoid using nulls in client code. Spark then optimizes the option 
types to nullable or default values.

> UDF with Option[Double] throws ClassCastException
> -
>
> Key: SPARK-12648
> URL: https://issues.apache.org/jira/browse/SPARK-12648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Mikael Valot
>
> I can write an UDF that returns an Option[Double], and the DataFrame's  
> schema is correctly inferred to be a nullable double. 
> However I cannot seem to be able to write a UDF that takes an Option as an 
> argument:
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
> "weight")
> import org.apache.spark.sql.functions._
> val addTwo = udf((d: Option[Double]) => d.map(_+2)) 
> df.withColumn("plusTwo", addTwo(df("weight"))).show()
> =>
> 2016-01-05T14:41:52 Executor task launch worker-0 ERROR 
> org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
>   at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) 
> ~[na:na]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[na:na]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> ~[scala-library-2.10.5.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >