[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles
[ https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816398#comment-15816398 ] Jakob Odersky commented on SPARK-14280: --- Twitter chill for scala 2.12 is finally out and I'm pleased to say that Spark-REPL now builds and runs on the latest version of scala without any snapshot dependencies. > Update change-version.sh and pom.xml to add Scala 2.12 profiles > --- > > Key: SPARK-14280 > URL: https://issues.apache.org/jira/browse/SPARK-14280 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > The following instructions will be kept quasi-up-to-date and are the best > starting point for building a Spark snapshot with Scala 2.12.0-M4: > * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12. > * Install dependencies: > ** chill: check out https://github.com/twitter/chill/pull/253 and run > {{sbt ++2.12.0-M4 publishLocal}} > * Run {{./dev/change-scala-version.sh 2.12.0-M4}} > * To compile Spark, run {{build/sbt -Dscala-2.12}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14519) Cross-publish Kafka for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-14519: -- Summary: Cross-publish Kafka for Scala 2.12 (was: Cross-publish Kafka for Scala 2.12.0-M4) > Cross-publish Kafka for Scala 2.12 > -- > > Key: SPARK-14519 > URL: https://issues.apache.org/jira/browse/SPARK-14519 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > In order to build the streaming Kafka connector, we need to publish Kafka for > Scala 2.12.0-M4. Someone should file an issue against the Kafka project and > work with their developers to figure out what will block their upgrade / > release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736107#comment-15736107 ] Jakob Odersky commented on SPARK-17647: --- I rebased the PR and resolved the conflict. However, there is still the incompatibility issue with the sql ANTLR parser. I talk about it in my last [two comments | https://github.com/apache/spark/pull/15398#issuecomment-255917940 ] and propose a few solutions. Any feedback is welcome! > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles
[ https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723814#comment-15723814 ] Jakob Odersky commented on SPARK-14280: --- You're welcome pull the changes back into your repo of course, however I'd also gladly continue on working to add 2.12 support! Btw, how should this kind of all-or-nothing change be integrated into spark? I don't want to make a pull request for some half-baked feature, however I also feel like continuing to pile on features to this will result in a huge changeset, impossible to review > Update change-version.sh and pom.xml to add Scala 2.12 profiles > --- > > Key: SPARK-14280 > URL: https://issues.apache.org/jira/browse/SPARK-14280 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > The following instructions will be kept quasi-up-to-date and are the best > starting point for building a Spark snapshot with Scala 2.12.0-M4: > * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12. > * Install dependencies: > ** chill: check out https://github.com/twitter/chill/pull/253 and run > {{sbt ++2.12.0-M4 publishLocal}} > * Run {{./dev/change-scala-version.sh 2.12.0-M4}} > * To compile Spark, run {{build/sbt -Dscala-2.12}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14280) Update change-version.sh and pom.xml to add Scala 2.12 profiles
[ https://issues.apache.org/jira/browse/SPARK-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723807#comment-15723807 ] Jakob Odersky commented on SPARK-14280: --- Hi [~joshrosen], I rebased your initial work onto the latest master and upgraded dependencies. You can see the changes here https://github.com/apache/spark/compare/master...jodersky:scala-2.12 There were a few of merge conflicts, often with respect to dependency version mismatches. I tried to resolve conflicts cleanly, however considering that I also had to take into account libraries that recently built for 2.12, it could be that certain of your changes in the pom.xml were lost. There are still quite a few depedency issues for the latest scala versions, however core still builds :) > Update change-version.sh and pom.xml to add Scala 2.12 profiles > --- > > Key: SPARK-14280 > URL: https://issues.apache.org/jira/browse/SPARK-14280 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > The following instructions will be kept quasi-up-to-date and are the best > starting point for building a Spark snapshot with Scala 2.12.0-M4: > * Check out https://github.com/JoshRosen/spark/tree/build-for-2.12. > * Install dependencies: > ** chill: check out https://github.com/twitter/chill/pull/253 and run > {{sbt ++2.12.0-M4 publishLocal}} > * Run {{./dev/change-scala-version.sh 2.12.0-M4}} > * To compile Spark, run {{build/sbt -Dscala-2.12}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143 ] Jakob Odersky edited comment on SPARK-14222 at 11/3/16 8:33 PM: Thanks Sean, however I realized that the dependency is in fact not yet published for 2.12.0 final. The package I linked is from a different org. There's a ticket for a release here https://github.com/FasterXML/jackson-module-scala/pull/294 was (Author: jodersky): Thanks Sean, however I realized that the dependency is in fact not yet published for 2.12.0 final. The package I linked is from a different org, oops > Cross-publish jackson-module-scala for Scala 2.12 > - > > Key: SPARK-14222 > URL: https://issues.apache.org/jira/browse/SPARK-14222 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > In order to build Spark against Scala 2.12, we need to either remove our > jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. > Personally, I'd prefer to remove it because I don't think we make extensive > use of it and because I'm not a huge fan of the implicit mapping between case > classes and JSON wire formats (the extra verbosity required by other > approaches is a feature, IMO, rather than a bug because it makes it much > harder to accidentally break wire compatibility). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143 ] Jakob Odersky edited comment on SPARK-14222 at 11/3/16 8:30 PM: Thanks Sean, however I realized that the dependency is in fact not yet published for 2.12.0 final. The package I linked is from a different org, oops was (Author: jodersky): Thanks Sean, however I realized that the dependency is in fact not yet published for 2.12.0 final. The package I linked is from a different org > Cross-publish jackson-module-scala for Scala 2.12 > - > > Key: SPARK-14222 > URL: https://issues.apache.org/jira/browse/SPARK-14222 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > In order to build Spark against Scala 2.12, we need to either remove our > jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. > Personally, I'd prefer to remove it because I don't think we make extensive > use of it and because I'm not a huge fan of the implicit mapping between case > classes and JSON wire formats (the extra verbosity required by other > approaches is a feature, IMO, rather than a bug because it makes it much > harder to accidentally break wire compatibility). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634143#comment-15634143 ] Jakob Odersky commented on SPARK-14222: --- Thanks Sean, however I realized that the dependency is in fact not yet published for 2.12.0 final. The package I linked is from a different org > Cross-publish jackson-module-scala for Scala 2.12 > - > > Key: SPARK-14222 > URL: https://issues.apache.org/jira/browse/SPARK-14222 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > In order to build Spark against Scala 2.12, we need to either remove our > jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. > Personally, I'd prefer to remove it because I don't think we make extensive > use of it and because I'm not a huge fan of the implicit mapping between case > classes and JSON wire formats (the extra verbosity required by other > approaches is a feature, IMO, rather than a bug because it makes it much > harder to accidentally break wire compatibility). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14222) Cross-publish jackson-module-scala for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634117#comment-15634117 ] Jakob Odersky commented on SPARK-14222: --- A newer version of module (vertsion 2.8.4) is available for scala 2.12 now http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22jackson-module-scala_2.12%22. Can we upgrade spark's dependency (currently Spark uses 2.6.5)? > Cross-publish jackson-module-scala for Scala 2.12 > - > > Key: SPARK-14222 > URL: https://issues.apache.org/jira/browse/SPARK-14222 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > In order to build Spark against Scala 2.12, we need to either remove our > jackson-module-scala dependency or cross-publish Jackson for Scala 2.12. > Personally, I'd prefer to remove it because I don't think we make extensive > use of it and because I'm not a huge fan of the implicit mapping between case > classes and JSON wire formats (the extra verbosity required by other > approaches is a feature, IMO, rather than a bug because it makes it much > harder to accidentally break wire compatibility). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14220) Build and test Spark against Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634027#comment-15634027 ] Jakob Odersky commented on SPARK-14220: --- at least most dependencies will probably make 2.12 builds available, now that it is considered binary-stable > Build and test Spark against Scala 2.12 > --- > > Key: SPARK-14220 > URL: https://issues.apache.org/jira/browse/SPARK-14220 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Reporter: Josh Rosen >Priority: Blocker > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.12 milestone. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14220) Build and test Spark against Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634027#comment-15634027 ] Jakob Odersky edited comment on SPARK-14220 at 11/3/16 7:54 PM: At least most dependencies will probably make 2.12 builds available, now that it is considered binary-stable. The closure cleaning and byte code manipulation stuff is a whole different story though... was (Author: jodersky): at least most dependencies will probably make 2.12 builds available, now that it is considered binary-stable > Build and test Spark against Scala 2.12 > --- > > Key: SPARK-14220 > URL: https://issues.apache.org/jira/browse/SPARK-14220 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Reporter: Josh Rosen >Priority: Blocker > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.12 milestone. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14220) Build and test Spark against Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633762#comment-15633762 ] Jakob Odersky commented on SPARK-14220: --- Scala 2.12 was just officially announced :) > Build and test Spark against Scala 2.12 > --- > > Key: SPARK-14220 > URL: https://issues.apache.org/jira/browse/SPARK-14220 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Reporter: Josh Rosen >Priority: Blocker > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.12 milestone. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18018) Specify alternate escape character in 'LIKE' expression
[ https://issues.apache.org/jira/browse/SPARK-18018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590136#comment-15590136 ] Jakob Odersky commented on SPARK-18018: --- I've started a very early prototype [here|https://github.com/jodersky/spark/compare/SPARK-17647...jodersky:escape]. It's still very much wip and I'm currently pondering whether to make `like` expressions ternary or include the `escape` option in a new `pattern` expression. Any feedback is welcome! > Specify alternate escape character in 'LIKE' expression > --- > > Key: SPARK-18018 > URL: https://issues.apache.org/jira/browse/SPARK-18018 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jakob Odersky > > Spark currently uses the backslash character () to escape patterns in > 'LIKE' expressions. > Other RDBMS ([MS|https://msdn.microsoft.com/en-us/library/ms179859.aspx], > [Oracle|https://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm], > > [DB2|http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_likepredicate.html], > > [MySQL|http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html], > > [PostgreSQL|https://www.postgresql.org/docs/9.0/static/functions-matching.html], > [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]) > support specifying an alternate escape character with an extended syntax of > the `LIKE` operator. > The syntax is the same in all above mentioned systems and is described as > follows: > {code} > expression LIKE pattern [ESCAPE escapeChar] > {code} > where {{escapeChar}} is a single-character expression that will replace the > backslash as escape character. > Adding this extended to Spark SQL would be a usability improvement for users > coming from other systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18018) Specify alternate escape character in 'LIKE' expression
Jakob Odersky created SPARK-18018: - Summary: Specify alternate escape character in 'LIKE' expression Key: SPARK-18018 URL: https://issues.apache.org/jira/browse/SPARK-18018 Project: Spark Issue Type: Improvement Components: SQL Reporter: Jakob Odersky Spark currently uses the backslash character () to escape patterns in 'LIKE' expressions. Other RDBMS ([MS|https://msdn.microsoft.com/en-us/library/ms179859.aspx], [Oracle|https://docs.oracle.com/cd/B12037_01/server.101/b10759/conditions016.htm], [DB2|http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_likepredicate.html], [MySQL|http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html], [PostgreSQL|https://www.postgresql.org/docs/9.0/static/functions-matching.html], [Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]) support specifying an alternate escape character with an extended syntax of the `LIKE` operator. The syntax is the same in all above mentioned systems and is described as follows: {code} expression LIKE pattern [ESCAPE escapeChar] {code} where {{escapeChar}} is a single-character expression that will replace the backslash as escape character. Adding this extended to Spark SQL would be a usability improvement for users coming from other systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582882#comment-15582882 ] Jakob Odersky commented on SPARK-17368: --- [~arisofala...@gmail.com] Let me explain the fix to what I thought was initially impossible. Value classes do have a class-representation for compatibility with Java, and although this will have a slight overhead compared to the primitive counterpart, catalyst will mostly negate that overhead by proving its own encoders and operators on serialized objects. This means that any operations on datasets that allow user defined functions (e.g. `map`, `filter` etc) will work with the class representation instead of the wrapped value. Regarding the availability of encoders: while we cannot create type-classes that apply only to value classes (an implicit for `AnyVal` will also be applied to primitive types), without resorting to macros, this fix adds value class support to existing encoders. E.g. you can define your value class as a case class and have a working encoder out-of-the-box. Unfortunately there is no way to statically verify that the wrapped value is also encodable, but encoders in general will perform "deep inspection" during runtime. > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15577) Java can't import DataFrame type alias
[ https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564302#comment-15564302 ] Jakob Odersky commented on SPARK-15577: --- this cleaning of jiras is really good to see :) Considering that spark 2.0 has already shipped with the type alias, I think it is safe to close this ticket. We can always reopen it if necessary. > Java can't import DataFrame type alias > -- > > Key: SPARK-15577 > URL: https://issues.apache.org/jira/browse/SPARK-15577 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0 >Reporter: holdenk > > After SPARK-13244, all Java code needs to be updated to use Dataset > instead of DataFrame as we used a type alias. Should we consider adding a > DataFrame to the Java API which just extends Dataset for compatibility? > cc [~liancheng] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15577) Java can't import DataFrame type alias
[ https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563910#comment-15563910 ] Jakob Odersky commented on SPARK-15577: --- This was considered and trade-offs were actively discussed, but ultimately the type alias was chosen over sub classing. I think the main argument in favor of aliasing was to avoid incompatibilities in future libraries, i.e. there is utility function was written to accept a {{DataFrame}}, however I want to pass in a {{Dataset\[Row\]}}. [This email thread| http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html] contains the whole discussion > Java can't import DataFrame type alias > -- > > Key: SPARK-15577 > URL: https://issues.apache.org/jira/browse/SPARK-15577 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0 >Reporter: holdenk > > After SPARK-13244, all Java code needs to be updated to use Dataset > instead of DataFrame as we used a type alias. Should we consider adding a > DataFrame to the Java API which just extends Dataset for compatibility? > cc [~liancheng] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15577) Java can't import DataFrame type alias
[ https://issues.apache.org/jira/browse/SPARK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563910#comment-15563910 ] Jakob Odersky edited comment on SPARK-15577 at 10/10/16 11:41 PM: -- This was considered and trade-offs were actively discussed, but ultimately the type alias was chosen over sub classing. I think a principal argument in favor of aliasing was to avoid incompatibilities in future libraries, i.e. there is utility function was written to accept a {{DataFrame}}, however I want to pass in a {{Dataset\[Row\]}}. [This email thread| http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html] contains the whole discussion was (Author: jodersky): This was considered and trade-offs were actively discussed, but ultimately the type alias was chosen over sub classing. I think the main argument in favor of aliasing was to avoid incompatibilities in future libraries, i.e. there is utility function was written to accept a {{DataFrame}}, however I want to pass in a {{Dataset\[Row\]}}. [This email thread| http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-DataFrame-vs-Dataset-in-Spark-2-0-td16445.html] contains the whole discussion > Java can't import DataFrame type alias > -- > > Key: SPARK-15577 > URL: https://issues.apache.org/jira/browse/SPARK-15577 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0 >Reporter: holdenk > > After SPARK-13244, all Java code needs to be updated to use Dataset > instead of DataFrame as we used a type alias. Should we consider adding a > DataFrame to the Java API which just extends Dataset for compatibility? > cc [~liancheng] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553517#comment-15553517 ] Jakob Odersky commented on SPARK-17647: --- Xiao pointed me to this issue, I can take a look at it > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16264) Allow the user to use operators on the received DataFrame
[ https://issues.apache.org/jira/browse/SPARK-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494587#comment-15494587 ] Jakob Odersky commented on SPARK-16264: --- I just came across this issue through a comment in the ForeachSink. I understand why Sinks would be better off by not knowing about the type of QueryExecution, however I'm not quite sure what you mean by "having something similar to foreachwriter". Is the idea to have only a single foreach sink and expose all custom user sinks as foreach writers? > Allow the user to use operators on the received DataFrame > - > > Key: SPARK-16264 > URL: https://issues.apache.org/jira/browse/SPARK-16264 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu > > Currently Sink cannot apply any operators on the given DataFrame because new > DataFrame created by the operator will use QueryExecution rather than > IncrementalExecution. > There are two options to fix this one: > 1. Merge IncrementalExecution into QueryExecution so that QueryExecution can > also deal with streaming operators. > 2. Make Dataset operators inherits the QueryExecution(IncrementalExecution is > just a subclass of IncrementalExecution) from it's parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478428#comment-15478428 ] Jakob Odersky commented on SPARK-14221: --- I just saw that chill already [has a pending PR to upgrade to Kryo 4.0.0|https://github.com/twitter/chill/pull/258] > Cross-publish Chill for Scala 2.12 > -- > > Key: SPARK-14221 > URL: https://issues.apache.org/jira/browse/SPARK-14221 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > We need to cross-publish Chill in order to build against Scala 2.12. > Upstream issue: https://github.com/twitter/chill/issues/252 > I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into > multiple failed tests due to issues with Java8 lambda serialization (similar > to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be > slightly more involved then just bumping the dependencies in the Chill build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14221) Cross-publish Chill for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky edited comment on SPARK-14221 at 9/9/16 6:30 PM: --- [~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works when the PR got created but that was never published (and is hence a major blocker for Chill). Instead, Kryo went straight to version 4.0.0 (see changes [here|https://github.com/EsotericSoftware/kryo#new-in-release-400]). Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, updating the Kryo version in Chill, in order to support Scala 2.12 will also need discussion upstream. was (Author: jodersky): [~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works when the PR got created but that was never published. Instead, Kryo went straight to version 4.0.0 (see changes [here|https://github.com/EsotericSoftware/kryo#new-in-release-400]). Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, updating the Kryo version in Chill, in order to support Scala 2.12 will also need discussion upstream. > Cross-publish Chill for Scala 2.12 > -- > > Key: SPARK-14221 > URL: https://issues.apache.org/jira/browse/SPARK-14221 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > We need to cross-publish Chill in order to build against Scala 2.12. > Upstream issue: https://github.com/twitter/chill/issues/252 > I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into > multiple failed tests due to issues with Java8 lambda serialization (similar > to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be > slightly more involved then just bumping the dependencies in the Chill build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14221) Cross-publish Chill for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903 ] Jakob Odersky commented on SPARK-14221: --- [~joshrosen]'s upstream PR requires Kryo 3.1, a version that was in the works when the PR got created but that was never published. Instead, Kryo went straight to version 4.0.0 (see changes [here|https://github.com/EsotericSoftware/kryo#new-in-release-400]). Would a transitive dependency on Kryo 4.0.0 be acceptable in Spark? Of course, updating the Kryo version in Chill, in order to support Scala 2.12 will also need discussion upstream. > Cross-publish Chill for Scala 2.12 > -- > > Key: SPARK-14221 > URL: https://issues.apache.org/jira/browse/SPARK-14221 > Project: Spark > Issue Type: Sub-task > Components: Build >Reporter: Josh Rosen >Assignee: Josh Rosen > > We need to cross-publish Chill in order to build against Scala 2.12. > Upstream issue: https://github.com/twitter/chill/issues/252 > I tried building and testing {{chill-scala}} against 2.12.0-M3 and ran into > multiple failed tests due to issues with Java8 lambda serialization (similar > to https://github.com/EsotericSoftware/kryo/issues/215), so this task will be > slightly more involved then just bumping the dependencies in the Chill build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471428#comment-15471428 ] Jakob Odersky commented on SPARK-17368: --- Hmm, you're right my assumption was of using only value classes in the beginning and at the end was too naive. [~srowen], how likely do you think it is that we can include a meta-encoder in Spark? It could be included in the form of an optional import. Since the existing encoders/ScalaReflection framework already use runtime-reflection, my guess is that adding compile-time reflection will not be too difficult. > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833 ] Jakob Odersky edited comment on SPARK-17368 at 9/6/16 10:57 PM: So I thought about this a bit more and although it is possible to support value classes, I currently see two main issues that make it cumbersome: 1. Catalyst (the engine behind Datasets) generates and compiles code during runtime, that will represent the actual computation. This code being Java, together with the fact that value classes don't have runtime representations, will require changes in the implementation of Encoders (see my experimental branch [here|https://github.com/apache/spark/compare/master...jodersky:value-classes]). 2. The largest problem of both is how will encoders for value classes be accessible? Currently, encoders are exposed as type classes and there is unfortunately no way to create type classes for classes extending AnyVal (you could create an encoder for AnyVals, however that would also apply to any primitive type and you would get implicit resolution conflicts). Requiring explicit encoders for value classes may work, however you would still have no compile-time safety, as accessing of a value class' inner val will occur during runtime and may hence fail if it is not encodable. The cleanest solution would be to use meta programming: it would guarantee "encodability" during compile-time and could easily complement the current API. Unfortunately however, I don't think it could be included in Spark in the near future as the current meta programming solutions in Scala are either too new (scala.meta) or on their way to being deprecated (the current experimental scala macros). (I have been wanting to experiment with meta encoders for a while though, so maybe I'll try putting together an external library for that) How inconvenient is it to extract the wrapped value before creating a dataset and re-wrapping your final results? was (Author: jodersky): So I thought about this a bit more and although it is possible to support value classes, I currently see two main issues that make it cumbersome: 1. Catalyst (the engine behind Datasets) generates and compiles code during runtime, that will represent the actual computation. This code being Java, together with the fact that value classes don't have runtime representations, will require changes in the implementation of Encoders (see my experimental branch here). 2. The largest problem of both is how will encoders for value classes be accessible? Currently, encoders are exposed as type classes and there is unfortunately no way to create type classes for classes extending AnyVal (you could create an encoder for AnyVals, however that would also apply to any primitive type and you would get implicit resolution conflicts). Requiring explicit encoders for value classes may work, however you would still have no compile-time safety, as accessing of a value class' inner val will occur during runtime and may hence fail if it is not encodable. The cleanest solution would be to use meta programming: it would guarantee "encodability" during compile-time and could easily complement the current API. Unfortunately however, I don't think it could be included in Spark in the near future as the current meta programming solutions in Scala are either too new (scala.meta) or on their way to being deprecated (the current experimental scala macros). (I have been wanting to experiment with meta encoders for a while though, so maybe I'll try putting together an external library for that) How inconvenient is it to extract the wrapped value before creating a dataset and re-wrapping your final results? > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0,
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833 ] Jakob Odersky commented on SPARK-17368: --- So I thought about this a bit more and although it is possible to support value classes, I currently see two main issues that make it cumbersome: 1. Catalyst (the engine behind Datasets) generates and compiles code during runtime, that will represent the actual computation. This code being Java, together with the fact that value classes don't have runtime representations, will require changes in the implementation of Encoders (see my experimental branch here). 2. The largest problem of both is how will encoders for value classes be accessible? Currently, encoders are exposed as type classes and there is unfortunately no way to create type classes for classes extending AnyVal (you could create an encoder for AnyVals, however that would also apply to any primitive type and you would get implicit resolution conflicts). Requiring explicit encoders for value classes may work, however you would still have no compile-time safety, as accessing of a value class' inner val will occur during runtime and may hence fail if it is not encodable. The cleanest solution would be to use meta programming: it would guarantee "encodability" during compile-time and could easily complement the current API. Unfortunately however, I don't think it could be included in Spark in the near future as the current meta programming solutions in Scala are either too new (scala.meta) or on their way to being deprecated (the current experimental scala macros). (I have been wanting to experiment with meta encoders for a while though, so maybe I'll try putting together an external library for that) How inconvenient is it to extract the wrapped value before creating a dataset and re-wrapping your final results? > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459707#comment-15459707 ] Jakob Odersky commented on SPARK-17368: --- Yeah macros would be awesome, something with Scala.meta would be neat :) In the mean time it occurred to me that Catalyst uses ClassTags to do reflection in lots of places. These are generated during compile-time, so it might just yet be possible to support value classes. A quick test showed me that value classes can be detected and their parameters accessed. Getting a Schema for such a case is trivial, I'll see about adding encoders next! > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459587#comment-15459587 ] Jakob Odersky commented on SPARK-17368: --- I'm currently taking a look at this but my first analysis is not very positive: considering that value classes are pure compile-time constructs I think it isn't possible to do anything with them through reflection, which Catalyst assumes. Here's a relevant blog post http://tech.kinja.com/scala-value-classes-and-reflection-here-be-dragons-1527846740 I'll check it out in a bit more detail but I fear that we'll have to resolve this as a won't fix and not support value classes in Datasets :( > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17367) Cannot define value classes in REPL
[ https://issues.apache.org/jira/browse/SPARK-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457862#comment-15457862 ] Jakob Odersky commented on SPARK-17367: --- You're absolutely correct, it is a Scala issue. I raised it here as well though, since the -Yrepl-class-based option was originally created for Spark (the standard object-wrapping behaviour had issues with the ClosureCleaner and serialization IIRC) and was contributed back to Scala. Should I close the issue? > Cannot define value classes in REPL > --- > > Key: SPARK-17367 > URL: https://issues.apache.org/jira/browse/SPARK-17367 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Reporter: Jakob Odersky > > It is currently not possible to define a class extending `AnyVal` in the > REPL. The underlying reason is the {{-Yrepl-class-based}} option used by > Spark Shell. > The report here is more of an FYI for anyone stumbling upon the problem, see > the upstream issue [https://issues.scala-lang.org/browse/SI-9910] for any > progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966 ] Jakob Odersky edited comment on SPARK-17368 at 9/1/16 11:48 PM: FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} defined outside of {{object BreakSpark}}) Please also be aware that the given example will *not compile* in a spark shell. See this related issue https://issues.apache.org/jira/browse/SPARK-17367 regarding the definition of value classes in the REPL. was (Author: jodersky): FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} defined outside of {{object BreakSpark}}) > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966 ] Jakob Odersky commented on SPARK-17368: --- FYI the issue also occurs for top-level value classes (i.e. {{FeatureId}} defined outside of {{object BreakSpark}}) > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17367) Cannot define value classes in REPL
Jakob Odersky created SPARK-17367: - Summary: Cannot define value classes in REPL Key: SPARK-17367 URL: https://issues.apache.org/jira/browse/SPARK-17367 Project: Spark Issue Type: Bug Components: Spark Shell Reporter: Jakob Odersky It is currently not possible to define a class extending `AnyVal` in the REPL. The underlying reason is the {{-Yrepl-class-based}} option used by Spark Shell. The report here is more of an FYI for anyone stumbling upon the problem, see the upstream issue [https://issues.scala-lang.org/browse/SI-9910] for any progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 5:28 PM: That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with said setting enabled). I'll check this out tomorrow, but my first intuition is that it's an upstream bug. was (Author: jodersky): That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). I'll check this out tomorrow, but my first intuition is that it's an upstream bug. > Can not define class variable in repl > - > > Key: SPARK-17103 > URL: https://issues.apache.org/jira/browse/SPARK-17103 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > I can not execute the following code in spark 2.0 repl, but it succeeds in > scala 2.11 repl > spark 2.0 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test {val f=new File(".")} > :11: error: not found: type File >class Test {val f=new File(".")} > {code} > scala 2.11 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test { val f=new File(".")} > defined class Test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:53 AM: That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). I'll check this out tomorrow, but my first intuition is that it's an upstream bug. was (Author: jodersky): That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). > Can not define class variable in repl > - > > Key: SPARK-17103 > URL: https://issues.apache.org/jira/browse/SPARK-17103 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > I can not execute the following code in spark 2.0 repl, but it succeeds in > scala 2.11 repl > spark 2.0 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test {val f=new File(".")} > :11: error: not found: type File >class Test {val f=new File(".")} > {code} > scala 2.11 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test { val f=new File(".")} > defined class Test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17103) Can not define class variable in repl
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:52 AM: That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). was (Author: jodersky): That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). There is one option that is set by spark and has caused previous issues > Can not define class variable in repl > - > > Key: SPARK-17103 > URL: https://issues.apache.org/jira/browse/SPARK-17103 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > I can not execute the following code in spark 2.0 repl, but it succeeds in > scala 2.11 repl > spark 2.0 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test {val f=new File(".")} > :11: error: not found: type File >class Test {val f=new File(".")} > {code} > scala 2.11 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test { val f=new File(".")} > defined class Test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17103) Can not define class variable in repl
[ https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228 ] Jakob Odersky commented on SPARK-17103: --- That's true, the spark repl is basically just a thin wrapper around the scala repl, with custom initialization and settings. One of the settings, "-Yrepl-class-based", has caused issues previously and seems to be the culprit here again (I can reproduce the issue by running a normal scala repl with -Yrepl-class-based{/code} enabled). There is one option that is set by spark and has caused previous issues > Can not define class variable in repl > - > > Key: SPARK-17103 > URL: https://issues.apache.org/jira/browse/SPARK-17103 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > I can not execute the following code in spark 2.0 repl, but it succeeds in > scala 2.11 repl > spark 2.0 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test {val f=new File(".")} > :11: error: not found: type File >class Test {val f=new File(".")} > {code} > scala 2.11 repl > {code} > scala> import java.io.File > import java.io.File > scala> class Test { val f=new File(".")} > defined class Test > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely
[ https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423602#comment-15423602 ] Jakob Odersky commented on SPARK-17095: --- Since this bug also occurs when there are no opening braces (}}} anywhere in the doc is sufficient), I think this is an issue with scaladoc itself. I would recommend creating a bug report on the scala tracker https://issues.scala-lang.org/secure/Dashboard.jspa. Ideally, code blocks could be delimited with an arbitrary number of opening symbols followed by an arbitrary number of closing symbols (e.g. you could use (4 braces) to delimit code that itself contains }}} 3 braces. > Latex and Scala doc do not play nicely > -- > > Key: SPARK-17095 > URL: https://issues.apache.org/jira/browse/SPARK-17095 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Seth Hendrickson >Priority: Minor > Labels: starter > > In Latex, it is common to find "}}}" when closing several expressions at > once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added > Mathjax to render Latex equations in scaladoc. However, when scala doc sees > "}}}" or "{{{" it treats it as a special character for code block. This > results in some very strange output. > A poor workaround is to use "}}\,}" in latex which inserts a small > whitespace. This is not ideal, and we can hopefully find a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell
[ https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299168#comment-15299168 ] Jakob Odersky commented on SPARK-15014: --- You might still have some issues with classloaders, I didn't think of that at first > Spark Shell could use Ammonite Shell > > > Key: SPARK-15014 > URL: https://issues.apache.org/jira/browse/SPARK-15014 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 1.6.1 > Environment: All >Reporter: John-Michael Reed >Priority: Minor > Labels: shell, shell-script > > Lihaoyi has an enhanced Scala Shell called Ammonite. > https://github.com/lihaoyi/Ammonite > Users of Ammonite shell have tried to use it with Apache Spark. > https://github.com/lihaoyi/Ammonite/issues/382 > Spark Shell does not work with Ammonite Shell, but I want it to because the > Ammonite REPL offers enhanced auto-complete, pretty printing, and other > features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell
[ https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299118#comment-15299118 ] Jakob Odersky commented on SPARK-15014: --- spark-shell is a very thin wrapper around the standard scala repl (with spark dependencies). It does some configuration and exposes a spark context and some imports, almost everything is implemented in these two files: - https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala - https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala. I don't know much about ammonite, but as a workaround could you use spark as a standalone program in your shell? Just add the spark dependencies and create a spark context manually. > Spark Shell could use Ammonite Shell > > > Key: SPARK-15014 > URL: https://issues.apache.org/jira/browse/SPARK-15014 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 1.6.1 > Environment: All >Reporter: John-Michael Reed >Priority: Minor > Labels: shell, shell-script > > Lihaoyi has an enhanced Scala Shell called Ammonite. > https://github.com/lihaoyi/Ammonite > Users of Ammonite shell have tried to use it with Apache Spark. > https://github.com/lihaoyi/Ammonite/issues/382 > Spark Shell does not work with Ammonite Shell, but I want it to because the > Ammonite REPL offers enhanced auto-complete, pretty printing, and other > features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738 ] Jakob Odersky commented on SPARK-13581: --- I can't reproduce it anymore either > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Critical > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738 ] Jakob Odersky edited comment on SPARK-13581 at 5/18/16 8:26 PM: I can't reproduce it anymore either. Should I close it as "fixed" or invalid? was (Author: jodersky): I can't reproduce it anymore either > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Critical > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4
[ https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259100#comment-15259100 ] Jakob Odersky commented on SPARK-14519: --- That sounds reasonable, however should the parent JIRA then still be marked as a blocker for 2.0? > Cross-publish Kafka for Scala 2.12.0-M4 > --- > > Key: SPARK-14519 > URL: https://issues.apache.org/jira/browse/SPARK-14519 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > In order to build the streaming Kafka connector, we need to publish Kafka for > Scala 2.12.0-M4. Someone should file an issue against the Kafka project and > work with their developers to figure out what will block their upgrade / > release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases
[ https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259079#comment-15259079 ] Jakob Odersky commented on SPARK-14146: --- the reason this fails is because spark-shell sets the `-Yrepl-class-based` of the scala REPL. I'm looking into this. > Imported implicits can't be found in Spark REPL in some cases > - > > Key: SPARK-14146 > URL: https://issues.apache.org/jira/browse/SPARK-14146 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan > > {code} > class I(i: Int) { > def double: Int = i * 2 > } > class Context { > implicit def toI(i: Int): I = new I(i) > } > val c = new Context > import c._ > // OK > 1.double > // Fail > class A; 1.double > {code} > The above code snippets can work in Scala REPL however. > This will affect our Dataset functionality, for example: > {code} > class A; Seq(1 -> "a").toDS() // fail > {code} > or in paste mode: > {code} > :paste > class A > Seq(1 -> "a").toDS() // fail > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14519) Cross-publish Kafka for Scala 2.12.0-M4
[ https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258974#comment-15258974 ] Jakob Odersky commented on SPARK-14519: --- >From a reply in the mailing list archive (14/4/2016): {quote} If no-one else beats me to it, I intend to start this conversation after 0.10.0.0 is released (probably 1 to 2 months away depending on how RCs go). {quote} What should we do? Scala 2.12 support is a blocker for Spark 2.0, which is planned to enter code freeze in a week. > Cross-publish Kafka for Scala 2.12.0-M4 > --- > > Key: SPARK-14519 > URL: https://issues.apache.org/jira/browse/SPARK-14519 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > In order to build the streaming Kafka connector, we need to publish Kafka for > Scala 2.12.0-M4. Someone should file an issue against the Kafka project and > work with their developers to figure out what will block their upgrade / > release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14417) Cleanup Scala deprecation warnings once we drop 2.10.X
[ https://issues.apache.org/jira/browse/SPARK-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258883#comment-15258883 ] Jakob Odersky commented on SPARK-14417: --- I suggested that Arun add the JIRA in the title and close the issue. That way his work will stay readily available from here, when Scala 2.10 support is dropped. > Cleanup Scala deprecation warnings once we drop 2.10.X > -- > > Key: SPARK-14417 > URL: https://issues.apache.org/jira/browse/SPARK-14417 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: holdenk >Priority: Minor > > While a previous issue addressed many of the deprecation warnings, since we > didn't want to introduce scala version specific code there are a number of > deprecation warnings we can't easily fix. Once we drop Scala 2.10 we should > go back and cleanup these remaining issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258702#comment-15258702 ] Jakob Odersky commented on SPARK-14511: --- release is out, pr has been submitted > Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version > -- > > Key: SPARK-14511 > URL: https://issues.apache.org/jira/browse/SPARK-14511 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > Before we can move to 2.12, we need to publish our forked genjavadoc for > 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257217#comment-15257217 ] Jakob Odersky commented on SPARK-14511: --- Update: an issue was discovered during release-testing upstream. I just submitted a fix for it, tested against Akka and Spark. Javadoc in Spark emits a few error messages, however these were already present previously and do not affect the final, generated documentation. I'll get back when the release is out > Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version > -- > > Key: SPARK-14511 > URL: https://issues.apache.org/jira/browse/SPARK-14511 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > Before we can move to 2.12, we need to publish our forked genjavadoc for > 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job
[ https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251304#comment-15251304 ] Jakob Odersky commented on SPARK-10001: --- FYI, I took up the issue (previous pr #8216) > Allow Ctrl-C in spark-shell to kill running job > --- > > Key: SPARK-10001 > URL: https://issues.apache.org/jira/browse/SPARK-10001 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 1.4.1 >Reporter: Cheolsoo Park >Priority: Minor > > Hitting Ctrl-C in spark-sql (and other tools like presto) cancels any running > job and starts a new input line on the prompt. It would be nice if > spark-shell also can do that. Otherwise, in case a user submits a job, say he > made a mistake, and wants to cancel it, he needs to exit the shell and > re-login to continue his work. Re-login can be a pain especially in Spark on > yarn, since it takes a while to allocate AM container and initial executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14511) Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version
[ https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246336#comment-15246336 ] Jakob Odersky commented on SPARK-14511: --- cf https://github.com/typesafehub/genjavadoc/issues/73 I can create a PR with the dependency updates once upstream releases > Publish our forked genjavadoc for 2.12.0-M4 or stop using a forked version > -- > > Key: SPARK-14511 > URL: https://issues.apache.org/jira/browse/SPARK-14511 > Project: Spark > Issue Type: Sub-task > Components: Build, Project Infra >Reporter: Josh Rosen > > Before we can move to 2.12, we need to publish our forked genjavadoc for > 2.12.0-M4 (or 2.12 final) or stop using a forked version of the plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242725#comment-15242725 ] Jakob Odersky commented on SPARK-7992: -- [~mengxr], The PR is finally in! Let's hope upstream makes a release soon. > Hide private classes/objects in in generated Java API doc > - > > Key: SPARK-7992 > URL: https://issues.apache.org/jira/browse/SPARK-7992 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > > After SPARK-5610, we found that private classes/objects still show up in the > generated Java API doc, e.g., under `org.apache.spark.api.r` we can see > {code} > BaseRRDD > PairwiseRRDD > RRDD > SpecialLengths > StringRRDD > {code} > We should update genjavadoc to hide those private classes/methods. The best > approach is to find a good mapping from Scala private to Java, and merge it > into the main genjavadoc repo. A WIP PR is at > https://github.com/typesafehub/genjavadoc/pull/47. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215315#comment-15215315 ] Jakob Odersky commented on SPARK-7992: -- [~mengxr], I just submitted [another PR|https://github.com/typesafehub/genjavadoc/pull/71] to the genjavadoc project. Once accepted, the original functionality should be straight-forward to merge. > Hide private classes/objects in in generated Java API doc > - > > Key: SPARK-7992 > URL: https://issues.apache.org/jira/browse/SPARK-7992 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > > After SPARK-5610, we found that private classes/objects still show up in the > generated Java API doc, e.g., under `org.apache.spark.api.r` we can see > {code} > BaseRRDD > PairwiseRRDD > RRDD > SpecialLengths > StringRRDD > {code} > We should update genjavadoc to hide those private classes/methods. The best > approach is to find a good mapping from Scala private to Java, and merge it > into the main genjavadoc repo. A WIP PR is at > https://github.com/typesafehub/genjavadoc/pull/47. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7992) Hide private classes/objects in in generated Java API doc
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280 ] Jakob Odersky edited comment on SPARK-7992 at 3/23/16 10:16 PM: Hey [~mengxr], you caught me in a very busy time last week and I'm afraid to say that I completely forgot about this. I just took up the issue. Take a look at my comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. was (Author: jodersky): Hey Xiangrui, you caught me in a very busy time last week and I'm afraid to say that I completely forgot about this. I just took up the issue. Take a look at my comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. > Hide private classes/objects in in generated Java API doc > - > > Key: SPARK-7992 > URL: https://issues.apache.org/jira/browse/SPARK-7992 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > > After SPARK-5610, we found that private classes/objects still show up in the > generated Java API doc, e.g., under `org.apache.spark.api.r` we can see > {code} > BaseRRDD > PairwiseRRDD > RRDD > SpecialLengths > StringRRDD > {code} > We should update genjavadoc to hide those private classes/methods. The best > approach is to find a good mapping from Scala private to Java, and merge it > into the main genjavadoc repo. A WIP PR is at > https://github.com/typesafehub/genjavadoc/pull/47. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280 ] Jakob Odersky commented on SPARK-7992: -- Hey Xiangrui, you caught me in a very busy time last week and I'm afraid to say that I completely forgot about this. I just took up the issue. Take a look at my comment on the PR thread https://github.com/typesafehub/genjavadoc/pull/47. > Hide private classes/objects in in generated Java API doc > - > > Key: SPARK-7992 > URL: https://issues.apache.org/jira/browse/SPARK-7992 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > > After SPARK-5610, we found that private classes/objects still show up in the > generated Java API doc, e.g., under `org.apache.spark.api.r` we can see > {code} > BaseRRDD > PairwiseRRDD > RRDD > SpecialLengths > StringRRDD > {code} > We should update genjavadoc to hide those private classes/methods. The best > approach is to find a good mapping from Scala private to Java, and merge it > into the main genjavadoc repo. A WIP PR is at > https://github.com/typesafehub/genjavadoc/pull/47. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7992) Hide private classes/objects in in generated Java API doc
[ https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197877#comment-15197877 ] Jakob Odersky commented on SPARK-7992: -- I'll check it out > Hide private classes/objects in in generated Java API doc > - > > Key: SPARK-7992 > URL: https://issues.apache.org/jira/browse/SPARK-7992 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > > After SPARK-5610, we found that private classes/objects still show up in the > generated Java API doc, e.g., under `org.apache.spark.api.r` we can see > {code} > BaseRRDD > PairwiseRRDD > RRDD > SpecialLengths > StringRRDD > {code} > We should update genjavadoc to hide those private classes/methods. The best > approach is to find a good mapping from Scala private to Java, and merge it > into the main genjavadoc repo. A WIP PR is at > https://github.com/typesafehub/genjavadoc/pull/47. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13929) Use Scala reflection for UDFs
Jakob Odersky created SPARK-13929: - Summary: Use Scala reflection for UDFs Key: SPARK-13929 URL: https://issues.apache.org/jira/browse/SPARK-13929 Project: Spark Issue Type: Bug Components: SQL Reporter: Jakob Odersky Priority: Minor {{ScalaReflection}} uses native Java reflection for User Defined Types which would fail if such types are not plain Scala classes that map 1:1 to Java. Consider the following extract (from here https://github.com/apache/spark/blob/92024797a4fad594b5314f3f3be5c6be2434de8a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L376 ): {code} case t if Utils.classIsLoadable(className) && Utils.classForName(className).isAnnotationPresent(classOf[SQLUserDefinedType]) => val udt = Utils.classForName(className).getAnnotation(classOf[SQLUserDefinedType]).udt().newInstance() //... {code} If {{t}}'s runtime class is actually synthetic (something that doesn't exist in Java and hence uses a dollar sign internally), such as nested classes or package objects, the above code will fail. Currently there are no known use-cases of synthetic user-defined types (hence the minor priority), however it would be best practice to remove plain Java reflection and rely on Scala reflection instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196928#comment-15196928 ] Jakob Odersky commented on SPARK-13118: --- Upate: there was actually with inner classes (or package objects or any other synthetic class containing a dollar sign), however it only occurs when wrapped in option types. > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196463#comment-15196463 ] Jakob Odersky commented on SPARK-13118: --- Should I remove the JIRA ID from my existing PR? > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465 ] Jakob Odersky commented on SPARK-13118: --- Sure, I'll submit a PR with the test > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194163#comment-15194163 ] Jakob Odersky commented on SPARK-13118: --- [~marmbrus], what's the issue at hand? Creating a simple test: {code} package object packageobject { case class Container(x: Int) } test("Package objects") { import packageobject._ val ds = Seq(Container(1)).toDS() checkDataset(ds, Container(1)) } {code} works without an issue. I might be testing something completely irrelevant but I can't quite make out the issue from the description. > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193968#comment-15193968 ] Jakob Odersky commented on SPARK-13118: --- If I recall correctly, I couldn't reproduce the issue. I'll have another shot at it though > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-13581: -- Description: When running an action on a DataFrame obtained by reading from a libsvm file a MatchError is thrown, however doing the same on a cached DataFrame works fine. {code} val df = sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") //file is in spark repository df.select(df("features")).show() //MatchError df.cache() df.select(df("features")).show() //OK {code} The exception stack trace is the following: {code} scala.MatchError: 1.0 (of class java.lang.Double) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) {code} This issue first appeared in commit {{1dac964c1}}, in PR [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. [~jeffzhang], do you have any insight of what could be going on? cc [~iyounus] was: When running an action on a DataFrame obtained by reading from a libsvm file a MatchError is thrown, however doing the same on a cached DataFrame works fine. {code} val df = sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") //file is df.select(df("features")).show() //MatchError df.cache() df.select(df("features")).show() //OK {code} The exception stack trace is the following: {code} scala.MatchError: 1.0 (of class java.lang.Double) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) {code} This issue first appeared in commit {{1dac964c1}}, in PR [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. [~jeffzhang], do you have any insight of what could be going on? cc [~iyounus] > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at >
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173058#comment-15173058 ] Jakob Odersky commented on SPARK-13581: --- It's in spark "data/mllib/sample_libsvm_data.txt" > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13581) LibSVM throws MatchError
Jakob Odersky created SPARK-13581: - Summary: LibSVM throws MatchError Key: SPARK-13581 URL: https://issues.apache.org/jira/browse/SPARK-13581 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Jakob Odersky Priority: Minor When running an action on a DataFrame obtained by reading from a libsvm file a MatchError is thrown, however doing the same on a cached DataFrame works fine. {code} val df = sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") //file is df.select(df("features")).show() //MatchError df.cache() df.select(df("features")).show() //OK {code} The exception stack trace is the following: {code} scala.MatchError: 1.0 (of class java.lang.Double) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) [info] at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) [info] at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) [info] at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) {code} This issue first appeared in commit {{1dac964c1}}, in PR [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. [~jeffzhang], do you have any insight of what could be going on? cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky reopened SPARK-7768: -- > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Priority: Critical > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky closed SPARK-7768. Resolution: Fixed > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Priority: Critical > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168155#comment-15168155 ] Jakob Odersky commented on SPARK-7768: -- [~marmbrus] UDTs are public now (in Scala at least), can this JIRA be closed? > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Priority: Critical > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119 ] Jakob Odersky edited comment on SPARK-12878 at 2/25/16 10:22 PM: - I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is a collection of elements B in this case, I don't think that the individual Bs are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge ([~marmbrus] [~cloud_fan]) on the topic can clarify what's going on? was (Author: jodersky): I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is a collection of elements B in this case, I don't think that the individual Bs are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? > Dataframe fails with nested User Defined Types > -- > > Key: SPARK-12878 > URL: https://issues.apache.org/jira/browse/SPARK-12878 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Joao >Priority: Blocker > > Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. > In version 1.5.2 the code below worked just fine: > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.catalyst.InternalRow > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow > import org.apache.spark.sql.types._ > @SQLUserDefinedType(udt = classOf[AUDT]) > case class A(list:Seq[B]) > class AUDT extends UserDefinedType[A] { > override def sqlType: DataType = StructType(Seq(StructField("list", > ArrayType(BUDT, containsNull = false), nullable = true))) > override def userClass: Class[A] = classOf[A] > override def serialize(obj: Any): Any = obj match { > case A(list) => > val row = new GenericMutableRow(1) > row.update(0, new > GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) > row > } > override def deserialize(datum: Any): A = { > datum match { > case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq) > } > } > } > object AUDT extends AUDT > @SQLUserDefinedType(udt = classOf[BUDT]) > case class B(text:Int) > class BUDT extends UserDefinedType[B] { > override def sqlType: DataType = StructType(Seq(StructField("num", > IntegerType, nullable = false))) > override def userClass: Class[B] = classOf[B] > override def serialize(obj: Any): Any = obj match { > case B(text) => > val row = new GenericMutableRow(1) > row.setInt(0, text) > row > } > override def deserialize(datum: Any): B = { > datum match { case row: InternalRow => new B(row.getInt(0)) } > } > } > object BUDT extends BUDT > object Test { > def main(args:Array[String]) = { > val col = Seq(new A(Seq(new B(1), new B(2))), > new A(Seq(new B(3), new B(4 > val sc = new SparkContext(new > SparkConf().setMaster("local[1]").setAppName("TestSpark")) > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val df = sc.parallelize(1 to 2 zip col).toDF("id","b") > df.select("b").show() > df.collect().foreach(println) > } > } > In the new version (1.6.0) I needed to include the following import: > import
[jira] [Commented] (SPARK-10712) JVM crashes with spark.sql.tungsten.enabled = true
[ https://issues.apache.org/jira/browse/SPARK-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167911#comment-15167911 ] Jakob Odersky commented on SPARK-10712: --- Any news on this? Is it still an issue? > JVM crashes with spark.sql.tungsten.enabled = true > -- > > Key: SPARK-10712 > URL: https://issues.apache.org/jira/browse/SPARK-10712 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 > Environment: 1 node - Linux, 64GB ram, 8 core >Reporter: Mauro Pirrone >Priority: Critical > > When turning on tungsten, I get the following error when executing a > query/job with a few joins. When tungsten is turned off, the error does not > appear. Also note that tungsten works for me in other cases. > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7ffadaf59200, pid=7598, tid=140710015645440 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build > 1.8.0_45-b14) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # V [libjvm.so+0x7eb200] > # > # Core dump written. Default location: //core or core.7598 (max size 100 > kB). To ensure a full core dump, try "ulimit -c unlimited" before starting > Java again > # > # An error report file with more information is saved as: > # //hs_err_pid7598.log > Compiled method (nm) 44403 10436 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x7ffac6b49290,0x7ffac6b495f8] = 872 > relocation [0x7ffac6b493b8,0x7ffac6b49400] = 72 > main code [0x7ffac6b49400,0x7ffac6b495f8] = 504 > Compiled method (nm) 44403 10436 n 0 sun.misc.Unsafe::copyMemory > (native) > total in heap [0x7ffac6b49290,0x7ffac6b495f8] = 872 > relocation [0x7ffac6b493b8,0x7ffac6b49400] = 72 > main code [0x7ffac6b49400,0x7ffac6b495f8] = 504 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7ff7902e7800): JavaThread "broadcast-hash-join-1" > daemon [_thread_in_vm, id=16548, stack(0x7ff66bd98000,0x7ff66be99000)] > siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: > 0x00069f572b10 > Registers: > RAX=0x00069f672b08, RBX=0x7ff7902e7800, RCX=0x000394132140, > RDX=0xfffe0004 > RSP=0x7ff66be97048, RBP=0x7ff66be970a0, RSI=0x000394032148, > RDI=0x00069f572b10 > R8 =0x7ff66be970d0, R9 =0x0028, R10=0x7ff79cc0e1e7, > R11=0x7ff79cc0e198 > R12=0x7ff66be970c0, R13=0x7ff66be970d0, R14=0x0028, > R15=0x30323048 > RIP=0x7ff7b0dae200, EFLAGS=0x00010282, CSGSFS=0xe033, > ERR=0x0004 > TRAPNO=0x000e > Top of Stack: (sp=0x7ff66be97048) > 0x7ff66be97048: 7ff7b1042b1a 7ff7902e7800 > 0x7ff66be97058: 7ff7 7ff7902e7800 > 0x7ff66be97068: 7ff7902e7800 7ff7ad2846a0 > 0x7ff66be97078: 7ff7897048d8 > 0x7ff66be97088: 7ff66be97110 7ff66be971f0 > 0x7ff66be97098: 7ff7902e7800 7ff66be970f0 > 0x7ff66be970a8: 7ff79cc0e261 0010 > 0x7ff66be970b8: 000390c04048 00066f24fac8 > 0x7ff66be970c8: 7ff7902e7800 000394032120 > 0x7ff66be970d8: 7ff7902e7800 7ff66f971af0 > 0x7ff66be970e8: 7ff7902e7800 7ff66be97198 > 0x7ff66be970f8: 7ff79c9d4c4d 7ff66a454b10 > 0x7ff66be97108: 7ff79c9d4c4d 0010 > 0x7ff66be97118: 7ff7902e5a90 0028 > 0x7ff66be97128: 7ff79c9d4760 000394032120 > 0x7ff66be97138: 30323048 7ff66be97160 > 0x7ff66be97148: 00066f24fac8 000390c04048 > 0x7ff66be97158: 7ff66be97158 7ff66f978eeb > 0x7ff66be97168: 7ff66be971f0 7ff66f9791c8 > 0x7ff66be97178: 7ff668e90c60 7ff66f978f60 > 0x7ff66be97188: 7ff66be97110 7ff66be971b8 > 0x7ff66be97198: 7ff66be97238 7ff79c9d4c4d > 0x7ff66be971a8: 0010 > 0x7ff66be971b8: 38363130 38363130 > 0x7ff66be971c8: 0028 7ff66f973388 > 0x7ff66be971d8: 000394032120 30323048 > 0x7ff66be971e8: 000665823080 00066f24fac8 > 0x7ff66be971f8: 7ff66be971f8 7ff66f973357 > 0x7ff66be97208: 7ff66be97260 7ff66f976fe0 > 0x7ff66be97218: 7ff66f973388 > 0x7ff66be97228: 7ff66be971b8 7ff66be97248 >
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163902#comment-15163902 ] Jakob Odersky commented on SPARK-13118: --- Ah, just realized the context of this issue, it's part of the Dataset API super-ticket > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163755#comment-15163755 ] Jakob Odersky commented on SPARK-13118: --- Hi Michael, what's the concrete issue you encounter, is it a (de-)serialization bug? I ran a simple test with DataFrames containing classes defined in package objects and everything worked out fine. I also quickly checked {{o.a.s.sql.catalyst.ScalaReflection}} but it seems that type names are always accessed via native scala reflection utilities. > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119 ] Jakob Odersky edited comment on SPARK-12878 at 2/24/16 7:16 PM: I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is a collection of elements B in this case, I don't think that the individual Bs are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? was (Author: jodersky): I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is a collection of elements B in this case, I don't think that the individual B's are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? > Dataframe fails with nested User Defined Types > -- > > Key: SPARK-12878 > URL: https://issues.apache.org/jira/browse/SPARK-12878 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Joao >Priority: Blocker > > Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. > In version 1.5.2 the code below worked just fine: > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.catalyst.InternalRow > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow > import org.apache.spark.sql.types._ > @SQLUserDefinedType(udt = classOf[AUDT]) > case class A(list:Seq[B]) > class AUDT extends UserDefinedType[A] { > override def sqlType: DataType = StructType(Seq(StructField("list", > ArrayType(BUDT, containsNull = false), nullable = true))) > override def userClass: Class[A] = classOf[A] > override def serialize(obj: Any): Any = obj match { > case A(list) => > val row = new GenericMutableRow(1) > row.update(0, new > GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) > row > } > override def deserialize(datum: Any): A = { > datum match { > case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq) > } > } > } > object AUDT extends AUDT > @SQLUserDefinedType(udt = classOf[BUDT]) > case class B(text:Int) > class BUDT extends UserDefinedType[B] { > override def sqlType: DataType = StructType(Seq(StructField("num", > IntegerType, nullable = false))) > override def userClass: Class[B] = classOf[B] > override def serialize(obj: Any): Any = obj match { > case B(text) => > val row = new GenericMutableRow(1) > row.setInt(0, text) > row > } > override def deserialize(datum: Any): B = { > datum match { case row: InternalRow => new B(row.getInt(0)) } > } > } > object BUDT extends BUDT > object Test { > def main(args:Array[String]) = { > val col = Seq(new A(Seq(new B(1), new B(2))), > new A(Seq(new B(3), new B(4 > val sc = new SparkContext(new > SparkConf().setMaster("local[1]").setAppName("TestSpark")) > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val df = sc.parallelize(1 to 2 zip col).toDF("id","b") > df.select("b").show() > df.collect().foreach(println) > } > } > In the new version (1.6.0) I needed to include the following import: > import
[jira] [Comment Edited] (SPARK-12878) Dataframe fails with nested User Defined Types
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119 ] Jakob Odersky edited comment on SPARK-12878 at 2/24/16 7:15 PM: I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is a collection of elements B in this case, I don't think that the individual B's are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? was (Author: jodersky): I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is of type B in this case, I don't think that the B's are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? > Dataframe fails with nested User Defined Types > -- > > Key: SPARK-12878 > URL: https://issues.apache.org/jira/browse/SPARK-12878 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Joao >Priority: Blocker > > Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. > In version 1.5.2 the code below worked just fine: > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.catalyst.InternalRow > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow > import org.apache.spark.sql.types._ > @SQLUserDefinedType(udt = classOf[AUDT]) > case class A(list:Seq[B]) > class AUDT extends UserDefinedType[A] { > override def sqlType: DataType = StructType(Seq(StructField("list", > ArrayType(BUDT, containsNull = false), nullable = true))) > override def userClass: Class[A] = classOf[A] > override def serialize(obj: Any): Any = obj match { > case A(list) => > val row = new GenericMutableRow(1) > row.update(0, new > GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) > row > } > override def deserialize(datum: Any): A = { > datum match { > case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq) > } > } > } > object AUDT extends AUDT > @SQLUserDefinedType(udt = classOf[BUDT]) > case class B(text:Int) > class BUDT extends UserDefinedType[B] { > override def sqlType: DataType = StructType(Seq(StructField("num", > IntegerType, nullable = false))) > override def userClass: Class[B] = classOf[B] > override def serialize(obj: Any): Any = obj match { > case B(text) => > val row = new GenericMutableRow(1) > row.setInt(0, text) > row > } > override def deserialize(datum: Any): B = { > datum match { case row: InternalRow => new B(row.getInt(0)) } > } > } > object BUDT extends BUDT > object Test { > def main(args:Array[String]) = { > val col = Seq(new A(Seq(new B(1), new B(2))), > new A(Seq(new B(3), new B(4 > val sc = new SparkContext(new > SparkConf().setMaster("local[1]").setAppName("TestSpark")) > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val df = sc.parallelize(1 to 2 zip col).toDF("id","b") > df.select("b").show() > df.collect().foreach(println) > } > } > In the new version (1.6.0) I needed to include the following import: > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow >
[jira] [Commented] (SPARK-12878) Dataframe fails with nested User Defined Types
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119 ] Jakob Odersky commented on SPARK-12878: --- I just tried your example and get a slightly different exception: {{java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.catalyst.InternalRow}} (B as opposed to BoxedUnit) However I actually don't understand why this worked in 1.5.2 in the first place. Consider the following extract from your snippet: {code} case A(list) => val row = new GenericMutableRow(1) row.update(0, new GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) row {code} although `list` is of type B in this case, I don't think that the B's are serialized according to the definition in BUDT. I would assume you are solely responsible for the serialization and would have to call something like {{list.map(BUDT.serialize(_))}} to convert any child elements to an "SQL Datum" (not sure what that is but the docs say it, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.UserDefinedType) Maybe someone with more knowledge on the topic can clarify what's going on? > Dataframe fails with nested User Defined Types > -- > > Key: SPARK-12878 > URL: https://issues.apache.org/jira/browse/SPARK-12878 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Joao >Priority: Blocker > > Spark 1.6.0 crashes when using nested User Defined Types in a Dataframe. > In version 1.5.2 the code below worked just fine: > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.catalyst.InternalRow > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow > import org.apache.spark.sql.types._ > @SQLUserDefinedType(udt = classOf[AUDT]) > case class A(list:Seq[B]) > class AUDT extends UserDefinedType[A] { > override def sqlType: DataType = StructType(Seq(StructField("list", > ArrayType(BUDT, containsNull = false), nullable = true))) > override def userClass: Class[A] = classOf[A] > override def serialize(obj: Any): Any = obj match { > case A(list) => > val row = new GenericMutableRow(1) > row.update(0, new > GenericArrayData(list.map(_.asInstanceOf[Any]).toArray)) > row > } > override def deserialize(datum: Any): A = { > datum match { > case row: InternalRow => new A(row.getArray(0).toArray(BUDT).toSeq) > } > } > } > object AUDT extends AUDT > @SQLUserDefinedType(udt = classOf[BUDT]) > case class B(text:Int) > class BUDT extends UserDefinedType[B] { > override def sqlType: DataType = StructType(Seq(StructField("num", > IntegerType, nullable = false))) > override def userClass: Class[B] = classOf[B] > override def serialize(obj: Any): Any = obj match { > case B(text) => > val row = new GenericMutableRow(1) > row.setInt(0, text) > row > } > override def deserialize(datum: Any): B = { > datum match { case row: InternalRow => new B(row.getInt(0)) } > } > } > object BUDT extends BUDT > object Test { > def main(args:Array[String]) = { > val col = Seq(new A(Seq(new B(1), new B(2))), > new A(Seq(new B(3), new B(4 > val sc = new SparkContext(new > SparkConf().setMaster("local[1]").setAppName("TestSpark")) > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val df = sc.parallelize(1 to 2 zip col).toDF("id","b") > df.select("b").show() > df.collect().foreach(println) > } > } > In the new version (1.6.0) I needed to include the following import: > import org.apache.spark.sql.catalyst.expressions.GenericMutableRow > However, Spark crashes in runtime: > 16/01/18 14:36:22 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to > org.apache.spark.sql.catalyst.InternalRow > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:51) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getStruct(rows.scala:248) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51) > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at
[jira] [Comment Edited] (SPARK-12422) Binding Spark Standalone Master to public IP fails
[ https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159904#comment-15159904 ] Jakob Odersky edited comment on SPARK-12422 at 2/24/16 12:16 AM: - This blocker issue is quite old now, can you or anyone else still reproduce it? I tried it in a non-docker environment (Debian 9) and everything worked fine (Spark versions 1.5.2 and 1.6.0). was (Author: jodersky): This blocker issue is quite old now, can you still reproduce it? I tried it in a non-docker environment (Debian 9) and everything worked fine (Spark versions 1.5.2 and 1.6.0). > Binding Spark Standalone Master to public IP fails > -- > > Key: SPARK-12422 > URL: https://issues.apache.org/jira/browse/SPARK-12422 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.5.2 > Environment: Fails on direct deployment on Mac OSX and also in Docker > Environment (running on OSX or Ubuntu) >Reporter: Bennet Jeutter >Priority: Blocker > > The start of the Spark Standalone Master fails, when the host specified > equals the public IP address. For example I created a Docker Machine with > public IP 192.168.99.100, then I run: > /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h > 192.168.99.100 > It'll fail with: > Exception in thread "main" java.net.BindException: Failed to bind to: > /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries! > at > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > So I thought oh well, lets just bind to the local IP and access it via public > IP - this doesn't work, it will give: > dropping message [class akka.actor.ActorSelectionMessage] for non-local > recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at > [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are > [akka.tcp://sparkMaster@spark-master:7077] > So there is currently no possibility to run all this... related stackoverflow > issues: > * > http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node > * > http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12422) Binding Spark Standalone Master to public IP fails
[ https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159904#comment-15159904 ] Jakob Odersky commented on SPARK-12422: --- This blocker issue is quite old now, can you still reproduce it? I tried it in a non-docker environment (Debian 9) and everything worked fine (Spark versions 1.5.2 and 1.6.0). > Binding Spark Standalone Master to public IP fails > -- > > Key: SPARK-12422 > URL: https://issues.apache.org/jira/browse/SPARK-12422 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.5.2 > Environment: Fails on direct deployment on Mac OSX and also in Docker > Environment (running on OSX or Ubuntu) >Reporter: Bennet Jeutter >Priority: Blocker > > The start of the Spark Standalone Master fails, when the host specified > equals the public IP address. For example I created a Docker Machine with > public IP 192.168.99.100, then I run: > /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h > 192.168.99.100 > It'll fail with: > Exception in thread "main" java.net.BindException: Failed to bind to: > /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries! > at > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > So I thought oh well, lets just bind to the local IP and access it via public > IP - this doesn't work, it will give: > dropping message [class akka.actor.ActorSelectionMessage] for non-local > recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at > [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are > [akka.tcp://sparkMaster@spark-master:7077] > So there is currently no possibility to run all this... related stackoverflow > issues: > * > http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node > * > http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated
[ https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138447#comment-15138447 ] Jakob Odersky commented on SPARK-13172: --- Cool, thanks for the snippet! I agree, the first approach looks alot better > Stop using RichException.getStackTrace it is deprecated > --- > > Key: SPARK-13172 > URL: https://issues.apache.org/jira/browse/SPARK-13172 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > Throwable getStackTrace is the recommended alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated
[ https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137575#comment-15137575 ] Jakob Odersky commented on SPARK-13172: --- I would suggest taking similar approach to what the Scala library does: https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L1, that is just call mkString on the stack trace. Using e.printStackTrace is not as flexible, it doesn't give you a string and as far as I know it prints to stderr with no option to redirect. > Stop using RichException.getStackTrace it is deprecated > --- > > Key: SPARK-13172 > URL: https://issues.apache.org/jira/browse/SPARK-13172 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > Throwable getStackTrace is the recommended alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13172) Stop using RichException.getStackTrace it is deprecated
[ https://issues.apache.org/jira/browse/SPARK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137575#comment-15137575 ] Jakob Odersky edited comment on SPARK-13172 at 2/8/16 8:03 PM: --- I would suggest taking similar approach to what the Scala library does: https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L16, that is just call mkString on the stack trace. Using e.printStackTrace is not as flexible, it doesn't give you a string and as far as I know it prints to stderr with no option to redirect. was (Author: jodersky): I would suggest taking similar approach to what the Scala library does: https://github.com/scala/scala/blob/v2.11.7/src/library/scala/runtime/RichException.scala#L1, that is just call mkString on the stack trace. Using e.printStackTrace is not as flexible, it doesn't give you a string and as far as I know it prints to stderr with no option to redirect. > Stop using RichException.getStackTrace it is deprecated > --- > > Key: SPARK-13172 > URL: https://issues.apache.org/jira/browse/SPARK-13172 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > Throwable getStackTrace is the recommended alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137795#comment-15137795 ] Jakob Odersky commented on SPARK-13171: --- This is very strange, are you sure it has something to do with the changes introduced by my PR? As mentioned previously, the only effective change between future() and Future.apply() is one less indirection. The only potentially visible changes would be for code that relies on reflection or does some macro magic. > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135979#comment-15135979 ] Jakob Odersky commented on SPARK-13171: --- [~tedyu] On GitHub, [~smilegator] wrote: > BTW, I also hit similar issues without this merge in my local environment. > Thus, I do not know what is the root causes of this problem Have you tried running the tests before the future merge? > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13171) Update promise & future to Promise and Future as the old ones are deprecated
[ https://issues.apache.org/jira/browse/SPARK-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135976#comment-15135976 ] Jakob Odersky commented on SPARK-13171: --- I can't see any reason why the above change should impact anything at all. The biggest change is an additional indirection. Could it be that Hive does some funky reflection or macro stuff? > Update promise & future to Promise and Future as the old ones are deprecated > > > Key: SPARK-13171 > URL: https://issues.apache.org/jira/browse/SPARK-13171 > Project: Spark > Issue Type: Sub-task >Reporter: holdenk >Assignee: Jakob Odersky >Priority: Trivial > Fix For: 2.0.0 > > > We use the promise and future functions on the concurrent object, both of > which have been deprecated in 2.11 . The full traits are present in Scala > 2.10 as well so this should be a safe migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!
[ https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135222#comment-15135222 ] Jakob Odersky commented on SPARK-13176: --- One of the places the process api is used is in creating symlinks. Since Spark requires at least Java 1.7, we can drop the use of external commands and rely on the nio.Files api instead. > Ignore deprecation warning for ProcessBuilder lines_! > - > > Key: SPARK-13176 > URL: https://issues.apache.org/jira/browse/SPARK-13176 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > The replacement, stream_! & lineStream_! is not present in 2.10 API. > Note @SupressWarnings for deprecation doesn't appear to work > https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings > might involve wrapping or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!
[ https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135254#comment-15135254 ] Jakob Odersky commented on SPARK-13176: --- The PR I submitted uses java.nio.Files, it does not fix the underlying problem of ignoring specific deprecation warnings. > Ignore deprecation warning for ProcessBuilder lines_! > - > > Key: SPARK-13176 > URL: https://issues.apache.org/jira/browse/SPARK-13176 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > The replacement, stream_! & lineStream_! is not present in 2.10 API. > Note @SupressWarnings for deprecation doesn't appear to work > https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings > might involve wrapping or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13208) Replace Pair with tuples
[ https://issues.apache.org/jira/browse/SPARK-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-13208: -- Priority: Trivial (was: Major) > Replace Pair with tuples > > > Key: SPARK-13208 > URL: https://issues.apache.org/jira/browse/SPARK-13208 > Project: Spark > Issue Type: Sub-task > Components: Examples, Spark Core, SQL, Streaming >Reporter: Jakob Odersky >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13208) Replace Pair with tuples
Jakob Odersky created SPARK-13208: - Summary: Replace Pair with tuples Key: SPARK-13208 URL: https://issues.apache.org/jira/browse/SPARK-13208 Project: Spark Issue Type: Sub-task Reporter: Jakob Odersky -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13176) Ignore deprecation warning for ProcessBuilder lines_!
[ https://issues.apache.org/jira/browse/SPARK-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131985#comment-15131985 ] Jakob Odersky commented on SPARK-13176: --- A possible workaround would be to filter out warnings in sbt, similar to what is done here https://github.com/apache/spark/pull/9128/files. Deprecation warnings could either be checked against a "whitelist" or, alternatively, the resulting classfiles could be inspected for the presence of some special annotations. Note however that both solutions are a hack and will not work with maven. > Ignore deprecation warning for ProcessBuilder lines_! > - > > Key: SPARK-13176 > URL: https://issues.apache.org/jira/browse/SPARK-13176 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > The replacement, stream_! & lineStream_! is not present in 2.10 API. > Note @SupressWarnings for deprecation doesn't appear to work > https://issues.scala-lang.org/browse/SI-7934 so suppressing the warnings > might involve wrapping or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)
[ https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky resolved SPARK-12990. --- Resolution: Duplicate Fixed in 00026fa9912ecee5637f1e7dd222f977f31f6766 > Fatal warnings on @transient parameters (Scala 2.11) > > > Key: SPARK-12990 > URL: https://issues.apache.org/jira/browse/SPARK-12990 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Reporter: Jakob Odersky >Priority: Critical > > Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and > {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient > annotations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)
Jakob Odersky created SPARK-12990: - Summary: Fatal warnings on @transient parameters (Scala 2.11) Key: SPARK-12990 URL: https://issues.apache.org/jira/browse/SPARK-12990 Project: Spark Issue Type: Bug Components: Build, SQL Reporter: Jakob Odersky Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient annotations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12958) Map accumulator in spark
[ https://issues.apache.org/jira/browse/SPARK-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116461#comment-15116461 ] Jakob Odersky commented on SPARK-12958: --- I agree that it's very specific, maybe it would make sense to include a more generic version of {{MapAccumulator}}. Something like MapAccumulator[A, B], where B has to have an implicit AccumulatorParam[B] itself? > Map accumulator in spark > > > Key: SPARK-12958 > URL: https://issues.apache.org/jira/browse/SPARK-12958 > Project: Spark > Issue Type: Wish > Components: Spark Core >Reporter: Souri >Priority: Minor > > Spark by default supports accumulators of Int,Long,Double,Float. > It would be good if we can have a Map accumulator where each executor can > just add key->value pairs and driver can have access to the aggregated value > for each key in the map. > In this way, it would also be easier to use accumulators for various metrics. > We can define metrics at runtime as the map can take any string key. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)
[ https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116409#comment-15116409 ] Jakob Odersky commented on SPARK-12990: --- marking as critical since it breaks the build > Fatal warnings on @transient parameters (Scala 2.11) > > > Key: SPARK-12990 > URL: https://issues.apache.org/jira/browse/SPARK-12990 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Reporter: Jakob Odersky >Priority: Critical > > Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and > {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient > annotations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12990) Fatal warnings on @transient parameters (Scala 2.11)
[ https://issues.apache.org/jira/browse/SPARK-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-12990: -- Priority: Critical (was: Major) > Fatal warnings on @transient parameters (Scala 2.11) > > > Key: SPARK-12990 > URL: https://issues.apache.org/jira/browse/SPARK-12990 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Reporter: Jakob Odersky >Priority: Critical > > Two new classes in {{sql.execution.datasources}}, {{CSVOptions}} and > {{JSONOptions}} break the Scala 2.11 build due to unnecessary @transient > annotations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12801) The DataFrame.rdd not return same result
[ https://issues.apache.org/jira/browse/SPARK-12801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096867#comment-15096867 ] Jakob Odersky commented on SPARK-12801: --- I can't reproduce this either > The DataFrame.rdd not return same result > > > Key: SPARK-12801 > URL: https://issues.apache.org/jira/browse/SPARK-12801 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.5.2 > Environment: 3 servers of centos7, cluster mode >Reporter: Joseph Sun > > run spark-shell and typeing following codes. > > import org.apache.spark.sql.types._ > > val schema = StructType(StructField("id",IntegerType,true)::Nil) > > val rdd = sc.parallelize((0 to 1)).map(Row(_)) > > val df = sqlContext.createDataFrame(rdd,schema) > > df.registerTempTable("test") > > sqlContext.cacheTable("test") > > sqlContext.sql("select * from test limit 2").collect() > show Array[org.apache.spark.sql.Row] = Array([0], [1]) > > sqlContext.sql("select * from test limit 2").rdd.collect() > run the code one more times,the result is not consistent. > some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1]) > or: Array[org.apache.spark.sql.Row] = Array([2500], [2501]) > why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples
[ https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097327#comment-15097327 ] Jakob Odersky commented on SPARK-12777: --- I get the same error in SparkShell, however everything works in a plain application (as shown in the listing) {code} import org.apache.spark._ import org.apache.spark.sql._ case class Test(v: (Int, Int)) object Main { val conf = new SparkConf().setMaster("local").setAppName("testbench") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) def main(args: Array[String]): Unit = { import sqlContext.implicits._ val rdd = sc.parallelize( Seq( Test((1,2)), Test((3,4 val ds = sqlContext.createDataset(rdd).toDS ds.show } } {code} > Dataset fields can't be Scala tuples > > > Key: SPARK-12777 > URL: https://issues.apache.org/jira/browse/SPARK-12777 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Chris Jansen > > Datasets can't seem to handle scala tuples as fields of case classes in > datasets. > {code} > Seq((1,2), (3,4)).toDS().show() //works > {code} > When including a tuple as a field, the code fails: > {code} > case class Test(v: (Int, Int)) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > UnresolvedException: : Invalid call to dataType on unresolved object, tree: > 'name (unresolved.scala:59) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381) > org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388) > org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391) > > org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172) > > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441) > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119) > org.apache.spark.Logging$class.logDebug(Logging.scala:62) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253) > org.apache.spark.sql.Dataset.(Dataset.scala:78) > org.apache.spark.sql.Dataset.(Dataset.scala:89) > org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507) > > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80) > {code} > When providing a type alias, the code fails in a different way: > {code} > type TwoInt = (Int, Int) > case class Test(v: TwoInt) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > NoSuchElementException: : head of empty list (ScalaReflection.scala:504) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504) > >
[jira] [Created] (SPARK-12816) Schema generation for type aliases does not work
Jakob Odersky created SPARK-12816: - Summary: Schema generation for type aliases does not work Key: SPARK-12816 URL: https://issues.apache.org/jira/browse/SPARK-12816 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Jakob Odersky Related to the second part of SPARK-12777. Assume the following: {code} case class Container[A](a: A) type IntContainer = Container[Int] {code} Generating a schema with {code}org.apache.spark.sql.catalyst.ScalaReflection.schemaFor[IntContainer]{code} fails miserably with {{NoSuchElementException: : head of empty list (ScalaReflection.scala:504)}} (the same exception as described in the related issues) Since {{schemaFor}} is called whenever a schema is implicitly needed, {{Datasets}} cannot be created from certain aliased types. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples
[ https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097431#comment-15097431 ] Jakob Odersky commented on SPARK-12777: --- Concerning the problem with type aliases, I can reproduce them both inside a SparkShell and inside a standalone program. See issue SPARK-12816 and related PR. > Dataset fields can't be Scala tuples > > > Key: SPARK-12777 > URL: https://issues.apache.org/jira/browse/SPARK-12777 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Chris Jansen > > Datasets can't seem to handle scala tuples as fields of case classes in > datasets. > {code} > Seq((1,2), (3,4)).toDS().show() //works > {code} > When including a tuple as a field, the code fails: > {code} > case class Test(v: (Int, Int)) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > UnresolvedException: : Invalid call to dataType on unresolved object, tree: > 'name (unresolved.scala:59) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:59) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field$lzycompute(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField.org$apache$spark$sql$catalyst$expressions$GetStructField$$field(complexTypeExtractors.scala:107) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:111) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.If.toString(conditionalExpressions.scala:76) > > org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:217) > > org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:385) > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argString$1.apply(TreeNode.scala:381) > org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:388) > org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:391) > > org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:172) > > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:441) > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:396) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:118) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$5.apply(RuleExecutor.scala:119) > org.apache.spark.Logging$class.logDebug(Logging.scala:62) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.logDebug(RuleExecutor.scala:44) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:115) > > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:253) > org.apache.spark.sql.Dataset.(Dataset.scala:78) > org.apache.spark.sql.Dataset.(Dataset.scala:89) > org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:507) > > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:80) > {code} > When providing a type alias, the code fails in a different way: > {code} > type TwoInt = (Int, Int) > case class Test(v: TwoInt) > Seq(Test((1,2)), Test((3,4)).toDS().show //fails > {code} > {code} > NoSuchElementException: : head of empty list (ScalaReflection.scala:504) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:504) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:502) > > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor(ScalaReflection.scala:502) > > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$extractorFor$1.apply(ScalaReflection.scala:509) > >
[jira] [Created] (SPARK-12761) Clean up duplicated code in scala 2.11 repl.Main
Jakob Odersky created SPARK-12761: - Summary: Clean up duplicated code in scala 2.11 repl.Main Key: SPARK-12761 URL: https://issues.apache.org/jira/browse/SPARK-12761 Project: Spark Issue Type: Improvement Components: Spark Shell Reporter: Jakob Odersky Priority: Trivial There is duplicate code in {{/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}. According to git blame, I moved the "settings" val to a method-local val, however due to a subsequent merge it was reintroduced as a global val. Cf https://github.com/apache/spark/blame/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala, line 33. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12761) Clean up duplicated code in scala 2.11 repl.Main
[ https://issues.apache.org/jira/browse/SPARK-12761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092964#comment-15092964 ] Jakob Odersky commented on SPARK-12761: --- @vanzin, I think this is your area of expertise > Clean up duplicated code in scala 2.11 repl.Main > > > Key: SPARK-12761 > URL: https://issues.apache.org/jira/browse/SPARK-12761 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Reporter: Jakob Odersky >Priority: Trivial > > There is duplicate code in > {{/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}. > According to git blame, I moved the "settings" val to a method-local val, > however due to a subsequent merge it was reintroduced as a global val. > Cf > https://github.com/apache/spark/blame/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala, > line 33. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4257) Spark master can only be accessed by hostname
[ https://issues.apache.org/jira/browse/SPARK-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088405#comment-15088405 ] Jakob Odersky commented on SPARK-4257: -- The way I interpret the documentation {{-h HOST, --host HOSTHostname to listen on}} requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of the hostname). Although the documentation is somewhat ambiguous, I think what you are experiencing is expected behavior. > Spark master can only be accessed by hostname > - > > Key: SPARK-4257 > URL: https://issues.apache.org/jira/browse/SPARK-4257 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Davies Liu >Priority: Critical > > After sbin/start-all.sh, the spark shell can not connect to standalone master > by spark://IP:7077, it works if replace IP by hostname. > In the docs[1], it says use `spark://IP:PORT` to connect to master. > [1] http://spark.apache.org/docs/latest/spark-standalone.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4257) Spark master can only be accessed by hostname
[ https://issues.apache.org/jira/browse/SPARK-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088405#comment-15088405 ] Jakob Odersky edited comment on SPARK-4257 at 1/7/16 11:46 PM: --- The way I interpret the documentation "-h HOST, --host HOST Hostname to listen on" requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of the hostname). Although the documentation is somewhat ambiguous, I think what you are experiencing is expected behavior. was (Author: jodersky): The way I interpret the documentation {{-h HOST, --host HOSTHostname to listen on}} requires a hostNAME whereas SPARK_MASTER_IP is the actual ip (of the hostname). Although the documentation is somewhat ambiguous, I think what you are experiencing is expected behavior. > Spark master can only be accessed by hostname > - > > Key: SPARK-4257 > URL: https://issues.apache.org/jira/browse/SPARK-4257 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Davies Liu >Priority: Critical > > After sbin/start-all.sh, the spark shell can not connect to standalone master > by spark://IP:7077, it works if replace IP by hostname. > In the docs[1], it says use `spark://IP:PORT` to connect to master. > [1] http://spark.apache.org/docs/latest/spark-standalone.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12648) UDF with Option[Double] throws ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086388#comment-15086388 ] Jakob Odersky edited comment on SPARK-12648 at 1/6/16 10:22 PM: In spark-shell: {code} val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", "weight") df: org.apache.spark.sql.DataFrame = [name: string, weight: double] {code} You're getting a DataFrame containing doubles, not optional doubles. Not 100% sure, but I'm guessing that creating a DataFrame from Option types is syntactic sugar to avoid using nulls in client code. Spark then optimizes the option types to nullable or default values. was (Author: jodersky): In spark-shell: {code} val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", "weight") df: org.apache.spark.sql.DataFrame = [name: string, weight: double] {/code} You're getting a DataFrame containing doubles, not optional doubles. Not 100% sure, but I'm guessing that creating a DataFrame from Option types is syntactic sugar to avoid using nulls in client code. Spark then optimizes the option types to nullable or default values. > UDF with Option[Double] throws ClassCastException > - > > Key: SPARK-12648 > URL: https://issues.apache.org/jira/browse/SPARK-12648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Mikael Valot > > I can write an UDF that returns an Option[Double], and the DataFrame's > schema is correctly inferred to be a nullable double. > However I cannot seem to be able to write a UDF that takes an Option as an > argument: > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkContext, SparkConf} > val conf = new SparkConf().setMaster("local[4]").setAppName("test") > val sc = new SparkContext(conf) > val sqlc = new SQLContext(sc) > import sqlc.implicits._ > val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", > "weight") > import org.apache.spark.sql.functions._ > val addTwo = udf((d: Option[Double]) => d.map(_+2)) > df.withColumn("plusTwo", addTwo(df("weight"))).show() > => > 2016-01-05T14:41:52 Executor task launch worker-0 ERROR > org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1) > java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) > ~[na:na] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[na:na] > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51) > ~[spark-sql_2.10-1.6.0.jar:1.6.0] > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49) > ~[spark-sql_2.10-1.6.0.jar:1.6.0] > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > ~[scala-library-2.10.5.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org