[jira] [Comment Edited] (SPARK-7286) Precedence of operator not behaving properly
[ https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014647#comment-15014647 ] Jakob Odersky edited comment on SPARK-7286 at 11/19/15 10:52 PM: - I just realized that <> would also have a different precedence that === That pretty much limits our options to 1 or 3 was (Author: jodersky): I just realized that <> would also have a different precedence that === > Precedence of operator not behaving properly > > > Key: SPARK-7286 > URL: https://issues.apache.org/jira/browse/SPARK-7286 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 > Environment: Linux >Reporter: DevilJetha >Priority: Critical > > The precedence of the operators ( especially with !== and && ) in Dataframe > Columns seems to be messed up. > Example Snippet > .where( $"col1" === "val1" && ($"col2" !== "val2") ) works fine. > whereas .where( $"col1" === "val1" && $"col2" !== "val2" ) > evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11288) Specify the return type for UDF in Scala
[ https://issues.apache.org/jira/browse/SPARK-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011855#comment-15011855 ] Jakob Odersky commented on SPARK-11288: --- Tell me if I'm missing the point, but you can also pass type parameters explicitly when calling udfs: {code} df.udf[ReturnType, Arg1, Arg2]((arg1, arg2) => ret) {code} > Specify the return type for UDF in Scala > > > Key: SPARK-11288 > URL: https://issues.apache.org/jira/browse/SPARK-11288 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > The return type is figured out from the function signature, maybe it's not > that user want, for example, the default DecimalType is (38, 18), user may > want (38, 0). > The older deprecated one callUDF can do that, we should figure out a way to > support that. > cc [~marmbrus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11832) Spark shell does not work from sbt with scala 2.11
Jakob Odersky created SPARK-11832: - Summary: Spark shell does not work from sbt with scala 2.11 Key: SPARK-11832 URL: https://issues.apache.org/jira/browse/SPARK-11832 Project: Spark Issue Type: Bug Components: Spark Shell Reporter: Jakob Odersky Priority: Minor Using Scala 2.11, running the spark shell task from within sbt fails, however running it from a distribution works. h3. Steps to reproduce # change scala version {{dev/change-scala-version.sh 2.11}} # start sbt {{build/sbt -Dscala-2.11}} # run shell task {{sparkShell}} h3. Stacktrace {code} Failed to initialize compiler: object scala in compiler mirror not found. ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programmatically, settings.usejavacp.value = true. Exception in thread "main" java.lang.NullPointerException at scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256) at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:894) at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:893) at scala.tools.nsc.interpreter.IMain$Request.importsPreamble$lzycompute(IMain.scala:893) at scala.tools.nsc.interpreter.IMain$Request.importsPreamble(IMain.scala:893) at scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:915) at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1325) at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1324) at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64) at scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1324) at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:906) at scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:995) at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:990) at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:577) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:563) at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:802) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:836) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:694) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcZ$sp(SparkILoop.scala:39) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:213) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:922) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:911) at org.apache.spark.repl.Main$.main(Main.scala:49) at org.apache.spark.repl.Main.main(Main.scala) {code} h3. Workaround In {{repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}, append to the repl settings {{s.usejavacp.value = true}} I haven't looked into the details of {{scala.tools.nsc.Settings}}, maybe someone has an idea of what's going on. Also, to be clear, this bug only affects scala 2.11 from within sbt; calling spark-shell from a distribution or from anywhere using scala 2.10 works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7286) Precedence of operator not behaving properly
[ https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012643#comment-15012643 ] Jakob Odersky commented on SPARK-7286: -- Going through the code, I saw that catalyst also defines !== in its dsl, so it seems this operator has quite wide-spread usage. Would deprecating it in favor of something else be a viable option? > Precedence of operator not behaving properly > > > Key: SPARK-7286 > URL: https://issues.apache.org/jira/browse/SPARK-7286 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 > Environment: Linux >Reporter: DevilJetha >Priority: Critical > > The precedence of the operators ( especially with !== and && ) in Dataframe > Columns seems to be messed up. > Example Snippet > .where( $"col1" === "val1" && ($"col2" !== "val2") ) works fine. > whereas .where( $"col1" === "val1" && $"col2" !== "val2" ) > evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7286) Precedence of operator not behaving properly
[ https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012532#comment-15012532 ] Jakob Odersky commented on SPARK-7286: -- The problem is that !== is recognized as an assignment operator (according to §6.12.4 of the scala specification http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower precedence than any other operator. A potential fix could be to rename !== to =!= > Precedence of operator not behaving properly > > > Key: SPARK-7286 > URL: https://issues.apache.org/jira/browse/SPARK-7286 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 > Environment: Linux >Reporter: DevilJetha >Priority: Critical > > The precedence of the operators ( especially with !== and && ) in Dataframe > Columns seems to be messed up. > Example Snippet > .where( $"col1" === "val1" && ($"col2" !== "val2") ) works fine. > whereas .where( $"col1" === "val1" && $"col2" !== "val2" ) > evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7286) Precedence of operator not behaving properly
[ https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012532#comment-15012532 ] Jakob Odersky edited comment on SPARK-7286 at 11/19/15 1:16 AM: The problem is that !== is recognized as an assignment operator (according to §6.12.4 of the scala specification http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower precedence than any other operator. A potential fix could be to rename !== to =!= was (Author: jodersky): The problem is that !== is recognized as an assignment operator (according to §6.12.4 of the scala specification http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower precedence than any other operator. A potential fix could be to rename !== to =!= > Precedence of operator not behaving properly > > > Key: SPARK-7286 > URL: https://issues.apache.org/jira/browse/SPARK-7286 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 > Environment: Linux >Reporter: DevilJetha >Priority: Critical > > The precedence of the operators ( especially with !== and && ) in Dataframe > Columns seems to be messed up. > Example Snippet > .where( $"col1" === "val1" && ($"col2" !== "val2") ) works fine. > whereas .where( $"col1" === "val1" && $"col2" !== "val2" ) > evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11832) Spark shell does not work from sbt with scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012372#comment-15012372 ] Jakob Odersky commented on SPARK-11832: --- I'm on to something, it seems as though the SBT actually passes a '-usejavacp' argument to the repl, however the spark shell scala 2.11 implementation ignores all arguments. I'm working on a fix > Spark shell does not work from sbt with scala 2.11 > -- > > Key: SPARK-11832 > URL: https://issues.apache.org/jira/browse/SPARK-11832 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Reporter: Jakob Odersky >Priority: Minor > > Using Scala 2.11, running the spark shell task from within sbt fails, however > running it from a distribution works. > h3. Steps to reproduce > # change scala version {{dev/change-scala-version.sh 2.11}} > # start sbt {{build/sbt -Dscala-2.11}} > # run shell task {{sparkShell}} > h3. Stacktrace > {code} > Failed to initialize compiler: object scala in compiler mirror not found. > ** Note that as of 2.8 scala does not assume use of the java classpath. > ** For the old behavior pass -usejavacp to scala, or if using a Settings > ** object programmatically, settings.usejavacp.value = true. > Exception in thread "main" java.lang.NullPointerException > at > scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256) > at > scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:894) > at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:893) > at > scala.tools.nsc.interpreter.IMain$Request.importsPreamble$lzycompute(IMain.scala:893) > at > scala.tools.nsc.interpreter.IMain$Request.importsPreamble(IMain.scala:893) > at > scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:915) > at > scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1325) > at > scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1324) > at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64) > at > scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1324) > at > scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:906) > at > scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:995) > at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:990) > at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:577) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:563) > at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:802) > at > scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:836) > at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:694) > at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcZ$sp(SparkILoop.scala:39) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38) > at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:213) > at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:38) > at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94) > at > scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:922) > at > scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911) > at > scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911) > at > scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) > at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:911) > at org.apache.spark.repl.Main$.main(Main.scala:49) > at org.apache.spark.repl.Main.main(Main.scala) > {code} > h3. Workaround > In {{repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}, > append to the repl settings {{s.usejavacp.value = true}} > I haven't looked into the details of {{scala.tools.nsc.Settings}}, maybe > someone has an idea of what's going on. > Also, to be clear, this bug only affects scala 2.11 from within sbt; calling > spark-shell from a distribution or from anywhere using scala 2.10 works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9875) Do not evaluate foreach and foreachPartition with count
[ https://issues.apache.org/jira/browse/SPARK-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009891#comment-15009891 ] Jakob Odersky commented on SPARK-9875: -- I'm not sure I understand the issue. Are you trying to force running {{func}} on every partition to achieve some kind of side effect? > Do not evaluate foreach and foreachPartition with count > --- > > Key: SPARK-9875 > URL: https://issues.apache.org/jira/browse/SPARK-9875 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Zoltán Zvara >Priority: Minor > > It is evident, that the summation inside count will result in an overhead, > which would be nice to remove from the current execution. > {{self.mapPartitions(func).count() # Force evaluation}} @{{rdd.py}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11765) Avoid assign UI port between browser unsafe ports (or just 4045: lockd)
[ https://issues.apache.org/jira/browse/SPARK-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007740#comment-15007740 ] Jakob Odersky commented on SPARK-11765: --- I think adding a "blacklist" of ports could lead to confusing debugging experiences with no real value gained. You can always explicitly set the web UI's port with the configuration parameter {{spark.ui.port}}, as explained here http://spark.apache.org/docs/latest/configuration.html > Avoid assign UI port between browser unsafe ports (or just 4045: lockd) > --- > > Key: SPARK-11765 > URL: https://issues.apache.org/jira/browse/SPARK-11765 > Project: Spark > Issue Type: Improvement >Reporter: Jungtaek Lim >Priority: Minor > > Spark UI port starts on 4040, and UI port is incremented by 1 for every > confliction. > In our use case, we have some drivers running at the same time, which makes > UI port to be assigned to 4045, which is treated to unsafe port for chrome > and mozilla. > http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util.cc?view=markup > http://www-archive.mozilla.org/projects/netlib/PortBanning.html#portlist > We would like to avoid assigning UI to these ports, or just avoid assigning > UI port to 4045 which is too close to default port. > If we'd like to accept this idea, I'm happy to work on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments
[ https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003600#comment-15003600 ] Jakob Odersky commented on SPARK-11688: --- Registering a UDF requires a function (instance of FunctionX), however only defs support default parameters. Let me illustrate: {{hasSubstring _}} is equivalent to {{(x: String, y: String, z: Int) => hasSubstring(x, y, z)}}, which is only syntactic sugar for {code} new Function3[Long, String, String, Int] { def apply(x: String, y: String, z: Int) = hasSubstring(x, y, z) } {code} Therefore, the error is expected since you are trying to call a Function3 with only two parameters. With the current API, and without trying some macro-magic, I see no way of enabling default parameters for UDFs. Maybe a changing the register API to something like register((X, Y, Z) => R, defaults) could work where defaults would supply the arguments to any non-specified parameters when the UDF is called. However this could also lead to some very subtle errors as any substituted default parameters would have a value as specified during registration, potentially different from a default parameter specified in a corresponding def declaration. > UDF's doesn't work when it has a default arguments > -- > > Key: SPARK-11688 > URL: https://issues.apache.org/jira/browse/SPARK-11688 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: M Bharat lal >Priority: Minor > > Use case: > > Suppose we have a function which accepts three parameters (string, subString > and frmIndex which has 0 default value ) > def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = > string.indexOf(subString, frmIndex) > above function works perfectly if I dont pass frmIndex parameter > scala> hasSubstring("Scala", "la") > res0: Long = 3 > But, when I register the above function as UDF (successfully registered) and > call the same without passing frmIndex parameter got the below exception > scala> val df = > sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", > "gfh"))).toDF("c1", "c2", "c3") > df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string] > scala> df.show > +-+-+-+ > | c1| c2| c3| > +-+-+-+ > |scala|Spark|MLlib| > | abc| def| gfh| > +-+-+-+ > scala> sqlContext.udf.register("hasSubstring", hasSubstring _ ) > res3: org.apache.spark.sql.UserDefinedFunction = > UserDefinedFunction(,LongType,List()) > scala> val result = df.as("i0").withColumn("subStringIndex", > callUDF("hasSubstring", $"i0.c1", lit("la"))) > org.apache.spark.sql.AnalysisException: undefined function hasSubstring; > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) > at scala.util.Try.getOrElse(Try.scala:77) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments
[ https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003603#comment-15003603 ] Jakob Odersky commented on SPARK-11688: --- As a workaround, you could register your function twice: {{sqlContext.udf.register("hasSubstring", (x: String, y: String) => hasSubstring(x, y) ) //this will use the default argument}} {{sqlContext.udf.register("hasSubstringIndex", (x: String, y: String, z: Int) => hasSubstring(x, y, z) )}} and call them accordingly > UDF's doesn't work when it has a default arguments > -- > > Key: SPARK-11688 > URL: https://issues.apache.org/jira/browse/SPARK-11688 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: M Bharat lal >Priority: Minor > > Use case: > > Suppose we have a function which accepts three parameters (string, subString > and frmIndex which has 0 default value ) > def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = > string.indexOf(subString, frmIndex) > above function works perfectly if I dont pass frmIndex parameter > scala> hasSubstring("Scala", "la") > res0: Long = 3 > But, when I register the above function as UDF (successfully registered) and > call the same without passing frmIndex parameter got the below exception > scala> val df = > sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", > "gfh"))).toDF("c1", "c2", "c3") > df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string] > scala> df.show > +-+-+-+ > | c1| c2| c3| > +-+-+-+ > |scala|Spark|MLlib| > | abc| def| gfh| > +-+-+-+ > scala> sqlContext.udf.register("hasSubstring", hasSubstring _ ) > res3: org.apache.spark.sql.UserDefinedFunction = > UserDefinedFunction(,LongType,List()) > scala> val result = df.as("i0").withColumn("subStringIndex", > callUDF("hasSubstring", $"i0.c1", lit("la"))) > org.apache.spark.sql.AnalysisException: undefined function hasSubstring; > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) > at scala.util.Try.getOrElse(Try.scala:77) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?
Hey Jeff, Do you mean reading from multiple text files? In that case, as a workaround, you can use the RDD#union() (or ++) method to concatenate multiple rdds. For example: val lines1 = sc.textFile("file1") val lines2 = sc.textFile("file2") val rdd = lines1 union lines2 regards, --Jakob On 11 November 2015 at 01:20, Jeff Zhangwrote: > Although user can use the hdfs glob syntax to support multiple inputs. But > sometimes, it is not convenient to do that. Not sure why there's no api > of SparkContext#textFiles. It should be easy to implement that. I'd love to > create a ticket and contribute for that if there's no other consideration > that I don't know. > > -- > Best Regards > > Jeff Zhang >
Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?
Hey Jeff, Do you mean reading from multiple text files? In that case, as a workaround, you can use the RDD#union() (or ++) method to concatenate multiple rdds. For example: val lines1 = sc.textFile("file1") val lines2 = sc.textFile("file2") val rdd = lines1 union lines2 regards, --Jakob On 11 November 2015 at 01:20, Jeff Zhangwrote: > Although user can use the hdfs glob syntax to support multiple inputs. But > sometimes, it is not convenient to do that. Not sure why there's no api > of SparkContext#textFiles. It should be easy to implement that. I'd love to > create a ticket and contribute for that if there's no other consideration > that I don't know. > > -- > Best Regards > > Jeff Zhang >
Re: Status of 2.11 support?
Hi Sukant, Regarding the first point: when building spark during my daily work, I always use Scala 2.11 and have only run into build problems once. Assuming a working build I have never had any issues with the resulting artifacts. More generally however, I would advise you to go with Scala 2.11 under all circumstances. Scala 2.10 has reached end-of-life and, from what I make out of your question, you have the opportunity to switch to a newer technology, so why stay with legacy? Furthermore, Scala 2.12 will be coming out early next year, so I reckon that Spark will switch to Scala 2.11 by default pretty soon*. regards, --Jakob *I'm myself pretty new to the Spark community so please don't take my words on it as gospel On 11 November 2015 at 15:25, Ted Yuwrote: > For #1, the published jars are usable. > However, you should build from source for your specific combination of > profiles. > > Cheers > > On Wed, Nov 11, 2015 at 3:22 PM, shajra-cogscale < > sha...@cognitivescale.com> wrote: > >> Hi, >> >> My company isn't using Spark in production yet, but we are using a bit of >> Scala. There's a few people who have wanted to be conservative and keep >> our >> Scala at 2.10 in the event we start using Spark. There are others who >> want >> to move to 2.11 with the idea that by the time we're using Spark it will >> be >> more or less 2.11-ready. >> >> It's hard to make a strong judgement on these kinds of things without >> getting some community feedback. >> >> Looking through the internet I saw: >> >> 1) There's advice to build 2.11 packages from source -- but also published >> jars to Maven Central for 2.11. Are these jars on Maven Central usable >> and >> the advice to build from source outdated? >> >> 2) There's a note that the JDBC RDD isn't 2.11-compliant. This is okay >> for >> us, but is there anything else to worry about? >> >> It would be nice to get some answers to those questions as well as any >> other >> feedback from maintainers or anyone that's used Spark with Scala 2.11 >> beyond >> simple examples. >> >> Thanks, >> Sukant >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-2-11-support-tp25362.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Spark Packages Configuration Not Found
As another, general question, are spark packages the go-to way of extending spark functionality? In my specific use-case I would like to start spark (be it spark-shell or other) and hook into the listener API. Since I wasn't able to find much documentation about spark packages, I was wondering if they are still actively being developed? thanks, --Jakob On 10 November 2015 at 14:58, Jakob Odersky <joder...@gmail.com> wrote: > (accidental keyboard-shortcut sent the message) > ... spark-shell from the spark 1.5.2 binary distribution. > Also, running "spPublishLocal" has the same effect. > > thanks, > --Jakob > > On 10 November 2015 at 14:55, Jakob Odersky <joder...@gmail.com> wrote: > >> Hi, >> I ran into in error trying to run spark-shell with an external package >> that I built and published locally >> using the spark-package sbt plugin ( >> https://github.com/databricks/sbt-spark-package). >> >> To my understanding, spark packages can be published simply as maven >> artifacts, yet after running "publishLocal" in my package project ( >> https://github.com/jodersky/spark-paperui), the following command >> >>park-shell --packages >> ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT >> >> gives an error: >> >> :: >> >> :: UNRESOLVED DEPENDENCIES :: >> >> :: >> >> :: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not >> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was >> required from org.apache.spark#spark-submit-parent;1.0 default >> >> :: >> >> >> :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS >> Exception in thread "main" java.lang.RuntimeException: [unresolved >> dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not >> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was >> required from org.apache.spark#spark-submit-parent;1.0 default] >> at >> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011) >> at >> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12 >> >> Do I need to include some default configuration? If so where and how >> should I do it? All other packages I looked at had no such thing. >> >> Btw, I am using spark-shell from a >> >> >
Re: Slow stage?
Hi Simone, I'm afraid I don't have an answer to your question. However I noticed the DAG figures in the attachment. How did you generate these? I am myself working on a project in which I am trying to generate visual representations of the spark scheduler DAG. If such a tool already exists, I would greatly appreciate any pointers. thanks, --Jakob On 9 November 2015 at 13:52, Simone Franziniwrote: > Hi all, > > I have a complex Spark job that is broken up in many stages. > I have a couple of stages that are particularly slow: each task takes > around 6 - 7 minutes. This stage is fairly complex as you can see from the > attached DAG. However, by construction each of the outer joins will have > only 0 or 1 record on each side. > It seems to me that this stage is really slow. However, the execution > timeline shows that almost 100% of the time is spent in actual execution > time not reading/writing to/from disk or in other overheads. > Does this make any sense? I.e. is it just that these operations are slow > (and notice task size in term of data seems small)? > Is the pattern of operations in the DAG good or is it terribly suboptimal? > If so, how could it be improved? > > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
Re: Spark Packages Configuration Not Found
(accidental keyboard-shortcut sent the message) ... spark-shell from the spark 1.5.2 binary distribution. Also, running "spPublishLocal" has the same effect. thanks, --Jakob On 10 November 2015 at 14:55, Jakob Odersky <joder...@gmail.com> wrote: > Hi, > I ran into in error trying to run spark-shell with an external package > that I built and published locally > using the spark-package sbt plugin ( > https://github.com/databricks/sbt-spark-package). > > To my understanding, spark packages can be published simply as maven > artifacts, yet after running "publishLocal" in my package project ( > https://github.com/jodersky/spark-paperui), the following command > >park-shell --packages ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT > > gives an error: > > :: > > :: UNRESOLVED DEPENDENCIES :: > > :: > > :: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not > found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was > required from org.apache.spark#spark-submit-parent;1.0 default > > :: > > > :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS > Exception in thread "main" java.lang.RuntimeException: [unresolved > dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not > found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was > required from org.apache.spark#spark-submit-parent;1.0 default] > at > org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12 > > Do I need to include some default configuration? If so where and how > should I do it? All other packages I looked at had no such thing. > > Btw, I am using spark-shell from a > >
Spark Packages Configuration Not Found
Hi, I ran into in error trying to run spark-shell with an external package that I built and published locally using the spark-package sbt plugin ( https://github.com/databricks/sbt-spark-package). To my understanding, spark packages can be published simply as maven artifacts, yet after running "publishLocal" in my package project ( https://github.com/jodersky/spark-paperui), the following command park-shell --packages ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT gives an error: :: :: UNRESOLVED DEPENDENCIES :: :: :: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was required from org.apache.spark#spark-submit-parent;1.0 default :: :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was required from org.apache.spark#spark-submit-parent;1.0 default] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12 Do I need to include some default configuration? If so where and how should I do it? All other packages I looked at had no such thing. Btw, I am using spark-shell from a
Re: State of the Build
Reposting to the list... Thanks for all the feedback everyone, I get a clearer picture of the reasoning and implications now. Koert, according to your post in this thread http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023, it is apparently very easy to change the maven resolution mechanism to the ivy one. Patrick, would this not help with the problems you described? On 5 November 2015 at 23:23, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Jakob, > > The builds in Spark are largely maintained by me, Sean, and Michael > Armbrust (for SBT). For historical reasons, Spark supports both a Maven and > SBT build. Maven is the build of reference for packaging Spark and is used > by many downstream packagers and to build all Spark releases. SBT is more > often used by developers. Both builds inherit from the same pom files (and > rely on the same profiles) to minimize maintenance complexity of Spark's > very complex dependency graph. > > If you are looking to make contributions that help with the build, I am > happy to point you towards some things that are consistent maintenance > headaches. There are two major pain points right now that I'd be thrilled > to see fixes for: > > 1. SBT relies on a different dependency conflict resolution strategy than > maven - causing all kinds of headaches for us. I have heard that newer > versions of SBT can (maybe?) use Maven as a dependency resolver instead of > Ivy. This would make our life so much better if it were possible, either by > virtue of upgrading SBT or somehow doing this ourselves. > > 2. We don't have a great way of auditing the net effect of dependency > changes when people make them in the build. I am working on a fairly clunky > patch to do this here: > > https://github.com/apache/spark/pull/8531 > > It could be done much more nicely using SBT, but only provided (1) is > solved. > > Doing a major overhaul of the sbt build to decouple it from pom files, I'm > not sure that's the best place to start, given that we need to continue to > support maven - the coupling is intentional. But getting involved in the > build in general would be completely welcome. > > - Patrick > > On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen <so...@cloudera.com> wrote: > >> Maven isn't 'legacy', or supported for the benefit of third parties. >> SBT had some behaviors / problems that Maven didn't relative to what >> Spark needs. SBT is a development-time alternative only, and partly >> generated from the Maven build. >> >> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers <ko...@tresata.com> wrote: >> > People who do upstream builds of spark (think bigtop and hadoop >> distros) are >> > used to legacy systems like maven, so maven is the default build. I >> don't >> > think it will change. >> > >> > Any improvements for the sbt build are of course welcome (it is still >> used >> > by many developers), but i would not do anything that increases the >> burden >> > of maintaining two build systems. >> > >> > On Nov 5, 2015 18:38, "Jakob Odersky" <joder...@gmail.com> wrote: >> >> >> >> Hi everyone, >> >> in the process of learning Spark, I wanted to get an overview of the >> >> interaction between all of its sub-projects. I therefore decided to >> have a >> >> look at the build setup and its dependency management. >> >> Since I am alot more comfortable using sbt than maven, I decided to >> try to >> >> port the maven configuration to sbt (with the help of automated tools). >> >> This led me to a couple of observations and questions on the build >> system >> >> design: >> >> >> >> First, currently, there are two build systems, maven and sbt. Is there >> a >> >> preferred tool (or future direction to one)? >> >> >> >> Second, the sbt build also uses maven "profiles" requiring the use of >> >> specific commandline parameters when starting sbt. Furthermore, since >> it >> >> relies on maven poms, dependencies to the scala binary version (_2.xx) >> are >> >> hardcoded and require running an external script when switching >> versions. >> >> Sbt could leverage built-in constructs to support cross-compilation and >> >> emulate profiles with configurations and new build targets. This would >> >> remove external state from the build (in that no extra steps need to be >> >> performed in a particular order to generate artifacts for a new >> >> configuration) and therefore improve stability and build >> reproducibility >> >> (maybe even build performance). I was wondering if implementing such >> >> functionality for the sbt build would be welcome? >> >> >> >> thanks, >> >> --Jakob >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >
State of the Build
Hi everyone, in the process of learning Spark, I wanted to get an overview of the interaction between all of its sub-projects. I therefore decided to have a look at the build setup and its dependency management. Since I am alot more comfortable using sbt than maven, I decided to try to port the maven configuration to sbt (with the help of automated tools). This led me to a couple of observations and questions on the build system design: First, currently, there are two build systems, maven and sbt. Is there a preferred tool (or future direction to one)? Second, the sbt build also uses maven "profiles" requiring the use of specific commandline parameters when starting sbt. Furthermore, since it relies on maven poms, dependencies to the scala binary version (_2.xx) are hardcoded and require running an external script when switching versions. Sbt could leverage built-in constructs to support cross-compilation and emulate profiles with configurations and new build targets. This would remove external state from the build (in that no extra steps need to be performed in a particular order to generate artifacts for a new configuration) and therefore improve stability and build reproducibility (maybe even build performance). I was wondering if implementing such functionality for the sbt build would be welcome? thanks, --Jakob
Re: Turn off logs in spark-sql shell
[repost to mailing list, ok I gotta really start hitting that reply-to-all-button] Hi, Spark uses Log4j which unfortunately does not support fine-grained configuration over the command line. Therefore some configuration file editing will have to be done (unless you want to configure Loggers programatically, which however would require editing spark-sql). Nevertheless, there seems to be a kind of "trick" where you can substitute java environment variables in the log4j configuration file. See this stackoverflow answer for details http://stackoverflow.com/a/31208461/917519. After editing the properties file, you can then start spark-sql with: bin/spark-sql --conf "spark.driver.extraJavaOptions=-Dmy.logger.threshold=OFF" this is untested but I hop it helps, --Jakob On 15 October 2015 at 22:56, Muhammad Ahsanwrote: > Hello Everyone! > > I want to know how to turn off logging during starting *spark-sql shell* > without changing log4j configuration files. In normal spark-shell I can use > the following commands > > import org.apache.log4j.Loggerimport org.apache.log4j.Level > Logger.getLogger("org").setLevel(Level.OFF)Logger.getLogger("akka").setLevel(Level.OFF) > > > Thanks > > -- > Thanks > > Muhammad Ahsan > >
Re: Insight into Spark Packages
[repost to mailing list] I don't know much about packages, but have you heard about the sbt-spark-package plugin? Looking at the code, specifically https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackagePlugin.scala, might give you insight on the details about package creation. Package submission is implemented in https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackageHttp.scala At a quick first overview, it seems packages are bundled as maven artifacts and then posted to "http://spark-packages.org/api/submit-release;. Hope this helps for your last question On 16 October 2015 at 08:43, jeff saremiwrote: > I'm looking for any form of documentation on Spark Packages > Specifically, what happens when one issues a command like the following: > > > $SPARK_HOME/bin/spark-shell --packages RedisLabs:spark-redis:0.1.0 > > > Something like an architecture diagram. > What happens when this package gets submitted? > Does this need to be done each time? > Is that package downloaded each time? > Is there a persistent cache on the server (master i guess)? > Can these packages be installed offline with no Internet connectivity? > How does a package get created? > > and so on and so forth >
Re: Building with SBT and Scala 2.11
[Repost to mailing list] Hey, Sorry about the typo, I of course meant hadoop-2.6, not 2.11. I suspect something bad happened with my Ivy cache, since when reverting back to scala 2.10, I got a very strange IllegalStateException (something something IvyNode, I can't remember the details). Kilking the cache made 2.10 work at least, I'll retry with 2.11 Thx for your help On Oct 14, 2015 6:52 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: > Adrian: > Likely you were using maven. > > Jakob's report was with sbt. > > Cheers > > On Tue, Oct 13, 2015 at 10:05 PM, Adrian Tanase <atan...@adobe.com> wrote: > >> Do you mean hadoop-2.4 or 2.6? not sure if this is the issue but I'm also >> compiling the 1.5.1 version with scala 2.11 and hadoop 2.6 and it works. >> >> -adrian >> >> Sent from my iPhone >> >> On 14 Oct 2015, at 03:53, Jakob Odersky <joder...@gmail.com> wrote: >> >> I'm having trouble compiling Spark with SBT for Scala 2.11. The command I >> use is: >> >> dev/change-version-to-2.11.sh >> build/sbt -Pyarn -Phadoop-2.11 -Dscala-2.11 >> >> followed by >> >> compile >> >> in the sbt shell. >> >> The error I get specifically is: >> >> spark/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: >> no valid targets for annotation on value conf - it is discarded unused. You >> may specify targets with meta-annotations, e.g. @(transient @param) >> [error] private[netty] class NettyRpcEndpointRef(@transient conf: >> SparkConf) >> [error] >> >> However I am also getting a large amount of deprecation warnings, making >> me wonder if I am supplying some incompatible/unsupported options to sbt. I >> am using Java 1.8 and the latest Spark master sources. >> Does someone know if I am doing anything wrong or is the sbt build broken? >> >> thanks for you help, >> --Jakob >> >> >
[jira] [Commented] (SPARK-11110) Scala 2.11 build fails due to compiler errors
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957721#comment-14957721 ] Jakob Odersky commented on SPARK-0: --- exactly what I got, I'll take a look at it > Scala 2.11 build fails due to compiler errors > - > > Key: SPARK-0 > URL: https://issues.apache.org/jira/browse/SPARK-0 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Patrick Wendell > > Right now the 2.11 build is failing due to compiler errors in SBT (though not > in Maven). I have updated our 2.11 compile test harness to catch this. > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull > {code} > [error] > /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: > no valid targets for annotation on value conf - it is discarded unused. You > may specify targets with meta-annotations, e.g. @(transient @param) > [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf) > [error] > {code} > This is one error, but there may be others past this point (the compile fails > fast). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11122) Fatal warnings in sbt are not displayed as such
Jakob Odersky created SPARK-11122: - Summary: Fatal warnings in sbt are not displayed as such Key: SPARK-11122 URL: https://issues.apache.org/jira/browse/SPARK-11122 Project: Spark Issue Type: Bug Components: Build Reporter: Jakob Odersky The sbt script treats warnings (except dependency warnings) as errors, however there is no visual difference between errors and fatal warnings, thus leading to very confusing debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11094) Test runner script fails to parse Java version.
Jakob Odersky created SPARK-11094: - Summary: Test runner script fails to parse Java version. Key: SPARK-11094 URL: https://issues.apache.org/jira/browse/SPARK-11094 Project: Spark Issue Type: Bug Components: Tests Environment: Debian testing Reporter: Jakob Odersky Priority: Minor Running {{dev/run-tests}} fails when the local Java version has an extra string appended to the version. For example, in Debian Stretch (currently testing distribution), {{java -version}} yields "1.8.0_66-internal" where the extra part "-internal" causes the script to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11092) Add source URLs to API documentation.
Jakob Odersky created SPARK-11092: - Summary: Add source URLs to API documentation. Key: SPARK-11092 URL: https://issues.apache.org/jira/browse/SPARK-11092 Project: Spark Issue Type: Documentation Components: Build, Documentation Reporter: Jakob Odersky Priority: Trivial It would be nice to have source URLs in the Spark scaladoc, similar to the standard library (e.g. http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). The fix should be really simple, just adding a line to the sbt unidoc settings. I'll use the github repo url "https://github.com/apache/spark/tree/v${version}/${FILE_PATH;). Feel free to tell me if I should use something else as base url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.
[ https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-11092: -- Description: It would be nice to have source URLs in the Spark scaladoc, similar to the standard library (e.g. http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). The fix should be really simple, just adding a line to the sbt unidoc settings. I'll use the github repo url "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free to tell me if I should use something else as base url. was: It would be nice to have source URLs in the Spark scaladoc, similar to the standard library (e.g. http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). The fix should be really simple, just adding a line to the sbt unidoc settings. I'll use the github repo url "https://github.com/apache/spark/tree/v${version}/${FILE_PATH;). Feel free to tell me if I should use something else as base url. > Add source URLs to API documentation. > - > > Key: SPARK-11092 > URL: https://issues.apache.org/jira/browse/SPARK-11092 > Project: Spark > Issue Type: Documentation > Components: Build, Documentation >Reporter: Jakob Odersky >Priority: Trivial > > It would be nice to have source URLs in the Spark scaladoc, similar to the > standard library (e.g. > http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). > The fix should be really simple, just adding a line to the sbt unidoc > settings. > I'll use the github repo url > "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free > to tell me if I should use something else as base url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11092) Add source URLs to API documentation.
[ https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955996#comment-14955996 ] Jakob Odersky commented on SPARK-11092: --- I can't set the assignee field, though I'd like to resolve this issue. > Add source URLs to API documentation. > - > > Key: SPARK-11092 > URL: https://issues.apache.org/jira/browse/SPARK-11092 > Project: Spark > Issue Type: Documentation > Components: Build, Documentation > Reporter: Jakob Odersky >Priority: Trivial > > It would be nice to have source URLs in the Spark scaladoc, similar to the > standard library (e.g. > http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). > The fix should be really simple, just adding a line to the sbt unidoc > settings. > I'll use the github repo url > "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free > to tell me if I should use something else as base url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.
[ https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-11092: -- Description: It would be nice to have source URLs in the Spark scaladoc, similar to the standard library (e.g. http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). The fix should be really simple, just adding a line to the sbt unidoc settings. I'll use the github repo url bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH} Feel free to tell me if I should use something else as base url. was: It would be nice to have source URLs in the Spark scaladoc, similar to the standard library (e.g. http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). The fix should be really simple, just adding a line to the sbt unidoc settings. I'll use the github repo url "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free to tell me if I should use something else as base url. > Add source URLs to API documentation. > - > > Key: SPARK-11092 > URL: https://issues.apache.org/jira/browse/SPARK-11092 > Project: Spark > Issue Type: Documentation > Components: Build, Documentation > Reporter: Jakob Odersky >Priority: Trivial > > It would be nice to have source URLs in the Spark scaladoc, similar to the > standard library (e.g. > http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List). > The fix should be really simple, just adding a line to the sbt unidoc > settings. > I'll use the github repo url > bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH} > Feel free to tell me if I should use something else as base url. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Spark Event Listener
the path of the source file defining the event API is `core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala` On 13 October 2015 at 16:29, Jakob Odersky <joder...@gmail.com> wrote: > Hi, > I came across the spark listener API while checking out possible UI > extensions recently. I noticed that all events inherit from a sealed trait > `SparkListenerEvent` and that a SparkListener has a corresponding > `onEventXXX(event)` method for every possible event. > > Considering that events inherit from a sealed trait and thus all events > are known during compile-time, what is the rationale of using specific > methods for every event rather than a single method that would let a client > pattern match on the type of event? > > I don't know the internals of the pattern matcher, but again, considering > events are sealed, I reckon that matching performance should not be an > issue. > > thanks, > --Jakob >
Spark Event Listener
Hi, I came across the spark listener API while checking out possible UI extensions recently. I noticed that all events inherit from a sealed trait `SparkListenerEvent` and that a SparkListener has a corresponding `onEventXXX(event)` method for every possible event. Considering that events inherit from a sealed trait and thus all events are known during compile-time, what is the rationale of using specific methods for every event rather than a single method that would let a client pattern match on the type of event? I don't know the internals of the pattern matcher, but again, considering events are sealed, I reckon that matching performance should not be an issue. thanks, --Jakob
Building with SBT and Scala 2.11
I'm having trouble compiling Spark with SBT for Scala 2.11. The command I use is: dev/change-version-to-2.11.sh build/sbt -Pyarn -Phadoop-2.11 -Dscala-2.11 followed by compile in the sbt shell. The error I get specifically is: spark/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308: no valid targets for annotation on value conf - it is discarded unused. You may specify targets with meta-annotations, e.g. @(transient @param) [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf) [error] However I am also getting a large amount of deprecation warnings, making me wonder if I am supplying some incompatible/unsupported options to sbt. I am using Java 1.8 and the latest Spark master sources. Does someone know if I am doing anything wrong or is the sbt build broken? thanks for you help, --Jakob
Live UI
Hi everyone, I am just getting started working on spark and was thinking of a first way to contribute whilst still trying to wrap my head around the codebase. Exploring the web UI, I noticed it is a classic request-response website, requiring manual refresh to get the latest data. I think it would be great to have a "live" website where data would be displayed real-time without the need to hit the refresh button. I would be very interested in contributing this feature if it is acceptable. Specifically, I was thinking of using websockets with a ScalaJS front-end. Please let me know if this design would be welcome or if it introduces unwanted dependencies, I'll be happy to discuss this further in detail. thanks for your feedback, --Jakob
[jira] [Commented] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951419#comment-14951419 ] Jakob Odersky commented on SPARK-10876: --- I'm not sure what you mean. The UI already has a "Duration" field for every job. > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-10876: -- Comment: was deleted (was: I'm not sure what you mean. The UI already has a "Duration" field for every job.) > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951434#comment-14951434 ] Jakob Odersky commented on SPARK-10876: --- Do you mean to display the total run time of uncompleted apps? Completed apps already have a "Duration" field > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10306) sbt hive/update issue
[ https://issues.apache.org/jira/browse/SPARK-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951362#comment-14951362 ] Jakob Odersky commented on SPARK-10306: --- Same issue here > sbt hive/update issue > - > > Key: SPARK-10306 > URL: https://issues.apache.org/jira/browse/SPARK-10306 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: holdenk >Priority: Trivial > > Running sbt hive/update sometimes results in the error "impossible to get > artifacts when data has not been loaded. IvyNode = > org.scala-lang#scala-library;2.10.3" which is unfortunate since it is always > evicted by 2.10.4 currently. An easy (but maybe not super clean) solution > would be adding 2.10.3 as a dependency which will then get evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org