[jira] [Comment Edited] (SPARK-7286) Precedence of operator not behaving properly

2015-11-19 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014647#comment-15014647
 ] 

Jakob Odersky edited comment on SPARK-7286 at 11/19/15 10:52 PM:
-

I just realized that <> would also have a different precedence that ===

That pretty much limits our options to 1 or 3


was (Author: jodersky):
I just realized that <> would also have a different precedence that ===

> Precedence of operator not behaving properly
> 
>
> Key: SPARK-7286
> URL: https://issues.apache.org/jira/browse/SPARK-7286
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
> Environment: Linux
>Reporter: DevilJetha
>Priority: Critical
>
> The precedence of the operators ( especially with !== and && ) in Dataframe 
> Columns seems to be messed up.
> Example Snippet
> .where( $"col1" === "val1" && ($"col2"  !== "val2")  ) works fine.
> whereas .where( $"col1" === "val1" && $"col2"  !== "val2"  )
> evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11288) Specify the return type for UDF in Scala

2015-11-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011855#comment-15011855
 ] 

Jakob Odersky commented on SPARK-11288:
---

Tell me if I'm missing the point, but you can also pass type parameters 
explicitly when calling udfs:
{code}
df.udf[ReturnType, Arg1, Arg2]((arg1, arg2) => ret)
{code}

> Specify the return type for UDF in Scala
> 
>
> Key: SPARK-11288
> URL: https://issues.apache.org/jira/browse/SPARK-11288
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> The return type is figured out from the function signature, maybe it's not 
> that user want, for example, the default DecimalType is (38, 18), user may 
> want (38, 0).
> The older deprecated one callUDF can do that, we should figure out  a way to 
> support that.
> cc [~marmbrus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11832) Spark shell does not work from sbt with scala 2.11

2015-11-18 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-11832:
-

 Summary: Spark shell does not work from sbt with scala 2.11
 Key: SPARK-11832
 URL: https://issues.apache.org/jira/browse/SPARK-11832
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Reporter: Jakob Odersky
Priority: Minor


Using Scala 2.11, running the spark shell task from within sbt fails, however 
running it from a distribution works.

h3. Steps to reproduce
# change scala version {{dev/change-scala-version.sh 2.11}}
# start sbt {{build/sbt -Dscala-2.11}}
# run shell task {{sparkShell}}

h3. Stacktrace
{code}
Failed to initialize compiler: object scala in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException
at 
scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
at 
scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:894)
at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:893)
at 
scala.tools.nsc.interpreter.IMain$Request.importsPreamble$lzycompute(IMain.scala:893)
at 
scala.tools.nsc.interpreter.IMain$Request.importsPreamble(IMain.scala:893)
at 
scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:915)
at 
scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1325)
at 
scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1324)
at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64)
at 
scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1324)
at 
scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:906)
at 
scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:995)
at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:990)
at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:577)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:563)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:802)
at 
scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:836)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:694)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404)
at 
org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcZ$sp(SparkILoop.scala:39)
at 
org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38)
at 
org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:213)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:38)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:922)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
at 
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
at 
scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:911)
at org.apache.spark.repl.Main$.main(Main.scala:49)
at org.apache.spark.repl.Main.main(Main.scala)
{code}

h3. Workaround
In {{repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}, append 
to the repl settings {{s.usejavacp.value = true}}

I haven't looked into the details of {{scala.tools.nsc.Settings}}, maybe 
someone has an idea of what's going on.
Also, to be clear, this bug only affects scala 2.11 from within sbt; calling 
spark-shell from a distribution or from anywhere using scala 2.10 works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7286) Precedence of operator not behaving properly

2015-11-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012643#comment-15012643
 ] 

Jakob Odersky commented on SPARK-7286:
--

Going through the code, I saw that catalyst also defines !== in its dsl, so it 
seems this operator has quite wide-spread usage.
Would deprecating it in favor of something else be a viable option?

> Precedence of operator not behaving properly
> 
>
> Key: SPARK-7286
> URL: https://issues.apache.org/jira/browse/SPARK-7286
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
> Environment: Linux
>Reporter: DevilJetha
>Priority: Critical
>
> The precedence of the operators ( especially with !== and && ) in Dataframe 
> Columns seems to be messed up.
> Example Snippet
> .where( $"col1" === "val1" && ($"col2"  !== "val2")  ) works fine.
> whereas .where( $"col1" === "val1" && $"col2"  !== "val2"  )
> evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7286) Precedence of operator not behaving properly

2015-11-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012532#comment-15012532
 ] 

Jakob Odersky commented on SPARK-7286:
--

The problem is that !== is recognized as an assignment operator (according to 
§6.12.4 of the scala specification 
http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower 
precedence than any other operator.
A potential fix could be to rename !== to =!=

> Precedence of operator not behaving properly
> 
>
> Key: SPARK-7286
> URL: https://issues.apache.org/jira/browse/SPARK-7286
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
> Environment: Linux
>Reporter: DevilJetha
>Priority: Critical
>
> The precedence of the operators ( especially with !== and && ) in Dataframe 
> Columns seems to be messed up.
> Example Snippet
> .where( $"col1" === "val1" && ($"col2"  !== "val2")  ) works fine.
> whereas .where( $"col1" === "val1" && $"col2"  !== "val2"  )
> evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7286) Precedence of operator not behaving properly

2015-11-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012532#comment-15012532
 ] 

Jakob Odersky edited comment on SPARK-7286 at 11/19/15 1:16 AM:


The problem is that !== is recognized as an assignment operator (according to 
§6.12.4 of the scala specification 
http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower 
precedence than any other operator.
A potential fix could be to rename !== to
 =!=


was (Author: jodersky):
The problem is that !== is recognized as an assignment operator (according to 
§6.12.4 of the scala specification 
http://www.scala-lang.org/docu/files/ScalaReference.pdf) and thus has lower 
precedence than any other operator.
A potential fix could be to rename !== to =!=

> Precedence of operator not behaving properly
> 
>
> Key: SPARK-7286
> URL: https://issues.apache.org/jira/browse/SPARK-7286
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
> Environment: Linux
>Reporter: DevilJetha
>Priority: Critical
>
> The precedence of the operators ( especially with !== and && ) in Dataframe 
> Columns seems to be messed up.
> Example Snippet
> .where( $"col1" === "val1" && ($"col2"  !== "val2")  ) works fine.
> whereas .where( $"col1" === "val1" && $"col2"  !== "val2"  )
> evaluates as ( $"col1" === "val1" && $"col2" ) !== "val2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11832) Spark shell does not work from sbt with scala 2.11

2015-11-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012372#comment-15012372
 ] 

Jakob Odersky commented on SPARK-11832:
---

I'm on to something, it seems as though the SBT actually passes a '-usejavacp' 
argument to the repl, however the spark shell scala 2.11 implementation ignores 
all arguments. I'm working on a fix

> Spark shell does not work from sbt with scala 2.11
> --
>
> Key: SPARK-11832
> URL: https://issues.apache.org/jira/browse/SPARK-11832
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>    Reporter: Jakob Odersky
>Priority: Minor
>
> Using Scala 2.11, running the spark shell task from within sbt fails, however 
> running it from a distribution works.
> h3. Steps to reproduce
> # change scala version {{dev/change-scala-version.sh 2.11}}
> # start sbt {{build/sbt -Dscala-2.11}}
> # run shell task {{sparkShell}}
> h3. Stacktrace
> {code}
> Failed to initialize compiler: object scala in compiler mirror not found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programmatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.NullPointerException
>   at 
> scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:894)
>   at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:893)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.importsPreamble$lzycompute(IMain.scala:893)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.importsPreamble(IMain.scala:893)
>   at 
> scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:915)
>   at 
> scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1325)
>   at 
> scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1324)
>   at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64)
>   at 
> scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1324)
>   at 
> scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:906)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:995)
>   at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:990)
>   at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:577)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:563)
>   at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:802)
>   at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:836)
>   at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:694)
>   at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:404)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcZ$sp(SparkILoop.scala:39)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:38)
>   at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:213)
>   at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:38)
>   at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:922)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:911)
>   at 
> scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
>   at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:911)
>   at org.apache.spark.repl.Main$.main(Main.scala:49)
>   at org.apache.spark.repl.Main.main(Main.scala)
> {code}
> h3. Workaround
> In {{repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala}}, 
> append to the repl settings {{s.usejavacp.value = true}}
> I haven't looked into the details of {{scala.tools.nsc.Settings}}, maybe 
> someone has an idea of what's going on.
> Also, to be clear, this bug only affects scala 2.11 from within sbt; calling 
> spark-shell from a distribution or from anywhere using scala 2.10 works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9875) Do not evaluate foreach and foreachPartition with count

2015-11-17 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009891#comment-15009891
 ] 

Jakob Odersky commented on SPARK-9875:
--

I'm not sure I understand the issue. Are you trying to force running {{func}} 
on every partition to achieve some kind of side effect?

> Do not evaluate foreach and foreachPartition with count
> ---
>
> Key: SPARK-9875
> URL: https://issues.apache.org/jira/browse/SPARK-9875
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Zoltán Zvara
>Priority: Minor
>
> It is evident, that the summation inside count will result in an overhead, 
> which would be nice to remove from the current execution.
> {{self.mapPartitions(func).count()  # Force evaluation}} @{{rdd.py}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11765) Avoid assign UI port between browser unsafe ports (or just 4045: lockd)

2015-11-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007740#comment-15007740
 ] 

Jakob Odersky commented on SPARK-11765:
---

I think adding a "blacklist" of ports could lead to confusing debugging 
experiences with no real value gained.
You can always explicitly set the web UI's port with the configuration 
parameter {{spark.ui.port}}, as explained here 
http://spark.apache.org/docs/latest/configuration.html

> Avoid assign UI port between browser unsafe ports (or just 4045: lockd)
> ---
>
> Key: SPARK-11765
> URL: https://issues.apache.org/jira/browse/SPARK-11765
> Project: Spark
>  Issue Type: Improvement
>Reporter: Jungtaek Lim
>Priority: Minor
>
> Spark UI port starts on 4040, and UI port is incremented by 1 for every 
> confliction.
> In our use case, we have some drivers running at the same time, which makes 
> UI port to be assigned to 4045, which is treated to unsafe port for chrome 
> and mozilla.
> http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util.cc?view=markup
> http://www-archive.mozilla.org/projects/netlib/PortBanning.html#portlist
> We would like to avoid assigning UI to these ports, or just avoid assigning 
> UI port to 4045 which is too close to default port.
> If we'd like to accept this idea, I'm happy to work on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments

2015-11-12 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003600#comment-15003600
 ] 

Jakob Odersky commented on SPARK-11688:
---

Registering a UDF requires a function (instance of FunctionX), however only 
defs support default parameters. Let me illustrate:

{{hasSubstring _}}  is equivalent to {{(x: String, y: String, z: Int) => 
hasSubstring(x, y, z)}}, which is only syntactic sugar for

{code}
new Function3[Long, String, String, Int] {
  def apply(x: String, y: String, z: Int) = hasSubstring(x, y, z)
}
{code}

Therefore, the error is expected since you are trying to call a Function3 with 
only two parameters.

With the current API, and without trying some macro-magic, I see no way of 
enabling default parameters for UDFs. Maybe a changing the register API to 
something like
register((X, Y, Z) => R, defaults) could work where defaults would supply the 
arguments to any non-specified parameters when the UDF is called. However this 
could also lead to some very subtle errors as any substituted default 
parameters would have a value as specified during registration, potentially 
different from a default parameter specified in a corresponding def declaration.

> UDF's doesn't work when it has a default arguments
> --
>
> Key: SPARK-11688
> URL: https://issues.apache.org/jira/browse/SPARK-11688
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: M Bharat lal
>Priority: Minor
>
> Use case:
> 
> Suppose we have a function which accepts three parameters (string, subString 
> and frmIndex which has 0 default value )
> def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = 
> string.indexOf(subString, frmIndex)
> above function works perfectly if I dont pass frmIndex parameter
> scala> hasSubstring("Scala", "la")
> res0: Long = 3
> But, when I register the above function as UDF (successfully registered) and 
> call the same without  passing frmIndex parameter got the below exception
> scala> val df  = 
> sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", 
> "gfh"))).toDF("c1", "c2", "c3")
> df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string]
> scala> df.show
> +-+-+-+
> |   c1|   c2|   c3|
> +-+-+-+
> |scala|Spark|MLlib|
> |  abc|  def|  gfh|
> +-+-+-+
> scala> sqlContext.udf.register("hasSubstring", hasSubstring _ )
> res3: org.apache.spark.sql.UserDefinedFunction = 
> UserDefinedFunction(,LongType,List())
> scala> val result = df.as("i0").withColumn("subStringIndex", 
> callUDF("hasSubstring", $"i0.c1", lit("la")))
> org.apache.spark.sql.AnalysisException: undefined function hasSubstring;
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
>   at scala.util.Try.getOrElse(Try.scala:77)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments

2015-11-12 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003603#comment-15003603
 ] 

Jakob Odersky commented on SPARK-11688:
---

As a workaround, you could register your function twice:

{{sqlContext.udf.register("hasSubstring", (x: String, y: String) => 
hasSubstring(x, y) ) //this will use the default argument}}
{{sqlContext.udf.register("hasSubstringIndex", (x: String, y: String, z: Int) 
=> hasSubstring(x, y, z) )}}

and call them accordingly

> UDF's doesn't work when it has a default arguments
> --
>
> Key: SPARK-11688
> URL: https://issues.apache.org/jira/browse/SPARK-11688
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: M Bharat lal
>Priority: Minor
>
> Use case:
> 
> Suppose we have a function which accepts three parameters (string, subString 
> and frmIndex which has 0 default value )
> def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = 
> string.indexOf(subString, frmIndex)
> above function works perfectly if I dont pass frmIndex parameter
> scala> hasSubstring("Scala", "la")
> res0: Long = 3
> But, when I register the above function as UDF (successfully registered) and 
> call the same without  passing frmIndex parameter got the below exception
> scala> val df  = 
> sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", 
> "gfh"))).toDF("c1", "c2", "c3")
> df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string]
> scala> df.show
> +-+-+-+
> |   c1|   c2|   c3|
> +-+-+-+
> |scala|Spark|MLlib|
> |  abc|  def|  gfh|
> +-+-+-+
> scala> sqlContext.udf.register("hasSubstring", hasSubstring _ )
> res3: org.apache.spark.sql.UserDefinedFunction = 
> UserDefinedFunction(,LongType,List())
> scala> val result = df.as("i0").withColumn("subStringIndex", 
> callUDF("hasSubstring", $"i0.c1", lit("la")))
> org.apache.spark.sql.AnalysisException: undefined function hasSubstring;
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
>   at scala.util.Try.getOrElse(Try.scala:77)
>   at 
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jakob Odersky
Hey Jeff,
Do you mean reading from multiple text files? In that case, as a
workaround, you can use the RDD#union() (or ++) method to concatenate
multiple rdds. For example:

val lines1 = sc.textFile("file1")
val lines2 = sc.textFile("file2")

val rdd = lines1 union lines2

regards,
--Jakob

On 11 November 2015 at 01:20, Jeff Zhang  wrote:

> Although user can use the hdfs glob syntax to support multiple inputs. But
> sometimes, it is not convenient to do that. Not sure why there's no api
> of SparkContext#textFiles. It should be easy to implement that. I'd love to
> create a ticket and contribute for that if there's no other consideration
> that I don't know.
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jakob Odersky
Hey Jeff,
Do you mean reading from multiple text files? In that case, as a
workaround, you can use the RDD#union() (or ++) method to concatenate
multiple rdds. For example:

val lines1 = sc.textFile("file1")
val lines2 = sc.textFile("file2")

val rdd = lines1 union lines2

regards,
--Jakob

On 11 November 2015 at 01:20, Jeff Zhang  wrote:

> Although user can use the hdfs glob syntax to support multiple inputs. But
> sometimes, it is not convenient to do that. Not sure why there's no api
> of SparkContext#textFiles. It should be easy to implement that. I'd love to
> create a ticket and contribute for that if there's no other consideration
> that I don't know.
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Status of 2.11 support?

2015-11-11 Thread Jakob Odersky
Hi Sukant,

Regarding the first point: when building spark during my daily work, I
always use Scala 2.11 and have only run into build problems once. Assuming
a working build I have never had any issues with the resulting artifacts.

More generally however, I would advise you to go with Scala 2.11 under all
circumstances. Scala 2.10 has reached end-of-life and, from what I make out
of your question, you have the opportunity to switch to a newer technology,
so why stay with legacy? Furthermore, Scala 2.12 will be coming out early
next year, so I reckon that Spark will switch to Scala 2.11 by default
pretty soon*.

regards,
--Jakob

*I'm myself pretty new to the Spark community so please don't take my words
on it as gospel


On 11 November 2015 at 15:25, Ted Yu  wrote:

> For #1, the published jars are usable.
> However, you should build from source for your specific combination of
> profiles.
>
> Cheers
>
> On Wed, Nov 11, 2015 at 3:22 PM, shajra-cogscale <
> sha...@cognitivescale.com> wrote:
>
>> Hi,
>>
>> My company isn't using Spark in production yet, but we are using a bit of
>> Scala.  There's a few people who have wanted to be conservative and keep
>> our
>> Scala at 2.10 in the event we start using Spark.  There are others who
>> want
>> to move to 2.11 with the idea that by the time we're using Spark it will
>> be
>> more or less 2.11-ready.
>>
>> It's hard to make a strong judgement on these kinds of things without
>> getting some community feedback.
>>
>> Looking through the internet I saw:
>>
>> 1) There's advice to build 2.11 packages from source -- but also published
>> jars to Maven Central for 2.11.  Are these jars on Maven Central usable
>> and
>> the advice to build from source outdated?
>>
>> 2)  There's a note that the JDBC RDD isn't 2.11-compliant.  This is okay
>> for
>> us, but is there anything else to worry about?
>>
>> It would be nice to get some answers to those questions as well as any
>> other
>> feedback from maintainers or anyone that's used Spark with Scala 2.11
>> beyond
>> simple examples.
>>
>> Thanks,
>> Sukant
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-2-11-support-tp25362.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Spark Packages Configuration Not Found

2015-11-11 Thread Jakob Odersky
As another, general question, are spark packages the go-to way of extending
spark functionality? In my specific use-case I would like to start spark
(be it spark-shell or other) and hook into the listener API.
Since I wasn't able to find much documentation about spark packages, I was
wondering if they are still actively being developed?

thanks,
--Jakob

On 10 November 2015 at 14:58, Jakob Odersky <joder...@gmail.com> wrote:

> (accidental keyboard-shortcut sent the message)
> ... spark-shell from the spark 1.5.2 binary distribution.
> Also, running "spPublishLocal" has the same effect.
>
> thanks,
> --Jakob
>
> On 10 November 2015 at 14:55, Jakob Odersky <joder...@gmail.com> wrote:
>
>> Hi,
>> I ran into in error trying to run spark-shell with an external package
>> that I built and published locally
>> using the spark-package sbt plugin (
>> https://github.com/databricks/sbt-spark-package).
>>
>> To my understanding, spark packages can be published simply as maven
>> artifacts, yet after running "publishLocal" in my package project (
>> https://github.com/jodersky/spark-paperui), the following command
>>
>>park-shell --packages
>> ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT
>>
>> gives an error:
>>
>> ::
>>
>> ::  UNRESOLVED DEPENDENCIES ::
>>
>> ::
>>
>> :: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
>> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
>> required from org.apache.spark#spark-submit-parent;1.0 default
>>
>> ::
>>
>>
>> :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>> Exception in thread "main" java.lang.RuntimeException: [unresolved
>> dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
>> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
>> required from org.apache.spark#spark-submit-parent;1.0 default]
>> at
>> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
>> at
>> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12
>>
>> Do I need to include some default configuration? If so where and how
>> should I do it? All other packages I looked at had no such thing.
>>
>> Btw, I am using spark-shell from a
>>
>>
>


Re: Slow stage?

2015-11-11 Thread Jakob Odersky
Hi Simone,
I'm afraid I don't have an answer to your question. However I noticed the
DAG figures in the attachment. How did you generate these? I am myself
working on a project in which I am trying to generate visual
representations of the spark scheduler DAG. If such a tool already exists,
I would greatly appreciate any pointers.

thanks,
--Jakob

On 9 November 2015 at 13:52, Simone Franzini  wrote:

> Hi all,
>
> I have a complex Spark job that is broken up in many stages.
> I have a couple of stages that are particularly slow: each task takes
> around 6 - 7 minutes. This stage is fairly complex as you can see from the
> attached DAG. However, by construction each of the outer joins will have
> only 0 or 1 record on each side.
> It seems to me that this stage is really slow. However, the execution
> timeline shows that almost 100% of the time is spent in actual execution
> time not reading/writing to/from disk or in other overheads.
> Does this make any sense? I.e. is it just that these operations are slow
> (and notice task size in term of data seems small)?
> Is the pattern of operations in the DAG good or is it terribly suboptimal?
> If so, how could it be improved?
>
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>


Re: Spark Packages Configuration Not Found

2015-11-10 Thread Jakob Odersky
(accidental keyboard-shortcut sent the message)
... spark-shell from the spark 1.5.2 binary distribution.
Also, running "spPublishLocal" has the same effect.

thanks,
--Jakob

On 10 November 2015 at 14:55, Jakob Odersky <joder...@gmail.com> wrote:

> Hi,
> I ran into in error trying to run spark-shell with an external package
> that I built and published locally
> using the spark-package sbt plugin (
> https://github.com/databricks/sbt-spark-package).
>
> To my understanding, spark packages can be published simply as maven
> artifacts, yet after running "publishLocal" in my package project (
> https://github.com/jodersky/spark-paperui), the following command
>
>park-shell --packages ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT
>
> gives an error:
>
> ::
>
> ::  UNRESOLVED DEPENDENCIES ::
>
> ::
>
> :: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
> required from org.apache.spark#spark-submit-parent;1.0 default
>
> ::
>
>
> :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> Exception in thread "main" java.lang.RuntimeException: [unresolved
> dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
> found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
> required from org.apache.spark#spark-submit-parent;1.0 default]
> at
> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
> at
> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12
>
> Do I need to include some default configuration? If so where and how
> should I do it? All other packages I looked at had no such thing.
>
> Btw, I am using spark-shell from a
>
>


Spark Packages Configuration Not Found

2015-11-10 Thread Jakob Odersky
Hi,
I ran into in error trying to run spark-shell with an external package that
I built and published locally
using the spark-package sbt plugin (
https://github.com/databricks/sbt-spark-package).

To my understanding, spark packages can be published simply as maven
artifacts, yet after running "publishLocal" in my package project (
https://github.com/jodersky/spark-paperui), the following command

   park-shell --packages ch.jodersky:spark-paperui-server_2.10:0.1-SNAPSHOT

gives an error:

::

::  UNRESOLVED DEPENDENCIES ::

::

:: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
required from org.apache.spark#spark-submit-parent;1.0 default

::


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved
dependency: ch.jodersky#spark-paperui-server_2.10;0.1: configuration not
found in ch.jodersky#spark-paperui-server_2.10;0.1: 'default'. It was
required from org.apache.spark#spark-submit-parent;1.0 default]
at
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:12

Do I need to include some default configuration? If so where and how should
I do it? All other packages I looked at had no such thing.

Btw, I am using spark-shell from a


Re: State of the Build

2015-11-06 Thread Jakob Odersky
Reposting to the list...

Thanks for all the feedback everyone, I get a clearer picture of the
reasoning and implications now.

Koert, according to your post in this thread
http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023,
it is apparently very easy to change the maven resolution mechanism to the
ivy one.
Patrick, would this not help with the problems you described?

On 5 November 2015 at 23:23, Patrick Wendell <pwend...@gmail.com> wrote:

> Hey Jakob,
>
> The builds in Spark are largely maintained by me, Sean, and Michael
> Armbrust (for SBT). For historical reasons, Spark supports both a Maven and
> SBT build. Maven is the build of reference for packaging Spark and is used
> by many downstream packagers and to build all Spark releases. SBT is more
> often used by developers. Both builds inherit from the same pom files (and
> rely on the same profiles) to minimize maintenance complexity of Spark's
> very complex dependency graph.
>
> If you are looking to make contributions that help with the build, I am
> happy to point you towards some things that are consistent maintenance
> headaches. There are two major pain points right now that I'd be thrilled
> to see fixes for:
>
> 1. SBT relies on a different dependency conflict resolution strategy than
> maven - causing all kinds of headaches for us. I have heard that newer
> versions of SBT can (maybe?) use Maven as a dependency resolver instead of
> Ivy. This would make our life so much better if it were possible, either by
> virtue of upgrading SBT or somehow doing this ourselves.
>
> 2. We don't have a great way of auditing the net effect of dependency
> changes when people make them in the build. I am working on a fairly clunky
> patch to do this here:
>
> https://github.com/apache/spark/pull/8531
>
> It could be done much more nicely using SBT, but only provided (1) is
> solved.
>
> Doing a major overhaul of the sbt build to decouple it from pom files, I'm
> not sure that's the best place to start, given that we need to continue to
> support maven - the coupling is intentional. But getting involved in the
> build in general would be completely welcome.
>
> - Patrick
>
> On Thu, Nov 5, 2015 at 10:53 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> Maven isn't 'legacy', or supported for the benefit of third parties.
>> SBT had some behaviors / problems that Maven didn't relative to what
>> Spark needs. SBT is a development-time alternative only, and partly
>> generated from the Maven build.
>>
>> On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers <ko...@tresata.com> wrote:
>> > People who do upstream builds of spark (think bigtop and hadoop
>> distros) are
>> > used to legacy systems like maven, so maven is the default build. I
>> don't
>> > think it will change.
>> >
>> > Any improvements for the sbt build are of course welcome (it is still
>> used
>> > by many developers), but i would not do anything that increases the
>> burden
>> > of maintaining two build systems.
>> >
>> > On Nov 5, 2015 18:38, "Jakob Odersky" <joder...@gmail.com> wrote:
>> >>
>> >> Hi everyone,
>> >> in the process of learning Spark, I wanted to get an overview of the
>> >> interaction between all of its sub-projects. I therefore decided to
>> have a
>> >> look at the build setup and its dependency management.
>> >> Since I am alot more comfortable using sbt than maven, I decided to
>> try to
>> >> port the maven configuration to sbt (with the help of automated tools).
>> >> This led me to a couple of observations and questions on the build
>> system
>> >> design:
>> >>
>> >> First, currently, there are two build systems, maven and sbt. Is there
>> a
>> >> preferred tool (or future direction to one)?
>> >>
>> >> Second, the sbt build also uses maven "profiles" requiring the use of
>> >> specific commandline parameters when starting sbt. Furthermore, since
>> it
>> >> relies on maven poms, dependencies to the scala binary version (_2.xx)
>> are
>> >> hardcoded and require running an external script when switching
>> versions.
>> >> Sbt could leverage built-in constructs to support cross-compilation and
>> >> emulate profiles with configurations and new build targets. This would
>> >> remove external state from the build (in that no extra steps need to be
>> >> performed in a particular order to generate artifacts for a new
>> >> configuration) and therefore improve stability and build
>> reproducibility
>> >> (maybe even build performance). I was wondering if implementing such
>> >> functionality for the sbt build would be welcome?
>> >>
>> >> thanks,
>> >> --Jakob
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


State of the Build

2015-11-05 Thread Jakob Odersky
Hi everyone,
in the process of learning Spark, I wanted to get an overview of the
interaction between all of its sub-projects. I therefore decided to have a
look at the build setup and its dependency management.
Since I am alot more comfortable using sbt than maven, I decided to try to
port the maven configuration to sbt (with the help of automated tools).
This led me to a couple of observations and questions on the build system
design:

First, currently, there are two build systems, maven and sbt. Is there a
preferred tool (or future direction to one)?

Second, the sbt build also uses maven "profiles" requiring the use of
specific commandline parameters when starting sbt. Furthermore, since it
relies on maven poms, dependencies to the scala binary version (_2.xx) are
hardcoded and require running an external script when switching versions.
Sbt could leverage built-in constructs to support cross-compilation and
emulate profiles with configurations and new build targets. This would
remove external state from the build (in that no extra steps need to be
performed in a particular order to generate artifacts for a new
configuration) and therefore improve stability and build reproducibility
(maybe even build performance). I was wondering if implementing such
functionality for the sbt build would be welcome?

thanks,
--Jakob


Re: Turn off logs in spark-sql shell

2015-10-16 Thread Jakob Odersky
[repost to mailing list, ok I gotta really start hitting that
reply-to-all-button]

Hi,
Spark uses Log4j which unfortunately does not support fine-grained
configuration over the command line. Therefore some configuration file
editing will have to be done (unless you want to configure Loggers
programatically, which however would require editing spark-sql).
Nevertheless, there seems to be a kind of "trick" where you can substitute
java environment variables in the log4j configuration file. See this
stackoverflow answer for details http://stackoverflow.com/a/31208461/917519.
After editing the properties file, you can then start spark-sql with:

bin/spark-sql --conf
"spark.driver.extraJavaOptions=-Dmy.logger.threshold=OFF"

this is untested but I hop it helps,
--Jakob

On 15 October 2015 at 22:56, Muhammad Ahsan 
wrote:

> Hello Everyone!
>
> I want to know how to turn off logging during starting *spark-sql shell*
> without changing log4j configuration files. In normal spark-shell I can use
> the following commands
>
> import org.apache.log4j.Loggerimport org.apache.log4j.Level
> Logger.getLogger("org").setLevel(Level.OFF)Logger.getLogger("akka").setLevel(Level.OFF)
>
>
> Thanks
>
> --
> Thanks
>
> Muhammad Ahsan
>
>


Re: Insight into Spark Packages

2015-10-16 Thread Jakob Odersky
[repost to mailing list]

I don't know much about packages, but have you heard about the
sbt-spark-package plugin?
Looking at the code, specifically
https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackagePlugin.scala,
might give you insight on the details about package creation. Package
submission is implemented in
https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackageHttp.scala

At a quick first overview, it seems packages are bundled as maven artifacts
and then posted to "http://spark-packages.org/api/submit-release;.

Hope this helps for your last question

On 16 October 2015 at 08:43, jeff saremi  wrote:

> I'm looking for any form of documentation on Spark Packages
> Specifically, what happens when one issues a command like the following:
>
>
> $SPARK_HOME/bin/spark-shell --packages RedisLabs:spark-redis:0.1.0
>
>
> Something like an architecture diagram.
> What happens when this package gets submitted?
> Does this need to be done each time?
> Is that package downloaded each time?
> Is there a persistent cache on the server (master i guess)?
> Can these packages be installed offline with no Internet connectivity?
> How does a package get created?
>
> and so on and so forth
>


Re: Building with SBT and Scala 2.11

2015-10-14 Thread Jakob Odersky
[Repost to mailing list]

Hey,
Sorry about the typo, I of course meant hadoop-2.6, not 2.11.
I suspect something bad happened with my Ivy cache, since when reverting
back to scala 2.10, I got a very strange IllegalStateException (something
something IvyNode, I can't remember the details).
Kilking the cache made 2.10 work at least, I'll retry with 2.11

Thx for your help
On Oct 14, 2015 6:52 AM, "Ted Yu" <yuzhih...@gmail.com> wrote:

> Adrian:
> Likely you were using maven.
>
> Jakob's report was with sbt.
>
> Cheers
>
> On Tue, Oct 13, 2015 at 10:05 PM, Adrian Tanase <atan...@adobe.com> wrote:
>
>> Do you mean hadoop-2.4 or 2.6? not sure if this is the issue but I'm also
>> compiling the 1.5.1 version with scala 2.11 and hadoop 2.6 and it works.
>>
>> -adrian
>>
>> Sent from my iPhone
>>
>> On 14 Oct 2015, at 03:53, Jakob Odersky <joder...@gmail.com> wrote:
>>
>> I'm having trouble compiling Spark with SBT for Scala 2.11. The command I
>> use is:
>>
>> dev/change-version-to-2.11.sh
>> build/sbt -Pyarn -Phadoop-2.11 -Dscala-2.11
>>
>> followed by
>>
>> compile
>>
>> in the sbt shell.
>>
>> The error I get specifically is:
>>
>> spark/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
>> no valid targets for annotation on value conf - it is discarded unused. You
>> may specify targets with meta-annotations, e.g. @(transient @param)
>> [error] private[netty] class NettyRpcEndpointRef(@transient conf:
>> SparkConf)
>> [error]
>>
>> However I am also getting a large amount of deprecation warnings, making
>> me wonder if I am supplying some incompatible/unsupported options to sbt. I
>> am using Java 1.8 and the latest Spark master sources.
>> Does someone know if I am doing anything wrong or is the sbt build broken?
>>
>> thanks for you help,
>> --Jakob
>>
>>
>


[jira] [Commented] (SPARK-11110) Scala 2.11 build fails due to compiler errors

2015-10-14 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957721#comment-14957721
 ] 

Jakob Odersky commented on SPARK-0:
---

exactly what I got, I'll take a look at it

> Scala 2.11 build fails due to compiler errors
> -
>
> Key: SPARK-0
> URL: https://issues.apache.org/jira/browse/SPARK-0
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>
> Right now the 2.11 build is failing due to compiler errors in SBT (though not 
> in Maven). I have updated our 2.11 compile test harness to catch this.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/1667/consoleFull
> {code}
> [error] 
> /home/jenkins/workspace/Spark-Master-Scala211-Compile/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
>  no valid targets for annotation on value conf - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
> [error] 
> {code}
> This is one error, but there may be others past this point (the compile fails 
> fast).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11122) Fatal warnings in sbt are not displayed as such

2015-10-14 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-11122:
-

 Summary: Fatal warnings in sbt are not displayed as such
 Key: SPARK-11122
 URL: https://issues.apache.org/jira/browse/SPARK-11122
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Jakob Odersky


The sbt script treats warnings (except dependency warnings) as errors, however 
there is no visual difference between errors and fatal warnings, thus leading 
to very confusing debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11094) Test runner script fails to parse Java version.

2015-10-13 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-11094:
-

 Summary: Test runner script fails to parse Java version.
 Key: SPARK-11094
 URL: https://issues.apache.org/jira/browse/SPARK-11094
 Project: Spark
  Issue Type: Bug
  Components: Tests
 Environment: Debian testing
Reporter: Jakob Odersky
Priority: Minor


Running {{dev/run-tests}} fails when the local Java version has an extra string 
appended to the version.
For example, in Debian Stretch (currently testing distribution), {{java 
-version}} yields "1.8.0_66-internal" where the extra part "-internal" causes 
the script to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11092) Add source URLs to API documentation.

2015-10-13 Thread Jakob Odersky (JIRA)
Jakob Odersky created SPARK-11092:
-

 Summary: Add source URLs to API documentation.
 Key: SPARK-11092
 URL: https://issues.apache.org/jira/browse/SPARK-11092
 Project: Spark
  Issue Type: Documentation
  Components: Build, Documentation
Reporter: Jakob Odersky
Priority: Trivial


It would be nice to have source URLs in the Spark scaladoc, similar to the 
standard library (e.g. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).

The fix should be really simple, just adding a line to the sbt unidoc settings.
I'll use the github repo url  
"https://github.com/apache/spark/tree/v${version}/${FILE_PATH;). Feel free to 
tell me if I should use something else as base url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.

2015-10-13 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-11092:
--
Description: 
It would be nice to have source URLs in the Spark scaladoc, similar to the 
standard library (e.g. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).

The fix should be really simple, just adding a line to the sbt unidoc settings.
I'll use the github repo url  
"https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free to 
tell me if I should use something else as base url.

  was:
It would be nice to have source URLs in the Spark scaladoc, similar to the 
standard library (e.g. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).

The fix should be really simple, just adding a line to the sbt unidoc settings.
I'll use the github repo url  
"https://github.com/apache/spark/tree/v${version}/${FILE_PATH;). Feel free to 
tell me if I should use something else as base url.


> Add source URLs to API documentation.
> -
>
> Key: SPARK-11092
> URL: https://issues.apache.org/jira/browse/SPARK-11092
> Project: Spark
>  Issue Type: Documentation
>  Components: Build, Documentation
>Reporter: Jakob Odersky
>Priority: Trivial
>
> It would be nice to have source URLs in the Spark scaladoc, similar to the 
> standard library (e.g. 
> http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).
> The fix should be really simple, just adding a line to the sbt unidoc 
> settings.
> I'll use the github repo url  
> "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free 
> to tell me if I should use something else as base url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11092) Add source URLs to API documentation.

2015-10-13 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955996#comment-14955996
 ] 

Jakob Odersky commented on SPARK-11092:
---

I can't set the assignee field, though I'd like to resolve this issue.

> Add source URLs to API documentation.
> -
>
> Key: SPARK-11092
> URL: https://issues.apache.org/jira/browse/SPARK-11092
> Project: Spark
>  Issue Type: Documentation
>  Components: Build, Documentation
>    Reporter: Jakob Odersky
>Priority: Trivial
>
> It would be nice to have source URLs in the Spark scaladoc, similar to the 
> standard library (e.g. 
> http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).
> The fix should be really simple, just adding a line to the sbt unidoc 
> settings.
> I'll use the github repo url  
> "https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free 
> to tell me if I should use something else as base url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11092) Add source URLs to API documentation.

2015-10-13 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-11092:
--
Description: 
It would be nice to have source URLs in the Spark scaladoc, similar to the 
standard library (e.g. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).

The fix should be really simple, just adding a line to the sbt unidoc settings.
I'll use the github repo url 
bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH}
Feel free to tell me if I should use something else as base url.

  was:
It would be nice to have source URLs in the Spark scaladoc, similar to the 
standard library (e.g. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).

The fix should be really simple, just adding a line to the sbt unidoc settings.
I'll use the github repo url  
"https://github.com/apache/spark/tree/v${version}/${FILE_PATH};). Feel free to 
tell me if I should use something else as base url.


> Add source URLs to API documentation.
> -
>
> Key: SPARK-11092
> URL: https://issues.apache.org/jira/browse/SPARK-11092
> Project: Spark
>  Issue Type: Documentation
>  Components: Build, Documentation
>    Reporter: Jakob Odersky
>Priority: Trivial
>
> It would be nice to have source URLs in the Spark scaladoc, similar to the 
> standard library (e.g. 
> http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List).
> The fix should be really simple, just adding a line to the sbt unidoc 
> settings.
> I'll use the github repo url 
> bq. https://github.com/apache/spark/tree/v${version}/${FILE_PATH}
> Feel free to tell me if I should use something else as base url.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark Event Listener

2015-10-13 Thread Jakob Odersky
the path of the source file defining the event API is
`core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala`

On 13 October 2015 at 16:29, Jakob Odersky <joder...@gmail.com> wrote:

> Hi,
> I came across the spark listener API while checking out possible UI
> extensions recently. I noticed that all events inherit from a sealed trait
> `SparkListenerEvent` and that a SparkListener has a corresponding
> `onEventXXX(event)` method for every possible event.
>
> Considering that events inherit from a sealed trait and thus all events
> are known during compile-time, what is the rationale of using specific
> methods for every event rather than a single method that would let a client
> pattern match on the type of event?
>
> I don't know the internals of the pattern matcher, but again, considering
> events are sealed, I reckon that matching performance should not be an
> issue.
>
> thanks,
> --Jakob
>


Spark Event Listener

2015-10-13 Thread Jakob Odersky
Hi,
I came across the spark listener API while checking out possible UI
extensions recently. I noticed that all events inherit from a sealed trait
`SparkListenerEvent` and that a SparkListener has a corresponding
`onEventXXX(event)` method for every possible event.

Considering that events inherit from a sealed trait and thus all events are
known during compile-time, what is the rationale of using specific methods
for every event rather than a single method that would let a client pattern
match on the type of event?

I don't know the internals of the pattern matcher, but again, considering
events are sealed, I reckon that matching performance should not be an
issue.

thanks,
--Jakob


Building with SBT and Scala 2.11

2015-10-13 Thread Jakob Odersky
I'm having trouble compiling Spark with SBT for Scala 2.11. The command I
use is:

dev/change-version-to-2.11.sh
build/sbt -Pyarn -Phadoop-2.11 -Dscala-2.11

followed by

compile

in the sbt shell.

The error I get specifically is:

spark/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:308:
no valid targets for annotation on value conf - it is discarded unused. You
may specify targets with meta-annotations, e.g. @(transient @param)
[error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
[error]

However I am also getting a large amount of deprecation warnings, making me
wonder if I am supplying some incompatible/unsupported options to sbt. I am
using Java 1.8 and the latest Spark master sources.
Does someone know if I am doing anything wrong or is the sbt build broken?

thanks for you help,
--Jakob


Live UI

2015-10-12 Thread Jakob Odersky
Hi everyone,
I am just getting started working on spark and was thinking of a first way
to contribute whilst still trying to wrap my head around the codebase.

Exploring the web UI, I noticed it is a classic request-response website,
requiring manual refresh to get the latest data.
I think it would be great to have a "live" website where data would be
displayed real-time without the need to hit the refresh button. I would be
very interested in contributing this feature if it is acceptable.

Specifically, I was thinking of using websockets with a ScalaJS front-end.
Please let me know if this design would be welcome or if it introduces
unwanted dependencies, I'll be happy to discuss this further in detail.

thanks for your feedback,
--Jakob


[jira] [Commented] (SPARK-10876) display total application time in spark history UI

2015-10-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951419#comment-14951419
 ] 

Jakob Odersky commented on SPARK-10876:
---

I'm not sure what you mean. The UI already has a "Duration" field for every job.

> display total application time in spark history UI
> --
>
> Key: SPARK-10876
> URL: https://issues.apache.org/jira/browse/SPARK-10876
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> The history file has an application start and application end events.  It 
> would be nice if we could use these to display the total run time for the 
> application in the history UI.
> Could be displayed similar to "Total Uptime" for a running application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-10876) display total application time in spark history UI

2015-10-09 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-10876:
--
Comment: was deleted

(was: I'm not sure what you mean. The UI already has a "Duration" field for 
every job.)

> display total application time in spark history UI
> --
>
> Key: SPARK-10876
> URL: https://issues.apache.org/jira/browse/SPARK-10876
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> The history file has an application start and application end events.  It 
> would be nice if we could use these to display the total run time for the 
> application in the history UI.
> Could be displayed similar to "Total Uptime" for a running application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10876) display total application time in spark history UI

2015-10-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951434#comment-14951434
 ] 

Jakob Odersky commented on SPARK-10876:
---

Do you mean to display the total run time of uncompleted apps? Completed apps 
already have a "Duration" field

> display total application time in spark history UI
> --
>
> Key: SPARK-10876
> URL: https://issues.apache.org/jira/browse/SPARK-10876
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>
> The history file has an application start and application end events.  It 
> would be nice if we could use these to display the total run time for the 
> application in the history UI.
> Could be displayed similar to "Total Uptime" for a running application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10306) sbt hive/update issue

2015-10-09 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951362#comment-14951362
 ] 

Jakob Odersky commented on SPARK-10306:
---

Same issue here

> sbt hive/update issue
> -
>
> Key: SPARK-10306
> URL: https://issues.apache.org/jira/browse/SPARK-10306
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: holdenk
>Priority: Trivial
>
> Running sbt hive/update sometimes results in the error "impossible to get 
> artifacts when data has not been loaded. IvyNode = 
> org.scala-lang#scala-library;2.10.3" which is unfortunate since it is always 
> evicted by 2.10.4 currently. An easy (but maybe not super clean) solution 
> would be adding 2.10.3 as a dependency which will then get evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4