[jira] [Commented] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338367#comment-15338367
 ] 

Apache Spark commented on SPARK-16037:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/13766

> use by-position resolution when insert into hive table
> --
>
> Key: SPARK-16037
> URL: https://issues.apache.org/jira/browse/SPARK-16037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
> The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338368#comment-15338368
 ] 

Apache Spark commented on SPARK-16034:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/13766

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>Assignee: Sean Zhong
> Fix For: 2.0.0
>
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338366#comment-15338366
 ] 

Apache Spark commented on SPARK-16036:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/13766

> better error message if the number of columns in SELECT clause doesn't match 
> the table schema
> -
>
> Key: SPARK-16036
> URL: https://issues.apache.org/jira/browse/SPARK-16036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15722) Wrong data when CTAS specifies schema

2016-06-18 Thread Rekha Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated SPARK-15722:

Description: 
{code}
scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
scala> (1 to 3).map { i => (i, i * 2, i * 3) }.toDF("height", "length", 
"width").write.insertInto("boxes")
scala> spark.table("boxes").show()
+-+--+--+
|width|length|height|
+-+--+--+
|1| 2| 3|
|2| 4| 6|
|3| 6| 9|
+-+--+--+
scala> sql("CREATE TABLE blocks (name STRING, age INT) AS SELECT * FROM boxes")
scala> spark.table("boxes").show()
++---+
|name|age|
++---+
|   1|  2|
|   2|  4|
|   3|  6|
++---+
{code}
The columns don't even match in types.

  was:
{code}
scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
scala> (1 to 3).map { i => (i, i * 2, i * 3) }.toDF("height", "length", 
"width").write.insertInto("boxes")
scala> spark.table("boxes").show()
+-+--+--+
|width|length|height|
+-+--+--+
|1| 2| 3|
|2| 4| 6|
|3| 6| 9|
+-+--+--+
scala> sql("CREATE TABLE blocks (name STRING, age INT) AS SELECT * FROM boxes")
scala> spark.table("students").show()
++---+
|name|age|
++---+
|   1|  2|
|   2|  4|
|   3|  6|
++---+
{code}
The columns don't even match in types.


> Wrong data when CTAS specifies schema
> -
>
> Key: SPARK-15722
> URL: https://issues.apache.org/jira/browse/SPARK-15722
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> {code}
> scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
> scala> (1 to 3).map { i => (i, i * 2, i * 3) }.toDF("height", "length", 
> "width").write.insertInto("boxes")
> scala> spark.table("boxes").show()
> +-+--+--+
> |width|length|height|
> +-+--+--+
> |1| 2| 3|
> |2| 4| 6|
> |3| 6| 9|
> +-+--+--+
> scala> sql("CREATE TABLE blocks (name STRING, age INT) AS SELECT * FROM 
> boxes")
> scala> spark.table("boxes").show()
> ++---+
> |name|age|
> ++---+
> |   1|  2|
> |   2|  4|
> |   3|  6|
> ++---+
> {code}
> The columns don't even match in types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16052) Add CollapseRepartitionBy optimizer

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16052:


Assignee: (was: Apache Spark)

> Add CollapseRepartitionBy optimizer
> ---
>
> Key: SPARK-16052
> URL: https://issues.apache.org/jira/browse/SPARK-16052
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>
> This issue adds a new optimizer, `CollapseRepartitionBy`.
> **Before**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- Exchange hashpartitioning(id#0L, 1)
>+- *Range (0, 10, splits=8)
> {code}
> **After**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- *Range (0, 10, splits=8)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16052) Add CollapseRepartitionBy optimizer

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16052:


Assignee: Apache Spark

> Add CollapseRepartitionBy optimizer
> ---
>
> Key: SPARK-16052
> URL: https://issues.apache.org/jira/browse/SPARK-16052
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue adds a new optimizer, `CollapseRepartitionBy`.
> **Before**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- Exchange hashpartitioning(id#0L, 1)
>+- *Range (0, 10, splits=8)
> {code}
> **After**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- *Range (0, 10, splits=8)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16052) Add CollapseRepartitionBy optimizer

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338309#comment-15338309
 ] 

Apache Spark commented on SPARK-16052:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13765

> Add CollapseRepartitionBy optimizer
> ---
>
> Key: SPARK-16052
> URL: https://issues.apache.org/jira/browse/SPARK-16052
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>
> This issue adds a new optimizer, `CollapseRepartitionBy`.
> **Before**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- Exchange hashpartitioning(id#0L, 1)
>+- *Range (0, 10, splits=8)
> {code}
> **After**
> {code}
> scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
> == Physical Plan ==
> Exchange hashpartitioning(id#0L, 1)
> +- *Range (0, 10, splits=8)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16052) Add CollapseRepartitionBy optimizer

2016-06-18 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-16052:
-

 Summary: Add CollapseRepartitionBy optimizer
 Key: SPARK-16052
 URL: https://issues.apache.org/jira/browse/SPARK-16052
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Reporter: Dongjoon Hyun


This issue adds a new optimizer, `CollapseRepartitionBy`.

**Before**
{code}
scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
== Physical Plan ==
Exchange hashpartitioning(id#0L, 1)
+- Exchange hashpartitioning(id#0L, 1)
   +- *Range (0, 10, splits=8)
{code}

**After**
{code}
scala> spark.range(10).repartition(1, $"id").repartition(1, $"id").explain
== Physical Plan ==
Exchange hashpartitioning(id#0L, 1)
+- *Range (0, 10, splits=8)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16024:


Assignee: Apache Spark

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16024:


Assignee: (was: Apache Spark)

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338286#comment-15338286
 ] 

Apache Spark commented on SPARK-16024:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13764

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15973) Fix GroupedData Documentation

2016-06-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15973:

Assignee: Josh Howes

> Fix GroupedData Documentation
> -
>
> Key: SPARK-15973
> URL: https://issues.apache.org/jira/browse/SPARK-15973
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vladimir Feinberg
>Assignee: Josh Howes
>Priority: Trivial
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> (1)
> {{GroupedData.pivot}} documenation uses {{//}} instead of {{#}} for doctest 
> python comments, which messes up formatting in the documentation as well as 
> the doctests themselves.
> A PR resolving this should probably resolve the other places this happens in 
> pyspark.
> (2)
> Simple aggregation functions which take column names {{cols}} as varargs 
> arguments show up in documentation with the argument {{args}}, but their 
> documentation refers to {{cols}}.
> The discrepancy is caused by an annotation, {{df_varargs_api}}, which 
> produces a temporary function with arguments {{args}} instead of {{cols}}, 
> creating the confusing documentation.
> (3)
> The {{pyspark.sql.GroupedData}} object calls the Java object it wraps around 
> as the member variable {{self._jdf}}, which is exactly the same as 
> {{pyspark.sql.DataFrame}}, when referring its object.
> The acronym is incorrect, standing for "Java DataFrame" instead of what 
> should be "Java GroupedData". As such, the name should be changed to 
> {{self._jgd}} - in fact, in the {{DataFrame.groupBy}} implementation, the 
> java object is referred to as exactly {{jgd}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16051) Add `read.orc/write.orc` to SparkR

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16051:


Assignee: Apache Spark

> Add `read.orc/write.orc` to SparkR
> --
>
> Key: SPARK-16051
> URL: https://issues.apache.org/jira/browse/SPARK-16051
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue adds `read.orc/write.orc` to SparkR for API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16051) Add `read.orc/write.orc` to SparkR

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16051:


Assignee: (was: Apache Spark)

> Add `read.orc/write.orc` to SparkR
> --
>
> Key: SPARK-16051
> URL: https://issues.apache.org/jira/browse/SPARK-16051
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dongjoon Hyun
>
> This issue adds `read.orc/write.orc` to SparkR for API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16051) Add `read.orc/write.orc` to SparkR

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338187#comment-15338187
 ] 

Apache Spark commented on SPARK-16051:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13763

> Add `read.orc/write.orc` to SparkR
> --
>
> Key: SPARK-16051
> URL: https://issues.apache.org/jira/browse/SPARK-16051
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dongjoon Hyun
>
> This issue adds `read.orc/write.orc` to SparkR for API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16051) Add `read.orc/write.orc` to SparkR

2016-06-18 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-16051:
-

 Summary: Add `read.orc/write.orc` to SparkR
 Key: SPARK-16051
 URL: https://issues.apache.org/jira/browse/SPARK-16051
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Dongjoon Hyun


This issue adds `read.orc/write.orc` to SparkR for API parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338148#comment-15338148
 ] 

Xiao Li commented on SPARK-16024:
-

{noformat}
  test("desc table for parquet data source table") {
val tabName = "tab1"
withTable(tabName) {
  sql(s"CREATE TABLE $tabName(a int comment 'test') USING parquet ")

  checkAnswer(
sql(s"DESC $tabName").select("comment"),
Row("test")
  )
}
  }
{noformat}

I tried both catalogs (in-memory catalog and hive metastore). The above test 
case can pass in both cases. Could you explain a little bit more about the 
exact scenario?

Thanks!

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6814) Support sorting for any data type in SparkR

2016-06-18 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338147#comment-15338147
 ] 

Dongjoon Hyun commented on SPARK-6814:
--

Hi, [~shivaram].
Since SparkR RDD is hiding from users now, can we simply close this issue?

> Support sorting for any data type in SparkR
> ---
>
> Key: SPARK-6814
> URL: https://issues.apache.org/jira/browse/SPARK-6814
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Priority: Critical
>
> I get various "return status == 0 is false" and "unimplemented type" errors 
> trying to get data out of any rdd with top() or collect(). The errors are not 
> consistent. I think spark is installed properly because some operations do 
> work. I apologize if I'm missing something easy or not providing the right 
> diagnostic info – I'm new to SparkR, and this seems to be the only resource 
> for SparkR issues.
> Some logs:
> {code}
> Browse[1]> top(estep.rdd, 1L)
> Error in order(unlist(part, recursive = FALSE), decreasing = !ascending) : 
>   unimplemented type 'list' in 'orderVector1'
> Calls: do.call ... Reduce ->  -> func -> FUN -> FUN -> order
> Execution halted
> 15/02/13 19:11:57 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14)
> org.apache.spark.SparkException: R computation failed with
>  Error in order(unlist(part, recursive = FALSE), decreasing = !ascending) : 
>   unimplemented type 'list' in 'orderVector1'
> Calls: do.call ... Reduce ->  -> func -> FUN -> FUN -> order
> Execution halted
>   at edu.berkeley.cs.amplab.sparkr.BaseRRDD.compute(RRDD.scala:69)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>   at org.apache.spark.scheduler.Task.run(Task.scala:54)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/02/13 19:11:57 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 14, 
> localhost): org.apache.spark.SparkException: R computation failed with
>  Error in order(unlist(part, recursive = FALSE), decreasing = !ascending) : 
>   unimplemented type 'list' in 'orderVector1'
> Calls: do.call ... Reduce ->  -> func -> FUN -> FUN -> order
> Execution halted
> edu.berkeley.cs.amplab.sparkr.BaseRRDD.compute(RRDD.scala:69)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338128#comment-15338128
 ] 

Xiao Li commented on SPARK-16024:
-

Thanks! In Spark 2.0, the simplest solution is to put {{comment}} into 
{{metadata}} of {{StructField}}. However, in the long term, I think we need to 
consolidate {{StrcutField}} and {{CatalogColumn}}

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338104#comment-15338104
 ] 

Wenchen Fan commented on SPARK-16024:
-

yea go ahead, thanks!

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338096#comment-15338096
 ] 

Xiao Li commented on SPARK-16024:
-

: ) Found a more serious bug in Json when reading the related code.

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14926) OneVsRest labelMetadata uses incorrect name

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14926:


Assignee: Apache Spark

> OneVsRest labelMetadata uses incorrect name
> ---
>
> Key: SPARK-14926
> URL: https://issues.apache.org/jira/browse/SPARK-14926
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.4.1, 1.5.2, 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Trivial
>
> OneVsRestModel applies {{labelMetadata}} to the output column, but the 
> metadata could contain the wrong name.  The attribute name should be modified 
> to match {{predictionCol}}.
> Here is the relevant location: 
> [[https://github.com/apache/spark/blob/2a3d39f48b1a7bb462e17e80e243bbc0a94d802e/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala#L200]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14926) OneVsRest labelMetadata uses incorrect name

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14926:


Assignee: (was: Apache Spark)

> OneVsRest labelMetadata uses incorrect name
> ---
>
> Key: SPARK-14926
> URL: https://issues.apache.org/jira/browse/SPARK-14926
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.4.1, 1.5.2, 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Trivial
>
> OneVsRestModel applies {{labelMetadata}} to the output column, but the 
> metadata could contain the wrong name.  The attribute name should be modified 
> to match {{predictionCol}}.
> Here is the relevant location: 
> [[https://github.com/apache/spark/blob/2a3d39f48b1a7bb462e17e80e243bbc0a94d802e/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala#L200]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14926) OneVsRest labelMetadata uses incorrect name

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338092#comment-15338092
 ] 

Apache Spark commented on SPARK-14926:
--

User 'josh-howes' has created a pull request for this issue:
https://github.com/apache/spark/pull/13762

> OneVsRest labelMetadata uses incorrect name
> ---
>
> Key: SPARK-14926
> URL: https://issues.apache.org/jira/browse/SPARK-14926
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.4.1, 1.5.2, 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Trivial
>
> OneVsRestModel applies {{labelMetadata}} to the output column, but the 
> metadata could contain the wrong name.  The attribute name should be modified 
> to match {{predictionCol}}.
> Here is the relevant location: 
> [[https://github.com/apache/spark/blob/2a3d39f48b1a7bb462e17e80e243bbc0a94d802e/mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala#L200]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16050) Flaky Test: Complete aggregation with Console sink

2016-06-18 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-16050:
---

 Summary: Flaky Test: Complete aggregation with Console sink 
 Key: SPARK-16050
 URL: https://issues.apache.org/jira/browse/SPARK-16050
 Project: Spark
  Issue Type: Test
  Components: SQL, Streaming
Reporter: Burak Yavuz
Priority: Critical


Please refer to the multiple failures in the last day:

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1018/consoleFull

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1017/consoleFull

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/1018/consoleFull




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16006) Attemping to write empty DataFrame with no fields throw non-intuitive exception

2016-06-18 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338083#comment-15338083
 ] 

Dongjoon Hyun commented on SPARK-16006:
---

Hi, [~tdas].
The PR is updated, could you review again?

> Attemping to write empty DataFrame with no fields throw non-intuitive 
> exception
> ---
>
> Key: SPARK-16006
> URL: https://issues.apache.org/jira/browse/SPARK-16006
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tathagata Das
>Priority: Minor
>
> Attempting to write an emptyDataFrame created with 
> {{sparkSession.emptyDataFrame.write.text("p")}} fails with the following 
> exception
> {code}
> org.apache.spark.sql.AnalysisException: Cannot use all columns for partition 
> columns;
>   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.validatePartitionColumn(PartitioningUtils.scala:355)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:435)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:213)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:196)
>   at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:525)
>   ... 48 elided
> {code}
> This is because # fields == # partitioning columns  = 0 at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.validatePartitionColumn(PartitioningUtils.scala:355).
>  This is a non-intuitive error message. Better error message "Cannot write 
> dataset with no fields".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-16034.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13749
[https://github.com/apache/spark/pull/13749]

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>Assignee: Sean Zhong
> Fix For: 2.0.0
>
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-16037.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13754
[https://github.com/apache/spark/pull/13754]

> use by-position resolution when insert into hive table
> --
>
> Key: SPARK-16037
> URL: https://issues.apache.org/jira/browse/SPARK-16037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
> The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-16036.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13754
[https://github.com/apache/spark/pull/13754]

> better error message if the number of columns in SELECT clause doesn't match 
> the table schema
> -
>
> Key: SPARK-16036
> URL: https://issues.apache.org/jira/browse/SPARK-16036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-16034:
-
Assignee: Sean Zhong

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338066#comment-15338066
 ] 

Yin Huai commented on SPARK-16032:
--

We will attach the report to here. 

> Audit semantics of various insertion operations related to partitioned tables
> -
>
> Key: SPARK-16032
> URL: https://issues.apache.org/jira/browse/SPARK-16032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
>
> We found that semantics of various insertion operations related to partition 
> tables can be inconsistent. This is an umbrella ticket for all related 
> tickets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16049) Make InsertIntoTable's expectedColumns support case-insensitive resolution properly

2016-06-18 Thread Yin Huai (JIRA)
Yin Huai created SPARK-16049:


 Summary: Make InsertIntoTable's expectedColumns support 
case-insensitive resolution properly
 Key: SPARK-16049
 URL: https://issues.apache.org/jira/browse/SPARK-16049
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai


Right now,  InsertIntoTable's expectedColumns uses the method of {{contains}} 
to find static partitioning columns. When analyzer is case-insensitive, the 
initialization of this lazy val will not work as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16024) column comment is ignored for datasource table

2016-06-18 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338024#comment-15338024
 ] 

Xiao Li commented on SPARK-16024:
-

Let me work on it?

> column comment is ignored for datasource table
> --
>
> Key: SPARK-16024
> URL: https://issues.apache.org/jira/browse/SPARK-16024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> CREATE TABLE src(a INT COMMENT 'bla') USING parquet.
> When we describe table, the column comment is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16047) Sort by status and id fields in Executors table

2016-06-18 Thread Jacek Laskowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-16047:

Description: 
With multiple executors with the same ID the default sorting *seems* to be by 
ID (descending) first and status (alphabetically ascending).

I'd like webUI to sort the Executors table by status first (with Active first) 
followed by ID (ascending with driver being the last one).

  was:
With multiple executors with the same ID the default sorting *seems* to be by 
ID (descending) first and status (alphabetically ascending).

I'd like to sort the table by status first (with Active first) followed by ID 
(ascending with driver being the last one).


> Sort by status and id fields in Executors table
> ---
>
> Key: SPARK-16047
> URL: https://issues.apache.org/jira/browse/SPARK-16047
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
> Attachments: spark-webui-executors.png
>
>
> With multiple executors with the same ID the default sorting *seems* to be by 
> ID (descending) first and status (alphabetically ascending).
> I'd like webUI to sort the Executors table by status first (with Active 
> first) followed by ID (ascending with driver being the last one).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16048) spark-shell unresponsive after "FetchFailedException: java.lang.UnsupportedOperationException: Unsupported shuffle manager" with YARN and spark.shuffle.service.enabled

2016-06-18 Thread Jacek Laskowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-16048:

Description: 
With Spark on YARN with external shuffle service 
{{java.lang.UnsupportedOperationException: Unsupported shuffle manager: 
org.apache.spark.shuffle.sort.SortShuffleManager}} exception makes spark-shell 
unresponsive.

{code}
$ YARN_CONF_DIR=hadoop-conf ./bin/spark-shell --master yarn -c 
spark.shuffle.service.enabled=true --deploy-mode client -c 
spark.scheduler.mode=FAIR --num-executors 2
...
Spark context Web UI available at http://192.168.1.9:4040
Spark context available as 'sc' (master = yarn, app id = 
application_1466255040841_0002).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 to 4, 1).map(n => (n % 2, n)).groupByKey.map(n => { 
Thread.sleep(5 * 1000); n }).count
org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 
1 (count at :25) has failed the maximum allowable number of times: 4. 
Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: 
java.lang.UnsupportedOperationException: Unsupported shuffle manager: 
org.apache.spark.shuffle.sort.SortShuffleManager
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:191)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:159)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)

at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at 

[jira] [Created] (SPARK-16048) spark-shell unresponsive after "FetchFailedException: java.lang.UnsupportedOperationException: Unsupported shuffle manager" with YARN and spark.shuffle.service.enabled

2016-06-18 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-16048:
---

 Summary: spark-shell unresponsive after "FetchFailedException: 
java.lang.UnsupportedOperationException: Unsupported shuffle manager" with YARN 
and spark.shuffle.service.enabled
 Key: SPARK-16048
 URL: https://issues.apache.org/jira/browse/SPARK-16048
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Shell, YARN
Affects Versions: 2.0.0
Reporter: Jacek Laskowski


With Spark on YARN with external shuffle service 
{{java.lang.UnsupportedOperationException: Unsupported shuffle manager: 
org.apache.spark.shuffle.sort.SortShuffleManager}} exception makes spark-shell 
unresponsive.

{quote}
$ YARN_CONF_DIR=hadoop-conf ./bin/spark-shell --master yarn -c 
spark.shuffle.service.enabled=true --deploy-mode client -c 
spark.scheduler.mode=FAIR --num-executors 2
...
Spark context Web UI available at http://192.168.1.9:4040
Spark context available as 'sc' (master = yarn, app id = 
application_1466255040841_0002).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 to 4, 1).map(n => (n % 2, n)).groupByKey.map(n => { 
Thread.sleep(5 * 1000); n }).count
org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 
1 (count at :25) has failed the maximum allowable number of times: 4. 
Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: 
java.lang.UnsupportedOperationException: Unsupported shuffle manager: 
org.apache.spark.shuffle.sort.SortShuffleManager
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:191)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:159)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)

at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at 

[jira] [Updated] (SPARK-16047) Sort by status and id fields in Executors table

2016-06-18 Thread Jacek Laskowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-16047:

Attachment: spark-webui-executors.png

Current default sorting

> Sort by status and id fields in Executors table
> ---
>
> Key: SPARK-16047
> URL: https://issues.apache.org/jira/browse/SPARK-16047
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
> Attachments: spark-webui-executors.png
>
>
> With multiple executors with the same ID the default sorting *seems* to be by 
> ID (descending) first and status (alphabetically ascending).
> I'd like to sort the table by status first (with Active first) followed by ID 
> (ascending with driver being the last one).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16047) Sort by status and id fields in Executors table

2016-06-18 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-16047:
---

 Summary: Sort by status and id fields in Executors table
 Key: SPARK-16047
 URL: https://issues.apache.org/jira/browse/SPARK-16047
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.0.0
Reporter: Jacek Laskowski
Priority: Minor


With multiple executors with the same ID the default sorting *seems* to be by 
ID (descending) first and status (alphabetically ascending).

I'd like to sort the table by status first (with Active first) followed by ID 
(ascending with driver being the last one).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16046) Add Spark SQL Dataset Tutorial

2016-06-18 Thread Pedro Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pedro Rodriguez updated SPARK-16046:

Description: 
Issue to update the Spark SQL guide to provide more content around using 
Datasets. This would expand the Creating Datasets section of the Spark SQL 
documentation.

Goals
1. Add more examples of column access via $ and `
2. Add examples of aggregates
3. Add examples of using Spark SQL functions

What else would be useful to have?

  was:
Issue to update the Spark SQL guide to provide more content around using 
Datasets. This would expand the Creating Datasets section of the Spark SQL 
documentation.

Goals
1. Add more examples of column access via $ and `
2. Add examples of aggregates
3. Add examples of using Spark SQL functions

What else would be useful to have


> Add Spark SQL Dataset Tutorial
> --
>
> Key: SPARK-16046
> URL: https://issues.apache.org/jira/browse/SPARK-16046
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 2.0.0
>Reporter: Pedro Rodriguez
>
> Issue to update the Spark SQL guide to provide more content around using 
> Datasets. This would expand the Creating Datasets section of the Spark SQL 
> documentation.
> Goals
> 1. Add more examples of column access via $ and `
> 2. Add examples of aggregates
> 3. Add examples of using Spark SQL functions
> What else would be useful to have?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16046) Add Spark SQL Dataset Tutorial

2016-06-18 Thread Pedro Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337869#comment-15337869
 ] 

Pedro Rodriguez commented on SPARK-16046:
-

I would like to take on this issue and will base work off of 
https://issues.apache.org/jira/browse/SPARK-15863

> Add Spark SQL Dataset Tutorial
> --
>
> Key: SPARK-16046
> URL: https://issues.apache.org/jira/browse/SPARK-16046
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 2.0.0
>Reporter: Pedro Rodriguez
>
> Issue to update the Spark SQL guide to provide more content around using 
> Datasets. This would expand the Creating Datasets section of the Spark SQL 
> documentation.
> Goals
> 1. Add more examples of column access via $ and `
> 2. Add examples of aggregates
> 3. Add examples of using Spark SQL functions
> What else would be useful to have



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16046) Add Spark SQL Dataset Tutorial

2016-06-18 Thread Pedro Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pedro Rodriguez updated SPARK-16046:

Component/s: SQL
 Documentation

> Add Spark SQL Dataset Tutorial
> --
>
> Key: SPARK-16046
> URL: https://issues.apache.org/jira/browse/SPARK-16046
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SQL
>Affects Versions: 2.0.0
>Reporter: Pedro Rodriguez
>
> Issue to update the Spark SQL guide to provide more content around using 
> Datasets. This would expand the Creating Datasets section of the Spark SQL 
> documentation.
> Goals
> 1. Add more examples of column access via $ and `
> 2. Add examples of aggregates
> 3. Add examples of using Spark SQL functions
> What else would be useful to have



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16046) Add Spark SQL Dataset Tutorial

2016-06-18 Thread Pedro Rodriguez (JIRA)
Pedro Rodriguez created SPARK-16046:
---

 Summary: Add Spark SQL Dataset Tutorial
 Key: SPARK-16046
 URL: https://issues.apache.org/jira/browse/SPARK-16046
 Project: Spark
  Issue Type: Documentation
Affects Versions: 2.0.0
Reporter: Pedro Rodriguez


Issue to update the Spark SQL guide to provide more content around using 
Datasets. This would expand the Creating Datasets section of the Spark SQL 
documentation.

Goals
1. Add more examples of column access via $ and `
2. Add examples of aggregates
3. Add examples of using Spark SQL functions

What else would be useful to have



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12197) Kryo's Avro Serializer add support for dynamic schemas using SchemaRepository

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337838#comment-15337838
 ] 

Apache Spark commented on SPARK-12197:
--

User 'RotemShaul' has created a pull request for this issue:
https://github.com/apache/spark/pull/13761

> Kryo's Avro Serializer add support for dynamic schemas using SchemaRepository
> -
>
> Key: SPARK-12197
> URL: https://issues.apache.org/jira/browse/SPARK-12197
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Rotem Shaul
>  Labels: avro, kryo, schema, serialization
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The original problem: Serializing GenericRecords in Spark Core results in a 
> very high overhead, as the schema is serialized per record. (When in the 
> actual input data of HDFS it's stored once per file. )
> The extended problem: Spark 1.5 introduced the ability to register Avro 
> schemas ahead of time using SparkConf. This solution is partial as some 
> applications may not know exactly which schemas they're going to read ahead 
> of time.
> Extended solution:
> Adding a schema repository to the Serializer. Assuming the generic record has 
> schemaId on them, it's possible to extract them dynamically from the read 
> records and serialize only the schemaId.
> Upon deserialization the schemaRepo will be queried once again.
> The local caching mechanism will remain in tact - so in fact each Task will 
> query the schema repo only once per schemaId.
> The previous static registering of schemas will remain in place, as it is 
> more efficient when the schemas are known ahead of time.
> New flow of serializing generic record:
> 1) check the pre-registered schema list, if found the schema, serialize only 
> its finger print
> 2) if not found, and schema repo has been set, attempt to extract the 
> schemaId from record and check if repo contains the id. If so - serialize 
> only the schema id
> 3) if no schema repo set or didn't find the schemaId in repo - compress and 
> send the entire schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12947) Spark with Swift throws EOFException when reading parquet file

2016-06-18 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337808#comment-15337808
 ] 

Ovidiu Marcu commented on SPARK-12947:
--

Hi, do you filled an issue with Ceph for the errors you point here?

> Spark with Swift throws EOFException when reading parquet file
> --
>
> Key: SPARK-12947
> URL: https://issues.apache.org/jira/browse/SPARK-12947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0-SNAPSHOT
>Reporter: Sam Stoelinga
>
> I'm using Swift as underlying storage for my spark jobs but it sometimes 
> throws EOFExceptions for some parts of the data.
> Another user has hit the same issue: 
> http://stackoverflow.com/questions/32400137/spark-swift-integration-parquet
> Code to reproduce:
> ```
> val features = sqlContext.read.parquet(featurePath)
> // Flatten the features into the array exploded
> val exploded = 
> features.select(explode(features("features"))).toDF("features")
> val kmeans = new KMeans()
>   .setK(k)
>   .setFeaturesCol("features")
>   .setPredictionCol("prediction")
> val model = kmeans.fit(exploded)
> ```
> val features is a dataframe with 2 columns: 
> image: String, features: Array[Vector]
> val exploded is a dataframe with a single column:
> features: Vector
> The following exception is shown when running takeSample on a large dataset 
> saved as parquet file (~1+GB):
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:756)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:168)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$30$$anon$1.hasNext(RDD.scala:827)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1563)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1119)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1119)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1840)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1840)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14533) RowMatrix.computeCovariance inaccurate when values are very large

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-14533:
--
Target Version/s:   (was: 2.0.0)

> RowMatrix.computeCovariance inaccurate when values are very large
> -
>
> Key: SPARK-14533
> URL: https://issues.apache.org/jira/browse/SPARK-14533
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> The following code will produce a Pearson correlation that's quite different 
> from 0, sometimes outside [-1,1] or even NaN:
> {code}
> val a = RandomRDDs.normalRDD(sc, 10, 10).map(_ + 10.0)
> val b = RandomRDDs.normalRDD(sc, 10, 10).map(_ + 10.0)
> val p = Statistics.corr(a, b, method = "pearson")
> {code}
> This is a "known issue" to some degree, given how Cov(X,Y) is calculated in 
> {{RowMatrix.getCovariance}}, as Cov(X,Y) = E[XY] - E[X]E[Y]. The easier and 
> more accurate approach involves just centering the input before computing the 
> Gramian, but this would be inefficient for sparse data.
> However, for dense data -- which includes the code paths that compute 
> correlations -- this approach is quite sensible. This would improve accuracy 
> for the dense row case, at least.
> Also, the mean column values computed in this method can be computed more 
> simply and accurately from {{computeColumnSummaryStatistics()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15909) PySpark classpath uri incorrectly set

2016-06-18 Thread Liam Fisk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337774#comment-15337774
 ] 

Liam Fisk commented on SPARK-15909:
---

Cluster mode isn't used here, I have a mesos cluster (and therefore am in 
client mode, as you said).

In client mode, the remote mesos executors need to be able to retrieve any 
dependencies, and they can't do that if they are attempting to contact 
localhost.

The bug here is that there is completely different behaviour on startup vs 
within the REPL. If I stop the spark context, clone the config, and construct a 
new spark context it will no longer work.

> PySpark classpath uri incorrectly set
> -
>
> Key: SPARK-15909
> URL: https://issues.apache.org/jira/browse/SPARK-15909
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.1
>Reporter: Liam Fisk
>
> PySpark behaves differently if the SparkContext is created within the REPL 
> (vs initialised by the shell).
> My conf/spark-env.sh file contains:
> {code}
> #!/bin/bash
> export SPARK_LOCAL_IP=172.20.30.158
> export LIBPROCESS_IP=172.20.30.158
> export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
> {code}
> And when running pyspark it will correctly initialize my SparkContext. 
> However, when I run:
> {code}
> from pyspark import SparkContext, SparkConf
> sc.stop()
> conf = (
> SparkConf()
> .setMaster("mesos://zk://foo:2181/mesos")
> .setAppName("Jupyter PySpark")
> )
> sc = SparkContext(conf=conf)
> {code}
> my _spark.driver.uri_ and URL classpath will point to localhost (preventing 
> my mesos cluster from accessing the appropriate files)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14533) RowMatrix.computeCovariance inaccurate when values are very large

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-14533:
--
Target Version/s: 2.0.0  (was: 1.6.2, 2.0.0)

> RowMatrix.computeCovariance inaccurate when values are very large
> -
>
> Key: SPARK-14533
> URL: https://issues.apache.org/jira/browse/SPARK-14533
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> The following code will produce a Pearson correlation that's quite different 
> from 0, sometimes outside [-1,1] or even NaN:
> {code}
> val a = RandomRDDs.normalRDD(sc, 10, 10).map(_ + 10.0)
> val b = RandomRDDs.normalRDD(sc, 10, 10).map(_ + 10.0)
> val p = Statistics.corr(a, b, method = "pearson")
> {code}
> This is a "known issue" to some degree, given how Cov(X,Y) is calculated in 
> {{RowMatrix.getCovariance}}, as Cov(X,Y) = E[XY] - E[X]E[Y]. The easier and 
> more accurate approach involves just centering the input before computing the 
> Gramian, but this would be inefficient for sparse data.
> However, for dense data -- which includes the code paths that compute 
> correlations -- this approach is quite sensible. This would improve accuracy 
> for the dense row case, at least.
> Also, the mean column values computed in this method can be computed more 
> simply and accurately from {{computeColumnSummaryStatistics()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15893) spark.createDataFrame raises an exception in Spark 2.0 tests on Windows

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15893.
---
  Resolution: Duplicate
Target Version/s:   (was: 2.0.0)

Same issue; there's a bit broader discussion in the other JIRA.

> spark.createDataFrame raises an exception in Spark 2.0 tests on Windows
> ---
>
> Key: SPARK-15893
> URL: https://issues.apache.org/jira/browse/SPARK-15893
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.0.0
>Reporter: Alexander Ulanov
>
> spark.createDataFrame raises an exception in Spark 2.0 tests on Windows
> For example, LogisticRegressionSuite fails at Line 46:
> Exception encountered when invoking run on a nested suite - 
> java.net.URISyntaxException: Relative path in absolute URI: 
> file:C:/dev/spark/external/flume-assembly/spark-warehouse
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> file:C:/dev/spark/external/flume-assembly/spark-warehouse
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:172)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:109)
> Another example, DataFrameSuite raises:
> java.net.URISyntaxException: Relative path in absolute URI: 
> file:C:/dev/spark/external/flume-assembly/spark-warehouse
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> file:C:/dev/spark/external/flume-assembly/spark-warehouse
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:172)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6817) DataFrame UDFs in R

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6817.
--
Resolution: Done

> DataFrame UDFs in R
> ---
>
> Key: SPARK-6817
> URL: https://issues.apache.org/jira/browse/SPARK-6817
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after 
> merging into Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15521) Add high level APIs based on dapply and gapply for easier usage

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15521:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-6817)

> Add high level APIs based on dapply and gapply for easier usage
> ---
>
> Key: SPARK-15521
> URL: https://issues.apache.org/jira/browse/SPARK-15521
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Sun Rui
>
> dapply() and gapply() of SparkDataFrame are two basic functions. For easier 
> usage to users in the R community, some high level functions can be added 
> based on them.
> Candidates are:
> http://exposurescience.org/heR.doc/library/heR.Misc/html/dapply.html
> http://exposurescience.org/heR.doc/library/stats/html/aggregate.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16012) add gapplyCollect() for SparkDataFrame

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16012:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-6817)

> add gapplyCollect() for SparkDataFrame
> --
>
> Key: SPARK-16012
> URL: https://issues.apache.org/jira/browse/SPARK-16012
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> Add a new API method called gapplyCollect() for SparkDataFrame. It does 
> gapply on a SparkDataFrame and collect the result back to R. Compared to 
> gapply() + collect(), gapplyCollect() offers performance optimization as well 
> as programming convenience, as no schema is needed to be provided.
> This is similar to dapplyCollect().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6817) DataFrame UDFs in R

2016-06-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337728#comment-15337728
 ] 

Sean Owen commented on SPARK-6817:
--

No, the best thing is just bulk-changing the issues to stand-alone issues. I 
can do that.

> DataFrame UDFs in R
> ---
>
> Key: SPARK-6817
> URL: https://issues.apache.org/jira/browse/SPARK-6817
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after 
> merging into Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12923) Optimize successive dapply() calls in SparkR

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12923:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-6817)

> Optimize successive dapply() calls in SparkR
> 
>
> Key: SPARK-12923
> URL: https://issues.apache.org/jira/browse/SPARK-12923
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Sun Rui
>
> For consecutive dapply() calls on a same DataFrame, optimize them to launch R 
> worker once instead of multiple times for performance improvement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15984) WARN message "o.a.h.y.s.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 8 is invalid" when starting application on YARN

2016-06-18 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337693#comment-15337693
 ] 

Jacek Laskowski commented on SPARK-15984:
-

The problem is that I am *not* changing Spark at all and so by default it gives 
the warning. If it's a warning and Spark does it, it'd be better (?) to play 
nicer with YARN. I could fix it easily if I was told it changes nothing else in 
Spark. I don't know so that's why I reported it (since it's a warning anyway).

> WARN message "o.a.h.y.s.resourcemanager.rmapp.RMAppImpl: The specific max 
> attempts: 0 for application: 8 is invalid" when starting application on YARN
> --
>
> Key: SPARK-15984
> URL: https://issues.apache.org/jira/browse/SPARK-15984
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> When executing {{spark-shell}} on Spark on YARN 2.7.2 on Mac OS as follows:
> {code}
> YARN_CONF_DIR=hadoop-conf ./bin/spark-shell --master yarn -c 
> spark.shuffle.service.enabled=true --deploy-mode client -c 
> spark.scheduler.mode=FAIR
> {code}
> it ends up with the following WARN in the logs:
> {code}
> 2016-06-16 08:33:05,308 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 8
> 2016-06-16 08:33:07,305 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific 
> max attempts: 0 for application: 8 is invalid, because it is out of the range 
> [1, 2]. Use the global max attempts instead.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16012) add gapplyCollect() for SparkDataFrame

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16012:


Assignee: (was: Apache Spark)

> add gapplyCollect() for SparkDataFrame
> --
>
> Key: SPARK-16012
> URL: https://issues.apache.org/jira/browse/SPARK-16012
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> Add a new API method called gapplyCollect() for SparkDataFrame. It does 
> gapply on a SparkDataFrame and collect the result back to R. Compared to 
> gapply() + collect(), gapplyCollect() offers performance optimization as well 
> as programming convenience, as no schema is needed to be provided.
> This is similar to dapplyCollect().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16012) add gapplyCollect() for SparkDataFrame

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16012:


Assignee: Apache Spark

> add gapplyCollect() for SparkDataFrame
> --
>
> Key: SPARK-16012
> URL: https://issues.apache.org/jira/browse/SPARK-16012
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>Assignee: Apache Spark
>
> Add a new API method called gapplyCollect() for SparkDataFrame. It does 
> gapply on a SparkDataFrame and collect the result back to R. Compared to 
> gapply() + collect(), gapplyCollect() offers performance optimization as well 
> as programming convenience, as no schema is needed to be provided.
> This is similar to dapplyCollect().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16012) add gapplyCollect() for SparkDataFrame

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337669#comment-15337669
 ] 

Apache Spark commented on SPARK-16012:
--

User 'NarineK' has created a pull request for this issue:
https://github.com/apache/spark/pull/13760

> add gapplyCollect() for SparkDataFrame
> --
>
> Key: SPARK-16012
> URL: https://issues.apache.org/jira/browse/SPARK-16012
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> Add a new API method called gapplyCollect() for SparkDataFrame. It does 
> gapply on a SparkDataFrame and collect the result back to R. Compared to 
> gapply() + collect(), gapplyCollect() offers performance optimization as well 
> as programming convenience, as no schema is needed to be provided.
> This is similar to dapplyCollect().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16045) Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16045:


Assignee: Apache Spark

> Spark 2.0 ML.feature: doc update for stopwords and binarizer
> 
>
> Key: SPARK-16045
> URL: https://issues.apache.org/jira/browse/SPARK-16045
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: yuhao yang
>Assignee: Apache Spark
>Priority: Minor
>
> 2.0 Audit: Update document for StopWordsRemover (load stop words) and 
> Binarizer (support of Vector)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16045) Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16045:


Assignee: (was: Apache Spark)

> Spark 2.0 ML.feature: doc update for stopwords and binarizer
> 
>
> Key: SPARK-16045
> URL: https://issues.apache.org/jira/browse/SPARK-16045
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>
> 2.0 Audit: Update document for StopWordsRemover (load stop words) and 
> Binarizer (support of Vector)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16045) Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337625#comment-15337625
 ] 

Apache Spark commented on SPARK-16045:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/13375

> Spark 2.0 ML.feature: doc update for stopwords and binarizer
> 
>
> Key: SPARK-16045
> URL: https://issues.apache.org/jira/browse/SPARK-16045
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>
> 2.0 Audit: Update document for StopWordsRemover (load stop words) and 
> Binarizer (support of Vector)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16045) Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-18 Thread yuhao yang (JIRA)
yuhao yang created SPARK-16045:
--

 Summary: Spark 2.0 ML.feature: doc update for stopwords and 
binarizer
 Key: SPARK-16045
 URL: https://issues.apache.org/jira/browse/SPARK-16045
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: yuhao yang
Priority: Minor


2.0 Audit: Update document for StopWordsRemover (load stop words) and Binarizer 
(support of Vector)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16044:


Assignee: Apache Spark

> input_file_name() returns empty strings in data sources based on NewHadoopRDD.
> --
>
> Key: SPARK-16044
> URL: https://issues.apache.org/jira/browse/SPARK-16044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>
> The issue is, {{input_file_name()}} function does not contain file paths when 
> data sources use {{NewHadoopRDD}}. This is currently only supported for 
> {{FileScanRDD}} and {{HadoopRDD}}.
> To be clear, this does not affect Spark's internal data sources because 
> currently they all do not use {{NewHadoopRDD}}.
> However, there are several datasources using this. For example,
>  
> spark-redshift - 
> [here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149]
> spark-xml - 
> [here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47]
> Currently, using this functions shows the output below:
> {code}
> +-+
> |input_file_name()|
> +-+
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16044:


Assignee: (was: Apache Spark)

> input_file_name() returns empty strings in data sources based on NewHadoopRDD.
> --
>
> Key: SPARK-16044
> URL: https://issues.apache.org/jira/browse/SPARK-16044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> The issue is, {{input_file_name()}} function does not contain file paths when 
> data sources use {{NewHadoopRDD}}. This is currently only supported for 
> {{FileScanRDD}} and {{HadoopRDD}}.
> To be clear, this does not affect Spark's internal data sources because 
> currently they all do not use {{NewHadoopRDD}}.
> However, there are several datasources using this. For example,
>  
> spark-redshift - 
> [here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149]
> spark-xml - 
> [here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47]
> Currently, using this functions shows the output below:
> {code}
> +-+
> |input_file_name()|
> +-+
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337624#comment-15337624
 ] 

Apache Spark commented on SPARK-16044:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/13759

> input_file_name() returns empty strings in data sources based on NewHadoopRDD.
> --
>
> Key: SPARK-16044
> URL: https://issues.apache.org/jira/browse/SPARK-16044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> The issue is, {{input_file_name()}} function does not contain file paths when 
> data sources use {{NewHadoopRDD}}. This is currently only supported for 
> {{FileScanRDD}} and {{HadoopRDD}}.
> To be clear, this does not affect Spark's internal data sources because 
> currently they all do not use {{NewHadoopRDD}}.
> However, there are several datasources using this. For example,
>  
> spark-redshift - 
> [here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149]
> spark-xml - 
> [here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47]
> Currently, using this functions shows the output below:
> {code}
> +-+
> |input_file_name()|
> +-+
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> | |
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16043:


Assignee: (was: Apache Spark)

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-18 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16044:


 Summary: input_file_name() returns empty strings in data sources 
based on NewHadoopRDD.
 Key: SPARK-16044
 URL: https://issues.apache.org/jira/browse/SPARK-16044
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Hyukjin Kwon


The issue is, {{input_file_name()}} function does not contain file paths when 
data sources use {{NewHadoopRDD}}. This is currently only supported for 
{{FileScanRDD}} and {{HadoopRDD}}.

To be clear, this does not affect Spark's internal data sources because 
currently they all do not use {{NewHadoopRDD}}.

However, there are several datasources using this. For example,
 
spark-redshift - 
[here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149]
spark-xml - 
[here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47]

Currently, using this functions shows the output below:

{code}
+-+
|input_file_name()|
+-+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+-+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337617#comment-15337617
 ] 

Apache Spark commented on SPARK-16043:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/13758

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16043:


Assignee: Apache Spark

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16022) Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?

2016-06-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337615#comment-15337615
 ] 

Sean Owen commented on SPARK-16022:
---

The u...@spark.apache.org mailing list http://spark.apache.org/community.html

> Input size is different when I use 1 or 3 nodes but the shufle size remains 
> +- icual, do you know why?
> --
>
> Key: SPARK-16022
> URL: https://issues.apache.org/jira/browse/SPARK-16022
> Project: Spark
>  Issue Type: Test
>Reporter: jon
>
> I run some queries on spark with just one node and then with 3 nodes. And in 
> the spark:4040 UI I see something that I am not understanding.
> For example after executing a query with 3 nodes and check the results in the 
> spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. 
> The same query on hadoop with just one node in local mode appears 7,3gb, the 
> spark read 7,3GB from hadoop. But this value shouldnt be equal?
> For example the value of shuffle remains +- equal in one node vs 3. Why the 
> input value doesn't stay equal? The same amount of data must be read from the 
> hdfs, so I am not understanding.
> Do you know?
> Single node:
> Input: 7,3 GB
> Shuffle read: 208.1kb
> Shuffle write: 208.1kb
> 3 nodes:
> Input: 2,8 GB
> Shuffle read: 193,3 kb
> Shuffle write; 208.1 kb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16040:
--
Priority: Trivial  (was: Minor)

OK, this does not need a JIRA

> spark.mllib PIC document extra line of refernece
> 
>
> Key: SPARK-16040
> URL: https://issues.apache.org/jira/browse/SPARK-16040
> Project: Spark
>  Issue Type: Documentation
>Reporter: Miao Wang
>Priority: Trivial
>
> In the 2.0 document, Line "A full example that produces the experiment 
> described in the PIC paper can be found under examples/." is redundant. 
> There is already "Find full example code at 
> "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
>  in the Spark repo.".
> We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-18 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-16043:


 Summary: Prepare GenericArrayData implementation specialized for a 
primitive array
 Key: SPARK-16043
 URL: https://issues.apache.org/jira/browse/SPARK-16043
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Kazuaki Ishizaki


There is a ToDo of GenericArrayData class, which is to eliminate 
boxing/unboxing for a primitive array (described 
[here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])

It would be good to prepare GenericArrayData implementation specialized for a 
primitive array to eliminate boxing/unboxing from the view of runtime memory 
footprint and performance.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15973) Fix GroupedData Documentation

2016-06-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15973.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix GroupedData Documentation
> -
>
> Key: SPARK-15973
> URL: https://issues.apache.org/jira/browse/SPARK-15973
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vladimir Feinberg
>Priority: Trivial
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> (1)
> {{GroupedData.pivot}} documenation uses {{//}} instead of {{#}} for doctest 
> python comments, which messes up formatting in the documentation as well as 
> the doctests themselves.
> A PR resolving this should probably resolve the other places this happens in 
> pyspark.
> (2)
> Simple aggregation functions which take column names {{cols}} as varargs 
> arguments show up in documentation with the argument {{args}}, but their 
> documentation refers to {{cols}}.
> The discrepancy is caused by an annotation, {{df_varargs_api}}, which 
> produces a temporary function with arguments {{args}} instead of {{cols}}, 
> creating the confusing documentation.
> (3)
> The {{pyspark.sql.GroupedData}} object calls the Java object it wraps around 
> as the member variable {{self._jdf}}, which is exactly the same as 
> {{pyspark.sql.DataFrame}}, when referring its object.
> The acronym is incorrect, standing for "Java DataFrame" instead of what 
> should be "Java GroupedData". As such, the name should be changed to 
> {{self._jgd}} - in fact, in the {{DataFrame.groupBy}} implementation, the 
> java object is referred to as exactly {{jgd}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16025) Document OFF_HEAP storage level in 2.0

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16025:
--
Priority: Minor  (was: Major)

> Document OFF_HEAP storage level in 2.0
> --
>
> Key: SPARK-16025
> URL: https://issues.apache.org/jira/browse/SPARK-16025
> Project: Spark
>  Issue Type: Documentation
>Reporter: Eric Liang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16023) Move InMemoryRelation to its own file

2016-06-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16023:
--
Issue Type: Improvement  (was: Bug)

> Move InMemoryRelation to its own file
> -
>
> Key: SPARK-16023
> URL: https://issues.apache.org/jira/browse/SPARK-16023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>
> Just to make InMemoryTableScanExec a little smaller and more readable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16023) Move InMemoryRelation to its own file

2016-06-18 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16023.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move InMemoryRelation to its own file
> -
>
> Key: SPARK-16023
> URL: https://issues.apache.org/jira/browse/SPARK-16023
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>
> Just to make InMemoryTableScanExec a little smaller and more readable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16042:


Assignee: Apache Spark

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-18 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16042:


Assignee: (was: Apache Spark)

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337594#comment-15337594
 ] 

Apache Spark commented on SPARK-16042:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/13757

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-18 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-16042:


 Summary: Eliminate nullcheck code at projection for an array type
 Key: SPARK-16042
 URL: https://issues.apache.org/jira/browse/SPARK-16042
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Kazuaki Ishizaki


When we run a spark program with a projection for a array type, nullcheck at a 
call to write each element of an array is generated. If we know all of the 
elements do not have {{null}} at compilation time, we can eliminate code for 
nullcheck.

{code}
val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
df.selectExpr("Array(v + 2.2, v + 3.3)").collect
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org