[jira] [Updated] (SPARK-15535) Remove code for TRUNCATE TABLE ... COLUMN

2016-05-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15535:
--
Priority: Minor  (was: Major)

> Remove code for TRUNCATE TABLE ... COLUMN
> -
>
> Key: SPARK-15535
> URL: https://issues.apache.org/jira/browse/SPARK-15535
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> This was never supported in the first place. Also Hive doesn't support it: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15534) TRUNCATE TABLE should throw exceptions, not logError

2016-05-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15534:
--
Priority: Minor  (was: Major)

> TRUNCATE TABLE should throw exceptions, not logError
> 
>
> Key: SPARK-15534
> URL: https://issues.apache.org/jira/browse/SPARK-15534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> If the table to truncate doesn't exist, throw an exception!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15536) Disallow TRUNCATE TABLE with external tables

2016-05-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15536:
--
Assignee: Suresh Thalamati  (was: Andrew Or)

> Disallow TRUNCATE TABLE with external tables
> 
>
> Key: SPARK-15536
> URL: https://issues.apache.org/jira/browse/SPARK-15536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Suresh Thalamati
>
> Otherwise we might accidentally delete existing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15536) Disallow TRUNCATE TABLE with external tables

2016-05-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15536:
-

 Summary: Disallow TRUNCATE TABLE with external tables
 Key: SPARK-15536
 URL: https://issues.apache.org/jira/browse/SPARK-15536
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Otherwise we might accidentally delete existing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15535) Remove code for TRUNCATE TABLE ... COLUMN

2016-05-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15535:
-

 Summary: Remove code for TRUNCATE TABLE ... COLUMN
 Key: SPARK-15535
 URL: https://issues.apache.org/jira/browse/SPARK-15535
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


This was never supported in the first place. Also Hive doesn't support it: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15534) TRUNCATE TABLE should throw exceptions, not logError

2016-05-25 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15534:
-

 Summary: TRUNCATE TABLE should throw exceptions, not logError
 Key: SPARK-15534
 URL: https://issues.apache.org/jira/browse/SPARK-15534
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


If the table to truncate doesn't exist, throw an exception!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15345.
---
Resolution: Fixed
  Assignee: Jeff Zhang  (was: Reynold Xin)

> SparkSession's conf doesn't take effect when there's already an existing 
> SparkContext
> -
>
> Key: SPARK-15345
> URL: https://issues.apache.org/jira/browse/SPARK-15345
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Piotr Milanowski
>Assignee: Jeff Zhang
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and 
> -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark 
> 1.6, and launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer: 
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, 
> string])) null else input[0, string].toString, 
> StructField(result,StringType,false)), result#2) AS #3]   Project 
> [createexternalrow(if (isnull(result#2)) null else result#2.toString, 
> StructField(result,StringType,false)) AS #3]
>  +- LocalRelation [result#2]  
>   
>  +- LocalRelation [result#2]
> 
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  private final 
> org.apache.spark.sql.types.StructType 
> org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  is now cleaned +++
> 1

[jira] [Updated] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15345:
--
Assignee: Reynold Xin  (was: Jeff Zhang)

> SparkSession's conf doesn't take effect when there's already an existing 
> SparkContext
> -
>
> Key: SPARK-15345
> URL: https://issues.apache.org/jira/browse/SPARK-15345
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Piotr Milanowski
>Assignee: Reynold Xin
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and 
> -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark 
> 1.6, and launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer: 
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, 
> string])) null else input[0, string].toString, 
> StructField(result,StringType,false)), result#2) AS #3]   Project 
> [createexternalrow(if (isnull(result#2)) null else result#2.toString, 
> StructField(result,StringType,false)) AS #3]
>  +- LocalRelation [result#2]  
>   
>  +- LocalRelation [result#2]
> 
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  private final 
> org.apache.spark.sql.types.StructType 
> org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  is now cleaned +++
> 16/05/16 12:17:47 DEBUG Cl

[jira] [Resolved] (SPARK-15511) Dropping data source table succeeds but throws exception

2016-05-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15511.
---
Resolution: Not A Problem
  Assignee: Andrew Or

If you run into this issue again, just delete $SPARK_HOME/metastore_db

> Dropping data source table succeeds but throws exception
> 
>
> Key: SPARK-15511
> URL: https://issues.apache.org/jira/browse/SPARK-15511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> If the catalog is backed by Hive:
> {code}
> scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
> {code}
> {code}
> scala> sql("DROP TABLE boxes")
> 16/05/24 13:30:50 WARN DropTableCommand: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
> ...
> Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15511) Dropping data source table succeeds but throws exception

2016-05-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15511:
--
Description: 
If the catalog is backed by Hive:

{code}
scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
{code}

{code}
scala> sql("DROP TABLE boxes")
16/05/24 13:30:50 WARN DropTableCommand: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
...
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
{code}

  was:
{code}
scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
{code}

{code}
scala> sql("DROP TABLE boxes")
16/05/24 13:30:50 WARN DropTableCommand: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
...
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
{code}


> Dropping data source table succeeds but throws exception
> 
>
> Key: SPARK-15511
> URL: https://issues.apache.org/jira/browse/SPARK-15511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If the catalog is backed by Hive:
> {code}
> scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
> {code}
> {code}
> scala> sql("DROP TABLE boxes")
> 16/05/24 13:30:50 WARN DropTableCommand: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
> 

[jira] [Created] (SPARK-15511) Dropping data source table succeeds but throws exception

2016-05-24 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15511:
-

 Summary: Dropping data source table succeeds but throws exception
 Key: SPARK-15511
 URL: https://issues.apache.org/jira/browse/SPARK-15511
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or


{code}
scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
{code}

{code}
scala> sql("DROP TABLE boxes")
16/05/24 13:30:50 WARN DropTableCommand: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
...
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/user/hive/warehouse/boxes;
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15388) spark sql "CREATE FUNCTION" throws exception with hive 1.2.1

2016-05-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15388.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark sql "CREATE FUNCTION" throws exception with hive 1.2.1
> 
>
> Key: SPARK-15388
> URL: https://issues.apache.org/jira/browse/SPARK-15388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yang Wang
>Assignee: Yang Wang
> Fix For: 2.0.0
>
>
> spark.sql("CREATE FUNCTION MY_FUNCTION_1 AS 
> 'com.haizhi.bdp.udf.UDFGetGeoCode'") throws 
> org.apache.spark.sql.AnalysisException. 
> I was using hive whose version is 1.2.1
> Full stack trace is as follows:
>  Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:NoSuchObjectException(message:Function 
> bdp.GET_GEO_CODE does not exist));
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:71)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.functionExists(HiveExternalCatalog.scala:323)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.functionExists(SessionCatalog.scala:712)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createFunction(SessionCatalog.scala:663)
>  at 
> org.apache.spark.sql.execution.command.CreateFunction.run(functions.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:187)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:168)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15388) spark sql "CREATE FUNCTION" throws exception with hive 1.2.1

2016-05-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15388:
--
Assignee: Yang Wang

> spark sql "CREATE FUNCTION" throws exception with hive 1.2.1
> 
>
> Key: SPARK-15388
> URL: https://issues.apache.org/jira/browse/SPARK-15388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yang Wang
>Assignee: Yang Wang
> Fix For: 2.0.0
>
>
> spark.sql("CREATE FUNCTION MY_FUNCTION_1 AS 
> 'com.haizhi.bdp.udf.UDFGetGeoCode'") throws 
> org.apache.spark.sql.AnalysisException. 
> I was using hive whose version is 1.2.1
> Full stack trace is as follows:
>  Exception in thread "main" org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:NoSuchObjectException(message:Function 
> bdp.GET_GEO_CODE does not exist));
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:71)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.functionExists(HiveExternalCatalog.scala:323)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.functionExists(SessionCatalog.scala:712)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createFunction(SessionCatalog.scala:663)
>  at 
> org.apache.spark.sql.execution.command.CreateFunction.run(functions.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:187)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:168)
>  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15450) Clean up SparkSession builder for python

2016-05-24 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297800#comment-15297800
 ] 

Andrew Or commented on SPARK-15450:
---

Actually I already have some ideas for this one.

> Clean up SparkSession builder for python
> 
>
> Key: SPARK-15450
> URL: https://issues.apache.org/jira/browse/SPARK-15450
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> This is the sister JIRA for SPARK-15075. Today we use 
> `SQLContext.getOrCreate` in our builder. Instead we should just have a real 
> `SparkSession.getOrCreate` and use that in our builder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15464) Replace SQLContext and SparkContext with SparkSession using builder pattern in python testsuites

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15464.
---
  Resolution: Fixed
Assignee: Weichen Xu
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Replace SQLContext and SparkContext with SparkSession using builder pattern 
> in python testsuites
> 
>
> Key: SPARK-15464
> URL: https://issues.apache.org/jira/browse/SPARK-15464
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Minor
>  Labels: test
> Fix For: 2.0.0
>
>
> In python script, there are several positions still using SQLContext and 
> directly create SparkContext, it should be replace with SparkSession using 
> SparkSession.builder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15311) Disallow DML on Non-temporary Tables when Using In-Memory Catalog

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15311.
---
  Resolution: Fixed
Assignee: Xiao Li
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Disallow DML on Non-temporary Tables when Using In-Memory Catalog
> -
>
> Key: SPARK-15311
> URL: https://issues.apache.org/jira/browse/SPARK-15311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> So far, when using In-Memory Catalog, we allow DDL operations for 
> non-temporary tables. However, the corresponding DML operations are not 
> supported. Thus, we need to issue exceptions in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15488) Possible Accumulator bug causing OneVsRestSuite to be flaky

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15488.
---
  Resolution: Fixed
Assignee: Liang-Chi Hsieh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Possible Accumulator bug causing OneVsRestSuite to be flaky
> ---
>
> Key: SPARK-15488
> URL: https://issues.apache.org/jira/browse/SPARK-15488
> Project: Spark
>  Issue Type: Bug
>  Components: ML, Spark Core
>Affects Versions: 2.0.0
> Environment: Jenkins: branch-2.0, maven build, Hadoop 2.6
>Reporter: Joseph K. Bradley
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> OneVsRestSuite has been slightly flaky recently.  The failure happens in the 
> use of {{Range.par}}, which executes concurrent jobs which use the same 
> DataFrame.  This sometimes causes failures from 
> {{java.util.ConcurrentModificationException}}.
> It appears the failure is from {{InMemoryRelation.batchStats}} being 
> accessed.  Since that is an instance of {{Accumulable}}, I'm guessing the bug 
> is from recent Accumulator changes.
> Stack trace from this test run.
> * links: [https://spark-tests.appspot.com/test-logs/125719479] and 
> [https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.6/993]
> {code}
>   java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.columnar.InMemoryRelation.computeSizeInBytes(InMemoryTableScanExec.scala:90)
>   at 
> org.apache.spark.sql.execution.columnar.InMemoryRelation.statistics(InMemoryTableScanExec.scala:113)
>   at 
> org.apache.spark.sql.execution.columnar.InMemoryRelation.statisticsToBePropagated(InMemoryTableScanExec.scala:97)
>   at 
> org.apache.spark.sql.execution.columnar.InMemoryRelation.withOutput(InMemoryTableScanExec.scala:191)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:144)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:144)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1.applyOrElse(CacheManager.scala:144)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1.applyOrElse(CacheManager.scala:141)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
> 

[jira] [Resolved] (SPARK-15279) Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15279.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)
> -
>
> Key: SPARK-15279
> URL: https://issues.apache.org/jira/browse/SPARK-15279
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> They are both potentially conflicting ways that allow you to specify the 
> SerDe. Unfortunately, we can't just get rid of ROW FORMAT because it may be 
> used with TEXTFILE or RCFILE. For other file formats, we should fail fast 
> wherever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15397) 'locate' UDF got different result with boundary value case compared to Hive engine

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15397:
--
Summary: 'locate' UDF got different result with boundary value case 
compared to Hive engine  (was: [Spark][SQL] 'locate' UDF got different result 
with boundary value case compared to Hive engine)

> 'locate' UDF got different result with boundary value case compared to Hive 
> engine
> --
>
> Key: SPARK-15397
> URL: https://issues.apache.org/jira/browse/SPARK-15397
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Yi Zhou
>
> Spark SQL:
> select locate("abc", "abc", 1);
> 0
> Hive:
> select locate("abc", "abc", 1);
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15397) 'locate' UDF got different result with boundary value case compared to Hive engine

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15397:
--
Assignee: Adrian Wang

> 'locate' UDF got different result with boundary value case compared to Hive 
> engine
> --
>
> Key: SPARK-15397
> URL: https://issues.apache.org/jira/browse/SPARK-15397
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Yi Zhou
>Assignee: Adrian Wang
>
> Spark SQL:
> select locate("abc", "abc", 1);
> 0
> Hive:
> select locate("abc", "abc", 1);
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15477) HiveContext is private[hive] and not accessible to users.

2016-05-23 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15477.
---
Resolution: Not A Bug

> HiveContext is private[hive] and not accessible to users. 
> --
>
> Key: SPARK-15477
> URL: https://issues.apache.org/jira/browse/SPARK-15477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Doug Balog
>
> In 2.0 org.apache.spark.sql.hive.HiveContext was mark deprecated but should 
> still be accessible from user programs. It is not since, its marked at 
> `private[hive]` 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15477) HiveContext is private[hive] and not accessible to users.

2016-05-23 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296763#comment-15296763
 ] 

Andrew Or commented on SPARK-15477:
---

What part of the code makes you think it's private[hive]?
https://github.com/apache/spark/blob/branch-2.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

> HiveContext is private[hive] and not accessible to users. 
> --
>
> Key: SPARK-15477
> URL: https://issues.apache.org/jira/browse/SPARK-15477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Doug Balog
>
> In 2.0 org.apache.spark.sql.hive.HiveContext was mark deprecated but should 
> still be accessible from user programs. It is not since, its marked at 
> `private[hive]` 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15456) PySpark Shell fails to create SparkContext if HiveConf not found

2016-05-20 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15456.
---
  Resolution: Fixed
Assignee: Bryan Cutler
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> PySpark Shell fails to create SparkContext if HiveConf not found
> 
>
> Key: SPARK-15456
> URL: https://issues.apache.org/jira/browse/SPARK-15456
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
> Fix For: 2.0.0
>
>
> When starting the PySpark shell, if HiveConf is not available then will fall 
> back to create a SparkSession from a SparkContext.  This is attempted with 
> the variable {{sc}} which hasn't been initialized yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15450) Clean up SparkSession builder for python

2016-05-20 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15450:
-

 Summary: Clean up SparkSession builder for python
 Key: SPARK-15450
 URL: https://issues.apache.org/jira/browse/SPARK-15450
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


This is the sister JIRA for SPARK-15075. Today we use `SQLContext.getOrCreate` 
in our builder. Instead we should just have a real `SparkSession.getOrCreate` 
and use that in our builder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-20 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293899#comment-15293899
 ] 

Andrew Or commented on SPARK-15345:
---

The python part should be resolved by SPARK-15417, 
https://github.com/apache/spark/pull/13203

> SparkSession's conf doesn't take effect when there's already an existing 
> SparkContext
> -
>
> Key: SPARK-15345
> URL: https://issues.apache.org/jira/browse/SPARK-15345
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Piotr Milanowski
>Assignee: Reynold Xin
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and 
> -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark 
> 1.6, and launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer: 
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, 
> string])) null else input[0, string].toString, 
> StructField(result,StringType,false)), result#2) AS #3]   Project 
> [createexternalrow(if (isnull(result#2)) null else result#2.toString, 
> StructField(result,StringType,false)) AS #3]
>  +- LocalRelation [result#2]  
>   
>  +- LocalRelation [result#2]
> 
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  private final 
> org.apache.spark.sql.types.StructType 
> org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ clos

[jira] [Updated] (SPARK-15417) Failed to enable hive support in PySpark shell

2016-05-20 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15417:
--
Summary: Failed to enable hive support in PySpark shell  (was: Failed to 
enable HiveSupport in PySpark)

> Failed to enable hive support in PySpark shell
> --
>
> Key: SPARK-15417
> URL: https://issues.apache.org/jira/browse/SPARK-15417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and 
> SparkSession. Both failed. It always uses in-memory catalog.
> Method 1: Using SparkSession
> {noformat}
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
> >>> INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 
> 57, in deco
> return f(*a, **kw)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
> : java.lang.UnsupportedOperationException: loadTable is not implemented
> at 
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280)
> at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at org.apache.spark.sql.Dataset.(Dataset.scala:187)
> at org.apache.spark.sql.Dataset.(Dataset.scala:168)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Method 2: Using HiveContext: 
> {noformat}
> >>> from pyspark.sql import HiveContext
> >>> sqlContext = HiveContext(sc)
> >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> sqlContext.sql("LOAD DATA LOCAL INPATH 
> >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", 
> line 346, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._w

[jira] [Resolved] (SPARK-15417) Failed to enable HiveSupport in PySpark

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15417.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Failed to enable HiveSupport in PySpark
> ---
>
> Key: SPARK-15417
> URL: https://issues.apache.org/jira/browse/SPARK-15417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and 
> SparkSession. Both failed. It always uses in-memory catalog.
> Method 1: Using SparkSession
> {noformat}
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
> >>> INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 
> 57, in deco
> return f(*a, **kw)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
> : java.lang.UnsupportedOperationException: loadTable is not implemented
> at 
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280)
> at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at org.apache.spark.sql.Dataset.(Dataset.scala:187)
> at org.apache.spark.sql.Dataset.(Dataset.scala:168)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Method 2: Using HiveContext: 
> {noformat}
> >>> from pyspark.sql import HiveContext
> >>> sqlContext = HiveContext(sc)
> >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> sqlContext.sql("LOAD DATA LOCAL INPATH 
> >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", 
> line 346, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/Id

[jira] [Resolved] (SPARK-15421) Table and Database property values need to be validated

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15421.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Table and Database property values need to be validated
> ---
>
> Key: SPARK-15421
> URL: https://issues.apache.org/jira/browse/SPARK-15421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> When we parse DDLs involving table or database properties, we need to 
> validate the values.
> E.g. if we alter a database's property without providing a value:
> {code}
> ALTER DATABASE my_db SET DBPROPERTIES('some_key')
> {code}
> Then we'll ignore it with Hive, but override the property with the in-memory 
> catalog. Inconsistencies like these arise because we don't validate the 
> property values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15421) Table and Database property values need to be validated

2016-05-19 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15421:
-

 Summary: Table and Database property values need to be validated
 Key: SPARK-15421
 URL: https://issues.apache.org/jira/browse/SPARK-15421
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


When we parse DDLs involving table or database properties, we need to validate 
the values.

E.g. if we alter a database's property without providing a value:
{code}
ALTER DATABASE my_db SET DBPROPERTIES('some_key')
{code}

Then we'll ignore it with Hive, but override the property with the in-memory 
catalog. Inconsistencies like these arise because we don't validate the 
property values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15417) Failed to enable HiveSupport in PySpark

2016-05-19 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292159#comment-15292159
 ] 

Andrew Or commented on SPARK-15417:
---

Good catch, I have a patch to fix this.

> Failed to enable HiveSupport in PySpark
> ---
>
> Key: SPARK-15417
> URL: https://issues.apache.org/jira/browse/SPARK-15417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and 
> SparkSession. Both failed. It always uses in-memory catalog.
> Method 1: Using SparkSession
> {noformat}
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
> >>> INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 
> 57, in deco
> return f(*a, **kw)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
> : java.lang.UnsupportedOperationException: loadTable is not implemented
> at 
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280)
> at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at org.apache.spark.sql.Dataset.(Dataset.scala:187)
> at org.apache.spark.sql.Dataset.(Dataset.scala:168)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Method 2: Using HiveContext: 
> {noformat}
> >>> from pyspark.sql import HiveContext
> >>> sqlContext = HiveContext(sc)
> >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> sqlContext.sql("LOAD DATA LOCAL INPATH 
> >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", 
> line 346, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.

[jira] [Assigned] (SPARK-15417) Failed to enable HiveSupport in PySpark

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-15417:
-

Assignee: Andrew Or

> Failed to enable HiveSupport in PySpark
> ---
>
> Key: SPARK-15417
> URL: https://issues.apache.org/jira/browse/SPARK-15417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Andrew Or
>Priority: Blocker
>
> Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and 
> SparkSession. Both failed. It always uses in-memory catalog.
> Method 1: Using SparkSession
> {noformat}
> >>> from pyspark.sql import SparkSession
> >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
> >>> INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 
> 57, in deco
> return f(*a, **kw)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
> : java.lang.UnsupportedOperationException: loadTable is not implemented
> at 
> org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297)
> at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280)
> at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at org.apache.spark.sql.Dataset.(Dataset.scala:187)
> at org.apache.spark.sql.Dataset.(Dataset.scala:168)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Method 2: Using HiveContext: 
> {noformat}
> >>> from pyspark.sql import HiveContext
> >>> sqlContext = HiveContext(sc)
> >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> DataFrame[]
> >>> sqlContext.sql("LOAD DATA LOCAL INPATH 
> >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", 
> line 346, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", 
> line 494, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line

[jira] [Resolved] (SPARK-15392) The default value of size estimation is not good

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15392.
---
  Resolution: Fixed
Target Version/s: 2.0.0

> The default value of size estimation is not good
> 
>
> Key: SPARK-15392
> URL: https://issues.apache.org/jira/browse/SPARK-15392
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> We use  autoBroadcastJoinThreshold + 1L as the default value of size 
> estimation, that is not good in 2.0, because we will calculate the size based 
> on size of schema, then the estimation could be less than 
> autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame 
> created from RDD.
> We should use an even bigger default value, for example, MaxLong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15317) JobProgressListener takes a huge amount of memory with iterative DataFrame program in local, standalone

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15317.
---
  Resolution: Fixed
Assignee: Shixiong Zhu
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> JobProgressListener takes a huge amount of memory with iterative DataFrame 
> program in local, standalone
> ---
>
> Key: SPARK-15317
> URL: https://issues.apache.org/jira/browse/SPARK-15317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
> Environment: Spark 2.0, local mode + standalone mode on MacBook Pro 
> OSX 10.9
>Reporter: Joseph K. Bradley
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
> Attachments: cc_traces.txt, compare-1.6-10Kpartitions.png, 
> compare-2.0-10Kpartitions.png, compare-2.0-16partitions.png, 
> dump-standalone-2.0-1of4.png, dump-standalone-2.0-2of4.png, 
> dump-standalone-2.0-3of4.png, dump-standalone-2.0-4of4.png
>
>
> h2. TL;DR
> Running a small test locally, I found JobProgressListener consuming a huge 
> amount of memory.  There are many tasks being run, but it is still 
> surprising.  Summary, with details below:
> * Spark app: series of DataFrame joins
> * Issue: GC
> * Heap dump shows JobProgressListener taking 150 - 400MB, depending on the 
> Spark mode/version
> h2. Reproducing this issue
> h3.  With more complex code
> The code which fails:
> * Here is a branch with the code snippet which fails: 
> [https://github.com/jkbradley/spark/tree/18836174ab190d94800cc247f5519f3148822dce]
> ** This is based on Spark commit hash: 
> bb1362eb3b36b553dca246b95f59ba7fd8adcc8a
> * Look at {{CC.scala}}, which implements connected components using 
> DataFrames: 
> [https://github.com/jkbradley/spark/blob/18836174ab190d94800cc247f5519f3148822dce/mllib/src/main/scala/org/apache/spark/ml/CC.scala]
> In the spark shell, run:
> {code}
> import org.apache.spark.ml.CC
> import org.apache.spark.sql.SQLContext
> val sqlContext = SQLContext.getOrCreate(sc)
> CC.runTest(sqlContext)
> {code}
> I have attached a file {{cc_traces.txt}} with the stack traces from running 
> {{runTest}}.  Note that I sometimes had to run {{runTest}} twice to cause the 
> fatal exception.  This includes a trace for 1.6, which should run without 
> modifications to {{CC.scala}}.  These traces are from running in local mode.
> I used {{jmap}} to dump the heap:
> * local mode with 2.0: JobProgressListener took about 397 MB
> * standalone mode with 2.0: JobProgressListener took about 171 MB  (See 
> attached screenshots from MemoryAnalyzer)
> Both 1.6 and 2.0 exhibit this issue.  2.0 ran faster, and the issue 
> (JobProgressListener allocation) seems more severe with 2.0, though it could 
> just be that 2.0 makes more progress and runs more jobs.
> h3. With simpler code
> I ran this with master (~Spark 2.0):
> {code}
> val data = spark.range(0, 1, 1, 1)
> data.cache().count()
> {code}
> The resulting heap dump:
> * 78MB for {{scala.tools.nsc.interpreter.ILoop$ILoopInterpreter}}
> * 58MB for {{org.apache.spark.ui.jobs.JobProgressListener}}
> * 80MB for {{io.netty.buffer.PoolChunk}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15387) SessionCatalog in SimpleAnalyzer does not need to make database directory.

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15387.
---
  Resolution: Fixed
Assignee: Kousuke Saruta
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> SessionCatalog in SimpleAnalyzer does not need to make database directory.
> --
>
> Key: SPARK-15387
> URL: https://issues.apache.org/jira/browse/SPARK-15387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 2.0.0
>
>
> After SPARK-15093 is fixed, we are forced to make /user/hive/warehouse when 
> SimpleAnalyzer is used but SimpleAnalyzer may not need the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15300) Can't remove a block if it's under evicting

2016-05-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15300.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Can't remove a block if it's under evicting
> ---
>
> Key: SPARK-15300
> URL: https://issues.apache.org/jira/browse/SPARK-15300
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> {code}
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned shuffle 94
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433121
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433122
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433123
> 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_629_piece0 on 
> 10.0.164.43:39651 in memory (size: 23.4 KB, free: 15.8 GB)
> 16/04/15 12:17:05 ERROR BlockManagerSlaveEndpoint: Error in removing block 
> broadcast_631_piece0
> java.lang.IllegalStateException: Task -1024 has already locked 
> broadcast_631_piece0 for writing
>   at 
> org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
>   at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1286)
>   at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:47)
>   at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveEndpoint.scala:46)
>   at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveEndpoint.scala:46)
>   at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:82)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_626_piece0 on 
> 10.0.164.43:39651 in memory (size: 23.4 KB, free: 15.8 GB)
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433124
> 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_627_piece0 on 
> 10.0.164.43:39651 in memory (size: 23.3 KB, free: 15.8 GB)
> 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433125
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15357) Cooperative spilling should check consumer memory mode

2016-05-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15357:
--
Description: 
In TaskMemoryManager.java:
{code}
for (MemoryConsumer c: consumers) {
  if (c != consumer && c.getUsed() > 0) {
try {
  long released = c.spill(required - got, consumer);
if (released > 0 && mode == tungstenMemoryMode) {
  got += memoryManager.acquireExecutionMemory(required - got, 
taskAttemptId, mode);
  if (got >= required) {
break;
  }
}
  } catch(...) { ... }
}
  }
}
{code}
Currently, when non-tungsten consumers acquire execution memory, they may force 
other tungsten consumers to spill and then NOT use the freed memory. A better 
way to do this is to incorporate the memory mode in the consumer itself and 
spill only those with matching memory modes.

  was:
In TaskMemoryManager.java:
{code}
for (MemoryConsumer c: consumers) {
  if (c != consumer && c.getUsed() > 0) {
try {
  long released = c.spill(required - got, consumer);
  if (released > 0 && mode == tungstenMemoryMode) {
logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
  Utils.bytesToString(released), c, consumer);
got += memoryManager.acquireExecutionMemory(required - got, 
taskAttemptId, mode);
if (got >= required) {
  break;
}
  }
} catch (IOException e) { ... }
  }
}
{code}
Currently, when non-tungsten consumers acquire execution memory, they may force 
other tungsten consumers to spill and then NOT use the freed memory. A better 
way to do this is to incorporate the memory mode in the consumer itself and 
spill only those with matching memory modes.


> Cooperative spilling should check consumer memory mode
> --
>
> Key: SPARK-15357
> URL: https://issues.apache.org/jira/browse/SPARK-15357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>
> In TaskMemoryManager.java:
> {code}
> for (MemoryConsumer c: consumers) {
>   if (c != consumer && c.getUsed() > 0) {
> try {
>   long released = c.spill(required - got, consumer);
> if (released > 0 && mode == tungstenMemoryMode) {
>   got += memoryManager.acquireExecutionMemory(required - got, 
> taskAttemptId, mode);
>   if (got >= required) {
> break;
>   }
> }
>   } catch(...) { ... }
> }
>   }
> }
> {code}
> Currently, when non-tungsten consumers acquire execution memory, they may 
> force other tungsten consumers to spill and then NOT use the freed memory. A 
> better way to do this is to incorporate the memory mode in the consumer 
> itself and spill only those with matching memory modes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15357) Cooperative spilling should check consumer memory mode

2016-05-16 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15357:
-

 Summary: Cooperative spilling should check consumer memory mode
 Key: SPARK-15357
 URL: https://issues.apache.org/jira/browse/SPARK-15357
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Andrew Or


In TaskMemoryManager.java:
{code}
for (MemoryConsumer c: consumers) {
  if (c != consumer && c.getUsed() > 0) {
try {
  long released = c.spill(required - got, consumer);
  if (released > 0 && mode == tungstenMemoryMode) {
logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
  Utils.bytesToString(released), c, consumer);
got += memoryManager.acquireExecutionMemory(required - got, 
taskAttemptId, mode);
if (got >= required) {
  break;
}
  }
} catch (IOException e) { ... }
  }
}
{code}
Currently, when non-tungsten consumers acquire execution memory, they may force 
other tungsten consumers to spill and then NOT use the freed memory. A better 
way to do this is to incorporate the memory mode in the consumer itself and 
spill only those with matching memory modes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14684) Verification of partition specs in SessionCatalog

2016-05-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14684.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Verification of partition specs in SessionCatalog
> -
>
> Key: SPARK-14684
> URL: https://issues.apache.org/jira/browse/SPARK-14684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> When attempting to drop partitions of a table, if the user provides an 
> unknown column, Hive will drop all the partitions of the table, which is 
> likely not intended. E.g.
> {code}
> ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12')
> {code}
> We should verify that the columns provided in the specs are actually 
> partitioned columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15277) Checking Partition Spec Existence Before Dropping

2016-05-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15277.
---
  Resolution: Fixed
Assignee: Xiao Li
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Checking Partition Spec Existence Before Dropping
> -
>
> Key: SPARK-15277
> URL: https://issues.apache.org/jira/browse/SPARK-15277
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> Now, we start dropping the partition before completing checking the existence 
> of all the partition specs. If one partition spec does not exist, we just 
> stop processing the command. Some partitions might not be dropped but some 
> partitions have been dropped. We should check the existence at first before 
> dropping any partition. 
> If any failure happened after we start to drop the partitions, we should log 
> an error message to indicate which partitions have been dropped and which 
> partitions have not been dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog

2016-05-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14684:
--
Description: 
When attempting to drop partitions of a table, if the user provides an unknown 
column, Hive will drop all the partitions of the table, which is likely not 
intended. E.g.
{code}
ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12')
{code}
We should verify that the columns provided in the specs are actually 
partitioned columns.

  was:When users inputting invalid partition spec, we might not be able to 
catch and issue the error messages. Sometimes, it could cause a disaster 
result. For example, previously, when we alter a table and drop a partition 
with invalid spec, it could drop all the partitions due to a bug/defect in Hive 
Metastore API. 


> Verification of partition specs in SessionCatalog
> -
>
> Key: SPARK-14684
> URL: https://issues.apache.org/jira/browse/SPARK-14684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> When attempting to drop partitions of a table, if the user provides an 
> unknown column, Hive will drop all the partitions of the table, which is 
> likely not intended. E.g.
> {code}
> ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12')
> {code}
> We should verify that the columns provided in the specs are actually 
> partitioned columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15289) SQL test compilation error from merge conflict

2016-05-12 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281678#comment-15281678
 ] 

Andrew Or commented on SPARK-15289:
---

Done, thanks for the ping.

> SQL test compilation error from merge conflict
> --
>
> Key: SPARK-15289
> URL: https://issues.apache.org/jira/browse/SPARK-15289
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.0.0
>Reporter: Piotr Milanowski
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark build fails during SQL build. Concerns commit 
> 6b69b8c0c778f4cba2b281fe3ad225dc922f82d6, but also earlier ones; build works 
> e.g. for commit c6d23b6604e85bcddbd1fb6a2c1c3edbfd2be2c1. 
> Run with command:
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> Result:
> {code}
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:282:
>  not found: value sparkSession
> [error] val dbString = CatalogImpl.makeDataset(Seq(db), 
> sparkSession).showString(10)
> [error] ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:283:
>  not found: value sparkSession
> [error] val tableString = CatalogImpl.makeDataset(Seq(table), 
> sparkSession).showString(10)
> [error]   ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:284:
>  not found: value sparkSession
> [error] val functionString = CatalogImpl.makeDataset(Seq(function), 
> sparkSession).showString(10)
> [error] ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:285:
>  not found: value sparkSession
> [error] val columnString = CatalogImpl.makeDataset(Seq(column), 
> sparkSession).showString(10)
> [error] ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15289) SQL test compilation error from merge conflict

2016-05-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15289.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> SQL test compilation error from merge conflict
> --
>
> Key: SPARK-15289
> URL: https://issues.apache.org/jira/browse/SPARK-15289
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 2.0.0
>Reporter: Piotr Milanowski
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark build fails during SQL build. Concerns commit 
> 6b69b8c0c778f4cba2b281fe3ad225dc922f82d6, but also earlier ones; build works 
> e.g. for commit c6d23b6604e85bcddbd1fb6a2c1c3edbfd2be2c1. 
> Run with command:
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> Result:
> {code}
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:282:
>  not found: value sparkSession
> [error] val dbString = CatalogImpl.makeDataset(Seq(db), 
> sparkSession).showString(10)
> [error] ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:283:
>  not found: value sparkSession
> [error] val tableString = CatalogImpl.makeDataset(Seq(table), 
> sparkSession).showString(10)
> [error]   ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:284:
>  not found: value sparkSession
> [error] val functionString = CatalogImpl.makeDataset(Seq(function), 
> sparkSession).showString(10)
> [error] ^
> [error] 
> /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:285:
>  not found: value sparkSession
> [error] val columnString = CatalogImpl.makeDataset(Seq(column), 
> sparkSession).showString(10)
> [error] ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15264) Spark 2.0 CSV Reader: NPE on Blank Column Names

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15264.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Spark 2.0 CSV Reader: NPE on Blank Column Names
> ---
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>Assignee: Bill Chambers
> Fix For: 2.0.0
>
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15274) CSV default column names should be consistent

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15274.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> CSV default column names should be consistent
> -
>
> Key: SPARK-15274
> URL: https://issues.apache.org/jira/browse/SPARK-15274
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Bill Chambers
> Fix For: 2.0.0
>
>
> If a column name is not provided, Spark SQL usually uses the convention 
> "_c0", "_c1" etc., but when reading in CSV files without headers, we use "C0" 
> and "C1". This is inconsistent and we should fix it by Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15276) CREATE TABLE with LOCATION should imply EXTERNAL

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15276.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> CREATE TABLE with LOCATION should imply EXTERNAL
> 
>
> Key: SPARK-15276
> URL: https://issues.apache.org/jira/browse/SPARK-15276
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> If the user runs `CREATE TABLE some_table ... LOCATION /some/path`, then this 
> will still be a managed table even though the table's data is stored at 
> /some/path. The problem is that when we drop the table we'll also delete the 
> data /some/path. This could cause problems if /some/path contains existing 
> data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15279) Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-05-11 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15279:
-

 Summary: Disallow ROW FORMAT and STORED AS (parquet | orc | avro 
etc.)
 Key: SPARK-15279
 URL: https://issues.apache.org/jira/browse/SPARK-15279
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


They are both potentially conflicting ways that allow you to specify the SerDe. 
Unfortunately, we can't just get rid of ROW FORMAT because it may be used with 
TEXTFILE or RCFILE. For other file formats, we should fail fast wherever 
possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15275:
--
Summary: CatalogTable should store sort ordering for sorted columns  (was: 
[SQL] CatalogTable should store sort ordering for sorted columns)

> CatalogTable should store sort ordering for sorted columns
> --
>
> Key: SPARK-15275
> URL: https://issues.apache.org/jira/browse/SPARK-15275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Tejas Patil
>Priority: Trivial
>
> For bucketed tables in Hive, one can also add constraint about column 
> sortedness along with ordering.
> As per the spec in [0], CREATE TABLE statement can allow SORT ordering as 
> well:
>   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], 
> ...)] INTO num_buckets BUCKETS]
> See [1] for example. 
> [0] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
> [1] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables
> Currently CatalogTable does not store any information about the sort ordering 
> and just has the names of the sorted columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15275:
--
Priority: Major  (was: Trivial)

> CatalogTable should store sort ordering for sorted columns
> --
>
> Key: SPARK-15275
> URL: https://issues.apache.org/jira/browse/SPARK-15275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>
> For bucketed tables in Hive, one can also add constraint about column 
> sortedness along with ordering.
> As per the spec in [0], CREATE TABLE statement can allow SORT ordering as 
> well:
>   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], 
> ...)] INTO num_buckets BUCKETS]
> See [1] for example. 
> [0] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
> [1] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables
> Currently CatalogTable does not store any information about the sort ordering 
> and just has the names of the sorted columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15275:
--
Assignee: Tejas Patil

> CatalogTable should store sort ordering for sorted columns
> --
>
> Key: SPARK-15275
> URL: https://issues.apache.org/jira/browse/SPARK-15275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Trivial
>
> For bucketed tables in Hive, one can also add constraint about column 
> sortedness along with ordering.
> As per the spec in [0], CREATE TABLE statement can allow SORT ordering as 
> well:
>   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], 
> ...)] INTO num_buckets BUCKETS]
> See [1] for example. 
> [0] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
> [1] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables
> Currently CatalogTable does not store any information about the sort ordering 
> and just has the names of the sorted columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15275:
--
Affects Version/s: (was: 1.6.1)
   2.0.0

> CatalogTable should store sort ordering for sorted columns
> --
>
> Key: SPARK-15275
> URL: https://issues.apache.org/jira/browse/SPARK-15275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>
> For bucketed tables in Hive, one can also add constraint about column 
> sortedness along with ordering.
> As per the spec in [0], CREATE TABLE statement can allow SORT ordering as 
> well:
>   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], 
> ...)] INTO num_buckets BUCKETS]
> See [1] for example. 
> [0] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
> [1] : 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables
> Currently CatalogTable does not store any information about the sort ordering 
> and just has the names of the sorted columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15276) CREATE TABLE with LOCATION should imply EXTERNAL

2016-05-11 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15276:
-

 Summary: CREATE TABLE with LOCATION should imply EXTERNAL
 Key: SPARK-15276
 URL: https://issues.apache.org/jira/browse/SPARK-15276
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


If the user runs `CREATE TABLE some_table ... LOCATION /some/path`, then this 
will still be a managed table even though the table's data is stored at 
/some/path. The problem is that when we drop the table we'll also delete the 
data /some/path. This could cause problems if /some/path contains existing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15264:
--
Assignee: Bill Chambers

> Spark 2.0 CSV Reader: Error on Blank Column Names
> -
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>Assignee: Bill Chambers
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: NPE on Blank Column Names

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15264:
--
Summary: Spark 2.0 CSV Reader: NPE on Blank Column Names  (was: Spark 2.0 
CSV Reader: Error on Blank Column Names)

> Spark 2.0 CSV Reader: NPE on Blank Column Names
> ---
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>Assignee: Bill Chambers
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15264:
--
Target Version/s: 2.0.0

> Spark 2.0 CSV Reader: Error on Blank Column Names
> -
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>Assignee: Bill Chambers
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15274) CSV default column names should be consistent

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15274:
--
Assignee: Bill Chambers

> CSV default column names should be consistent
> -
>
> Key: SPARK-15274
> URL: https://issues.apache.org/jira/browse/SPARK-15274
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Bill Chambers
>
> If a column name is not provided, Spark SQL usually uses the convention 
> "_c0", "_c1" etc., but when reading in CSV files without headers, we use "C0" 
> and "C1". This is inconsistent and we should fix it by Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15274) CSV default column names should be consistent

2016-05-11 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15274:
-

 Summary: CSV default column names should be consistent
 Key: SPARK-15274
 URL: https://issues.apache.org/jira/browse/SPARK-15274
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or


If a column name is not provided, Spark SQL usually uses the convention "_c0", 
"_c1" etc., but when reading in CSV files without headers, we use "C0" and 
"C1". This is inconsistent and we should fix it by Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-11 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280615#comment-15280615
 ] 

Andrew Or commented on SPARK-13566:
---

[~ekeddy] This only happens with the unified memory manager, so you could 
switch back to the static memory manager by setting 
`spark.memory.useLegacyMode` to true. You may observe a decrease in performance 
if you do that, however.

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15262) race condition in killing an executor and reregistering an executor

2016-05-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15262:
--
Target Version/s: 1.6.2, 2.0.0

> race condition in killing an executor and reregistering an executor
> ---
>
> Key: SPARK-15262
> URL: https://issues.apache.org/jira/browse/SPARK-15262
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Shixiong Zhu
>
> There is a race condition when killing an executor and reregistering an 
> executor happen at the same time. Here is the execution steps to reproduce it.
> 1. master find a worker is dead
> 2. master tells driver to remove executor
> 3. driver remove executor
> 4. BlockManagerMasterEndpoint remove the block manager
> 5. executor finds it's not reigstered via heartbeat
> 6. executor send reregister block manager
> 7. register block manager
> 8. executor is killed by worker
> 9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is 
> not in the executor list
> 10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers
> As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block 
> managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle 
> block via a RPC endpoint of a dead block manager, we will get 
> ClosedChannelException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15249) Use FunctionResource instead of (String, String) in CreateFunction and CatalogFunction for resource

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15249.
---
  Resolution: Fixed
Assignee: Sandeep Singh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> ---
>
> Key: SPARK-15249
> URL: https://issues.apache.org/jira/browse/SPARK-15249
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> see: TODO's here
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L36
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L42



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15257:
-

 Summary: Require CREATE EXTERNAL TABLE to specify LOCATION
 Key: SPARK-15257
 URL: https://issues.apache.org/jira/browse/SPARK-15257
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
`LOCATION`, the table will still be created in the warehouse directory, but its 
metadata won't be deleted even when the user drops the table! This is a 
problem. We should use require the user to also specify `LOCATION`.

Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is not 
yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14857) Table/Database Name Validation in SessionCatalog

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14857:
--
Assignee: Xiao Li

> Table/Database Name Validation in SessionCatalog
> 
>
> Key: SPARK-14857
> URL: https://issues.apache.org/jira/browse/SPARK-14857
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> We need validate the database/table names before storing these information in 
> `ExternalCatalog`. 
> For example, if users use `backstick` to quote the table/database names 
> containing illegal characters, these names are allowed by Spark Parser, but 
> Hive metastore does not allow them. We need to catch them in SessionCatalog 
> and issue an appropriate error message.
> ```
> CREATE TABLE `tab:1`  ...
> ```
> This PR enforces the name rules of Spark SQL for `table`/`database`/`view`: 
> `only can contain alphanumeric and underscore characters.` Different from 
> Hive, we allow the names with starting underscore characters. 
> The validation of function/column names will be done in a separate JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14603) SessionCatalog needs to check if a metadata operation is valid

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14603.
---
   Resolution: Fixed
 Assignee: Xiao Li
Fix Version/s: 2.0.0

> SessionCatalog needs to check if a metadata operation is valid
> --
>
> Key: SPARK-14603
> URL: https://issues.apache.org/jira/browse/SPARK-14603
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.0.0
>
>
> Since we cannot really trust if the underlying external catalog can throw 
> exceptions when there is an invalid metadata operation, let's do it in 
> SessionCatalog. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14684:
--
Assignee: Xiao Li

> Verification of partition specs in SessionCatalog
> -
>
> Key: SPARK-14684
> URL: https://issues.apache.org/jira/browse/SPARK-14684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> When users inputting invalid partition spec, we might not be able to catch 
> and issue the error messages. Sometimes, it could cause a disaster result. 
> For example, previously, when we alter a table and drop a partition with 
> invalid spec, it could drop all the partitions due to a bug/defect in Hive 
> Metastore API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Component/s: SQL

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15037:
--
Component/s: Tests

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15037.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15236:
--
Component/s: Spark Shell

> No way to disable Hive support in REPL
> --
>
> Key: SPARK-15236
> URL: https://issues.apache.org/jira/browse/SPARK-15236
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If you built Spark with Hive classes, there's no switch to flip to start a 
> new `spark-shell` using the InMemoryCatalog. The only thing you can do now is 
> to rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15236:
--
Assignee: (was: Andrew Or)

> No way to disable Hive support in REPL
> --
>
> Key: SPARK-15236
> URL: https://issues.apache.org/jira/browse/SPARK-15236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If you built Spark with Hive classes, there's no switch to flip to start a 
> new `spark-shell` using the InMemoryCatalog. The only thing you can do now is 
> to rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15236) No way to disable Hive support in REPL

2016-05-09 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15236:
-

 Summary: No way to disable Hive support in REPL
 Key: SPARK-15236
 URL: https://issues.apache.org/jira/browse/SPARK-15236
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


If you built Spark with Hive classes, there's no switch to flip to start a new 
`spark-shell` using the InMemoryCatalog. The only thing you can do now is to 
rebuild Spark again. That is quite inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-15234:
-

Assignee: Andrew Or

> spark.catalog.listDatabases.show() is not formatted correctly
> -
>
> Key: SPARK-15234
> URL: https://issues.apache.org/jira/browse/SPARK-15234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> {code}
> scala> spark.catalog.listDatabases.show()
> ++---+---+
> |name|description|locationUri|
> ++---+---+
> |Database[name='de...|
> |Database[name='my...|
> |Database[name='so...|
> ++---+---+
> {code}
> It's because org.apache.spark.sql.catalog.Database is not a case class!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15234:
-

 Summary: spark.catalog.listDatabases.show() is not formatted 
correctly
 Key: SPARK-15234
 URL: https://issues.apache.org/jira/browse/SPARK-15234
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or


{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15234:
--
Description: 
{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}

It's because org.apache.spark.sql.catalog.Database is not a case class!

  was:
{code}
scala> spark.catalog.listDatabases.show()
++---+---+
|name|description|locationUri|
++---+---+
|Database[name='de...|
|Database[name='my...|
|Database[name='so...|
++---+---+
{code}


> spark.catalog.listDatabases.show() is not formatted correctly
> -
>
> Key: SPARK-15234
> URL: https://issues.apache.org/jira/browse/SPARK-15234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> {code}
> scala> spark.catalog.listDatabases.show()
> ++---+---+
> |name|description|locationUri|
> ++---+---+
> |Database[name='de...|
> |Database[name='my...|
> |Database[name='so...|
> ++---+---+
> {code}
> It's because org.apache.spark.sql.catalog.Database is not a case class!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-05-09 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276820#comment-15276820
 ] 

Andrew Or commented on SPARK-14021:
---

Closing as Won't Fix because the issue is outdated after HiveContext was 
removed.

> Support custom context derived from HiveContext for SparkSQLEnv
> ---
>
> Key: SPARK-14021
> URL: https://issues.apache.org/jira/browse/SPARK-14021
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Adrian Wang
>
> This is to create a custom context for command bin/spark-sql and 
> sbin/start-thriftserver. Any context that is derived from HiveContext is 
> acceptable. User need to configure the class name of custom context in a 
> config of spark.sql.context.class, and make sure the class in classpath. This 
> is to provide a more elegant way for custom configurations and changes for 
> infrastructure team.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14021.
---
Resolution: Won't Fix

> Support custom context derived from HiveContext for SparkSQLEnv
> ---
>
> Key: SPARK-14021
> URL: https://issues.apache.org/jira/browse/SPARK-14021
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Adrian Wang
>
> This is to create a custom context for command bin/spark-sql and 
> sbin/start-thriftserver. Any context that is derived from HiveContext is 
> acceptable. User need to configure the class name of custom context in a 
> config of spark.sql.context.class, and make sure the class in classpath. This 
> is to provide a more elegant way for custom configurations and changes for 
> infrastructure team.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10653.
---
  Resolution: Fixed
Assignee: Alex Bozarth
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15210.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15166.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15210:
--
Assignee: zhengruifeng

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15220) Add hyperlink to "running application" and "completed application"

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15220.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add hyperlink to "running application" and "completed application"
> --
>
> Key: SPARK-15220
> URL: https://issues.apache.org/jira/browse/SPARK-15220
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Mao, Wei
>Priority: Minor
> Fix For: 2.0.0
>
>
> Add hyperlink to "running application" and "completed application", so user 
> can jump to application table directly, In my environment, I set up 1000+ 
> works and it's painful to scroll down to skip worker list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15067) YARN executors are launched with fixed perm gen size

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15067.
---
  Resolution: Fixed
Assignee: Sean Owen
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> YARN executors are launched with fixed perm gen size
> 
>
> Key: SPARK-15067
> URL: https://issues.apache.org/jira/browse/SPARK-15067
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Renato Falchi Brandão
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.0.0
>
>
> It is impossible to change the executors max perm gen size using the property 
> "spark.executor.extraJavaOptions" when you are running on YARN.
> When the JVM option "-XX:MaxPermSize" is set through the property 
> "spark.executor.extraJavaOptions", Spark put it properly in the shell command 
> that will start the JVM container but, in the ending of command, it sets 
> again this option using a fixed value of 256m, as you can see in the log I've 
> extracted:
> 2016-04-30 17:20:12 INFO  ExecutorRunnable:58 -
> ===
> YARN executor launch context:
>   env:
> CLASSPATH -> 
> {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure
> SPARK_LOG_URL_STDERR -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096
> SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993
> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166
> SPARK_USER -> h_loadbd
> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC
> SPARK_YARN_MODE -> true
> SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343
> SPARK_LOG_URL_STDOUT -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096
> SPARK_YARN_CACHE_FILES -> 
> hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml
>   command:
> {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m 
> -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' 
> '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp 
> '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' 
> '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' 
> -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname 
> x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 
> --user-class-path file:$PWD/__app__.jar 1> /stdout 2> 
> /stderr
> Analyzing the code is possible to see that all the options set in the 
> property "spark.executor.extraJavaOptions" are enclosed, one by one, in 
> single quotes (ExecutorRunnable.scala:151) before the launcher take the 
> decision if a default value has to be provided or not for the option 
> "-XX:MaxPermSize" (ExecutorRunnable.scala:202).
> This decision is taken examining all the options set and looking for a string 
> starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If 
> that value is not found, the default value is set.
> A string option starting without single quote will never be found, then, a 
> default value will always be provided.
> A possible solution is change the source code of CommandBuilderUtils.java in 
> the line 328:
> From-> if (arg.startsWith("-XX:MaxPermSize="))
> To-> if (arg.indexOf("-XX:MaxPermSize=") > -1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15225.
---
  Resolution: Fixed
Assignee: Liang-Chi Hsieh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Assignee: Philipp Hoffmann

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Target Version/s: 1.6.2, 2.0.0  (was: 2.0.0)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Fix Version/s: 1.6.2

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15223.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Priority: Minor  (was: Trivial)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Minor
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15093) create/delete/rename directory for InMemoryCatalog operations if needed

2016-05-09 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15093.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> create/delete/rename directory for InMemoryCatalog operations if needed
> ---
>
> Key: SPARK-15093
> URL: https://issues.apache.org/jira/browse/SPARK-15093
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13566.
---
   Resolution: Fixed
Fix Version/s: 1.6.2

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15167.
---
Resolution: Won't Fix

> Add public catalog implementation method to SparkSession
> 
>
> Key: SPARK-15167
> URL: https://issues.apache.org/jira/browse/SPARK-15167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now there's no way to check whether a given SparkSession has Hive 
> support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
> that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13566:
--
Assignee: cen yuhai

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15152) Scaladoc and Code style Improvements

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15152.
---
  Resolution: Fixed
Assignee: Jacek Laskowski
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Scaladoc and Code style Improvements
> 
>
> Key: SPARK-15152
> URL: https://issues.apache.org/jira/browse/SPARK-15152
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, Spark Core, SQL, YARN
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Jacek Laskowski
>Priority: Minor
> Fix For: 2.0.0
>
>
> While doing code reviews for the Spark Notes I found many places with typos 
> and incorrect code style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15167:
-

 Summary: Add public catalog implementation method to SparkSession
 Key: SPARK-15167
 URL: https://issues.apache.org/jira/browse/SPARK-15167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now there's no way to check whether a given SparkSession has Hive 
support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15166:
--
Summary: Move hive-specific conf setting from SparkSession  (was: Move 
hive-specific conf setting to HiveSharedState)

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15166) Move hive-specific conf setting to HiveSharedState

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15166:
-

 Summary: Move hive-specific conf setting to HiveSharedState
 Key: SPARK-15166
 URL: https://issues.apache.org/jira/browse/SPARK-15166
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14893.
---
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.0.0

> Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed
> ---
>
> Key: SPARK-14893
> URL: https://issues.apache.org/jira/browse/SPARK-14893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Dilip Biswal
> Fix For: 2.0.0
>
>
> The test was disabled in https://github.com/apache/spark/pull/12585. To 
> re-enable it we need to rebuild the jar using the updated source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15158:
--
Assignee: Kai Wang

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Assignee: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9926) Parallelize file listing for partitioned Hive table

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-9926.
--
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Parallelize file listing for partitioned Hive table
> ---
>
> Key: SPARK-9926
> URL: https://issues.apache.org/jira/browse/SPARK-9926
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheolsoo Park
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> In Spark SQL, short queries like {{select * from table limit 10}} run very 
> slowly against partitioned Hive tables because of file listing. In 
> particular, if a large number of partitions are scanned on storage like S3, 
> the queries run extremely slowly. Here are some example benchmarks in my 
> environment-
> * Parquet-backed Hive table
> * Partitioned by dateint and hour
> * Stored on S3
> ||\# of partitions||\# of files||runtime||query||
> |1|972|30 secs|select * from nccp_log where dateint=20150601 and hour=0 limit 
> 10;|
> |24|13646|6 mins|select * from nccp_log where dateint=20150601 limit 10;|
> |240|136222|1 hour|select * from nccp_log where dateint>=20150601 and 
> dateint<=20150610 limit 10;|
> The problem is that {{TableReader}} constructs a separate HadoopRDD per Hive 
> partition path and group them into a UnionRDD. Then, all the input files are 
> listed sequentially. In other tools such as Hive and Pig, this can be solved 
> by setting 
> [mapreduce.input.fileinputformat.list-status.num-threads|https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml]
>  high. But in Spark, since each HadoopRDD lists only one partition path, 
> setting this property doesn't help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15158.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15134) Indent SparkSession builder patterns and update binary_classification_metrics_example.py

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15134.
---
  Resolution: Fixed
Assignee: Dongjoon Hyun
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Indent SparkSession builder patterns and update 
> binary_classification_metrics_example.py
> 
>
> Key: SPARK-15134
> URL: https://issues.apache.org/jira/browse/SPARK-15134
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> This issue addresses the comments in SPARK-15031 and also fix java-linter 
> errors.
> - Use multiline format in SparkSession builder patterns.
> - Update `binary_classification_metrics_example.py` to use `SparkSession`.
> - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15135) Make sure SparkSession thread safe

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15135.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make sure SparkSession thread safe
> --
>
> Key: SPARK-15135
> URL: https://issues.apache.org/jira/browse/SPARK-15135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Fixed non-thread-safe classed used by SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >