[jira] [Commented] (SPARK-15579) SparkUI: Storage page is empty even if things are cached
[ https://issues.apache.org/jira/browse/SPARK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302930#comment-15302930 ] Andrew Or commented on SPARK-15579: --- I tried this on 0f61d6efb45b9ee94fa663f67c4489fbdae2eded, which is literally the latest commit as of the writing of this message > SparkUI: Storage page is empty even if things are cached > > > Key: SPARK-15579 > URL: https://issues.apache.org/jira/browse/SPARK-15579 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Andrew Or > > scala> sc.parallelize(1 to 1, 5000).cache().count() > SparkUI storage page is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15579) SparkUI: Storage page is empty even if things are cached
[ https://issues.apache.org/jira/browse/SPARK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15579: -- Description: scala> sc.parallelize(1 to 1, 5000).cache().count() SparkUI storage page is empty. > SparkUI: Storage page is empty even if things are cached > > > Key: SPARK-15579 > URL: https://issues.apache.org/jira/browse/SPARK-15579 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.0.0 >Reporter: Andrew Or > > scala> sc.parallelize(1 to 1, 5000).cache().count() > SparkUI storage page is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15579) SparkUI: Storage page is empty even if things are cached
Andrew Or created SPARK-15579: - Summary: SparkUI: Storage page is empty even if things are cached Key: SPARK-15579 URL: https://issues.apache.org/jira/browse/SPARK-15579 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 2.0.0 Reporter: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15552) Remove unnecessary private[sql] methods in SparkSession
[ https://issues.apache.org/jira/browse/SPARK-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15552. --- Resolution: Fixed Fix Version/s: 2.0.0 > Remove unnecessary private[sql] methods in SparkSession > --- > > Key: SPARK-15552 > URL: https://issues.apache.org/jira/browse/SPARK-15552 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > SparkSession has a list of unnecessary private[sql] methods. These methods > cause some trouble because private[sql] doesn't apply in Java. In the cases > that they are easy to remove, we can simply remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15576) Add back hive tests blacklisted by SPARK-15539
[ https://issues.apache.org/jira/browse/SPARK-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15576: -- Assignee: (was: Andrew Or) > Add back hive tests blacklisted by SPARK-15539 > -- > > Key: SPARK-15576 > URL: https://issues.apache.org/jira/browse/SPARK-15576 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or > > These were removed from HiveCompatibilitySuite. They should be added back to > HiveQuerySuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions
[ https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15520: -- Assignee: Eric Liang > SparkSession builder in python should also allow overriding confs of existing > sessions > -- > > Key: SPARK-15520 > URL: https://issues.apache.org/jira/browse/SPARK-15520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang > Fix For: 2.0.0 > > > This is a leftover TODO from the SparkSession clean in this PR: > https://github.com/apache/spark/pull/13200 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions
[ https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15520. --- Resolution: Fixed Fix Version/s: 2.0.0 > SparkSession builder in python should also allow overriding confs of existing > sessions > -- > > Key: SPARK-15520 > URL: https://issues.apache.org/jira/browse/SPARK-15520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang > Fix For: 2.0.0 > > > This is a leftover TODO from the SparkSession clean in this PR: > https://github.com/apache/spark/pull/13200 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15539) DROP TABLE should throw exceptions, not logError
[ https://issues.apache.org/jira/browse/SPARK-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15539. --- Resolution: Fixed Fix Version/s: 2.0.0 > DROP TABLE should throw exceptions, not logError > > > Key: SPARK-15539 > URL: https://issues.apache.org/jira/browse/SPARK-15539 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > Fix For: 2.0.0 > > > Same as SPARK-15534 but for DROP TABLE -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15576) Add back hive tests blacklisted by SPARK-15539
Andrew Or created SPARK-15576: - Summary: Add back hive tests blacklisted by SPARK-15539 Key: SPARK-15576 URL: https://issues.apache.org/jira/browse/SPARK-15576 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or These were removed from HiveCompatibilitySuite. They should be added back to HiveQuerySuite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15506) only one notebook can define a UDF; java.sql.SQLException: Another instance of Derby may have already booted the database
[ https://issues.apache.org/jira/browse/SPARK-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302359#comment-15302359 ] Andrew Davidson commented on SPARK-15506: - Hi Jeff Here is how I start the notebook server. I believe spark uses jupyter $SPARK_ROOT/bin/pyspark Can you tell me where I can find out more about configuration details? I do not think the issue is multiple users. I discovered the bug while running two notebooks on my local machine. I.E. I was running both notebooks. It seem like the each notebook server needs it own data base? Kind regards Andy p.s. even in our data center I start the notebook server the same way. I am the only data scientist > only one notebook can define a UDF; java.sql.SQLException: Another instance > of Derby may have already booted the database > - > > Key: SPARK-15506 > URL: https://issues.apache.org/jira/browse/SPARK-15506 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.1 > Environment: Mac OSX El Captain > Python 3.4.2 >Reporter: Andrew Davidson > > I am using a sqlContext to create dataframes. I noticed that if I open up and > run 'notebook a' and 'a' defines a udf. That I will not be able to open a > second notebook that also defines a udf unless I shut down notebook a first. > In the second notebook I get a big long stack trace. The problem seems to be > Caused by: java.sql.SQLException: Another instance of Derby may have already > booted the database > /Users/andrewdavidson/workSpace/bigPWSWorkspace/dataScience/notebooks/gnip/metastore_db. > at > org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) > at > org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown > Source) > at > org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown > Source) > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown > Source) > ... 86 more > Here is the complete stack track > Kind regards > Andy > You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt > assembly > --- > Py4JJavaError Traceback (most recent call last) > in () > 16 #fooUDF = udf(lambda arg : "aedwip") > 17 > ---> 18 paddedStrUDF = udf(lambda zipInt : str(zipInt).zfill(5)) > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py > in udf(f, returnType) >1595 [Row(slen=5), Row(slen=3)] >1596 """ > -> 1597 return UserDefinedFunction(f, returnType) >1598 >1599 blacklist = ['map', 'since', 'ignore_unicode_prefix'] > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py > in __init__(self, func, returnType, name) >1556 self.returnType = returnType >1557 self._broadcast = None > -> 1558 self._judf = self._create_judf(name) >1559 >1560 def _create_judf(self, name): > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py > in _create_judf(self, name) >1567 pickled_command, broadcast_vars, env, includes = > _prepare_for_python_RDD(sc, command, self) >1568 ctx = SQLContext.getOrCreate(sc) > -> 1569 jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) >1570 if name is None: >1571 name = f.__name__ if hasattr(f, '__name__') else > f.__class__.__name__ > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/context.py > in _ssql_ctx(self) > 681 try: > 682 if not hasattr(self, '_scala_HiveContext'): > --> 683 self._scala_HiveContext = self._get_hive_ctx() > 684 return self._scala_HiveContext > 685 except Py4JError as e: > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/context.py > in _get_hive_ctx(self) > 690 > 691 def _get_hive_ctx(self): > --> 692 return self._jvm.HiveContext(self._jsc.sc()) > 693 > 694 def refreshTable(self, tableName): > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py > in __call__(self, *args) >1062 answer = self._gateway_client.send_command(command) >1063 return_value = get_return_value( > -> 1064 answer, self._gateway_client, None, self._fqn) >1065 >1066 for temp_arg in temp_args: > /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/utils.py >
[jira] [Assigned] (SPARK-15536) Disallow TRUNCATE TABLE with external tables and views
[ https://issues.apache.org/jira/browse/SPARK-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15536: - Assignee: Andrew Or (was: Suresh Thalamati) > Disallow TRUNCATE TABLE with external tables and views > -- > > Key: SPARK-15536 > URL: https://issues.apache.org/jira/browse/SPARK-15536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Otherwise we might accidentally delete existing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15538) Truncate table does not work on data source table
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15538: -- Affects Version/s: 2.0.0 Target Version/s: 2.0.0 > Truncate table does not work on data source table > - > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Suresh Thalamati >Assignee: Andrew Or >Priority: Minor > > Truncate table does not seem to work on data source tables. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show() // FileNotFoundException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15536) Disallow TRUNCATE TABLE with external tables and views
[ https://issues.apache.org/jira/browse/SPARK-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15536: -- Summary: Disallow TRUNCATE TABLE with external tables and views (was: Disallow TRUNCATE TABLE with external tables) > Disallow TRUNCATE TABLE with external tables and views > -- > > Key: SPARK-15536 > URL: https://issues.apache.org/jira/browse/SPARK-15536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Suresh Thalamati > > Otherwise we might accidentally delete existing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15538) Truncate table does not work on data source table
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15538: -- Description: Truncate table does not seem to work on data source tables. Repro: {code} val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", "CA")).toDF("id", "name", "state") df.write.format("parquet").partitionBy("state").saveAsTable("emp") scala> sql("truncate table emp") res8: org.apache.spark.sql.DataFrame = [] scala> sql("select * from emp").show() // FileNotFoundException {code} was: Truncate table does not seems to work on data source table. It returns success without any error , but table is not truncated. Repro: {code} val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", "CA")).toDF("id", "name", "state") df.write.format("parquet").partitionBy("state").saveAsTable("emp") scala> sql("truncate table emp") res8: org.apache.spark.sql.DataFrame = [] scala> sql("select * from emp").show() // FileNotFoundException {code} > Truncate table does not work on data source table > - > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Suresh Thalamati >Assignee: Andrew Or >Priority: Minor > > Truncate table does not seem to work on data source tables. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show() // FileNotFoundException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15538) Truncate table does not work on data source table
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15538: -- Description: Truncate table does not seems to work on data source table. It returns success without any error , but table is not truncated. Repro: {code} val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", "CA")).toDF("id", "name", "state") df.write.format("parquet").partitionBy("state").saveAsTable("emp") scala> sql("truncate table emp") res8: org.apache.spark.sql.DataFrame = [] scala> sql("select * from emp").show() // FileNotFoundException {code} was: Truncate table does not seems to work on data source table. It returns success without any error , but table is not truncated. Repro: {code} val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", "CA")).toDF("id", "name", "state") df.write.format("parquet").partitionBy("state").saveAsTable("emp") scala> sql("truncate table emp") res8: org.apache.spark.sql.DataFrame = [] scala> sql("select * from emp").show ; +---+--+-+ | id| name|state| +---+--+-+ | 3|Robert| CA| | 1| john| CA| | 2| Mike| NY| +---+--+-+ {code} The select should have returned no results. By scanning through the code I found some of the other DDL commands like LOAD DATA , and SHOW PARTITIONS are not allowed for data source table and they raise error. It Might be good to throw error until the truncate table works with data source table also. > Truncate table does not work on data source table > - > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Suresh Thalamati >Assignee: Andrew Or >Priority: Minor > > Truncate table does not seems to work on data source table. It returns > success without any error , but table is not truncated. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show() // FileNotFoundException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15538) Truncate table does not work on data source table
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15538: -- Summary: Truncate table does not work on data source table (was: Truncate table does not work on data source table , and does not raise error either.) > Truncate table does not work on data source table > - > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Suresh Thalamati >Assignee: Suresh Thalamati >Priority: Minor > > Truncate table does not seems to work on data source table. It returns > success without any error , but table is not truncated. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show ; > +---+--+-+ > | id| name|state| > +---+--+-+ > | 3|Robert| CA| > | 1| john| CA| > | 2| Mike| NY| > +---+--+-+ > {code} > The select should have returned no results. > By scanning through the code I found some of the other DDL commands like > LOAD DATA , and SHOW PARTITIONS are not allowed for data source table and > they raise error. > It Might be good to throw error until the truncate table works with data > source table also. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15538) Truncate table does not work on data source table
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15538: - Assignee: Andrew Or (was: Suresh Thalamati) > Truncate table does not work on data source table > - > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Suresh Thalamati >Assignee: Andrew Or >Priority: Minor > > Truncate table does not seems to work on data source table. It returns > success without any error , but table is not truncated. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show ; > +---+--+-+ > | id| name|state| > +---+--+-+ > | 3|Robert| CA| > | 1| john| CA| > | 2| Mike| NY| > +---+--+-+ > {code} > The select should have returned no results. > By scanning through the code I found some of the other DDL commands like > LOAD DATA , and SHOW PARTITIONS are not allowed for data source table and > they raise error. > It Might be good to throw error until the truncate table works with data > source table also. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15539) DROP TABLE should throw exceptions, not logError
Andrew Or created SPARK-15539: - Summary: DROP TABLE should throw exceptions, not logError Key: SPARK-15539 URL: https://issues.apache.org/jira/browse/SPARK-15539 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Minor Same as SPARK-15534 but for DROP TABLE -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15538) Truncate table does not work on data source table , and does not raise error either.
[ https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15538: -- Assignee: Suresh Thalamati > Truncate table does not work on data source table , and does not raise error > either. > > > Key: SPARK-15538 > URL: https://issues.apache.org/jira/browse/SPARK-15538 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Suresh Thalamati >Assignee: Suresh Thalamati >Priority: Minor > > Truncate table does not seems to work on data source table. It returns > success without any error , but table is not truncated. > Repro: > {code} > val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", > "CA")).toDF("id", "name", "state") > df.write.format("parquet").partitionBy("state").saveAsTable("emp") > scala> sql("truncate table emp") > res8: org.apache.spark.sql.DataFrame = [] > scala> sql("select * from emp").show ; > +---+--+-+ > | id| name|state| > +---+--+-+ > | 3|Robert| CA| > | 1| john| CA| > | 2| Mike| NY| > +---+--+-+ > {code} > The select should have returned no results. > By scanning through the code I found some of the other DDL commands like > LOAD DATA , and SHOW PARTITIONS are not allowed for data source table and > they raise error. > It Might be good to throw error until the truncate table works with data > source table also. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15535) Remove code for TRUNCATE TABLE ... COLUMN
[ https://issues.apache.org/jira/browse/SPARK-15535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15535: -- Priority: Minor (was: Major) > Remove code for TRUNCATE TABLE ... COLUMN > - > > Key: SPARK-15535 > URL: https://issues.apache.org/jira/browse/SPARK-15535 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > > This was never supported in the first place. Also Hive doesn't support it: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15534) TRUNCATE TABLE should throw exceptions, not logError
[ https://issues.apache.org/jira/browse/SPARK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15534: -- Priority: Minor (was: Major) > TRUNCATE TABLE should throw exceptions, not logError > > > Key: SPARK-15534 > URL: https://issues.apache.org/jira/browse/SPARK-15534 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > > If the table to truncate doesn't exist, throw an exception! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15536) Disallow TRUNCATE TABLE with external tables
[ https://issues.apache.org/jira/browse/SPARK-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15536: -- Assignee: Suresh Thalamati (was: Andrew Or) > Disallow TRUNCATE TABLE with external tables > > > Key: SPARK-15536 > URL: https://issues.apache.org/jira/browse/SPARK-15536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Suresh Thalamati > > Otherwise we might accidentally delete existing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15536) Disallow TRUNCATE TABLE with external tables
Andrew Or created SPARK-15536: - Summary: Disallow TRUNCATE TABLE with external tables Key: SPARK-15536 URL: https://issues.apache.org/jira/browse/SPARK-15536 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Otherwise we might accidentally delete existing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15535) Remove code for TRUNCATE TABLE ... COLUMN
Andrew Or created SPARK-15535: - Summary: Remove code for TRUNCATE TABLE ... COLUMN Key: SPARK-15535 URL: https://issues.apache.org/jira/browse/SPARK-15535 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or This was never supported in the first place. Also Hive doesn't support it: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15534) TRUNCATE TABLE should throw exceptions, not logError
Andrew Or created SPARK-15534: - Summary: TRUNCATE TABLE should throw exceptions, not logError Key: SPARK-15534 URL: https://issues.apache.org/jira/browse/SPARK-15534 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or If the table to truncate doesn't exist, throw an exception! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext
[ https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15345. --- Resolution: Fixed Assignee: Jeff Zhang (was: Reynold Xin) > SparkSession's conf doesn't take effect when there's already an existing > SparkContext > - > > Key: SPARK-15345 > URL: https://issues.apache.org/jira/browse/SPARK-15345 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Piotr Milanowski >Assignee: Jeff Zhang >Priority: Blocker > Fix For: 2.0.0 > > > I am working with branch-2.0, spark is compiled with hive support (-Phive and > -Phvie-thriftserver). > I am trying to access databases using this snippet: > {code} > from pyspark.sql import HiveContext > hc = HiveContext(sc) > hc.sql("show databases").collect() > [Row(result='default')] > {code} > This means that spark doesn't find any databases specified in configuration. > Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark > 1.6, and launching above snippet, I can print out existing databases. > When run in DEBUG mode this is what spark (2.0) prints out: > {code} > 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases > 16/05/16 12:17:47 DEBUG SimpleAnalyzer: > === Result of Batch Resolution === > !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, > string])) null else input[0, string].toString, > StructField(result,StringType,false)), result#2) AS #3] Project > [createexternalrow(if (isnull(result#2)) null else result#2.toString, > StructField(result,StringType,false)) AS #3] > +- LocalRelation [result#2] > > +- LocalRelation [result#2] > > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.Dataset$$anonfun$53) +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: private final > org.apache.spark.sql.types.StructType > org.apache.spark.sql.Dataset$$anonfun$53.structType$1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure > (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1) > +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final > org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure > (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1) > is now cleaned +++ >
[jira] [Updated] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext
[ https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15345: -- Assignee: Reynold Xin (was: Jeff Zhang) > SparkSession's conf doesn't take effect when there's already an existing > SparkContext > - > > Key: SPARK-15345 > URL: https://issues.apache.org/jira/browse/SPARK-15345 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Reporter: Piotr Milanowski >Assignee: Reynold Xin >Priority: Blocker > Fix For: 2.0.0 > > > I am working with branch-2.0, spark is compiled with hive support (-Phive and > -Phvie-thriftserver). > I am trying to access databases using this snippet: > {code} > from pyspark.sql import HiveContext > hc = HiveContext(sc) > hc.sql("show databases").collect() > [Row(result='default')] > {code} > This means that spark doesn't find any databases specified in configuration. > Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark > 1.6, and launching above snippet, I can print out existing databases. > When run in DEBUG mode this is what spark (2.0) prints out: > {code} > 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases > 16/05/16 12:17:47 DEBUG SimpleAnalyzer: > === Result of Batch Resolution === > !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, > string])) null else input[0, string].toString, > StructField(result,StringType,false)), result#2) AS #3] Project > [createexternalrow(if (isnull(result#2)) null else result#2.toString, > StructField(result,StringType,false)) AS #3] > +- LocalRelation [result#2] > > +- LocalRelation [result#2] > > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.Dataset$$anonfun$53) +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: private final > org.apache.spark.sql.types.StructType > org.apache.spark.sql.Dataset$$anonfun$53.structType$1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure > (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1) > +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final > org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure > (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1) > is now cleaned +++ > 16/05/16 12:17:47 DEBUG
[jira] [Resolved] (SPARK-15511) Dropping data source table succeeds but throws exception
[ https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15511. --- Resolution: Not A Problem Assignee: Andrew Or If you run into this issue again, just delete $SPARK_HOME/metastore_db > Dropping data source table succeeds but throws exception > > > Key: SPARK-15511 > URL: https://issues.apache.org/jira/browse/SPARK-15511 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > If the catalog is backed by Hive: > {code} > scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV") > {code} > {code} > scala> sql("DROP TABLE boxes") > 16/05/24 13:30:50 WARN DropTableCommand: > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/user/hive/warehouse/boxes; > com.google.common.util.concurrent.UncheckedExecutionException: > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/user/hive/warehouse/boxes; > at > com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882) > at > com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170) > ... > Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: > file:/user/hive/warehouse/boxes; > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15511) Dropping data source table succeeds but throws exception
[ https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15511: -- Description: If the catalog is backed by Hive: {code} scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV") {code} {code} scala> sql("DROP TABLE boxes") 16/05/24 13:30:50 WARN DropTableCommand: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; com.google.common.util.concurrent.UncheckedExecutionException: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882) at com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170) ... Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69) {code} was: {code} scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV") {code} {code} scala> sql("DROP TABLE boxes") 16/05/24 13:30:50 WARN DropTableCommand: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; com.google.common.util.concurrent.UncheckedExecutionException: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882) at com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170) ... Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69) {code} > Dropping data source table succeeds but throws exception > > > Key: SPARK-15511 > URL: https://issues.apache.org/jira/browse/SPARK-15511 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > If the catalog is backed by Hive: > {code} > scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV") > {code} > {code} > scala> sql("DROP TABLE boxes") > 16/05/24 13:30:50 WARN DropTableCommand: > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/user/hive/warehouse/boxes; > com.google.common.util.concurrent.UncheckedExecutionException: > org.apache.spark.sql.AnalysisException: Path does not exist: > file:/user/hive/warehouse/boxes; > at > com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882) > at > com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170) >
[jira] [Created] (SPARK-15511) Dropping data source table succeeds but throws exception
Andrew Or created SPARK-15511: - Summary: Dropping data source table succeeds but throws exception Key: SPARK-15511 URL: https://issues.apache.org/jira/browse/SPARK-15511 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or {code} scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV") {code} {code} scala> sql("DROP TABLE boxes") 16/05/24 13:30:50 WARN DropTableCommand: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; com.google.common.util.concurrent.UncheckedExecutionException: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882) at com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170) ... Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/hive/warehouse/boxes; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15388) spark sql "CREATE FUNCTION" throws exception with hive 1.2.1
[ https://issues.apache.org/jira/browse/SPARK-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15388. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > spark sql "CREATE FUNCTION" throws exception with hive 1.2.1 > > > Key: SPARK-15388 > URL: https://issues.apache.org/jira/browse/SPARK-15388 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yang Wang >Assignee: Yang Wang > Fix For: 2.0.0 > > > spark.sql("CREATE FUNCTION MY_FUNCTION_1 AS > 'com.haizhi.bdp.udf.UDFGetGeoCode'") throws > org.apache.spark.sql.AnalysisException. > I was using hive whose version is 1.2.1 > Full stack trace is as follows: > Exception in thread "main" org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:NoSuchObjectException(message:Function > bdp.GET_GEO_CODE does not exist)); > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:71) > at > org.apache.spark.sql.hive.HiveExternalCatalog.functionExists(HiveExternalCatalog.scala:323) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.functionExists(SessionCatalog.scala:712) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.createFunction(SessionCatalog.scala:663) > at > org.apache.spark.sql.execution.command.CreateFunction.run(functions.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15388) spark sql "CREATE FUNCTION" throws exception with hive 1.2.1
[ https://issues.apache.org/jira/browse/SPARK-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15388: -- Assignee: Yang Wang > spark sql "CREATE FUNCTION" throws exception with hive 1.2.1 > > > Key: SPARK-15388 > URL: https://issues.apache.org/jira/browse/SPARK-15388 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yang Wang >Assignee: Yang Wang > Fix For: 2.0.0 > > > spark.sql("CREATE FUNCTION MY_FUNCTION_1 AS > 'com.haizhi.bdp.udf.UDFGetGeoCode'") throws > org.apache.spark.sql.AnalysisException. > I was using hive whose version is 1.2.1 > Full stack trace is as follows: > Exception in thread "main" org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:NoSuchObjectException(message:Function > bdp.GET_GEO_CODE does not exist)); > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:71) > at > org.apache.spark.sql.hive.HiveExternalCatalog.functionExists(HiveExternalCatalog.scala:323) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.functionExists(SessionCatalog.scala:712) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.createFunction(SessionCatalog.scala:663) > at > org.apache.spark.sql.execution.command.CreateFunction.run(functions.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15506) only one notebook can define a UDF; java.sql.SQLException: Another instance of Derby may have already booted the database
Andrew Davidson created SPARK-15506: --- Summary: only one notebook can define a UDF; java.sql.SQLException: Another instance of Derby may have already booted the database Key: SPARK-15506 URL: https://issues.apache.org/jira/browse/SPARK-15506 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.6.1 Environment: Mac OSX El Captain Python 3.4.2 Reporter: Andrew Davidson I am using a sqlContext to create dataframes. I noticed that if I open up and run 'notebook a' and 'a' defines a udf. That I will not be able to open a second notebook that also defines a udf unless I shut down notebook a first. In the second notebook I get a big long stack trace. The problem seems to be Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /Users/andrewdavidson/workSpace/bigPWSWorkspace/dataScience/notebooks/gnip/metastore_db. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) ... 86 more Here is the complete stack track Kind regards Andy You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly --- Py4JJavaError Traceback (most recent call last) in () 16 #fooUDF = udf(lambda arg : "aedwip") 17 ---> 18 paddedStrUDF = udf(lambda zipInt : str(zipInt).zfill(5)) /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py in udf(f, returnType) 1595 [Row(slen=5), Row(slen=3)] 1596 """ -> 1597 return UserDefinedFunction(f, returnType) 1598 1599 blacklist = ['map', 'since', 'ignore_unicode_prefix'] /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py in __init__(self, func, returnType, name) 1556 self.returnType = returnType 1557 self._broadcast = None -> 1558 self._judf = self._create_judf(name) 1559 1560 def _create_judf(self, name): /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/functions.py in _create_judf(self, name) 1567 pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self) 1568 ctx = SQLContext.getOrCreate(sc) -> 1569 jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) 1570 if name is None: 1571 name = f.__name__ if hasattr(f, '__name__') else f.__class__.__name__ /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/context.py in _ssql_ctx(self) 681 try: 682 if not hasattr(self, '_scala_HiveContext'): --> 683 self._scala_HiveContext = self._get_hive_ctx() 684 return self._scala_HiveContext 685 except Py4JError as e: /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/context.py in _get_hive_ctx(self) 690 691 def _get_hive_ctx(self): --> 692 return self._jvm.HiveContext(self._jsc.sc()) 693 694 def refreshTable(self, tableName): /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1062 answer = self._gateway_client.send_command(command) 1063 return_value = get_return_value( -> 1064 answer, self._gateway_client, None, self._fqn) 1065 1066 for temp_arg in temp_args: /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/pyspark/sql/utils.py in deco(*a, **kw) 43 def deco(*a, **kw): 44 try: ---> 45 return f(*a, **kw) 46 except py4j.protocol.Py4JJavaError as e: 47 s = e.java_exception.toString() /Users/andrewdavidson/workSpace/spark/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise Py4JJavaError( 307 "An error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise Py4JError( Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
[jira] [Commented] (SPARK-15450) Clean up SparkSession builder for python
[ https://issues.apache.org/jira/browse/SPARK-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297800#comment-15297800 ] Andrew Or commented on SPARK-15450: --- Actually I already have some ideas for this one. > Clean up SparkSession builder for python > > > Key: SPARK-15450 > URL: https://issues.apache.org/jira/browse/SPARK-15450 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > This is the sister JIRA for SPARK-15075. Today we use > `SQLContext.getOrCreate` in our builder. Instead we should just have a real > `SparkSession.getOrCreate` and use that in our builder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15464) Replace SQLContext and SparkContext with SparkSession using builder pattern in python testsuites
[ https://issues.apache.org/jira/browse/SPARK-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15464. --- Resolution: Fixed Assignee: Weichen Xu Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Replace SQLContext and SparkContext with SparkSession using builder pattern > in python testsuites > > > Key: SPARK-15464 > URL: https://issues.apache.org/jira/browse/SPARK-15464 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, SQL, Tests >Affects Versions: 2.0.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Minor > Labels: test > Fix For: 2.0.0 > > > In python script, there are several positions still using SQLContext and > directly create SparkContext, it should be replace with SparkSession using > SparkSession.builder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15311) Disallow DML on Non-temporary Tables when Using In-Memory Catalog
[ https://issues.apache.org/jira/browse/SPARK-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15311. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Disallow DML on Non-temporary Tables when Using In-Memory Catalog > - > > Key: SPARK-15311 > URL: https://issues.apache.org/jira/browse/SPARK-15311 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > So far, when using In-Memory Catalog, we allow DDL operations for > non-temporary tables. However, the corresponding DML operations are not > supported. Thus, we need to issue exceptions in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15488) Possible Accumulator bug causing OneVsRestSuite to be flaky
[ https://issues.apache.org/jira/browse/SPARK-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15488. --- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Possible Accumulator bug causing OneVsRestSuite to be flaky > --- > > Key: SPARK-15488 > URL: https://issues.apache.org/jira/browse/SPARK-15488 > Project: Spark > Issue Type: Bug > Components: ML, Spark Core >Affects Versions: 2.0.0 > Environment: Jenkins: branch-2.0, maven build, Hadoop 2.6 >Reporter: Joseph K. Bradley >Assignee: Liang-Chi Hsieh > Fix For: 2.0.0 > > > OneVsRestSuite has been slightly flaky recently. The failure happens in the > use of {{Range.par}}, which executes concurrent jobs which use the same > DataFrame. This sometimes causes failures from > {{java.util.ConcurrentModificationException}}. > It appears the failure is from {{InMemoryRelation.batchStats}} being > accessed. Since that is an instance of {{Accumulable}}, I'm guessing the bug > is from recent Accumulator changes. > Stack trace from this test run. > * links: [https://spark-tests.appspot.com/test-logs/125719479] and > [https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.6/993] > {code} > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.computeSizeInBytes(InMemoryTableScanExec.scala:90) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.statistics(InMemoryTableScanExec.scala:113) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.statisticsToBePropagated(InMemoryTableScanExec.scala:97) > at > org.apache.spark.sql.execution.columnar.InMemoryRelation.withOutput(InMemoryTableScanExec.scala:191) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:144) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:144) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1.applyOrElse(CacheManager.scala:144) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$useCachedData$1.applyOrElse(CacheManager.scala:141) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) >
[jira] [Resolved] (SPARK-15279) Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)
[ https://issues.apache.org/jira/browse/SPARK-15279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15279. --- Resolution: Fixed Fix Version/s: 2.0.0 > Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.) > - > > Key: SPARK-15279 > URL: https://issues.apache.org/jira/browse/SPARK-15279 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > They are both potentially conflicting ways that allow you to specify the > SerDe. Unfortunately, we can't just get rid of ROW FORMAT because it may be > used with TEXTFILE or RCFILE. For other file formats, we should fail fast > wherever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15397) 'locate' UDF got different result with boundary value case compared to Hive engine
[ https://issues.apache.org/jira/browse/SPARK-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15397: -- Summary: 'locate' UDF got different result with boundary value case compared to Hive engine (was: [Spark][SQL] 'locate' UDF got different result with boundary value case compared to Hive engine) > 'locate' UDF got different result with boundary value case compared to Hive > engine > -- > > Key: SPARK-15397 > URL: https://issues.apache.org/jira/browse/SPARK-15397 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Yi Zhou > > Spark SQL: > select locate("abc", "abc", 1); > 0 > Hive: > select locate("abc", "abc", 1); > 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15397) 'locate' UDF got different result with boundary value case compared to Hive engine
[ https://issues.apache.org/jira/browse/SPARK-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15397: -- Assignee: Adrian Wang > 'locate' UDF got different result with boundary value case compared to Hive > engine > -- > > Key: SPARK-15397 > URL: https://issues.apache.org/jira/browse/SPARK-15397 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Yi Zhou >Assignee: Adrian Wang > > Spark SQL: > select locate("abc", "abc", 1); > 0 > Hive: > select locate("abc", "abc", 1); > 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15477) HiveContext is private[hive] and not accessible to users.
[ https://issues.apache.org/jira/browse/SPARK-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15477. --- Resolution: Not A Bug > HiveContext is private[hive] and not accessible to users. > -- > > Key: SPARK-15477 > URL: https://issues.apache.org/jira/browse/SPARK-15477 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Doug Balog > > In 2.0 org.apache.spark.sql.hive.HiveContext was mark deprecated but should > still be accessible from user programs. It is not since, its marked at > `private[hive]` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15477) HiveContext is private[hive] and not accessible to users.
[ https://issues.apache.org/jira/browse/SPARK-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296763#comment-15296763 ] Andrew Or commented on SPARK-15477: --- What part of the code makes you think it's private[hive]? https://github.com/apache/spark/blob/branch-2.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala > HiveContext is private[hive] and not accessible to users. > -- > > Key: SPARK-15477 > URL: https://issues.apache.org/jira/browse/SPARK-15477 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Doug Balog > > In 2.0 org.apache.spark.sql.hive.HiveContext was mark deprecated but should > still be accessible from user programs. It is not since, its marked at > `private[hive]` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15456) PySpark Shell fails to create SparkContext if HiveConf not found
[ https://issues.apache.org/jira/browse/SPARK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15456. --- Resolution: Fixed Assignee: Bryan Cutler Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > PySpark Shell fails to create SparkContext if HiveConf not found > > > Key: SPARK-15456 > URL: https://issues.apache.org/jira/browse/SPARK-15456 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Bryan Cutler >Assignee: Bryan Cutler > Fix For: 2.0.0 > > > When starting the PySpark shell, if HiveConf is not available then will fall > back to create a SparkSession from a SparkContext. This is attempted with > the variable {{sc}} which hasn't been initialized yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15450) Clean up SparkSession builder for python
Andrew Or created SPARK-15450: - Summary: Clean up SparkSession builder for python Key: SPARK-15450 URL: https://issues.apache.org/jira/browse/SPARK-15450 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or This is the sister JIRA for SPARK-15075. Today we use `SQLContext.getOrCreate` in our builder. Instead we should just have a real `SparkSession.getOrCreate` and use that in our builder. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext
[ https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293899#comment-15293899 ] Andrew Or commented on SPARK-15345: --- The python part should be resolved by SPARK-15417, https://github.com/apache/spark/pull/13203 > SparkSession's conf doesn't take effect when there's already an existing > SparkContext > - > > Key: SPARK-15345 > URL: https://issues.apache.org/jira/browse/SPARK-15345 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Piotr Milanowski >Assignee: Reynold Xin >Priority: Blocker > Fix For: 2.0.0 > > > I am working with branch-2.0, spark is compiled with hive support (-Phive and > -Phvie-thriftserver). > I am trying to access databases using this snippet: > {code} > from pyspark.sql import HiveContext > hc = HiveContext(sc) > hc.sql("show databases").collect() > [Row(result='default')] > {code} > This means that spark doesn't find any databases specified in configuration. > Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark > 1.6, and launching above snippet, I can print out existing databases. > When run in DEBUG mode this is what spark (2.0) prints out: > {code} > 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases > 16/05/16 12:17:47 DEBUG SimpleAnalyzer: > === Result of Batch Resolution === > !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, > string])) null else input[0, string].toString, > StructField(result,StringType,false)), result#2) AS #3] Project > [createexternalrow(if (isnull(result#2)) null else result#2.toString, > StructField(result,StringType,false)) AS #3] > +- LocalRelation [result#2] > > +- LocalRelation [result#2] > > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.Dataset$$anonfun$53) +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: private final > org.apache.spark.sql.types.StructType > org.apache.spark.sql.Dataset$$anonfun$53.structType$1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure > (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure > (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1) > +++ > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 1 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID > 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2 > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object) > 16/05/16 12:17:47 DEBUG ClosureCleaner: public final > org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler > org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator) > 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because > this is the starting closure > 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting > closure: 0 > 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects! > 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure >
[jira] [Updated] (SPARK-15417) Failed to enable hive support in PySpark shell
[ https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15417: -- Summary: Failed to enable hive support in PySpark shell (was: Failed to enable HiveSupport in PySpark) > Failed to enable hive support in PySpark shell > -- > > Key: SPARK-15417 > URL: https://issues.apache.org/jira/browse/SPARK-15417 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Andrew Or >Priority: Blocker > Fix For: 2.0.0 > > > Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and > SparkSession. Both failed. It always uses in-memory catalog. > Method 1: Using SparkSession > {noformat} > >>> from pyspark.sql import SparkSession > >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate() > >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' > >>> INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line > 57, in deco > return f(*a, **kw) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > : java.lang.UnsupportedOperationException: loadTable is not implemented > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Method 2: Using HiveContext: > {noformat} > >>> from pyspark.sql import HiveContext > >>> sqlContext = HiveContext(sc) > >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> sqlContext.sql("LOAD DATA LOCAL INPATH > >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", > line 346, in sql > return self.sparkSession.sql(sqlQuery) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery),
[jira] [Resolved] (SPARK-15417) Failed to enable HiveSupport in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15417. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Failed to enable HiveSupport in PySpark > --- > > Key: SPARK-15417 > URL: https://issues.apache.org/jira/browse/SPARK-15417 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Andrew Or >Priority: Blocker > Fix For: 2.0.0 > > > Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and > SparkSession. Both failed. It always uses in-memory catalog. > Method 1: Using SparkSession > {noformat} > >>> from pyspark.sql import SparkSession > >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate() > >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' > >>> INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line > 57, in deco > return f(*a, **kw) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > : java.lang.UnsupportedOperationException: loadTable is not implemented > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Method 2: Using HiveContext: > {noformat} > >>> from pyspark.sql import HiveContext > >>> sqlContext = HiveContext(sc) > >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> sqlContext.sql("LOAD DATA LOCAL INPATH > >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", > line 346, in sql > return self.sparkSession.sql(sqlQuery) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File >
[jira] [Resolved] (SPARK-15421) Table and Database property values need to be validated
[ https://issues.apache.org/jira/browse/SPARK-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15421. --- Resolution: Fixed Fix Version/s: 2.0.0 > Table and Database property values need to be validated > --- > > Key: SPARK-15421 > URL: https://issues.apache.org/jira/browse/SPARK-15421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > When we parse DDLs involving table or database properties, we need to > validate the values. > E.g. if we alter a database's property without providing a value: > {code} > ALTER DATABASE my_db SET DBPROPERTIES('some_key') > {code} > Then we'll ignore it with Hive, but override the property with the in-memory > catalog. Inconsistencies like these arise because we don't validate the > property values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15421) Table and Database property values need to be validated
Andrew Or created SPARK-15421: - Summary: Table and Database property values need to be validated Key: SPARK-15421 URL: https://issues.apache.org/jira/browse/SPARK-15421 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or When we parse DDLs involving table or database properties, we need to validate the values. E.g. if we alter a database's property without providing a value: {code} ALTER DATABASE my_db SET DBPROPERTIES('some_key') {code} Then we'll ignore it with Hive, but override the property with the in-memory catalog. Inconsistencies like these arise because we don't validate the property values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15417) Failed to enable HiveSupport in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292159#comment-15292159 ] Andrew Or commented on SPARK-15417: --- Good catch, I have a patch to fix this. > Failed to enable HiveSupport in PySpark > --- > > Key: SPARK-15417 > URL: https://issues.apache.org/jira/browse/SPARK-15417 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Blocker > > Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and > SparkSession. Both failed. It always uses in-memory catalog. > Method 1: Using SparkSession > {noformat} > >>> from pyspark.sql import SparkSession > >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate() > >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' > >>> INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line > 57, in deco > return f(*a, **kw) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > : java.lang.UnsupportedOperationException: loadTable is not implemented > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Method 2: Using HiveContext: > {noformat} > >>> from pyspark.sql import HiveContext > >>> sqlContext = HiveContext(sc) > >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> sqlContext.sql("LOAD DATA LOCAL INPATH > >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", > line 346, in sql > return self.sparkSession.sql(sqlQuery) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File >
[jira] [Assigned] (SPARK-15417) Failed to enable HiveSupport in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15417: - Assignee: Andrew Or > Failed to enable HiveSupport in PySpark > --- > > Key: SPARK-15417 > URL: https://issues.apache.org/jira/browse/SPARK-15417 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Andrew Or >Priority: Blocker > > Unable to use Hive meta-store in pyspark shell. Tried both HiveContext and > SparkSession. Both failed. It always uses in-memory catalog. > Method 1: Using SparkSession > {noformat} > >>> from pyspark.sql import SparkSession > >>> spark = SparkSession.builder.enableHiveSupport().getOrCreate() > >>> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' > >>> INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line > 57, in deco > return f(*a, **kw) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql. > : java.lang.UnsupportedOperationException: loadTable is not implemented > at > org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.loadTable(InMemoryCatalog.scala:297) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:280) > at org.apache.spark.sql.execution.command.LoadData.run(tables.scala:263) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at org.apache.spark.sql.Dataset.(Dataset.scala:187) > at org.apache.spark.sql.Dataset.(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:541) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Method 2: Using HiveContext: > {noformat} > >>> from pyspark.sql import HiveContext > >>> sqlContext = HiveContext(sc) > >>> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > DataFrame[] > >>> sqlContext.sql("LOAD DATA LOCAL INPATH > >>> 'examples/src/main/resources/kv1.txt' INTO TABLE src") > Traceback (most recent call last): > File "", line 1, in > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/context.py", > line 346, in sql > return self.sparkSession.sql(sqlQuery) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/session.py", > line 494, in sql > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) > File > "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", >
[jira] [Resolved] (SPARK-15392) The default value of size estimation is not good
[ https://issues.apache.org/jira/browse/SPARK-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15392. --- Resolution: Fixed Target Version/s: 2.0.0 > The default value of size estimation is not good > > > Key: SPARK-15392 > URL: https://issues.apache.org/jira/browse/SPARK-15392 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > We use autoBroadcastJoinThreshold + 1L as the default value of size > estimation, that is not good in 2.0, because we will calculate the size based > on size of schema, then the estimation could be less than > autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame > created from RDD. > We should use an even bigger default value, for example, MaxLong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15317) JobProgressListener takes a huge amount of memory with iterative DataFrame program in local, standalone
[ https://issues.apache.org/jira/browse/SPARK-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15317. --- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > JobProgressListener takes a huge amount of memory with iterative DataFrame > program in local, standalone > --- > > Key: SPARK-15317 > URL: https://issues.apache.org/jira/browse/SPARK-15317 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 > Environment: Spark 2.0, local mode + standalone mode on MacBook Pro > OSX 10.9 >Reporter: Joseph K. Bradley >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > Attachments: cc_traces.txt, compare-1.6-10Kpartitions.png, > compare-2.0-10Kpartitions.png, compare-2.0-16partitions.png, > dump-standalone-2.0-1of4.png, dump-standalone-2.0-2of4.png, > dump-standalone-2.0-3of4.png, dump-standalone-2.0-4of4.png > > > h2. TL;DR > Running a small test locally, I found JobProgressListener consuming a huge > amount of memory. There are many tasks being run, but it is still > surprising. Summary, with details below: > * Spark app: series of DataFrame joins > * Issue: GC > * Heap dump shows JobProgressListener taking 150 - 400MB, depending on the > Spark mode/version > h2. Reproducing this issue > h3. With more complex code > The code which fails: > * Here is a branch with the code snippet which fails: > [https://github.com/jkbradley/spark/tree/18836174ab190d94800cc247f5519f3148822dce] > ** This is based on Spark commit hash: > bb1362eb3b36b553dca246b95f59ba7fd8adcc8a > * Look at {{CC.scala}}, which implements connected components using > DataFrames: > [https://github.com/jkbradley/spark/blob/18836174ab190d94800cc247f5519f3148822dce/mllib/src/main/scala/org/apache/spark/ml/CC.scala] > In the spark shell, run: > {code} > import org.apache.spark.ml.CC > import org.apache.spark.sql.SQLContext > val sqlContext = SQLContext.getOrCreate(sc) > CC.runTest(sqlContext) > {code} > I have attached a file {{cc_traces.txt}} with the stack traces from running > {{runTest}}. Note that I sometimes had to run {{runTest}} twice to cause the > fatal exception. This includes a trace for 1.6, which should run without > modifications to {{CC.scala}}. These traces are from running in local mode. > I used {{jmap}} to dump the heap: > * local mode with 2.0: JobProgressListener took about 397 MB > * standalone mode with 2.0: JobProgressListener took about 171 MB (See > attached screenshots from MemoryAnalyzer) > Both 1.6 and 2.0 exhibit this issue. 2.0 ran faster, and the issue > (JobProgressListener allocation) seems more severe with 2.0, though it could > just be that 2.0 makes more progress and runs more jobs. > h3. With simpler code > I ran this with master (~Spark 2.0): > {code} > val data = spark.range(0, 1, 1, 1) > data.cache().count() > {code} > The resulting heap dump: > * 78MB for {{scala.tools.nsc.interpreter.ILoop$ILoopInterpreter}} > * 58MB for {{org.apache.spark.ui.jobs.JobProgressListener}} > * 80MB for {{io.netty.buffer.PoolChunk}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15387) SessionCatalog in SimpleAnalyzer does not need to make database directory.
[ https://issues.apache.org/jira/browse/SPARK-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15387. --- Resolution: Fixed Assignee: Kousuke Saruta Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > SessionCatalog in SimpleAnalyzer does not need to make database directory. > -- > > Key: SPARK-15387 > URL: https://issues.apache.org/jira/browse/SPARK-15387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 2.0.0 > > > After SPARK-15093 is fixed, we are forced to make /user/hive/warehouse when > SimpleAnalyzer is used but SimpleAnalyzer may not need the directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15300) Can't remove a block if it's under evicting
[ https://issues.apache.org/jira/browse/SPARK-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15300. --- Resolution: Fixed Fix Version/s: 2.0.0 > Can't remove a block if it's under evicting > --- > > Key: SPARK-15300 > URL: https://issues.apache.org/jira/browse/SPARK-15300 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > {code} > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned shuffle 94 > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433121 > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433122 > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433123 > 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_629_piece0 on > 10.0.164.43:39651 in memory (size: 23.4 KB, free: 15.8 GB) > 16/04/15 12:17:05 ERROR BlockManagerSlaveEndpoint: Error in removing block > broadcast_631_piece0 > java.lang.IllegalStateException: Task -1024 has already locked > broadcast_631_piece0 for writing > at > org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1286) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:47) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveEndpoint.scala:46) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(BlockManagerSlaveEndpoint.scala:46) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:82) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_626_piece0 on > 10.0.164.43:39651 in memory (size: 23.4 KB, free: 15.8 GB) > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433124 > 16/04/15 12:17:05 INFO BlockManagerInfo: Removed broadcast_627_piece0 on > 10.0.164.43:39651 in memory (size: 23.3 KB, free: 15.8 GB) > 16/04/15 12:17:05 INFO ContextCleaner: Cleaned accumulator 1433125 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15357) Cooperative spilling should check consumer memory mode
[ https://issues.apache.org/jira/browse/SPARK-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15357: -- Description: In TaskMemoryManager.java: {code} for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0) { try { long released = c.spill(required - got, consumer); if (released > 0 && mode == tungstenMemoryMode) { got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); if (got >= required) { break; } } } catch(...) { ... } } } } {code} Currently, when non-tungsten consumers acquire execution memory, they may force other tungsten consumers to spill and then NOT use the freed memory. A better way to do this is to incorporate the memory mode in the consumer itself and spill only those with matching memory modes. was: In TaskMemoryManager.java: {code} for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0) { try { long released = c.spill(required - got, consumer); if (released > 0 && mode == tungstenMemoryMode) { logger.debug("Task {} released {} from {} for {}", taskAttemptId, Utils.bytesToString(released), c, consumer); got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); if (got >= required) { break; } } } catch (IOException e) { ... } } } {code} Currently, when non-tungsten consumers acquire execution memory, they may force other tungsten consumers to spill and then NOT use the freed memory. A better way to do this is to incorporate the memory mode in the consumer itself and spill only those with matching memory modes. > Cooperative spilling should check consumer memory mode > -- > > Key: SPARK-15357 > URL: https://issues.apache.org/jira/browse/SPARK-15357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Andrew Or > > In TaskMemoryManager.java: > {code} > for (MemoryConsumer c: consumers) { > if (c != consumer && c.getUsed() > 0) { > try { > long released = c.spill(required - got, consumer); > if (released > 0 && mode == tungstenMemoryMode) { > got += memoryManager.acquireExecutionMemory(required - got, > taskAttemptId, mode); > if (got >= required) { > break; > } > } > } catch(...) { ... } > } > } > } > {code} > Currently, when non-tungsten consumers acquire execution memory, they may > force other tungsten consumers to spill and then NOT use the freed memory. A > better way to do this is to incorporate the memory mode in the consumer > itself and spill only those with matching memory modes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15357) Cooperative spilling should check consumer memory mode
Andrew Or created SPARK-15357: - Summary: Cooperative spilling should check consumer memory mode Key: SPARK-15357 URL: https://issues.apache.org/jira/browse/SPARK-15357 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Andrew Or In TaskMemoryManager.java: {code} for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0) { try { long released = c.spill(required - got, consumer); if (released > 0 && mode == tungstenMemoryMode) { logger.debug("Task {} released {} from {} for {}", taskAttemptId, Utils.bytesToString(released), c, consumer); got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); if (got >= required) { break; } } } catch (IOException e) { ... } } } {code} Currently, when non-tungsten consumers acquire execution memory, they may force other tungsten consumers to spill and then NOT use the freed memory. A better way to do this is to incorporate the memory mode in the consumer itself and spill only those with matching memory modes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14684) Verification of partition specs in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14684. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Verification of partition specs in SessionCatalog > - > > Key: SPARK-14684 > URL: https://issues.apache.org/jira/browse/SPARK-14684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > When attempting to drop partitions of a table, if the user provides an > unknown column, Hive will drop all the partitions of the table, which is > likely not intended. E.g. > {code} > ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12') > {code} > We should verify that the columns provided in the specs are actually > partitioned columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15277) Checking Partition Spec Existence Before Dropping
[ https://issues.apache.org/jira/browse/SPARK-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15277. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Checking Partition Spec Existence Before Dropping > - > > Key: SPARK-15277 > URL: https://issues.apache.org/jira/browse/SPARK-15277 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > Now, we start dropping the partition before completing checking the existence > of all the partition specs. If one partition spec does not exist, we just > stop processing the command. Some partitions might not be dropped but some > partitions have been dropped. We should check the existence at first before > dropping any partition. > If any failure happened after we start to drop the partitions, we should log > an error message to indicate which partitions have been dropped and which > partitions have not been dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14684: -- Description: When attempting to drop partitions of a table, if the user provides an unknown column, Hive will drop all the partitions of the table, which is likely not intended. E.g. {code} ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12') {code} We should verify that the columns provided in the specs are actually partitioned columns. was:When users inputting invalid partition spec, we might not be able to catch and issue the error messages. Sometimes, it could cause a disaster result. For example, previously, when we alter a table and drop a partition with invalid spec, it could drop all the partitions due to a bug/defect in Hive Metastore API. > Verification of partition specs in SessionCatalog > - > > Key: SPARK-14684 > URL: https://issues.apache.org/jira/browse/SPARK-14684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > > When attempting to drop partitions of a table, if the user provides an > unknown column, Hive will drop all the partitions of the table, which is > likely not intended. E.g. > {code} > ALTER TABLE my_tab DROP PARTITION (ds='2008-04-09', unknownCol='12') > {code} > We should verify that the columns provided in the specs are actually > partitioned columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15289) SQL test compilation error from merge conflict
[ https://issues.apache.org/jira/browse/SPARK-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281678#comment-15281678 ] Andrew Or commented on SPARK-15289: --- Done, thanks for the ping. > SQL test compilation error from merge conflict > -- > > Key: SPARK-15289 > URL: https://issues.apache.org/jira/browse/SPARK-15289 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 2.0.0 >Reporter: Piotr Milanowski >Assignee: Andrew Or >Priority: Blocker > Fix For: 2.0.0 > > > Spark build fails during SQL build. Concerns commit > 6b69b8c0c778f4cba2b281fe3ad225dc922f82d6, but also earlier ones; build works > e.g. for commit c6d23b6604e85bcddbd1fb6a2c1c3edbfd2be2c1. > Run with command: > ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Dhadoop.version=2.6.0 -DskipTests > Result: > {code} > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:282: > not found: value sparkSession > [error] val dbString = CatalogImpl.makeDataset(Seq(db), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:283: > not found: value sparkSession > [error] val tableString = CatalogImpl.makeDataset(Seq(table), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:284: > not found: value sparkSession > [error] val functionString = CatalogImpl.makeDataset(Seq(function), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:285: > not found: value sparkSession > [error] val columnString = CatalogImpl.makeDataset(Seq(column), > sparkSession).showString(10) > [error] ^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15289) SQL test compilation error from merge conflict
[ https://issues.apache.org/jira/browse/SPARK-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15289. --- Resolution: Fixed Fix Version/s: 2.0.0 > SQL test compilation error from merge conflict > -- > > Key: SPARK-15289 > URL: https://issues.apache.org/jira/browse/SPARK-15289 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 2.0.0 >Reporter: Piotr Milanowski >Assignee: Andrew Or >Priority: Blocker > Fix For: 2.0.0 > > > Spark build fails during SQL build. Concerns commit > 6b69b8c0c778f4cba2b281fe3ad225dc922f82d6, but also earlier ones; build works > e.g. for commit c6d23b6604e85bcddbd1fb6a2c1c3edbfd2be2c1. > Run with command: > ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Dhadoop.version=2.6.0 -DskipTests > Result: > {code} > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:282: > not found: value sparkSession > [error] val dbString = CatalogImpl.makeDataset(Seq(db), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:283: > not found: value sparkSession > [error] val tableString = CatalogImpl.makeDataset(Seq(table), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:284: > not found: value sparkSession > [error] val functionString = CatalogImpl.makeDataset(Seq(function), > sparkSession).showString(10) > [error] ^ > [error] > /home/bpol0421/various/spark/sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:285: > not found: value sparkSession > [error] val columnString = CatalogImpl.makeDataset(Seq(column), > sparkSession).showString(10) > [error] ^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15264) Spark 2.0 CSV Reader: NPE on Blank Column Names
[ https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15264. --- Resolution: Fixed Fix Version/s: 2.0.0 > Spark 2.0 CSV Reader: NPE on Blank Column Names > --- > > Key: SPARK-15264 > URL: https://issues.apache.org/jira/browse/SPARK-15264 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Bill Chambers >Assignee: Bill Chambers > Fix For: 2.0.0 > > > When you read in a csv file that starts with blank column names the read > fails when you specify that you want a header. > Pull request coming shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15274) CSV default column names should be consistent
[ https://issues.apache.org/jira/browse/SPARK-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15274. --- Resolution: Fixed Fix Version/s: 2.0.0 > CSV default column names should be consistent > - > > Key: SPARK-15274 > URL: https://issues.apache.org/jira/browse/SPARK-15274 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Bill Chambers > Fix For: 2.0.0 > > > If a column name is not provided, Spark SQL usually uses the convention > "_c0", "_c1" etc., but when reading in CSV files without headers, we use "C0" > and "C1". This is inconsistent and we should fix it by Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15276) CREATE TABLE with LOCATION should imply EXTERNAL
[ https://issues.apache.org/jira/browse/SPARK-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15276. --- Resolution: Fixed Fix Version/s: 2.0.0 > CREATE TABLE with LOCATION should imply EXTERNAL > > > Key: SPARK-15276 > URL: https://issues.apache.org/jira/browse/SPARK-15276 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > If the user runs `CREATE TABLE some_table ... LOCATION /some/path`, then this > will still be a managed table even though the table's data is stored at > /some/path. The problem is that when we drop the table we'll also delete the > data /some/path. This could cause problems if /some/path contains existing > data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15279) Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)
Andrew Or created SPARK-15279: - Summary: Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.) Key: SPARK-15279 URL: https://issues.apache.org/jira/browse/SPARK-15279 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or They are both potentially conflicting ways that allow you to specify the SerDe. Unfortunately, we can't just get rid of ROW FORMAT because it may be used with TEXTFILE or RCFILE. For other file formats, we should fail fast wherever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns
[ https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15275: -- Summary: CatalogTable should store sort ordering for sorted columns (was: [SQL] CatalogTable should store sort ordering for sorted columns) > CatalogTable should store sort ordering for sorted columns > -- > > Key: SPARK-15275 > URL: https://issues.apache.org/jira/browse/SPARK-15275 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Tejas Patil >Priority: Trivial > > For bucketed tables in Hive, one can also add constraint about column > sortedness along with ordering. > As per the spec in [0], CREATE TABLE statement can allow SORT ordering as > well: > [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], > ...)] INTO num_buckets BUCKETS] > See [1] for example. > [0] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable > [1] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables > Currently CatalogTable does not store any information about the sort ordering > and just has the names of the sorted columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns
[ https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15275: -- Priority: Major (was: Trivial) > CatalogTable should store sort ordering for sorted columns > -- > > Key: SPARK-15275 > URL: https://issues.apache.org/jira/browse/SPARK-15275 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Tejas Patil >Assignee: Tejas Patil > > For bucketed tables in Hive, one can also add constraint about column > sortedness along with ordering. > As per the spec in [0], CREATE TABLE statement can allow SORT ordering as > well: > [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], > ...)] INTO num_buckets BUCKETS] > See [1] for example. > [0] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable > [1] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables > Currently CatalogTable does not store any information about the sort ordering > and just has the names of the sorted columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns
[ https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15275: -- Assignee: Tejas Patil > CatalogTable should store sort ordering for sorted columns > -- > > Key: SPARK-15275 > URL: https://issues.apache.org/jira/browse/SPARK-15275 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Tejas Patil >Assignee: Tejas Patil >Priority: Trivial > > For bucketed tables in Hive, one can also add constraint about column > sortedness along with ordering. > As per the spec in [0], CREATE TABLE statement can allow SORT ordering as > well: > [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], > ...)] INTO num_buckets BUCKETS] > See [1] for example. > [0] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable > [1] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables > Currently CatalogTable does not store any information about the sort ordering > and just has the names of the sorted columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15275) CatalogTable should store sort ordering for sorted columns
[ https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15275: -- Affects Version/s: (was: 1.6.1) 2.0.0 > CatalogTable should store sort ordering for sorted columns > -- > > Key: SPARK-15275 > URL: https://issues.apache.org/jira/browse/SPARK-15275 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Tejas Patil >Assignee: Tejas Patil > > For bucketed tables in Hive, one can also add constraint about column > sortedness along with ordering. > As per the spec in [0], CREATE TABLE statement can allow SORT ordering as > well: > [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], > ...)] INTO num_buckets BUCKETS] > See [1] for example. > [0] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable > [1] : > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables > Currently CatalogTable does not store any information about the sort ordering > and just has the names of the sorted columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15276) CREATE TABLE with LOCATION should imply EXTERNAL
Andrew Or created SPARK-15276: - Summary: CREATE TABLE with LOCATION should imply EXTERNAL Key: SPARK-15276 URL: https://issues.apache.org/jira/browse/SPARK-15276 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or If the user runs `CREATE TABLE some_table ... LOCATION /some/path`, then this will still be a managed table even though the table's data is stored at /some/path. The problem is that when we drop the table we'll also delete the data /some/path. This could cause problems if /some/path contains existing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names
[ https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15264: -- Assignee: Bill Chambers > Spark 2.0 CSV Reader: Error on Blank Column Names > - > > Key: SPARK-15264 > URL: https://issues.apache.org/jira/browse/SPARK-15264 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Bill Chambers >Assignee: Bill Chambers > > When you read in a csv file that starts with blank column names the read > fails when you specify that you want a header. > Pull request coming shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: NPE on Blank Column Names
[ https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15264: -- Summary: Spark 2.0 CSV Reader: NPE on Blank Column Names (was: Spark 2.0 CSV Reader: Error on Blank Column Names) > Spark 2.0 CSV Reader: NPE on Blank Column Names > --- > > Key: SPARK-15264 > URL: https://issues.apache.org/jira/browse/SPARK-15264 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Bill Chambers >Assignee: Bill Chambers > > When you read in a csv file that starts with blank column names the read > fails when you specify that you want a header. > Pull request coming shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names
[ https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15264: -- Target Version/s: 2.0.0 > Spark 2.0 CSV Reader: Error on Blank Column Names > - > > Key: SPARK-15264 > URL: https://issues.apache.org/jira/browse/SPARK-15264 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Bill Chambers >Assignee: Bill Chambers > > When you read in a csv file that starts with blank column names the read > fails when you specify that you want a header. > Pull request coming shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15274) CSV default column names should be consistent
[ https://issues.apache.org/jira/browse/SPARK-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15274: -- Assignee: Bill Chambers > CSV default column names should be consistent > - > > Key: SPARK-15274 > URL: https://issues.apache.org/jira/browse/SPARK-15274 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Bill Chambers > > If a column name is not provided, Spark SQL usually uses the convention > "_c0", "_c1" etc., but when reading in CSV files without headers, we use "C0" > and "C1". This is inconsistent and we should fix it by Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15274) CSV default column names should be consistent
Andrew Or created SPARK-15274: - Summary: CSV default column names should be consistent Key: SPARK-15274 URL: https://issues.apache.org/jira/browse/SPARK-15274 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or If a column name is not provided, Spark SQL usually uses the convention "_c0", "_c1" etc., but when reading in CSV files without headers, we use "C0" and "C1". This is inconsistent and we should fix it by Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13566) Deadlock between MemoryStore and BlockManager
[ https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280615#comment-15280615 ] Andrew Or commented on SPARK-13566: --- [~ekeddy] This only happens with the unified memory manager, so you could switch back to the static memory manager by setting `spark.memory.useLegacyMode` to true. You may observe a decrease in performance if you do that, however. > Deadlock between MemoryStore and BlockManager > - > > Key: SPARK-13566 > URL: https://issues.apache.org/jira/browse/SPARK-13566 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.0 > Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2 >Reporter: cen yuhai >Assignee: cen yuhai > Fix For: 1.6.2 > > > === > "block-manager-slave-async-thread-pool-1": > at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216) > - waiting to lock <0x0005895b09b0> (a > org.apache.spark.memory.UnifiedMemoryManager) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114) > - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:94) > at > org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65) > at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > "Executor task launch worker-10": > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032) > - waiting to lock <0x00059a0988b8> (a > org.apache.spark.storage.BlockInfo) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460) > at > org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15262) race condition in killing an executor and reregistering an executor
[ https://issues.apache.org/jira/browse/SPARK-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15262: -- Target Version/s: 1.6.2, 2.0.0 > race condition in killing an executor and reregistering an executor > --- > > Key: SPARK-15262 > URL: https://issues.apache.org/jira/browse/SPARK-15262 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Shixiong Zhu > > There is a race condition when killing an executor and reregistering an > executor happen at the same time. Here is the execution steps to reproduce it. > 1. master find a worker is dead > 2. master tells driver to remove executor > 3. driver remove executor > 4. BlockManagerMasterEndpoint remove the block manager > 5. executor finds it's not reigstered via heartbeat > 6. executor send reregister block manager > 7. register block manager > 8. executor is killed by worker > 9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is > not in the executor list > 10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers > As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block > managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle > block via a RPC endpoint of a dead block manager, we will get > ClosedChannelException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15249) Use FunctionResource instead of (String, String) in CreateFunction and CatalogFunction for resource
[ https://issues.apache.org/jira/browse/SPARK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15249. --- Resolution: Fixed Assignee: Sandeep Singh Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use FunctionResource instead of (String, String) in CreateFunction and > CatalogFunction for resource > --- > > Key: SPARK-15249 > URL: https://issues.apache.org/jira/browse/SPARK-15249 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Minor > Fix For: 2.0.0 > > > Use FunctionResource instead of (String, String) in CreateFunction and > CatalogFunction for resource > see: TODO's here > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L36 > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L42 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION
Andrew Or created SPARK-15257: - Summary: Require CREATE EXTERNAL TABLE to specify LOCATION Key: SPARK-15257 URL: https://issues.apache.org/jira/browse/SPARK-15257 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or Right now when the user runs `CREATE EXTERNAL TABLE` without specifying `LOCATION`, the table will still be created in the warehouse directory, but its metadata won't be deleted even when the user drops the table! This is a problem. We should use require the user to also specify `LOCATION`. Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is not yet supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14857) Table/Database Name Validation in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14857: -- Assignee: Xiao Li > Table/Database Name Validation in SessionCatalog > > > Key: SPARK-14857 > URL: https://issues.apache.org/jira/browse/SPARK-14857 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > > We need validate the database/table names before storing these information in > `ExternalCatalog`. > For example, if users use `backstick` to quote the table/database names > containing illegal characters, these names are allowed by Spark Parser, but > Hive metastore does not allow them. We need to catch them in SessionCatalog > and issue an appropriate error message. > ``` > CREATE TABLE `tab:1` ... > ``` > This PR enforces the name rules of Spark SQL for `table`/`database`/`view`: > `only can contain alphanumeric and underscore characters.` Different from > Hive, we allow the names with starting underscore characters. > The validation of function/column names will be done in a separate JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14603) SessionCatalog needs to check if a metadata operation is valid
[ https://issues.apache.org/jira/browse/SPARK-14603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14603. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 > SessionCatalog needs to check if a metadata operation is valid > -- > > Key: SPARK-14603 > URL: https://issues.apache.org/jira/browse/SPARK-14603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li >Priority: Critical > Fix For: 2.0.0 > > > Since we cannot really trust if the underlying external catalog can throw > exceptions when there is an invalid metadata operation, let's do it in > SessionCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14684) Verification of partition specs in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14684: -- Assignee: Xiao Li > Verification of partition specs in SessionCatalog > - > > Key: SPARK-14684 > URL: https://issues.apache.org/jira/browse/SPARK-14684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > > When users inputting invalid partition spec, we might not be able to catch > and issue the error messages. Sometimes, it could cause a disaster result. > For example, previously, when we alter a table and drop a partition with > invalid spec, it could drop all the partitions due to a bug/defect in Hive > Metastore API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Component/s: SQL > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15037: -- Component/s: Tests > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15037. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Reporter: Dongjoon Hyun >Assignee: Sandeep Singh > Fix For: 2.0.0 > > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL
[ https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15236: -- Component/s: Spark Shell > No way to disable Hive support in REPL > -- > > Key: SPARK-15236 > URL: https://issues.apache.org/jira/browse/SPARK-15236 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > If you built Spark with Hive classes, there's no switch to flip to start a > new `spark-shell` using the InMemoryCatalog. The only thing you can do now is > to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15236) No way to disable Hive support in REPL
[ https://issues.apache.org/jira/browse/SPARK-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15236: -- Assignee: (was: Andrew Or) > No way to disable Hive support in REPL > -- > > Key: SPARK-15236 > URL: https://issues.apache.org/jira/browse/SPARK-15236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > If you built Spark with Hive classes, there's no switch to flip to start a > new `spark-shell` using the InMemoryCatalog. The only thing you can do now is > to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15236) No way to disable Hive support in REPL
Andrew Or created SPARK-15236: - Summary: No way to disable Hive support in REPL Key: SPARK-15236 URL: https://issues.apache.org/jira/browse/SPARK-15236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or If you built Spark with Hive classes, there's no switch to flip to start a new `spark-shell` using the InMemoryCatalog. The only thing you can do now is to rebuild Spark again. That is quite inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
[ https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-15234: - Assignee: Andrew Or > spark.catalog.listDatabases.show() is not formatted correctly > - > > Key: SPARK-15234 > URL: https://issues.apache.org/jira/browse/SPARK-15234 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > {code} > scala> spark.catalog.listDatabases.show() > ++---+---+ > |name|description|locationUri| > ++---+---+ > |Database[name='de...| > |Database[name='my...| > |Database[name='so...| > ++---+---+ > {code} > It's because org.apache.spark.sql.catalog.Database is not a case class! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
Andrew Or created SPARK-15234: - Summary: spark.catalog.listDatabases.show() is not formatted correctly Key: SPARK-15234 URL: https://issues.apache.org/jira/browse/SPARK-15234 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15234) spark.catalog.listDatabases.show() is not formatted correctly
[ https://issues.apache.org/jira/browse/SPARK-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15234: -- Description: {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} It's because org.apache.spark.sql.catalog.Database is not a case class! was: {code} scala> spark.catalog.listDatabases.show() ++---+---+ |name|description|locationUri| ++---+---+ |Database[name='de...| |Database[name='my...| |Database[name='so...| ++---+---+ {code} > spark.catalog.listDatabases.show() is not formatted correctly > - > > Key: SPARK-15234 > URL: https://issues.apache.org/jira/browse/SPARK-15234 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > {code} > scala> spark.catalog.listDatabases.show() > ++---+---+ > |name|description|locationUri| > ++---+---+ > |Database[name='de...| > |Database[name='my...| > |Database[name='so...| > ++---+---+ > {code} > It's because org.apache.spark.sql.catalog.Database is not a case class! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv
[ https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276820#comment-15276820 ] Andrew Or commented on SPARK-14021: --- Closing as Won't Fix because the issue is outdated after HiveContext was removed. > Support custom context derived from HiveContext for SparkSQLEnv > --- > > Key: SPARK-14021 > URL: https://issues.apache.org/jira/browse/SPARK-14021 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Adrian Wang > > This is to create a custom context for command bin/spark-sql and > sbin/start-thriftserver. Any context that is derived from HiveContext is > acceptable. User need to configure the class name of custom context in a > config of spark.sql.context.class, and make sure the class in classpath. This > is to provide a more elegant way for custom configurations and changes for > infrastructure team. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv
[ https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14021. --- Resolution: Won't Fix > Support custom context derived from HiveContext for SparkSQLEnv > --- > > Key: SPARK-14021 > URL: https://issues.apache.org/jira/browse/SPARK-14021 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Adrian Wang > > This is to create a custom context for command bin/spark-sql and > sbin/start-thriftserver. Any context that is derived from HiveContext is > acceptable. User need to configure the class name of custom context in a > config of spark.sql.context.class, and make sure the class in classpath. This > is to provide a more elegant way for custom configurations and changes for > infrastructure team. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10653) Remove unnecessary things from SparkEnv
[ https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10653. --- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Remove unnecessary things from SparkEnv > --- > > Key: SPARK-10653 > URL: https://issues.apache.org/jira/browse/SPARK-10653 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Alex Bozarth > Fix For: 2.0.0 > > > As of the writing of this message, there are at least two things that can be > removed from it: > {code} > @DeveloperApi > class SparkEnv ( > val executorId: String, > private[spark] val rpcEnv: RpcEnv, > val serializer: Serializer, > val closureSerializer: Serializer, > val cacheManager: CacheManager, > val mapOutputTracker: MapOutputTracker, > val shuffleManager: ShuffleManager, > val broadcastManager: BroadcastManager, > val blockTransferService: BlockTransferService, // this one can go > val blockManager: BlockManager, > val securityManager: SecurityManager, > val httpFileServer: HttpFileServer, > val sparkFilesDir: String, // this one maybe? It's only used in 1 place. > val metricsSystem: MetricsSystem, > val shuffleMemoryManager: ShuffleMemoryManager, > val executorMemoryManager: ExecutorMemoryManager, // this can go > val outputCommitCoordinator: OutputCommitCoordinator, > val conf: SparkConf) extends Logging { > ... > } > {code} > We should avoid adding to this infinite list of things in SparkEnv's > constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types
[ https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15210. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add missing @DeveloperApi annotation in sql.types > - > > Key: SPARK-15210 > URL: https://issues.apache.org/jira/browse/SPARK-15210 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > Fix For: 2.0.0 > > > @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} > {{UserDefinedType}} are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15166) Move hive-specific conf setting from SparkSession
[ https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15166. --- Resolution: Fixed Fix Version/s: 2.0.0 > Move hive-specific conf setting from SparkSession > - > > Key: SPARK-15166 > URL: https://issues.apache.org/jira/browse/SPARK-15166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types
[ https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15210: -- Assignee: zhengruifeng > Add missing @DeveloperApi annotation in sql.types > - > > Key: SPARK-15210 > URL: https://issues.apache.org/jira/browse/SPARK-15210 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Trivial > Fix For: 2.0.0 > > > @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} > {{UserDefinedType}} are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15220) Add hyperlink to "running application" and "completed application"
[ https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15220. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add hyperlink to "running application" and "completed application" > -- > > Key: SPARK-15220 > URL: https://issues.apache.org/jira/browse/SPARK-15220 > Project: Spark > Issue Type: Improvement > Components: Web UI >Reporter: Mao, Wei >Priority: Minor > Fix For: 2.0.0 > > > Add hyperlink to "running application" and "completed application", so user > can jump to application table directly, In my environment, I set up 1000+ > works and it's painful to scroll down to skip worker list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15067) YARN executors are launched with fixed perm gen size
[ https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15067. --- Resolution: Fixed Assignee: Sean Owen Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > YARN executors are launched with fixed perm gen size > > > Key: SPARK-15067 > URL: https://issues.apache.org/jira/browse/SPARK-15067 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Renato Falchi Brandão >Assignee: Sean Owen >Priority: Minor > Fix For: 2.0.0 > > > It is impossible to change the executors max perm gen size using the property > "spark.executor.extraJavaOptions" when you are running on YARN. > When the JVM option "-XX:MaxPermSize" is set through the property > "spark.executor.extraJavaOptions", Spark put it properly in the shell command > that will start the JVM container but, in the ending of command, it sets > again this option using a fixed value of 256m, as you can see in the log I've > extracted: > 2016-04-30 17:20:12 INFO ExecutorRunnable:58 - > === > YARN executor launch context: > env: > CLASSPATH -> > {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure > SPARK_LOG_URL_STDERR -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096 > SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993 > SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166 > SPARK_USER -> h_loadbd > SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC > SPARK_YARN_MODE -> true > SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343 > SPARK_LOG_URL_STDOUT -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096 > SPARK_YARN_CACHE_FILES -> > hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml > command: > {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m > -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' > '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp > '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' > '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' > -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname > x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 > --user-class-path file:$PWD/__app__.jar 1> /stdout 2> > /stderr > Analyzing the code is possible to see that all the options set in the > property "spark.executor.extraJavaOptions" are enclosed, one by one, in > single quotes (ExecutorRunnable.scala:151) before the launcher take the > decision if a default value has to be provided or not for the option > "-XX:MaxPermSize" (ExecutorRunnable.scala:202). > This decision is taken examining all the options set and looking for a string > starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If > that value is not found, the default value is set. > A string option starting without single quote will never be found, then, a > default value will always be provided. > A possible solution is change the source code of CommandBuilderUtils.java in > the line 328: > From-> if (arg.startsWith("-XX:MaxPermSize=")) > To-> if (arg.indexOf("-XX:MaxPermSize=") > -1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org