[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270155#comment-15270155 ] Sagar commented on SPARK-15072: --- [~techaddict] Yes it fails as assembly/assembly removed, test is ignored right now, means they are not considering it or what? > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util
[ https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15107. - Resolution: Fixed Fix Version/s: 2.0.0 > Allow running test cases with different iterations in micro-benchmark util > -- > > Key: SPARK-15107 > URL: https://issues.apache.org/jira/browse/SPARK-15107 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13946) PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions
[ https://issues.apache.org/jira/browse/SPARK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270151#comment-15270151 ] Niranjan Molkeri` commented on SPARK-13946: --- Hi, I ran the following code. {noformat} import numpy as np import pandas as pd from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext(appName="fooAPP") sqlContext = SQLContext(sc) df = pd.DataFrame({'foo': np.random.randn(100),'bar': np.random.randn(100)}) sdf = sqlContext.createDataFrame(df) sdf2 = sdf[sdf.bar > 0] #sdf.agg(F.count(sdf2.foo)).show() sdfCount = sdf.count() sdf2Count = sdf2.count() {noformat} sdf.count() returns 100 sdf2.count() returns avg around 50 can you tell me what is "F" in {noformat} sdf.agg(F.count(sdf2.foo)).show() {noformat} So that I can further test have a look into the issue. Thank you. > PySpark DataFrames allows you to silently use aggregate expressions derived > from different table expressions > > > Key: SPARK-13946 > URL: https://issues.apache.org/jira/browse/SPARK-13946 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Wes McKinney > > In my opinion, this code should raise an exception rather than silently > discarding the predicate: > {code} > import numpy as np > import pandas as pd > df = pd.DataFrame({'foo': np.random.randn(100), >'bar': np.random.randn(100)}) > sdf = sqlContext.createDataFrame(df) > sdf2 = sdf[sdf.bar > 0] > sdf.agg(F.count(sdf2.foo)).show() > +--+ > |count(foo)| > +--+ > | 100| > +--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270123#comment-15270123 ] Felix Cheung commented on SPARK-14817: -- perhaps this SPARK-12071 should be included? > ML, Graph, R 2.0 QA: Programming guide update and migration guide > - > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib, GraphX, and SparkR > Programming Guides. Updates will include: > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs and [SPARK-13448]. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, > to make it clear the RDD-based API is the older, maintenance-mode one. > * No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > * If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. This per-feature work can happen under [SPARK-14815]. > * This big reorganization should be done *after* docs are added for each > feature (to minimize merge conflicts). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270121#comment-15270121 ] Niranjan Molkeri` commented on SPARK-14385: --- Hi, I would like to take a look at the problem. Can give me further details on how to proceed with the bug. Thank you. > Use FunctionIdentifier in FunctionRegistry/SessionCatalog > - > > Key: SPARK-14385 > URL: https://issues.apache.org/jira/browse/SPARK-14385 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > Right now it's confusing what's a qualified name or not. There's little > type-safety in this corner of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270122#comment-15270122 ] Sandeep Singh commented on SPARK-15072: --- [~snanda] the first build/sbt will fail coz assembly/assembly was removed. secondly we don't need to fix this in this PR since the test is ignored right now. > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14539) Fetching delegation tokens in Hive-Thriftserver fails when hive.server2.enable.doAs = True
[ https://issues.apache.org/jira/browse/SPARK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270113#comment-15270113 ] Niranjan Molkeri` commented on SPARK-14539: --- Hi, Can i know which hive version are using? > Fetching delegation tokens in Hive-Thriftserver fails when > hive.server2.enable.doAs = True > -- > > Key: SPARK-14539 > URL: https://issues.apache.org/jira/browse/SPARK-14539 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0, 1.6.1 >Reporter: Trystan Leftwich > > Similar to https://issues.apache.org/jira/browse/SPARK-13478 > When you are running Hive Thriftserver and have hive.server2.enable.doAs = > True you will get > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270107#comment-15270107 ] holdenk commented on SPARK-14813: - While starting to do this audit, a number of params are missing but I'm assuming this is expected until Spark 2.1 (see https://issues.apache.org/jira/browse/SPARK-10931 ) > ML 2.0 QA: API: Python API coverage > --- > > Key: SPARK-14813 > URL: https://issues.apache.org/jira/browse/SPARK-14813 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, PySpark >Reporter: Joseph K. Bradley > > For new public APIs added to MLlib, we need to check the generated HTML doc > and compare the Scala & Python versions. We need to track: > * Inconsistency: Do class/method/parameter names match? > * Docs: Is the Python doc missing or just a stub? We want the Python doc to > be as complete as the Scala doc. > * API breaking changes: These should be very rare but are occasionally either > necessary (intentional) or accidental. These must be recorded and added in > the Migration Guide for this release. > ** Note: If the API change is for an Alpha/Experimental/DeveloperApi > component, please note that as well. > * Missing classes/methods/parameters: We should create to-do JIRAs for > functionality missing from Python, to be added in the next release cycle. > Please use a *separate* JIRA (linked below) for this list of to-do items. > UPDATE: This only needs to cover spark.ml since spark.mllib is going into > maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values
[ https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270105#comment-15270105 ] holdenk commented on SPARK-10931: - So, just to be certain, for https://issues.apache.org/jira/browse/SPARK-14813 we won't try and resolve the params not being present in the models? > PySpark ML Models should contain Param values > - > > Key: SPARK-10931 > URL: https://issues.apache.org/jira/browse/SPARK-10931 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Joseph K. Bradley > > PySpark spark.ml Models are generally wrappers around Java objects and do not > even contain Param values. This JIRA is for copying the Param values from > the Estimator to the model. > This can likely be solved by modifying Estimator.fit to copy Param values, > but should also include proper unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame
[ https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270104#comment-15270104 ] Apache Spark commented on SPARK-15110: -- User 'NarineK' has created a pull request for this issue: https://github.com/apache/spark/pull/12887 > SparkR - Implement repartitionByColumn on DataFrame > --- > > Key: SPARK-15110 > URL: https://issues.apache.org/jira/browse/SPARK-15110 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Narine Kokhlikyan > > Implement repartitionByColumn on DataFrame. > This will allow us to run R functions on each partition identified by column > groups with dapply() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame
[ https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15110: Assignee: Apache Spark > SparkR - Implement repartitionByColumn on DataFrame > --- > > Key: SPARK-15110 > URL: https://issues.apache.org/jira/browse/SPARK-15110 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Narine Kokhlikyan >Assignee: Apache Spark > > Implement repartitionByColumn on DataFrame. > This will allow us to run R functions on each partition identified by column > groups with dapply() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame
[ https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15110: Assignee: (was: Apache Spark) > SparkR - Implement repartitionByColumn on DataFrame > --- > > Key: SPARK-15110 > URL: https://issues.apache.org/jira/browse/SPARK-15110 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Narine Kokhlikyan > > Implement repartitionByColumn on DataFrame. > This will allow us to run R functions on each partition identified by column > groups with dapply() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame
[ https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narine Kokhlikyan updated SPARK-15110: -- Description: Implement repartitionByColumn on DataFrame. This will allow us to run R functions on each partition identified by column groups with dapply() method. was: Implement repartitionByColumn on DataFrame. This will allow us to run R functions on each partition with dapply() method. > SparkR - Implement repartitionByColumn on DataFrame > --- > > Key: SPARK-15110 > URL: https://issues.apache.org/jira/browse/SPARK-15110 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Narine Kokhlikyan > > Implement repartitionByColumn on DataFrame. > This will allow us to run R functions on each partition identified by column > groups with dapply() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame
Narine Kokhlikyan created SPARK-15110: - Summary: SparkR - Implement repartitionByColumn on DataFrame Key: SPARK-15110 URL: https://issues.apache.org/jira/browse/SPARK-15110 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Narine Kokhlikyan Implement repartitionByColumn on DataFrame. This will allow us to run R functions on each partition with dapply() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11148) Unable to create views
[ https://issues.apache.org/jira/browse/SPARK-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270069#comment-15270069 ] Yin Huai commented on SPARK-11148: -- Hi [~lunendl], we have cut the 2.0 branch and we are in QA period right now. Based on our schedule (https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage), early June is a good estimation. > Unable to create views > -- > > Key: SPARK-11148 > URL: https://issues.apache.org/jira/browse/SPARK-11148 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 > Environment: Ubuntu 14.04 > Spark-1.5.1-bin-hadoop2.6 > (I don't have Hadoop or Hive installed) > Start spark-all.sh and thriftserver with mysql jar driver >Reporter: Lunen >Priority: Critical > Fix For: 2.0.0 > > > I am unable to create views within spark SQL. > Creating tables without specifying the column names work. eg. > CREATE TABLE trade2 > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:mysql://192.168.30.191:3318/?user=root", > dbtable "database.trade", > driver "com.mysql.jdbc.Driver" > ); > Ceating tables with datatypes gives an error: > CREATE TABLE trade2( > COL1 timestamp, > COL2 STRING, > COL3 STRING) > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:mysql://192.168.30.191:3318/?user=root", > dbtable "database.trade", > driver "com.mysql.jdbc.Driver" > ); > Error: org.apache.spark.sql.AnalysisException: > org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not allow > user-specified schemas.; SQLState: null ErrorCode: 0 > Trying to create a VIEW from the table that was created.(The select statement > below returns data) > CREATE VIEW viewtrade as Select Col1 from trade2; > Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: > SemanticException [Error 10004]: Line 1:30 Invalid table alias or column > reference 'Col1': (possible column names are: col) > SQLState: null > ErrorCode: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15109) Accept Dataset[_] in joins
[ https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15109: Assignee: Apache Spark (was: Reynold Xin) > Accept Dataset[_] in joins > -- > > Key: SPARK-15109 > URL: https://issues.apache.org/jira/browse/SPARK-15109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15109) Accept Dataset[_] in joins
[ https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270068#comment-15270068 ] Apache Spark commented on SPARK-15109: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/12886 > Accept Dataset[_] in joins > -- > > Key: SPARK-15109 > URL: https://issues.apache.org/jira/browse/SPARK-15109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15109) Accept Dataset[_] in joins
[ https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15109: Assignee: Reynold Xin (was: Apache Spark) > Accept Dataset[_] in joins > -- > > Key: SPARK-15109 > URL: https://issues.apache.org/jira/browse/SPARK-15109 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270067#comment-15270067 ] Andrew Or commented on SPARK-13269: --- Oops actually this was already done in SPARK-14069. Closing this as duplicate. > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or > Fix For: 2.0.0 > > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13269. --- Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.0.0 > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or >Assignee: Wenchen Fan > Fix For: 2.0.0 > > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15109) Accept Dataset[_] in joins
Reynold Xin created SPARK-15109: --- Summary: Accept Dataset[_] in joins Key: SPARK-15109 URL: https://issues.apache.org/jira/browse/SPARK-15109 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15108) Function is Not Found when Describe Permanent UDTF
[ https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270039#comment-15270039 ] Apache Spark commented on SPARK-15108: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/12885 > Function is Not Found when Describe Permanent UDTF > -- > > Key: SPARK-15108 > URL: https://issues.apache.org/jira/browse/SPARK-15108 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When Describe UDTF, it returns a wrong result. The command is unable to find > the function, which has been created and cataloged in the catalog but not in > the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15108) Function is Not Found when Describe Permanent UDTF
[ https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15108: Assignee: Apache Spark > Function is Not Found when Describe Permanent UDTF > -- > > Key: SPARK-15108 > URL: https://issues.apache.org/jira/browse/SPARK-15108 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark > > When Describe UDTF, it returns a wrong result. The command is unable to find > the function, which has been created and cataloged in the catalog but not in > the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15108) Function is Not Found when Describe Permanent UDTF
[ https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15108: Assignee: (was: Apache Spark) > Function is Not Found when Describe Permanent UDTF > -- > > Key: SPARK-15108 > URL: https://issues.apache.org/jira/browse/SPARK-15108 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When Describe UDTF, it returns a wrong result. The command is unable to find > the function, which has been created and cataloged in the catalog but not in > the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15108) Function is Not Found when Describe Permanent UDTF
[ https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-15108: Summary: Function is Not Found when Describe Permanent UDTF (was: Function is Not Found when Describe Permanent UDF) > Function is Not Found when Describe Permanent UDTF > -- > > Key: SPARK-15108 > URL: https://issues.apache.org/jira/browse/SPARK-15108 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When Describe UDF, it returns a wrong result. The command is unable to find > the function, which has been created and cataloged in the catalog but not in > the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15108) Function is Not Found when Describe Permanent UDTF
[ https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-15108: Description: When Describe UDTF, it returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry. (was: When Describe UDF, it returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry.) > Function is Not Found when Describe Permanent UDTF > -- > > Key: SPARK-15108 > URL: https://issues.apache.org/jira/browse/SPARK-15108 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When Describe UDTF, it returns a wrong result. The command is unable to find > the function, which has been created and cataloged in the catalog but not in > the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15108) Function is Not Found when Describe Permanent UDF
Xiao Li created SPARK-15108: --- Summary: Function is Not Found when Describe Permanent UDF Key: SPARK-15108 URL: https://issues.apache.org/jira/browse/SPARK-15108 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li When Describe UDF, it returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15089) kafka-spark consumer with SSL problem
[ https://issues.apache.org/jira/browse/SPARK-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270030#comment-15270030 ] JasonChang commented on SPARK-15089: Hi Sean yes, broker works with SSL I run on kafka comsumer, it work but kafka-spark consumer is not working {code} public void consume(String topic, BiConsumercallback) { Properties props = new Properties(); props.put("bootstrap.servers", kafkaHosts); props.put("key.deserializer", org.apache.kafka.common.serialization.StringDeserializer.class); props.put("value.deserializer", org.apache.kafka.common.serialization.StringDeserializer.class); props.put("group.id", group); props.put("security.protocol", "SSL"); props.put("ssl.truststore.location", "/opt/cert/client.truststore.jks"); props.put("ssl.truststore.password", "password"); props.put("ssl.keystore.location", "/opt/cert/keystore.jks"); props.put("ssl.keystore.password", "password"); props.put("ssl.key.password", "password"); try (KafkaConsumer consumer = new KafkaConsumer (props)) { consumer.subscribe(Arrays.asList(topic)); while (!stopped.get()) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { System.out.println("<<< " + record.key() + ", " + record.value()); callback.accept(record.key(), record.value()); } } System.out.println("Finishing subscription to topic " + topic); } } {code} > kafka-spark consumer with SSL problem > - > > Key: SPARK-15089 > URL: https://issues.apache.org/jira/browse/SPARK-15089 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.1 >Reporter: JasonChang > > I am not sure spark streaming support SSL > I tried to add params to kafkaParams, but it not work > {code} > JavaStreamingContext jsc = new JavaStreamingContext(sparkConf, new > Duration(1)); > Set topicmap = new HashSet(); > topicmap.add(kafkaTopic); > Map kafkaParams = new HashMap (); > kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, server_url); > kafkaParams.put("security.protocol", "SSL"); > kafkaParams.put("ssl.keystore.type", "JKS"); > kafkaParams.put("ssl.keystore.location", "/opt/cert/keystore.jks"); > kafkaParams.put("ssl.keystore.password ", "password"); > kafkaParams.put("ssl.key.password", "password"); > kafkaParams.put("ssl.truststore.type", "JKS"); > kafkaParams.put("ssl.truststore.location", "/opt/cert/client.truststore.jks"); > kafkaParams.put("ssl.truststore.password", "password"); > kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaTopic); > JavaPairInputDStream stream = > KafkaUtils.createDirectStream(jsc, > String.class, > String.class, > StringDecoder.class, > StringDecoder.class, > kafkaParams, > topicmap > ); > JavaDStream lines = stream.map(new Function , > String>() { > public String call(Tuple2 tuple2) { > return tuple2._2(); > } > }); > {code} > {code} > Exception in thread "main" org.apache.spark.SparkException: > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at > org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366) > at > org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366) > at scala.util.Either.fold(Either.scala:97) > at > org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365) > at > org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222) > at > org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484) > at > org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607) > at > org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270032#comment-15270032 ] Sagar commented on SPARK-15072: --- This helps to build test.jar $ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive package assembly/assembly streaming-kafka-assembly/assembly streaming-flume-assembly/assembly streaming-mqtt-assembly/assembly streaming-mqtt/test:assembly streaming-kinesis-asl-assembly/assembly $ cd sql/hive/src/test/resources/regression-test-SPARK-8489/ $ scalac -classpath ~/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.3.0.jar Main.scala MyCoolClass.scala $ rm test.jar $ jar cvf test.jar *.class $ cd ~/spark $ ~/bin/spark-submit' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--driver-java-options' '-Dderby.system.durability=test' '--class' 'Main' 'sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar' Let me know if you are still working on it. > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15032) When we create a new JDBC session, we may need to create a new session of executionHive
[ https://issues.apache.org/jira/browse/SPARK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270018#comment-15270018 ] Sagar commented on SPARK-15032: --- You are right! It is safer to create new session of executionHive while creating JDBC session but I think the problem is that it terminates the executionHive process, let me know if you figured out other way, I can work on it. > When we create a new JDBC session, we may need to create a new session of > executionHive > --- > > Key: SPARK-15032 > URL: https://issues.apache.org/jira/browse/SPARK-15032 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Critical > > Right now, we only use executionHive in thriftserver. When we create a new > jdbc session, we probably need to create a new session of executionHive. I am > not sure what will break if we leave the code as is. But, I feel it will be > safer to create a new session of executionHive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15063) filtering and joining back doesn't work
[ https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270012#comment-15270012 ] Sagar commented on SPARK-15063: --- What else is required to do it in new df for each filter can you elaborate? > filtering and joining back doesn't work > --- > > Key: SPARK-15063 > URL: https://issues.apache.org/jira/browse/SPARK-15063 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Neville Kadwa > > I'm trying to filter and join to do a simple pivot but getting very odd > results. > {quote} {noformat} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna")) > val accounts = Array( > (1, "checking", 100.0), > (1, "savings", 300.0), > (2, "savings", 1000.0), > (3, "carloan", 12000.0), > (3, "checking", 400.0) > ) > val t1 = sc.makeRDD(people).toDF("uid", "name") > val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2c = t2.filter(t2("type") <=> "checking") > val t2s = t2.filter(t2("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are wrong: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [1,sam,1,checking,100.0,2,savings,1000.0], > [2,joe,null,null,null,null,null,null], > [3,sally,3,checking,400.0,1,savings,300.0], > [3,sally,3,checking,400.0,2,savings,1000.0], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} > The way I can force it to work properly is to create a new df for each filter: > {quote} {noformat} > val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2s = t2a.filter(t2a("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are right: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [2,joe,null,null,null,2,savings,1000.0], > [3,sally,3,checking,400.0,null,null,null], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270007#comment-15270007 ] Sagar commented on SPARK-15086: --- In order to update Java API once Scala terminates. Please provide more information in order to make it work what else it includes. > Update Java API once the Scala one is finalized > --- > > Key: SPARK-15086 > URL: https://issues.apache.org/jira/browse/SPARK-15086 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin > Fix For: 2.0.0 > > > We should make sure we update the Java API once the Scala one is finalized. > This includes adding the equivalent API in Java as well as deprecating the > old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util
[ https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270003#comment-15270003 ] Apache Spark commented on SPARK-15107: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/12884 > Allow running test cases with different iterations in micro-benchmark util > -- > > Key: SPARK-15107 > URL: https://issues.apache.org/jira/browse/SPARK-15107 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util
[ https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15107: Assignee: Reynold Xin (was: Apache Spark) > Allow running test cases with different iterations in micro-benchmark util > -- > > Key: SPARK-15107 > URL: https://issues.apache.org/jira/browse/SPARK-15107 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util
[ https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15107: Assignee: Apache Spark (was: Reynold Xin) > Allow running test cases with different iterations in micro-benchmark util > -- > > Key: SPARK-15107 > URL: https://issues.apache.org/jira/browse/SPARK-15107 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util
Reynold Xin created SPARK-15107: --- Summary: Allow running test cases with different iterations in micro-benchmark util Key: SPARK-15107 URL: https://issues.apache.org/jira/browse/SPARK-15107 Project: Spark Issue Type: Bug Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14645: -- Assignee: Timothy Chen > non local Python resource doesn't work with Mesos cluster mode > -- > > Key: SPARK-14645 > URL: https://issues.apache.org/jira/browse/SPARK-14645 > Project: Spark > Issue Type: Bug >Reporter: Timothy Chen >Assignee: Timothy Chen > Fix For: 2.0.0 > > > Currently SparkSubmit explicitly allows non-local python resources for > cluster mode with Mesos, which it's actually supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14414. --- Resolution: Fixed Fix Version/s: 2.0.0 > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._
[ https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15097. --- Resolution: Fixed Assignee: Koert Kuipers Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Import fails for someDataset.sqlContext.implicits._ > --- > > Key: SPARK-15097 > URL: https://issues.apache.org/jira/browse/SPARK-15097 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT >Reporter: koert kuipers >Assignee: Koert Kuipers > Fix For: 2.0.0 > > > with the introduction of SparkSession SQLContext changed from being a lazy > val to a def inside Dataset. however this is troublesome if you want to do: > import someDataset.sqlContext.implicits._ > you get this error: > stable identifier required, but someDataset.sqlContext.implicits found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15084) Use builder pattern to create SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15084. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Use builder pattern to create SparkSession in PySpark > - > > Key: SPARK-15084 > URL: https://issues.apache.org/jira/browse/SPARK-15084 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun > Fix For: 2.0.0 > > > This is a Python port of SPARK-15052. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14645. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > non local Python resource doesn't work with Mesos cluster mode > -- > > Key: SPARK-14645 > URL: https://issues.apache.org/jira/browse/SPARK-14645 > Project: Spark > Issue Type: Bug >Reporter: Timothy Chen > Fix For: 2.0.0 > > > Currently SparkSubmit explicitly allows non-local python resources for > cluster mode with Mesos, which it's actually supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14422) Improve handling of optional configs in SQLConf
[ https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14422: -- Assignee: Sandeep Singh > Improve handling of optional configs in SQLConf > --- > > Key: SPARK-14422 > URL: https://issues.apache.org/jira/browse/SPARK-14422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Sandeep Singh >Priority: Minor > Fix For: 2.0.0 > > > As Michael showed here: > https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150 > Handling of optional configs in SQLConf is a little sub-optimal right now. We > should clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14422) Improve handling of optional configs in SQLConf
[ https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14422. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve handling of optional configs in SQLConf > --- > > Key: SPARK-14422 > URL: https://issues.apache.org/jira/browse/SPARK-14422 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Priority: Minor > Fix For: 2.0.0 > > > As Michael showed here: > https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150 > Handling of optional configs in SQLConf is a little sub-optimal right now. We > should clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.
[ https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15106: Assignee: (was: Apache Spark) > Add package documentation for ML and remove BETA from Scala & Java for ML > pipeline API. > --- > > Key: SPARK-15106 > URL: https://issues.apache.org/jira/browse/SPARK-15106 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, PySpark >Reporter: holdenk > > As part of the audit (SPARK-14813) I noticed we don't have a package > definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should > be going away now that we are deprecating MLLib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.
[ https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269901#comment-15269901 ] Apache Spark commented on SPARK-15106: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/12883 > Add package documentation for ML and remove BETA from Scala & Java for ML > pipeline API. > --- > > Key: SPARK-15106 > URL: https://issues.apache.org/jira/browse/SPARK-15106 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, PySpark >Reporter: holdenk > > As part of the audit (SPARK-14813) I noticed we don't have a package > definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should > be going away now that we are deprecating MLLib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.
[ https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15106: Assignee: Apache Spark > Add package documentation for ML and remove BETA from Scala & Java for ML > pipeline API. > --- > > Key: SPARK-15106 > URL: https://issues.apache.org/jira/browse/SPARK-15106 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, PySpark >Reporter: holdenk >Assignee: Apache Spark > > As part of the audit (SPARK-14813) I noticed we don't have a package > definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should > be going away now that we are deprecating MLLib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.
holdenk created SPARK-15106: --- Summary: Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API. Key: SPARK-15106 URL: https://issues.apache.org/jira/browse/SPARK-15106 Project: Spark Issue Type: Improvement Components: Documentation, ML, PySpark Reporter: holdenk As part of the audit (SPARK-14813) I noticed we don't have a package definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should be going away now that we are deprecating MLLib. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage
[ https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269827#comment-15269827 ] holdenk commented on SPARK-14813: - I'm happy to start doing a first pass on this later on this week if no one else is interested. <3 PySpark > ML 2.0 QA: API: Python API coverage > --- > > Key: SPARK-14813 > URL: https://issues.apache.org/jira/browse/SPARK-14813 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, PySpark >Reporter: Joseph K. Bradley > > For new public APIs added to MLlib, we need to check the generated HTML doc > and compare the Scala & Python versions. We need to track: > * Inconsistency: Do class/method/parameter names match? > * Docs: Is the Python doc missing or just a stub? We want the Python doc to > be as complete as the Scala doc. > * API breaking changes: These should be very rare but are occasionally either > necessary (intentional) or accidental. These must be recorded and added in > the Migration Guide for this release. > ** Note: If the API change is for an Alpha/Experimental/DeveloperApi > component, please note that as well. > * Missing classes/methods/parameters: We should create to-do JIRAs for > functionality missing from Python, to be added in the next release cycle. > Please use a *separate* JIRA (linked below) for this list of to-do items. > UPDATE: This only needs to cover spark.ml since spark.mllib is going into > maintenance mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14772) Python ML Params.copy treats uid, paramMaps differently than Scala
[ https://issues.apache.org/jira/browse/SPARK-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269824#comment-15269824 ] holdenk commented on SPARK-14772: - I can take a look at this if no one else is working on it and it planned for 2.0 > Python ML Params.copy treats uid, paramMaps differently than Scala > -- > > Key: SPARK-14772 > URL: https://issues.apache.org/jira/browse/SPARK-14772 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Reporter: Joseph K. Bradley > > In PySpark, {{ml.param.Params.copy}} does not quite match the Scala > implementation: > * It does not copy the UID > * It does not respect the difference between defaultParamMap and paramMap. > This is an issue with {{_copyValues}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15096) LogisticRegression MultiClassSummarizer numClasses can fail if no valid labels are found
[ https://issues.apache.org/jira/browse/SPARK-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269776#comment-15269776 ] Miao Wang commented on SPARK-15096: --- If nobody is working on this one, I will work on this one now. Thanks! Miao > LogisticRegression MultiClassSummarizer numClasses can fail if no valid > labels are found > > > Key: SPARK-15096 > URL: https://issues.apache.org/jira/browse/SPARK-15096 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Priority: Minor > > LogisticRegression.train calls labelSummarizer.histogram before it calls > labelSummarizer.countInvalid: > [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L292] > But if there are no valid labels, it is possible to get an Exception from > empty.max when numClasses is called here: > [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L751] > Proposed fix: We should fix numClasses to throw a better exception: > [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L747] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269775#comment-15269775 ] Xin Ren commented on SPARK-14817: - ok, I'll start looking for new APIs. So just create new tickets under SPARK-14815? > ML, Graph, R 2.0 QA: Programming guide update and migration guide > - > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib, GraphX, and SparkR > Programming Guides. Updates will include: > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs and [SPARK-13448]. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, > to make it clear the RDD-based API is the older, maintenance-mode one. > * No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > * If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. This per-feature work can happen under [SPARK-14815]. > * This big reorganization should be done *after* docs are added for each > feature (to minimize merge conflicts). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15101) Audit: ml.clustering and ml.recommendation
[ https://issues.apache.org/jira/browse/SPARK-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269768#comment-15269768 ] Miao Wang commented on SPARK-15101: --- [~josephkb] I want to know how to work on these kind of JIRAs. For example, if multiple examples and docs are missing, shall I just file a single PR for this JIRA? Or I just report the missing parts and updates the JIRA for further subtasks? Thanks! Miao > Audit: ml.clustering and ml.recommendation > -- > > Key: SPARK-15101 > URL: https://issues.apache.org/jira/browse/SPARK-15101 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Joseph K. Bradley > > Audit this sub-package for new algorithms which do not have corresponding > sections & examples in the user guide. > See parent issue for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14900) spark.ml classification metrics should include accuracy
[ https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14900: Assignee: (was: Apache Spark) > spark.ml classification metrics should include accuracy > --- > > Key: SPARK-14900 > URL: https://issues.apache.org/jira/browse/SPARK-14900 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > To compute "accuracy" (0/1 classification accuracy), users can use > {{precision}} in MulticlassMetrics and > MulticlassClassificationEvaluator.metricName. We should also support > "accuracy" directly as an alias to help users familiar with that name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14900) spark.ml classification metrics should include accuracy
[ https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14900: Assignee: Apache Spark > spark.ml classification metrics should include accuracy > --- > > Key: SPARK-14900 > URL: https://issues.apache.org/jira/browse/SPARK-14900 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Minor > > To compute "accuracy" (0/1 classification accuracy), users can use > {{precision}} in MulticlassMetrics and > MulticlassClassificationEvaluator.metricName. We should also support > "accuracy" directly as an alias to help users familiar with that name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14900) spark.ml classification metrics should include accuracy
[ https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269754#comment-15269754 ] Apache Spark commented on SPARK-14900: -- User 'wangmiao1981' has created a pull request for this issue: https://github.com/apache/spark/pull/12882 > spark.ml classification metrics should include accuracy > --- > > Key: SPARK-14900 > URL: https://issues.apache.org/jira/browse/SPARK-14900 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > To compute "accuracy" (0/1 classification accuracy), users can use > {{precision}} in MulticlassMetrics and > MulticlassClassificationEvaluator.metricName. We should also support > "accuracy" directly as an alias to help users familiar with that name. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269738#comment-15269738 ] Alex Bozarth commented on SPARK-13269: -- Hey [~andrewor14], I was interested in this and took a look at the two examples you gave and am a bit confused at what exactly you actually want. You can currently see the used memory, used disk space, and active task count for each executor by calling /applications/[app-id]/executors or (in code) getting the ExecutorSummary class for each executor and checking activeTask, memoryUsed, and diskUsed. Are these numbers different from what you were interested in surfacing? I was unsure of how those example related to JobProgressListener as well. > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15095) Drop binary mode in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269719#comment-15269719 ] Apache Spark commented on SPARK-15095: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/12881 > Drop binary mode in ThriftServer > > > Key: SPARK-15095 > URL: https://issues.apache.org/jira/browse/SPARK-15095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15105) Remove HiveSessionHook from ThriftServer
Davies Liu created SPARK-15105: -- Summary: Remove HiveSessionHook from ThriftServer Key: SPARK-15105 URL: https://issues.apache.org/jira/browse/SPARK-15105 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Davies Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15102) remove delegation token from ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269700#comment-15269700 ] Apache Spark commented on SPARK-15102: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/12878 > remove delegation token from ThriftServer > - > > Key: SPARK-15102 > URL: https://issues.apache.org/jira/browse/SPARK-15102 > Project: Spark > Issue Type: Sub-task >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > These feature is only useful for Hadoop -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15104) Bad spacing in log line
[ https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15104. - Resolution: Fixed Assignee: Andrew Ash Fix Version/s: 2.0.0 > Bad spacing in log line > --- > > Key: SPARK-15104 > URL: https://issues.apache.org/jira/browse/SPARK-15104 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Andrew Ash >Assignee: Andrew Ash >Priority: Minor > Fix For: 2.0.0 > > > {noformat}INFO [2016-05-03 21:18:51,477] > org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 > (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat} > Should have a space before "NODE_LOCAL" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15102) remove delegation token from ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15102. - Resolution: Fixed Fix Version/s: 2.0.0 > remove delegation token from ThriftServer > - > > Key: SPARK-15102 > URL: https://issues.apache.org/jira/browse/SPARK-15102 > Project: Spark > Issue Type: Sub-task >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > These feature is only useful for Hadoop -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15104) Bad spacing in log line
[ https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15104: Assignee: Apache Spark > Bad spacing in log line > --- > > Key: SPARK-15104 > URL: https://issues.apache.org/jira/browse/SPARK-15104 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Andrew Ash >Assignee: Apache Spark >Priority: Minor > > {noformat}INFO [2016-05-03 21:18:51,477] > org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 > (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat} > Should have a space before "NODE_LOCAL" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15104) Bad spacing in log line
[ https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269648#comment-15269648 ] Apache Spark commented on SPARK-15104: -- User 'ash211' has created a pull request for this issue: https://github.com/apache/spark/pull/12880 > Bad spacing in log line > --- > > Key: SPARK-15104 > URL: https://issues.apache.org/jira/browse/SPARK-15104 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Andrew Ash >Priority: Minor > > {noformat}INFO [2016-05-03 21:18:51,477] > org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 > (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat} > Should have a space before "NODE_LOCAL" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15104) Bad spacing in log line
[ https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15104: Assignee: (was: Apache Spark) > Bad spacing in log line > --- > > Key: SPARK-15104 > URL: https://issues.apache.org/jira/browse/SPARK-15104 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Andrew Ash >Priority: Minor > > {noformat}INFO [2016-05-03 21:18:51,477] > org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 > (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat} > Should have a space before "NODE_LOCAL" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15104) Bad spacing in log line
Andrew Ash created SPARK-15104: -- Summary: Bad spacing in log line Key: SPARK-15104 URL: https://issues.apache.org/jira/browse/SPARK-15104 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Reporter: Andrew Ash Priority: Minor {noformat}INFO [2016-05-03 21:18:51,477] org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat} Should have a space before "NODE_LOCAL" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9466) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-9466. -- Resolution: Auto Closed > Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite > --- > > Key: SPARK-9466 > URL: https://issues.apache.org/jira/browse/SPARK-9466 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai > Labels: flaky-test > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12008) Spark hive security authorization doesn't work as Apache hive's
[ https://issues.apache.org/jira/browse/SPARK-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-12008. --- Resolution: Invalid Marking this as invalid since these are unsupported for now. We might add support explicitly for these features in the future. > Spark hive security authorization doesn't work as Apache hive's > --- > > Key: SPARK-12008 > URL: https://issues.apache.org/jira/browse/SPARK-12008 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: pin_zhang > > Spark hive security authorization doesn't consistent with apache hive > The same hive-site.xml > > hive.security.authorization.enabled > true > > > hive.security.authorization.manager > org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory > > > hive.security.authenticator.manager > org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator > > > hive.server2.enable.doAs > true > > 1. Run spark start-thriftserver.sh, Will meet exception when run sql. >SQL standards based authorization should not be enabled from hive > cliInstead the use of storage based authorization in hive metastore is > reccomended. >Set hive.security.authorization.enabled=false to disable authz within cli > 2. Change to start start-thriftserver.sh with hive configurations > ./start-thriftserver.sh --conf > hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory > --conf > hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator > > 3. Beeline connect with userA and create table tableA. > 4. Beeline connect with userB to truncate tableA > A) In Apache hive, truncate table get exception > Error while compiling statement: FAILED: HiveAccessControlException > Permission denied: Principal [name=userB, type=USER] does not have following > privileges for operation TRUNCATETABLE [[OBJECT OWNERSHIP] on Object > [type=TABLE_OR_VIEW, name=default.tablea]] (state=42000,code=4) > B) In Spark hive, any user that can connect to the hive, can truncate, as > long as the spark user has privileges. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink
[ https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269632#comment-15269632 ] Apache Spark commented on SPARK-15103: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/12879 > Add support for batch jobs correctly inferring partitions from data written > with file stream sink > - > > Key: SPARK-15103 > URL: https://issues.apache.org/jira/browse/SPARK-15103 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > File Stream Sink writes the list of written files in a log. StreamFileCatalog > reads the list of the files for processing. However StreamFileCatalog does > not infer partitioning like HDFSFileCatalog. > This JIRA is enable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink
[ https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15103: Assignee: Tathagata Das (was: Apache Spark) > Add support for batch jobs correctly inferring partitions from data written > with file stream sink > - > > Key: SPARK-15103 > URL: https://issues.apache.org/jira/browse/SPARK-15103 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > File Stream Sink writes the list of written files in a log. StreamFileCatalog > reads the list of the files for processing. However StreamFileCatalog does > not infer partitioning like HDFSFileCatalog. > This JIRA is enable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink
[ https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15103: Assignee: Apache Spark (was: Tathagata Das) > Add support for batch jobs correctly inferring partitions from data written > with file stream sink > - > > Key: SPARK-15103 > URL: https://issues.apache.org/jira/browse/SPARK-15103 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > File Stream Sink writes the list of written files in a log. StreamFileCatalog > reads the list of the files for processing. However StreamFileCatalog does > not infer partitioning like HDFSFileCatalog. > This JIRA is enable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12066) spark sql throw java.lang.ArrayIndexOutOfBoundsException when use table.* with join
[ https://issues.apache.org/jira/browse/SPARK-12066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-12066. --- Resolution: Cannot Reproduce Closing as cannot reproduce for now. > spark sql throw java.lang.ArrayIndexOutOfBoundsException when use table.* > with join > - > > Key: SPARK-12066 > URL: https://issues.apache.org/jira/browse/SPARK-12066 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.5.2 > Environment: linux >Reporter: Ricky Yang > > throw java.lang.ArrayIndexOutOfBoundsException when I use following spark > sql on spark standlone or yarn. >the sql: > select ta.* > from bi_td.dm_price_seg_td tb > join bi_sor.sor_ord_detail_tf ta > on 1 = 1 > where ta.sale_dt = '20140514' > and ta.sale_price >= tb.pri_from > and ta.sale_price < tb.pri_to limit 10 ; > But ,the result is correct when using no * as following: > select ta.sale_dt > from bi_td.dm_price_seg_td tb > join bi_sor.sor_ord_detail_tf ta > on 1 = 1 > where ta.sale_dt = '20140514' > and ta.sale_price >= tb.pri_from > and ta.sale_price < tb.pri_to limit 10 ; > standlone version is 1.4.0 and version spark on yarn is 1.5.2 > error log : > > 15/11/30 14:19:59 ERROR SparkSQLDriver: Failed in [select ta.* > from bi_td.dm_price_seg_td tb > join bi_sor.sor_ord_detail_tf ta > on 1 = 1 > where ta.sale_dt = '20140514' > and ta.sale_price >= tb.pri_from > and ta.sale_price < tb.pri_to limit 10 ] > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, namenode2-sit.cnsuning.com): java.lang.ArrayIndexOutOfBoundsException > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850) > at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215) > at > org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:587) > > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) > > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:308) > > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) > at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166) > > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674) > > at
[jira] [Commented] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269620#comment-15269620 ] Dongjoon Hyun commented on SPARK-15037: --- Sure. Go ahead if you want. This is still blocked by SPARK-15084 . But, I think you can start with Spark/Java. I am working on SPARK-15084 and SPARK-15031 (for examples). Thanks. > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task >Reporter: Dongjoon Hyun > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15056) Parse Unsupported Sampling Syntax and Issue Better Exceptions
[ https://issues.apache.org/jira/browse/SPARK-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-15056. --- Resolution: Fixed Assignee: Xiao Li > Parse Unsupported Sampling Syntax and Issue Better Exceptions > - > > Key: SPARK-15056 > URL: https://issues.apache.org/jira/browse/SPARK-15056 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Minor > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling > Compared with the current Spark parser, there are two extra syntax are > supported in Hive for sampling > 1. In On clauses, rand() indicating sampling on the entire row instead of an > individual column. > 2. Users can specify total length to be read in block_sample. > We need to parse and capture them. Issue a better error message for these > unsupported features. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink
Tathagata Das created SPARK-15103: - Summary: Add support for batch jobs correctly inferring partitions from data written with file stream sink Key: SPARK-15103 URL: https://issues.apache.org/jira/browse/SPARK-15103 Project: Spark Issue Type: Sub-task Components: SQL, Streaming Reporter: Tathagata Das Assignee: Tathagata Das File Stream Sink writes the list of written files in a log. StreamFileCatalog reads the list of the files for processing. However StreamFileCatalog does not infer partitioning like HDFSFileCatalog. This JIRA is enable that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13971) Implicit group by with distinct modifier on having raises an unexpected error
[ https://issues.apache.org/jira/browse/SPARK-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13971. - Resolution: Fixed Fix Version/s: 2.0.0 > Implicit group by with distinct modifier on having raises an unexpected error > - > > Key: SPARK-13971 > URL: https://issues.apache.org/jira/browse/SPARK-13971 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: spark standalone mode installed on Centos7 >Reporter: Javier PĂ©rez > Fix For: 2.0.0 > > > 1. Start-thriftserver > 2. connect with beeline > 3 perform the following query over a simple talbe: > SELECT COUNT(DISTINCT field1) FROM test_table HAVING COUNT(DISTINCT field1) = > 3 > TRACE: > ERROR SparkExecuteStatementOperation: Error running hive query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: resolved attribute(s) > gid#13616,field1#13617 missing from > field1#13612,field2#13611,field2#13608,field3#13610,field4#13613,field5#13609 > in operator !Expand [List(null, 0, if ((gid#13616 = 1)) field1#13617 else > null),List(field2#13608, 1, null)], [field2#13619,gid#13618,if ((gid = 1)) > field1 else null#13620]; > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:246) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading
[ https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-14973. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12825 [https://github.com/apache/spark/pull/12825] > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading > - > > Key: SPARK-14973 > URL: https://issues.apache.org/jira/browse/SPARK-14973 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Reporter: Xusen Yin >Assignee: Xusen Yin > Fix For: 2.0.0 > > > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading. Need to fix both Spark side code and test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15102) remove delegation token from ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15102: Issue Type: Sub-task (was: Bug) Parent: SPARK-14987 > remove delegation token from ThriftServer > - > > Key: SPARK-15102 > URL: https://issues.apache.org/jira/browse/SPARK-15102 > Project: Spark > Issue Type: Sub-task >Reporter: Davies Liu >Assignee: Davies Liu > > These feature is only useful for Hadoop -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading
[ https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14973: -- Shepherd: Joseph K. Bradley > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading > - > > Key: SPARK-14973 > URL: https://issues.apache.org/jira/browse/SPARK-14973 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Reporter: Xusen Yin >Assignee: Xusen Yin > > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading. Need to fix both Spark side code and test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15095) Drop binary mode in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15095: Issue Type: Sub-task (was: Bug) Parent: SPARK-14987 > Drop binary mode in ThriftServer > > > Key: SPARK-15095 > URL: https://issues.apache.org/jira/browse/SPARK-15095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading
[ https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14973: -- Assignee: Xusen Yin > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading > - > > Key: SPARK-14973 > URL: https://issues.apache.org/jira/browse/SPARK-14973 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Reporter: Xusen Yin >Assignee: Xusen Yin > > The CrossValidator and TrainValidationSplit miss the seed when saving and > loading. Need to fix both Spark side code and test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10152) Support Init script for hive-thriftserver
[ https://issues.apache.org/jira/browse/SPARK-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-10152. --- Resolution: Won't Fix > Support Init script for hive-thriftserver > - > > Key: SPARK-10152 > URL: https://issues.apache.org/jira/browse/SPARK-10152 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Priority: Trivial > > If some queries can be executed on thrift server in initialization > stage(mostly for registering functions or macros), things are done much > easier. > Not big stuff to be included in spark but wish someone can use of this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15102) remove delegation token from ThriftServer
Davies Liu created SPARK-15102: -- Summary: remove delegation token from ThriftServer Key: SPARK-15102 URL: https://issues.apache.org/jira/browse/SPARK-15102 Project: Spark Issue Type: Bug Reporter: Davies Liu Assignee: Davies Liu These feature is only useful for Hadoop -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15095) Drop binary mode in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15095. - Resolution: Fixed Fix Version/s: 2.0.0 > Drop binary mode in ThriftServer > > > Key: SPARK-15095 > URL: https://issues.apache.org/jira/browse/SPARK-15095 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15100) Audit: ml.feature
Joseph K. Bradley created SPARK-15100: - Summary: Audit: ml.feature Key: SPARK-15100 URL: https://issues.apache.org/jira/browse/SPARK-15100 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Joseph K. Bradley Audit this sub-package for new algorithms which do not have corresponding sections & examples in the user guide. See parent issue for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269588#comment-15269588 ] Joseph K. Bradley commented on SPARK-14817: --- [~BenFradet] [~iamshrek] [~podongfeng] [~wm624] If you'd like to begin, could you please help with the initial audit tasks on [SPARK-14815]? That will let us identify missing programming guide items which we need to add. Thank you! > ML, Graph, R 2.0 QA: Programming guide update and migration guide > - > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib, GraphX, and SparkR > Programming Guides. Updates will include: > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs and [SPARK-13448]. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, > to make it clear the RDD-based API is the older, maintenance-mode one. > * No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > * If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. This per-feature work can happen under [SPARK-14815]. > * This big reorganization should be done *after* docs are added for each > feature (to minimize merge conflicts). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15101) Audit: ml.clustering and ml.recommendation
Joseph K. Bradley created SPARK-15101: - Summary: Audit: ml.clustering and ml.recommendation Key: SPARK-15101 URL: https://issues.apache.org/jira/browse/SPARK-15101 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Joseph K. Bradley Audit this sub-package for new algorithms which do not have corresponding sections & examples in the user guide. See parent issue for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15098) Audit: ml.classification
Joseph K. Bradley created SPARK-15098: - Summary: Audit: ml.classification Key: SPARK-15098 URL: https://issues.apache.org/jira/browse/SPARK-15098 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Joseph K. Bradley Audit this sub-package for new algorithms which do not have corresponding sections & examples in the user guide. See parent issue for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15099) Audit: ml.regression
Joseph K. Bradley created SPARK-15099: - Summary: Audit: ml.regression Key: SPARK-15099 URL: https://issues.apache.org/jira/browse/SPARK-15099 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Joseph K. Bradley Audit this sub-package for new algorithms which do not have corresponding sections & examples in the user guide. See parent issue for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14815) ML, Graph, R 2.0 QA: Update user guide for new features & APIs
[ https://issues.apache.org/jira/browse/SPARK-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269581#comment-15269581 ] Joseph K. Bradley commented on SPARK-14815: --- I'll go ahead and create subtasks for auditing various parts of the API. Please check those subpackages to see if the user guide is missing sections. > ML, Graph, R 2.0 QA: Update user guide for new features & APIs > -- > > Key: SPARK-14815 > URL: https://issues.apache.org/jira/browse/SPARK-14815 > Project: Spark > Issue Type: Sub-task > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley > > Check the user guide vs. a list of new APIs (classes, methods, data members) > to see what items require updates to the user guide. > For each feature missing user guide doc: > * Create a JIRA for that feature, and assign it to the author of the feature > * Link it to (a) the original JIRA which introduced that feature ("related > to") and (b) to this JIRA ("requires"). > For MLlib: > * This task does not include major reorganizations for the programming guide; > that will be under [SPARK-14817]. > * We should now begin copying algorithm details from the spark.mllib guide to > spark.ml as needed, rather than just linking back to the corresponding > algorithms in the spark.mllib user guide. > If you would like to work on this task, please comment, and we can create & > link JIRAs for parts of this work (which should be broken into pieces for > this larger 2.0 release). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14808) Spark MLlib, GraphX, SparkR 2.0 QA umbrella
[ https://issues.apache.org/jira/browse/SPARK-14808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14808: -- Description: This JIRA lists tasks for the next Spark release's QA period for MLlib, GraphX, and SparkR. The list below gives an overview of what is involved, and the corresponding JIRA issues are linked below that. h2. API * Check binary API compatibility for Scala/Java * Audit new public APIs (from the generated html doc) ** Scala ** Java compatibility ** Python coverage ** R * Check Experimental, DeveloperApi tags h2. Algorithms and performance *Performance* * _List any other missing performance tests from spark-perf here_ * perf-tests for transformers (SPARK-2838) * MultilayerPerceptron (SPARK-11911) h2. Documentation and example code * For new algorithms, create JIRAs for updating the user guide sections & examples * Update Programming Guide * Update website was: This JIRA lists tasks for the next Spark release's QA period for MLlib, GraphX, and SparkR. The list below gives an overview of what is involved, and the corresponding JIRA issues are linked below that. h2. API * Check binary API compatibility for Scala/Java * Audit new public APIs (from the generated html doc) ** Scala ** Java compatibility ** Python coverage ** R * Check Experimental, DeveloperApi tags h2. Algorithms and performance *Performance* * _List any other missing performance tests from spark-perf here_ * perf-tests for transformers (SPARK-2838) * MultilayerPerceptron (SPARK-11911) h2. Documentation and example code * For new algorithms, create JIRAs for updating the user guide * For major components, create JIRAs for example code * Update Programming Guide * Update website > Spark MLlib, GraphX, SparkR 2.0 QA umbrella > --- > > Key: SPARK-14808 > URL: https://issues.apache.org/jira/browse/SPARK-14808 > Project: Spark > Issue Type: Umbrella > Components: Documentation, GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Critical > > This JIRA lists tasks for the next Spark release's QA period for MLlib, > GraphX, and SparkR. > The list below gives an overview of what is involved, and the corresponding > JIRA issues are linked below that. > h2. API > * Check binary API compatibility for Scala/Java > * Audit new public APIs (from the generated html doc) > ** Scala > ** Java compatibility > ** Python coverage > ** R > * Check Experimental, DeveloperApi tags > h2. Algorithms and performance > *Performance* > * _List any other missing performance tests from spark-perf here_ > * perf-tests for transformers (SPARK-2838) > * MultilayerPerceptron (SPARK-11911) > h2. Documentation and example code > * For new algorithms, create JIRAs for updating the user guide sections & > examples > * Update Programming Guide > * Update website -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14809) Examples: Check for new APIs requiring example code in 2.0
[ https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-14809. - Resolution: Duplicate This used to be relevant when examples & the user guide were separate, but it can now be contained within [SPARK-14815] > Examples: Check for new APIs requiring example code in 2.0 > -- > > Key: SPARK-14809 > URL: https://issues.apache.org/jira/browse/SPARK-14809 > Project: Spark > Issue Type: Sub-task > Components: GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley >Priority: Minor > > Audit list of new features added to MLlib, GraphX & SparkR, and see which > major items are missing example code (in the examples folder). We do not > need examples for everything, only for major items such as new algorithms. > For any such items: > * Create a JIRA for that feature, and assign it to the author of the feature > (or yourself if interested). > * Link it to (a) the original JIRA which introduced that feature ("related > to") and (b) to this JIRA ("requires"). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14809) Examples: Check for new APIs requiring example code in 2.0
[ https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14809: -- Issue Type: Documentation (was: Sub-task) Parent: (was: SPARK-14808) > Examples: Check for new APIs requiring example code in 2.0 > -- > > Key: SPARK-14809 > URL: https://issues.apache.org/jira/browse/SPARK-14809 > Project: Spark > Issue Type: Documentation > Components: GraphX, ML, MLlib, SparkR >Reporter: Joseph K. Bradley >Priority: Minor > > Audit list of new features added to MLlib, GraphX & SparkR, and see which > major items are missing example code (in the examples folder). We do not > need examples for everything, only for major items such as new algorithms. > For any such items: > * Create a JIRA for that feature, and assign it to the author of the feature > (or yourself if interested). > * Link it to (a) the original JIRA which introduced that feature ("related > to") and (b) to this JIRA ("requires"). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15073) Make SparkSession constructors private
[ https://issues.apache.org/jira/browse/SPARK-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-15073. - Resolution: Fixed > Make SparkSession constructors private > -- > > Key: SPARK-15073 > URL: https://issues.apache.org/jira/browse/SPARK-15073 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Andrew Or > Fix For: 2.0.0 > > > So users have to use the Builder pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._
[ https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15097: Assignee: (was: Apache Spark) > Import fails for someDataset.sqlContext.implicits._ > --- > > Key: SPARK-15097 > URL: https://issues.apache.org/jira/browse/SPARK-15097 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT >Reporter: koert kuipers > > with the introduction of SparkSession SQLContext changed from being a lazy > val to a def inside Dataset. however this is troublesome if you want to do: > import someDataset.sqlContext.implicits._ > you get this error: > stable identifier required, but someDataset.sqlContext.implicits found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._
[ https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269548#comment-15269548 ] Apache Spark commented on SPARK-15097: -- User 'koertkuipers' has created a pull request for this issue: https://github.com/apache/spark/pull/12877 > Import fails for someDataset.sqlContext.implicits._ > --- > > Key: SPARK-15097 > URL: https://issues.apache.org/jira/browse/SPARK-15097 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT >Reporter: koert kuipers > > with the introduction of SparkSession SQLContext changed from being a lazy > val to a def inside Dataset. however this is troublesome if you want to do: > import someDataset.sqlContext.implicits._ > you get this error: > stable identifier required, but someDataset.sqlContext.implicits found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._
[ https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15097: Assignee: Apache Spark > Import fails for someDataset.sqlContext.implicits._ > --- > > Key: SPARK-15097 > URL: https://issues.apache.org/jira/browse/SPARK-15097 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: spark-2.0.0-SNAPSHOT >Reporter: koert kuipers >Assignee: Apache Spark > > with the introduction of SparkSession SQLContext changed from being a lazy > val to a def inside Dataset. however this is troublesome if you want to do: > import someDataset.sqlContext.implicits._ > you get this error: > stable identifier required, but someDataset.sqlContext.implicits found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11316) coalesce doesn't handle UnionRDD with partial locality properly
[ https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-11316. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11327 [https://github.com/apache/spark/pull/11327] > coalesce doesn't handle UnionRDD with partial locality properly > --- > > Key: SPARK-11316 > URL: https://issues.apache.org/jira/browse/SPARK-11316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > Fix For: 2.0.0 > > > So I haven't fully debugged this yet but reporting what I'm seeing and think > might be going on. > I have a graph processing job that is seeing huge slow down in setupGroups in > the location iterator where its getting the preferred locations for the > coalesce. They are coalescing from 2400 down to 1200 and its taking 17+ > hours to do the calculation. Killed it at this point so don't know total > time. > It appears that the job is doing an isEmpty call, a bunch of other > transformation, then a coalesce (where it takes so long), other > transformations, then finally a count to trigger it. > It appears that there is only one node that its finding in the setupGroup > call and to get to that node it has to first to through the while loop: > while (numCreated < targetLen && tries < expectedCoupons2) { > where expectedCoupons2 is around 19000. It finds very few or none in this > loop. > Then it does the second loop: > while (numCreated < targetLen) { // if we don't have enough partition > groups, create duplicates > var (nxt_replica, nxt_part) = rotIt.next() > val pgroup = PartitionGroup(nxt_replica) > groupArr += pgroup > groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup > var tries = 0 > while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // > ensure at least one part > nxt_part = rotIt.next()._2 > tries += 1 > } > numCreated += 1 > } > Where it has an inner while loop and both of those are going 1200 times. > 1200*1200 loops. This is taking a very long time. > The user can work around the issue by adding in a count() call very close to > after the isEmpty call before the coalesce is called. I also tried putting > in a take(1) right before the isEmpty call and it seems to work around > the issue, took 1 hours with the take vs a few minutes with the count(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-14521. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12598 [https://github.com/apache/spark/pull/12598] > StackOverflowError in Kryo when executing TPC-DS > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan >Priority: Critical > Fix For: 2.0.0 > > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14234) Executor crashes for TaskRunner thread interruption
[ https://issues.apache.org/jira/browse/SPARK-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-14234. -- Resolution: Fixed Assignee: Devaraj K Fix Version/s: 2.0.0 > Executor crashes for TaskRunner thread interruption > --- > > Key: SPARK-14234 > URL: https://issues.apache.org/jira/browse/SPARK-14234 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Devaraj K >Assignee: Devaraj K > Fix For: 2.0.0 > > > If the TaskRunner thread gets interrupted while running due to task kill or > any other reason, the interrupted thread will try to update the task status > as part of the exception handling and fails with the below exception. This is > happening from all of these catch blocks statusUpdate calls, below are the > exceptions correspondingly for all these catch cases. > {code:title=Executor.scala|borderStyle=solid} > case _: TaskKilledException | _: InterruptedException if task.killed > => > .. > case cDE: CommitDeniedException => > .. > case t: Throwable => > .. > {code} > {code:xml} > 16/03/29 17:32:33 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-2,5,main] > java.lang.Error: java.nio.channels.ClosedByInterruptException > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at > java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460) > at > org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49) > at > org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1204) > at > org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:513) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:135) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ... 2 more > {code} > {code:xml} > 16/03/29 08:00:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in > thread Thread[Executor task launch worker-4,5,main] > java.lang.Error: java.nio.channels.ClosedByInterruptException > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at
[jira] [Comment Edited] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites
[ https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269475#comment-15269475 ] Sandeep Singh edited comment on SPARK-15037 at 5/3/16 8:22 PM: --- [~dongjoon] If it's ok with you, I can work on this one. was (Author: techaddict): cc: [~rxin] [~dongjoon] If you want I can work on this one. > Use SparkSession instead of SQLContext in testsuites > > > Key: SPARK-15037 > URL: https://issues.apache.org/jira/browse/SPARK-15037 > Project: Spark > Issue Type: Sub-task >Reporter: Dongjoon Hyun > > This issue aims to update the existing testsuites to use `SparkSession` > instread of `SQLContext` since `SQLContext` exists just for backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org