[jira] [Commented] (SPARK-7099) Floating point literals cannot be specified using exponent
[ https://issues.apache.org/jira/browse/SPARK-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711678#comment-14711678 ] Ryan Pham commented on SPARK-7099: -- We've moved to spark 1.3.1, but it seems like the exponent format is still not supported. The select works when the number is expanded. SELECT cdescription179858030, cbigintcol823807900, cintcol1455799299, csmallintcol2049749987, ctinyintcol1324387732 FROM TABLE_1 WHERE (cbigintcol823807900 = 1E6) 15/08/25 10:44:40 ERROR AbstractFunctionalTests: java.lang.RuntimeException: [1.179] failure: ``)'' expected but identifier E6 found Floating point literals cannot be specified using exponent -- Key: SPARK-7099 URL: https://issues.apache.org/jira/browse/SPARK-7099 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Environment: Windows, Linux, Mac OS X Reporter: Peter Hagelund Priority: Minor Floating point literals cannot be expressed in scientific notation using an exponent, like e.g. 1.23E4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10230) LDA public API should use docConcentration
[ https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10230: Assignee: Apache Spark LDA public API should use docConcentration -- Key: SPARK-10230 URL: https://issues.apache.org/jira/browse/SPARK-10230 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Assignee: Apache Spark Priority: Minor {{alpha}} is provided as an alias to {{docConcentration}} because it is commonly used in literature. However, we should prefer {{docConcentration}} since it is unambiguous what we mean. The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate any public API's using {{alpha}} directly and refer users to the corresponding {{docConcentration}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711668#comment-14711668 ] Koert Kuipers commented on SPARK-3655: -- oh, thats no good i am using guava without even declaring a dependency... let me see if there is an alternative to using guava for this Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10231) Update @Since annotation for mllib.classification
Xiangrui Meng created SPARK-10231: - Summary: Update @Since annotation for mllib.classification Key: SPARK-10231 URL: https://issues.apache.org/jira/browse/SPARK-10231 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Minor Some public methods are missing @Since tags, and some versions are not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5456) Decimal Type comparison issue
[ https://issues.apache.org/jira/browse/SPARK-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711690#comment-14711690 ] Brandon Bradley commented on SPARK-5456: I'm still experiencing this in 1.4.0 and 1.4.1. I believe the fix for it should be in 1.4.1. Decimal Type comparison issue - Key: SPARK-5456 URL: https://issues.apache.org/jira/browse/SPARK-5456 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0, 1.3.0 Reporter: Kuldeep Assignee: Adrian Wang Priority: Blocker Fix For: 1.3.2, 1.4.0 Not quite able to figure this out but here is a junit test to reproduce this, in JavaAPISuite.java {code:title=DecimalBug.java} @Test public void decimalQueryTest() { ListRow decimalTable = new ArrayListRow(); decimalTable.add(RowFactory.create(new BigDecimal(1), new BigDecimal(2))); decimalTable.add(RowFactory.create(new BigDecimal(3), new BigDecimal(4))); JavaRDDRow rows = sc.parallelize(decimalTable); ListStructField fields = new ArrayListStructField(7); fields.add(DataTypes.createStructField(a, DataTypes.createDecimalType(), true)); fields.add(DataTypes.createStructField(b, DataTypes.createDecimalType(), true)); sqlContext.applySchema(rows.rdd(), DataTypes.createStructType(fields)).registerTempTable(foo); Assert.assertEquals(sqlContext.sql(select * from foo where a 0).collectAsList(), decimalTable); } {code} Fails with java.lang.ClassCastException: java.math.BigDecimal cannot be cast to org.apache.spark.sql.types.Decimal -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10227) sbt build on Scala 2.11 fails
[ https://issues.apache.org/jira/browse/SPARK-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10227: -- Shepherd: Sean Owen Affects Version/s: 1.5.0 Target Version/s: (was: 1.5.0) Hm I thought we zapped most or all of those, or else we wouldn't be able to build a 1.5 release candidate for Scala 2.11: https://repository.apache.org/content/repositories/orgapachespark-1137/org/apache/spark/ I wonder if it's only the SBT build that is set to fail on warnings? Cleaning these up would be the fastest solution anyway. Are you in a position to propose a PR? I can work with you on that as I have done a fair bit of warning cleanup over time. sbt build on Scala 2.11 fails - Key: SPARK-10227 URL: https://issues.apache.org/jira/browse/SPARK-10227 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.0 Reporter: Luc Bourlier Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) fails to build with sbt on Scala 2.11. Most of the warning are about the {{@transient}} annotation not being set on relevant elements, and a few pointing to some potential bugs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4223) Support * (meaning all users) as part of the acls
[ https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4223: - Assignee: Zhuo Liu Support * (meaning all users) as part of the acls - Key: SPARK-4223 URL: https://issues.apache.org/jira/browse/SPARK-4223 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Thomas Graves Assignee: Zhuo Liu Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10228) Integer overflow in VertexRDDImpl.count
Robin Cheng created SPARK-10228: --- Summary: Integer overflow in VertexRDDImpl.count Key: SPARK-10228 URL: https://issues.apache.org/jira/browse/SPARK-10228 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.4.1 Reporter: Robin Cheng VertexRDDImpl overrides RDD.count() but aggregates Int instead of Long: /** The number of vertices in the RDD. */ override def count(): Long = { partitionsRDD.map(_.size).reduce(_ + _) } This causes Pregel to stop iterating when the number of messages is negative, giving incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711630#comment-14711630 ] Sean Owen commented on SPARK-10229: --- Repeat of https://issues.apache.org/jira/browse/SPARK-10037 ? Are you using -Dscala-2.11? Wrong jline dependency when compiled against Scala 2.11 --- Key: SPARK-10229 URL: https://issues.apache.org/jira/browse/SPARK-10229 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Cheng Lian Priority: Blocker Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork of jline is gone and you can just depend on the official jline. The nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure when Spark is built against Scala 2.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls
[ https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711568#comment-14711568 ] Sean Owen commented on SPARK-4223: -- Done, added as Contributor and assigned Support * (meaning all users) as part of the acls - Key: SPARK-4223 URL: https://issues.apache.org/jira/browse/SPARK-4223 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Thomas Graves Assignee: Zhuo Liu Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10227) sbt build on Scala 2.11 fails
[ https://issues.apache.org/jira/browse/SPARK-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711634#comment-14711634 ] Luc Bourlier commented on SPARK-10227: -- I am working of a PR right now. sbt build on Scala 2.11 fails - Key: SPARK-10227 URL: https://issues.apache.org/jira/browse/SPARK-10227 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.0 Reporter: Luc Bourlier Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) fails to build with sbt on Scala 2.11. Most of the warning are about the {{@transient}} annotation not being set on relevant elements, and a few pointing to some potential bugs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10219) Error when additional options provided as variable in write.df
[ https://issues.apache.org/jira/browse/SPARK-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711642#comment-14711642 ] Shivaram Venkataraman commented on SPARK-10219: --- I think thats happening because `mode` is actually an argument name that is taken in by the write.df method -- So I am not sure you need option=mode, but just mode=mode or mode=append should work ? Error when additional options provided as variable in write.df -- Key: SPARK-10219 URL: https://issues.apache.org/jira/browse/SPARK-10219 Project: Spark Issue Type: Bug Components: R Affects Versions: 1.4.0 Environment: SparkR shell Reporter: Samuel Alexander Labels: spark-shell, sparkR Opened a SparkR shell Created a df using df - jsonFile(sqlContext, examples/src/main/resources/people.json) Assigned a variable like below mode - append When write.df called using below statement got the mentioned error write.df(df, source=org.apache.spark.sql.parquet, path=par_path, option=mode) Error in writeType(con, type) : Unsupported type for serialization name Whereas mode is passed as append itself, i.e. not via mode variable as below everything works fine write.df(df, source=org.apache.spark.sql.parquet, path=par_path, option=append) Note: For parquet it is not needed to hanve option. But we are using Spark Salesforce package (http://spark-packages.org/package/springml/spark-salesforce) which require additional options to be passed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711646#comment-14711646 ] Cheng Lian commented on SPARK-10229: Sorry, I was using {{-Pscala-2.11}}. Thanks for clarification! Wrong jline dependency when compiled against Scala 2.11 --- Key: SPARK-10229 URL: https://issues.apache.org/jira/browse/SPARK-10229 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Cheng Lian Priority: Blocker Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork of jline is gone and you can just depend on the official jline. The nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure when Spark is built against Scala 2.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10230) LDA public API should use docConcentration
[ https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10230: Assignee: (was: Apache Spark) LDA public API should use docConcentration -- Key: SPARK-10230 URL: https://issues.apache.org/jira/browse/SPARK-10230 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor {{alpha}} is provided as an alias to {{docConcentration}} because it is commonly used in literature. However, we should prefer {{docConcentration}} since it is unambiguous what we mean. The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate any public API's using {{alpha}} directly and refer users to the corresponding {{docConcentration}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10230) LDA public API should use docConcentration
[ https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711691#comment-14711691 ] Apache Spark commented on SPARK-10230: -- User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8422 LDA public API should use docConcentration -- Key: SPARK-10230 URL: https://issues.apache.org/jira/browse/SPARK-10230 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor {{alpha}} is provided as an alias to {{docConcentration}} because it is commonly used in literature. However, we should prefer {{docConcentration}} since it is unambiguous what we mean. The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate any public API's using {{alpha}} directly and refer users to the corresponding {{docConcentration}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10227) sbt build on Scala 2.11 fails
Luc Bourlier created SPARK-10227: Summary: sbt build on Scala 2.11 fails Key: SPARK-10227 URL: https://issues.apache.org/jira/browse/SPARK-10227 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Luc Bourlier Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) fails to build with sbt on Scala 2.11. Most of the warning are about the {{@transient}} annotation not being set on relevant elements, and a few pointing to some potential bugs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10230) LDA public API should use docConcentration
[ https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711660#comment-14711660 ] Feynman Liang commented on SPARK-10230: --- Working on this LDA public API should use docConcentration -- Key: SPARK-10230 URL: https://issues.apache.org/jira/browse/SPARK-10230 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor {{alpha}} is provided as an alias to {{docConcentration}} because it is commonly used in literature. However, we should prefer {{docConcentration}} since it is unambiguous what we mean. The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate any public API's using {{alpha}} directly and refer users to the corresponding {{docConcentration}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10230) LDA public API should use docConcentration
Feynman Liang created SPARK-10230: - Summary: LDA public API should use docConcentration Key: SPARK-10230 URL: https://issues.apache.org/jira/browse/SPARK-10230 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor {{alpha}} is provided as an alias to {{docConcentration}} because it is commonly used in literature. However, we should prefer {{docConcentration}} since it is unambiguous what we mean. The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate any public API's using {{alpha}} directly and refer users to the corresponding {{docConcentration}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10198) Turn off Hive verifyPartitionPath by default
[ https://issues.apache.org/jira/browse/SPARK-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10198. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8404 [https://github.com/apache/spark/pull/8404] Turn off Hive verifyPartitionPath by default Key: SPARK-10198 URL: https://issues.apache.org/jira/browse/SPARK-10198 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0, 1.5.0 Reporter: Michael Armbrust Assignee: Michael Armbrust Priority: Blocker Fix For: 1.5.0 I've seen several cases in production where this option either causes us to fail reading valid tables, or incorrectly returns no results. It also invalidates our new metastore partition pruning feature. Since there is not much time to dig into the root cause, I propose we turn it off by default for Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8531) Update ML user guide for MinMaxScaler
[ https://issues.apache.org/jira/browse/SPARK-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-8531. -- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7211 [https://github.com/apache/spark/pull/7211] Update ML user guide for MinMaxScaler - Key: SPARK-8531 URL: https://issues.apache.org/jira/browse/SPARK-8531 Project: Spark Issue Type: Documentation Components: ML Affects Versions: 1.5.0 Reporter: yuhao yang Assignee: yuhao yang Priority: Minor Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10231) Update @Since annotation for mllib.classification
[ https://issues.apache.org/jira/browse/SPARK-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711682#comment-14711682 ] Apache Spark commented on SPARK-10231: -- User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/8421 Update @Since annotation for mllib.classification - Key: SPARK-10231 URL: https://issues.apache.org/jira/browse/SPARK-10231 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Minor Some public methods are missing @Since tags, and some versions are not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10188) Pyspark CrossValidator with RMSE selects incorrect model
[ https://issues.apache.org/jira/browse/SPARK-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noel Smith resolved SPARK-10188. Resolution: Fixed Pyspark CrossValidator with RMSE selects incorrect model Key: SPARK-10188 URL: https://issues.apache.org/jira/browse/SPARK-10188 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.0 Reporter: Noel Smith Pyspark {{CrossValidator}} is giving incorrect results when selecting estimators using RMSE as an evaluation metric. In the example below, it should be selecting the {{LogisticRegression}} estimator with zero regularization as that gives the most accurate result, but instead it selects the one with the largest. Probably related to: SPARK-10097 {code} from pyspark.ml.evaluation import RegressionEvaluator from pyspark.ml.regression import LinearRegression from pyspark.ml.tuning import ParamGridBuilder, CrossValidator, CrossValidatorModel from pyspark.ml.feature import Binarizer from pyspark.mllib.linalg import Vectors from pyspark.sql import SQLContext sqlContext = SQLContext(sc) # Label = 2 * feature train = sqlContext.createDataFrame([ (Vectors.dense([10.0]), 20.0), (Vectors.dense([100.0]), 200.0), (Vectors.dense([1000.0]), 2000.0)] * 10, [features, label]) test = sqlContext.createDataFrame([ (Vectors.dense([1000.0]),)], [features]) # Expected prediction 2000.0 print LinearRegression(regParam=0.0).fit(train).transform(test).collect() # Predicts 2000.0 (perfect) print LinearRegression(regParam=100.0).fit(train).transform(test).collect() # Predicts 1869.31 print LinearRegression(regParam=100.0).fit(train).transform(test).collect() # 741.08 (worst) # Cross-validation lr = LinearRegression() rmse_eval = RegressionEvaluator(metricName=rmse) grid = (ParamGridBuilder() .addGrid( lr.regParam, [0.0, 100.0, 100.0] ) .build()) cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=rmse_eval) cv_model = cv.fit(train) cv_model.bestModel.transform(test).collect() # Predicts 741.08 (i.e. worst model selected) {code} Once workaround for users would be to add a wrapper around the selected evaluator to invert the metric: {code} class InvertedEvaluator(Evaluator): def __init__(self, evaluator): super(Evaluator, self).__init__() self.evaluator = evaluator def _evaluate(self, dataset): return -self.evaluator.evaluate(dataset) invertedEvaluator = InvertedEvaluator(RegressionEvaluator(metricName=rmse)) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711701#comment-14711701 ] Koert Kuipers commented on SPARK-3655: -- Great. We have stress tested it with millions of records per key (and only 1.5g of ram per executor) to make sure there was no hidden assumption that data needs to fit in memory somehow, and it worked fine. Seems the shuffle-based sort keeps it promise... Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711291#comment-14711291 ] Jason Hubbard commented on SPARK-3533: -- Spark SQL has the ability to write to multiple file locations already SPARK-3007. I'm not recommending converting your RDD to DataFrame just to write to multiple locations, but it might be beneficial for them to share the same mechanism. One current limitation of the Spark SQL implementation is that each split will open a new Writer for each hive partition, and if there are a lot of hive partitions spread across the splits then it will cause many small files and possibly degrade performance because of memory usage. Add saveAsTextFileByKey() method to RDDs Key: SPARK-3533 URL: https://issues.apache.org/jira/browse/SPARK-3533 Project: Spark Issue Type: Improvement Components: PySpark, Spark Core Affects Versions: 1.1.0 Reporter: Nicholas Chammas Users often have a single RDD of key-value pairs that they want to save to multiple locations based on the keys. For example, say I have an RDD like this: {code} a = sc.parallelize(['Nick', 'Nancy', 'Bob', 'Ben', 'Frankie']).keyBy(lambda x: x[0]) a.collect() [('N', 'Nick'), ('N', 'Nancy'), ('B', 'Bob'), ('B', 'Ben'), ('F', 'Frankie')] a.keys().distinct().collect() ['B', 'F', 'N'] {code} Now I want to write the RDD out to different paths depending on the keys, so that I have one output directory per distinct key. Each output directory could potentially have multiple {{part-}} files, one per RDD partition. So the output would look something like: {code} /path/prefix/B [/part-1, /part-2, etc] /path/prefix/F [/part-1, /part-2, etc] /path/prefix/N [/part-1, /part-2, etc] {code} Though it may be possible to do this with some combination of {{saveAsNewAPIHadoopFile()}}, {{saveAsHadoopFile()}}, and the {{MultipleTextOutputFormat}} output format class, it isn't straightforward. It's not clear if it's even possible at all in PySpark. Please add a {{saveAsTextFileByKey()}} method or something similar to RDDs that makes it easy to save RDDs out to multiple locations at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711382#comment-14711382 ] Koert Kuipers commented on SPARK-3655: -- glad to hear it worked well. totally agree guava dependency mismatch is a pain. spark-sorted does not have a dependency on guava. could it be one of your other dependencies uses guava? Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711410#comment-14711410 ] Herman van Hovell edited comment on SPARK-10226 at 8/25/15 3:11 PM: Apparently most databases support this: http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql I wouldn't call this a bug though. It is more of an improvement. was (Author: hvanhovell): Apparently most databases support this: http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10183) Expose the SparkR backend api
[ https://issues.apache.org/jira/browse/SPARK-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711417#comment-14711417 ] Amos commented on SPARK-10183: -- Pushed code up last night to link R to a running spark context. I'm not able to post a link from my iPhone but it's in elbamos/incubator-Zeppelin, branch reinterpreter, and the code you'll care about is in classes RContext (which does the work), RBackendHelper (because the Backend is private) and RStatics. Expose the SparkR backend api - Key: SPARK-10183 URL: https://issues.apache.org/jira/browse/SPARK-10183 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Amos Priority: Minor The Backend class is currently scoped to the api.r package. I'm accessing it, for the Zeppelin project, so I can start SparkR against an already-running spark context. To do this I've had to create a helper class withing api.r. It would be better if the backend were exposed. It isn't a tremendous amount of functionality - create a backend, start it, stop it. (If we want to be really clever, it could also be passed a spark context and make that available to R clients, facilitate passing rdd's back and forth, etc. I'll be pushing code that does some of that to Zeppelin in a day or two if that helps.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10055) San Francisco Crime Classification
[ https://issues.apache.org/jira/browse/SPARK-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711298#comment-14711298 ] Kai Sasaki commented on SPARK-10055: I submitted the initial version of this competition. Although the score is not good, there are several points I found in using Spark ML API. There might be something which is just caused by my lack of knowledge of Spark ML. So if we can already solve with existing code, please let me know. * There does not seem to be {{Transformer}} which can cast type of columns. In this case, {{X}} and {{Y}} are String as default when read by [spark-csv|http://spark-packages.org/package/databricks/spark-csv]. In order to use {{StandardScaler}} to {{X}} and {{Y}}, they must be numeric types. I cannot do that with Spark ML `Transformer`. Fortunately, {{spark-csv}} can infer types of schema to reading all data once. But in case of no such option in reading library, I think it is better to cast column types in Spark ML pipeline. * {{StringIndexer}} exports its labels in order by frequencies. But in this competition, we have to write in alphabetical order. We have to write some extra code to convert frequency order labels to alphabetical order. * {{StandardScaler}} can only receive vector data as its own input. In this case, I want to scale {{X}} and {{Y}} with {{StandardScaler}}. But these are simple double data, it is necessary to assemble these values into feature vector. Is there some case to use `StandardScaler` to simple Int data or Double data? We have to assemble these data into a feature vector before scaling? The code is [here|https://github.com/Lewuathe/kaggle-jobs/blob/master/src/main/scala/com/lewuathe/SfCrimeClassification.scala]. Thank you. San Francisco Crime Classification -- Key: SPARK-10055 URL: https://issues.apache.org/jira/browse/SPARK-10055 Project: Spark Issue Type: Sub-task Components: ML Reporter: Xiangrui Meng Assignee: Xusen Yin Apply ML pipeline API to San Francisco Crime Classification (https://www.kaggle.com/c/sf-crime). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10226: Assignee: Apache Spark Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei Assignee: Apache Spark DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count(*) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10215) Div of Decimal returns null
[ https://issues.apache.org/jira/browse/SPARK-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711327#comment-14711327 ] Yi Zhou commented on SPARK-10215: - This issue cause cases which is relative to 'decimal' type to fail , so hopefully it can be fixed in Spark 1.5.0. Thanks in advance ! Div of Decimal returns null --- Key: SPARK-10215 URL: https://issues.apache.org/jira/browse/SPARK-10215 Project: Spark Issue Type: Bug Components: SQL Reporter: Cheng Hao Priority: Blocker {code} val d = Decimal(1.12321) val df = Seq((d, 1)).toDF(a, b) df.selectExpr(b * a / b).collect() = Array(Row(null)) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711400#comment-14711400 ] Herman van Hovell edited comment on SPARK-10226 at 8/25/15 3:05 PM: In what SQL dialect is {{!=}} a valid symbol for {{not equals}}? I thought pretty much all SQL environments use {{}} for this. was (Author: hvanhovell): In what SQL dialect is {{!=}} a valid symbol {{not equals}}? I thought pretty much all SQL environments use {{}} for this. Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711363#comment-14711363 ] Apache Spark commented on SPARK-10226: -- User 'small-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/8420 Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711410#comment-14711410 ] Herman van Hovell commented on SPARK-10226: --- Apparently most databases support this: http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6951) History server slow startup if the event log directory is large
[ https://issues.apache.org/jira/browse/SPARK-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711292#comment-14711292 ] Thomas Graves commented on SPARK-6951: -- Sorry I was wrong.. I just went and tested it and it does start up fairly quickly now. I'm having problems with it getting stuck reading large files which is a separate issue. History server slow startup if the event log directory is large --- Key: SPARK-6951 URL: https://issues.apache.org/jira/browse/SPARK-6951 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.3.0 Reporter: Matt Cheah I started my history server, then navigated to the web UI where I expected to be able to view some completed applications, but the webpage was not available. It turned out that the History Server was not finished parsing all of the event logs in the event log directory that I had specified. I had accumulated a lot of event logs from months of running Spark, so it would have taken a very long time for the History Server to crunch through them all. I purged the event log directory and started from scratch, and the UI loaded immediately. We should have a pagination strategy or parse the directory lazily to avoid needing to wait after starting the history server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711351#comment-14711351 ] Apache Spark commented on SPARK-10226: -- User 'small-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/8419 Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711400#comment-14711400 ] Herman van Hovell commented on SPARK-10226: --- In what SQL dialect is {{!=}} a valid symbol {{not equals}}? I thought pretty much all SQL environments use {{}} for this. Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711435#comment-14711435 ] Nick Xie commented on SPARK-3655: - It is in your api/java/GroupSorted.scala line 8: import com.google.common.collect.{ Ordering = GuavaOrdering } . line 29: private implicit def ordering[K]: Ordering[K] = comparatorToOrdering(GuavaOrdering.natural.asInstanceOf[Comparator[K]]) Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711451#comment-14711451 ] Nick Xie commented on SPARK-3655: - For the record, the data file is 25 million rows and about 3000 unique keys, so that's about 8000 records on average to be sorted per key on the timestamp. Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10226: Assignee: (was: Apache Spark) Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count(*) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711302#comment-14711302 ] Apache Spark commented on SPARK-10226: -- User 'small-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/8418 Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count(*) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls
[ https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711384#comment-14711384 ] Thomas Graves commented on SPARK-4223: -- [~srowen] [~rxin] do one of you have permissions to give 'zhuoliu' committer access so we can assign this jira to him? Support * (meaning all users) as part of the acls - Key: SPARK-4223 URL: https://issues.apache.org/jira/browse/SPARK-4223 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Thomas Graves Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls
[ https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711574#comment-14711574 ] Zhuo Liu commented on SPARK-4223: - Thank you! [~sowen] [~tgraves] Support * (meaning all users) as part of the acls - Key: SPARK-4223 URL: https://issues.apache.org/jira/browse/SPARK-4223 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Thomas Graves Assignee: Zhuo Liu Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11
Cheng Lian created SPARK-10229: -- Summary: Wrong jline dependency when compiled against Scala 2.11 Key: SPARK-10229 URL: https://issues.apache.org/jira/browse/SPARK-10229 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Cheng Lian Priority: Blocker Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork of jline is gone and you can just depend on the official jline. The nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure when Spark is built against Scala 2.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10228) Integer overflow in VertexRDDImpl.count
[ https://issues.apache.org/jira/browse/SPARK-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10228. --- Resolution: Duplicate Always best to look at master first, since you'd see it was already fixed: https://github.com/apache/spark/blame/9e952ecbce670e9b532a1c664a4d03b66e404112/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexRDDImpl.scala https://issues.apache.org/jira/browse/SPARK-3190 Integer overflow in VertexRDDImpl.count --- Key: SPARK-10228 URL: https://issues.apache.org/jira/browse/SPARK-10228 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.4.1 Reporter: Robin Cheng VertexRDDImpl overrides RDD.count() but aggregates Int instead of Long: /** The number of vertices in the RDD. */ override def count(): Long = { partitionsRDD.map(_.size).reduce(_ + _) } This causes Pregel to stop iterating when the number of messages is negative, giving incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangwei updated SPARK-10226: Description: DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) was: DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count(*) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711367#comment-14711367 ] Nick Xie commented on SPARK-3655: - It worked really well on the cluster. :-) I did notice that it had a dependency on Google guava classes. Any way to rid of this dependency? guava dependency mismatch is a pain with spark and hadoop versions. Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11
[ https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-10229. Resolution: Not A Problem I was using {{-Pscala-2.11}} since {{scala-2.11}} is a POM profile. But it should be {{-Dscala-2.11}}. Wrong jline dependency when compiled against Scala 2.11 --- Key: SPARK-10229 URL: https://issues.apache.org/jira/browse/SPARK-10229 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.5.0 Reporter: Cheng Lian Priority: Blocker Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork of jline is gone and you can just depend on the official jline. The nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure when Spark is built against Scala 2.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests
[ https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712443#comment-14712443 ] Feynman Liang commented on SPARK-10253: --- Working on this Remove Guava dependencies in MLlib java tests - Key: SPARK-10253 URL: https://issues.apache.org/jira/browse/SPARK-10253 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Feynman Liang Priority: Minor Many tests depend on Google Guava's {{Lists.newArrayList}} when {{java.util.Arrays.asList}} could be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10253) Remove Guava dependencies in MLlib java tests
Feynman Liang created SPARK-10253: - Summary: Remove Guava dependencies in MLlib java tests Key: SPARK-10253 URL: https://issues.apache.org/jira/browse/SPARK-10253 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Feynman Liang Priority: Minor Many tests depend on Google Guava's {{Lists.newArrayList}} when {{java.util.Arrays.asList}} could be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10273) Add @since annotation to pyspark.mllib.feature
Xiangrui Meng created SPARK-10273: - Summary: Add @since annotation to pyspark.mllib.feature Key: SPARK-10273 URL: https://issues.apache.org/jira/browse/SPARK-10273 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10274) Add @since annotation to pyspark.mllib.fpm
Xiangrui Meng created SPARK-10274: - Summary: Add @since annotation to pyspark.mllib.fpm Key: SPARK-10274 URL: https://issues.apache.org/jira/browse/SPARK-10274 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10269) Add @since annotation to pyspark.mllib.classification
Xiangrui Meng created SPARK-10269: - Summary: Add @since annotation to pyspark.mllib.classification Key: SPARK-10269 URL: https://issues.apache.org/jira/browse/SPARK-10269 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10270) Add/Replace some Java friendly DataFrame API
Cheng Hao created SPARK-10270: - Summary: Add/Replace some Java friendly DataFrame API Key: SPARK-10270 URL: https://issues.apache.org/jira/browse/SPARK-10270 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Currently in DataFrame, we have API like: {code} def join(right: DataFrame, usingColumns: Seq[String]): DataFrame def dropDuplicates(colNames: Seq[String]): DataFrame def dropDuplicates(colNames: Array[String]): DataFrame {code} Those API not like the so friendly to Java programmers, change it to: {code} def join(right: DataFrame, usingColumns: String*): DataFrame def dropDuplicates(colNames: String*): DataFrame {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10277) Add @since annotation to pyspark.mllib.regression
Xiangrui Meng created SPARK-10277: - Summary: Add @since annotation to pyspark.mllib.regression Key: SPARK-10277 URL: https://issues.apache.org/jira/browse/SPARK-10277 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation
Xiangrui Meng created SPARK-10276: - Summary: Add @since annotation to pyspark.mllib.recommendation Key: SPARK-10276 URL: https://issues.apache.org/jira/browse/SPARK-10276 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10279) Add @since annotation to pyspark.mllib.util
Xiangrui Meng created SPARK-10279: - Summary: Add @since annotation to pyspark.mllib.util Key: SPARK-10279 URL: https://issues.apache.org/jira/browse/SPARK-10279 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10271) Add @since annotation to pyspark.mllib.clustering
Xiangrui Meng created SPARK-10271: - Summary: Add @since annotation to pyspark.mllib.clustering Key: SPARK-10271 URL: https://issues.apache.org/jira/browse/SPARK-10271 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10275) Add @since annotation to pyspark.mllib.random
Xiangrui Meng created SPARK-10275: - Summary: Add @since annotation to pyspark.mllib.random Key: SPARK-10275 URL: https://issues.apache.org/jira/browse/SPARK-10275 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10272) Add @since annotation to pyspark.mllib.evaluation
Xiangrui Meng created SPARK-10272: - Summary: Add @since annotation to pyspark.mllib.evaluation Key: SPARK-10272 URL: https://issues.apache.org/jira/browse/SPARK-10272 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10278) Add @since annotation to pyspark.mllib.tree
Xiangrui Meng created SPARK-10278: - Summary: Add @since annotation to pyspark.mllib.tree Key: SPARK-10278 URL: https://issues.apache.org/jira/browse/SPARK-10278 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8360) Streaming DataFrames
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712377#comment-14712377 ] Adrian Wang commented on SPARK-8360: https://github.com/intel-bigdata/spark-streamingsql Our streaming sql project is highly related to this jira ticket. Streaming DataFrames Key: SPARK-8360 URL: https://issues.apache.org/jira/browse/SPARK-8360 Project: Spark Issue Type: Umbrella Components: SQL, Streaming Reporter: Reynold Xin Umbrella ticket to track what's needed to make streaming DataFrame a reality. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON
[ https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9964: --- Assignee: Apache Spark PySpark DataFrameReader accept RDD of String for JSON - Key: SPARK-9964 URL: https://issues.apache.org/jira/browse/SPARK-9964 Project: Spark Issue Type: New Feature Components: PySpark, SQL Reporter: Joseph K. Bradley Assignee: Apache Spark Priority: Minor It would be nice (but not necessary) for the PySpark DataFrameReader to accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path. If this JIRA is accepted, it should probably be duplicated to cover the other input types (not just JSON). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON
[ https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712409#comment-14712409 ] Apache Spark commented on SPARK-9964: - User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8444 PySpark DataFrameReader accept RDD of String for JSON - Key: SPARK-9964 URL: https://issues.apache.org/jira/browse/SPARK-9964 Project: Spark Issue Type: New Feature Components: PySpark, SQL Reporter: Joseph K. Bradley Priority: Minor It would be nice (but not necessary) for the PySpark DataFrameReader to accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path. If this JIRA is accepted, it should probably be duplicated to cover the other input types (not just JSON). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON
[ https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9964: --- Assignee: (was: Apache Spark) PySpark DataFrameReader accept RDD of String for JSON - Key: SPARK-9964 URL: https://issues.apache.org/jira/browse/SPARK-9964 Project: Spark Issue Type: New Feature Components: PySpark, SQL Reporter: Joseph K. Bradley Priority: Minor It would be nice (but not necessary) for the PySpark DataFrameReader to accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path. If this JIRA is accepted, it should probably be duplicated to cover the other input types (not just JSON). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10238) Update @Since annotation for mllib.linalg
[ https://issues.apache.org/jira/browse/SPARK-10238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712410#comment-14712410 ] DB Tsai commented on SPARK-10238: - Resolved in master and branch 1.5 Update @Since annotation for mllib.linalg - Key: SPARK-10238 URL: https://issues.apache.org/jira/browse/SPARK-10238 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10281) Add @since annotation to pyspark.ml.clustering
Xiangrui Meng created SPARK-10281: - Summary: Add @since annotation to pyspark.ml.clustering Key: SPARK-10281 URL: https://issues.apache.org/jira/browse/SPARK-10281 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10280) Add @since annotation to pyspark.ml.classification
Xiangrui Meng created SPARK-10280: - Summary: Add @since annotation to pyspark.ml.classification Key: SPARK-10280 URL: https://issues.apache.org/jira/browse/SPARK-10280 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10285) Add @since annotation to pyspark.ml.util
Xiangrui Meng created SPARK-10285: - Summary: Add @since annotation to pyspark.ml.util Key: SPARK-10285 URL: https://issues.apache.org/jira/browse/SPARK-10285 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10284) Add @since annotation to pyspark.ml.tuning
Xiangrui Meng created SPARK-10284: - Summary: Add @since annotation to pyspark.ml.tuning Key: SPARK-10284 URL: https://issues.apache.org/jira/browse/SPARK-10284 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10282) Add @since annotation to pyspark.ml.recommendation
Xiangrui Meng created SPARK-10282: - Summary: Add @since annotation to pyspark.ml.recommendation Key: SPARK-10282 URL: https://issues.apache.org/jira/browse/SPARK-10282 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10283) Add @since annotation to pyspark.ml.regression
Xiangrui Meng created SPARK-10283: - Summary: Add @since annotation to pyspark.ml.regression Key: SPARK-10283 URL: https://issues.apache.org/jira/browse/SPARK-10283 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10286) Add @since annotation to pyspark.ml.param and pyspark.ml.*
Xiangrui Meng created SPARK-10286: - Summary: Add @since annotation to pyspark.ml.param and pyspark.ml.* Key: SPARK-10286 URL: https://issues.apache.org/jira/browse/SPARK-10286 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Xiangrui Meng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10269) Add @since annotation to pyspark.mllib.classification
[ https://issues.apache.org/jira/browse/SPARK-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10269: -- Target Version/s: 1.6.0 Add @since annotation to pyspark.mllib.classification - Key: SPARK-10269 URL: https://issues.apache.org/jira/browse/SPARK-10269 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10272) Add @since annotation to pyspark.mllib.evaluation
[ https://issues.apache.org/jira/browse/SPARK-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10272: -- Target Version/s: 1.6.0 Add @since annotation to pyspark.mllib.evaluation - Key: SPARK-10272 URL: https://issues.apache.org/jira/browse/SPARK-10272 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10271) Add @since annotation to pyspark.mllib.clustering
[ https://issues.apache.org/jira/browse/SPARK-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10271: -- Target Version/s: 1.6.0 Add @since annotation to pyspark.mllib.clustering - Key: SPARK-10271 URL: https://issues.apache.org/jira/browse/SPARK-10271 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10273) Add @since annotation to pyspark.mllib.feature
[ https://issues.apache.org/jira/browse/SPARK-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10273: -- Target Version/s: 1.6.0 Add @since annotation to pyspark.mllib.feature -- Key: SPARK-10273 URL: https://issues.apache.org/jira/browse/SPARK-10273 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib, PySpark Reporter: Xiangrui Meng Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table
Yin Huai created SPARK-10287: Summary: After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table Key: SPARK-10287 URL: https://issues.apache.org/jira/browse/SPARK-10287 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Yin Huai Priority: Blocker {code} val df = sqlContext.read.format(json).load(aPartitionedJsonData) val columnStr = df.schema.map(_.name).mkString(,) println(scolumns: $columnStr) val hash = df .selectExpr(shash($columnStr) as hashValue) .groupBy() .sum(hashValue) .head() .getLong(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table
[ https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-10287: - Description: I have a partitioned json table with around 2000 partitions. {code} val df = sqlContext.read.format(json).load(aPartitionedJsonData) val columnStr = df.schema.map(_.name).mkString(,) println(scolumns: $columnStr) val hash = df .selectExpr(shash($columnStr) as hashValue) .groupBy() .sum(hashValue) .head() .getLong(0) {code} was: {code} val df = sqlContext.read.format(json).load(aPartitionedJsonData) val columnStr = df.schema.map(_.name).mkString(,) println(scolumns: $columnStr) val hash = df .selectExpr(shash($columnStr) as hashValue) .groupBy() .sum(hashValue) .head() .getLong(0) {code} After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table Key: SPARK-10287 URL: https://issues.apache.org/jira/browse/SPARK-10287 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Yin Huai Priority: Blocker I have a partitioned json table with around 2000 partitions. {code} val df = sqlContext.read.format(json).load(aPartitionedJsonData) val columnStr = df.schema.map(_.name).mkString(,) println(scolumns: $columnStr) val hash = df .selectExpr(shash($columnStr) as hashValue) .groupBy() .sum(hashValue) .head() .getLong(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10220) org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word
[ https://issues.apache.org/jira/browse/SPARK-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10220: Assignee: (was: Apache Spark) org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word Key: SPARK-10220 URL: https://issues.apache.org/jira/browse/SPARK-10220 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: fang fang chen Attachments: SPARK-10220.patch Reproduce steps: var options: HashMap[String, String] = new HashMap options.put(driver, com.mysql.jdbc.Driver) options.put(url, url_total) options.put(dbtable, table)//one column named desc options.put(lowerBound, lower_bound.toString()) options.put(upperBound, upper_bound.toString()) options.put(numPartitions, partitions.toString()); options.put(partitionColumn, id); val jdbcDF = sqlContext.load(jdbc, options) jdbcDF.save(output) Exception: 15/08/24 19:02:34 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 (TID 3) com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'desc,warning_stat,money_limit,real_name,region_lv1,region_lv2,region_lv3,region_' at line 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10220) org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word
[ https://issues.apache.org/jira/browse/SPARK-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712394#comment-14712394 ] Apache Spark commented on SPARK-10220: -- User 'ffchenAtCloudera' has created a pull request for this issue: https://github.com/apache/spark/pull/8443 org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word Key: SPARK-10220 URL: https://issues.apache.org/jira/browse/SPARK-10220 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: fang fang chen Attachments: SPARK-10220.patch Reproduce steps: var options: HashMap[String, String] = new HashMap options.put(driver, com.mysql.jdbc.Driver) options.put(url, url_total) options.put(dbtable, table)//one column named desc options.put(lowerBound, lower_bound.toString()) options.put(upperBound, upper_bound.toString()) options.put(numPartitions, partitions.toString()); options.put(partitionColumn, id); val jdbcDF = sqlContext.load(jdbc, options) jdbcDF.save(output) Exception: 15/08/24 19:02:34 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 (TID 3) com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'desc,warning_stat,money_limit,real_name,region_lv1,region_lv2,region_lv3,region_' at line 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10254) Remove Guava dependencies in spark.ml.feature
[ https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10254: Assignee: Apache Spark Remove Guava dependencies in spark.ml.feature - Key: SPARK-10254 URL: https://issues.apache.org/jira/browse/SPARK-10254 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10254) Remove Guava dependencies in spark.ml.feature
[ https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10254: Assignee: (was: Apache Spark) Remove Guava dependencies in spark.ml.feature - Key: SPARK-10254 URL: https://issues.apache.org/jira/browse/SPARK-10254 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10254) Remove Guava dependencies in spark.ml.feature
[ https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712444#comment-14712444 ] Apache Spark commented on SPARK-10254: -- User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8445 Remove Guava dependencies in spark.ml.feature - Key: SPARK-10254 URL: https://issues.apache.org/jira/browse/SPARK-10254 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10255) Remove Guava dependencies in spark.ml.param
Feynman Liang created SPARK-10255: - Summary: Remove Guava dependencies in spark.ml.param Key: SPARK-10255 URL: https://issues.apache.org/jira/browse/SPARK-10255 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712456#comment-14712456 ] Vinod KC commented on SPARK-10202: -- I'm working on this Specify schema during KMeansModel.save to avoid reflection -- Key: SPARK-10202 URL: https://issues.apache.org/jira/browse/SPARK-10202 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110] currently infers a schema from a case class when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=
[ https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712457#comment-14712457 ] wangwei commented on SPARK-10226: - I tested the case in Spark-1.4 and found that exclamation mark works, so != was supported in SparkSQL. Error occured in SparkSQL when using != Key: SPARK-10226 URL: https://issues.apache.org/jira/browse/SPARK-10226 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: wangwei DataSource: src/main/resources/kv1.txt SQL: 1. create table src(id string, name string); 2. load data local inpath '${SparkHome}/examples/src/main/resources/kv1.txt' into table src; 3. select count( * ) from src where id != '0'; [ERROR] Could not expand event java.lang.IllegalArgumentException: != 0;: event not found at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779) at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631) at jline.console.ConsoleReader.accept(ConsoleReader.java:2019) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666) at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10203) Specify schema during GLMClassificationModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712458#comment-14712458 ] Vinod KC commented on SPARK-10203: -- I'm working on this Specify schema during GLMClassificationModel.save to avoid reflection - Key: SPARK-10203 URL: https://issues.apache.org/jira/browse/SPARK-10203 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [GLMClassificationModel.save|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/classification/impl/GLMClassificationModel.scala#L38] currently infers a schema from a case class when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712462#comment-14712462 ] Vinod KC commented on SPARK-10206: -- I'm working on this Specify schema during IsotonicRegression.save to avoid reflection - Key: SPARK-10206 URL: https://issues.apache.org/jira/browse/SPARK-10206 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184] currently infers a schema from a case class when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10204) Specify schema during NaiveBayes.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712459#comment-14712459 ] Vinod KC commented on SPARK-10204: -- I'm working on this Specify schema during NaiveBayes.save to avoid reflection - Key: SPARK-10204 URL: https://issues.apache.org/jira/browse/SPARK-10204 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [NaiveBayes.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L181] currently infers a schema from a case class when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests
[ https://issues.apache.org/jira/browse/SPARK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10257: Assignee: Apache Spark Remove Guava dependencies in spark.mllib JavaTests -- Key: SPARK-10257 URL: https://issues.apache.org/jira/browse/SPARK-10257 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Assignee: Apache Spark Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests
[ https://issues.apache.org/jira/browse/SPARK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10257: Assignee: (was: Apache Spark) Remove Guava dependencies in spark.mllib JavaTests -- Key: SPARK-10257 URL: https://issues.apache.org/jira/browse/SPARK-10257 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10243) Update @Since annotation for mllib.tree
[ https://issues.apache.org/jira/browse/SPARK-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10243. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8442 [https://github.com/apache/spark/pull/8442] Update @Since annotation for mllib.tree --- Key: SPARK-10243 URL: https://issues.apache.org/jira/browse/SPARK-10243 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Affects Versions: 1.5.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Minor Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10104) Consolidate different forms of table identifiers
[ https://issues.apache.org/jira/browse/SPARK-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10104: Assignee: (was: Apache Spark) Consolidate different forms of table identifiers Key: SPARK-10104 URL: https://issues.apache.org/jira/browse/SPARK-10104 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Right now, we have QualifiedTableName, TableIdentifier, and Seq[String] to represent table identifiers. We should only have one form and looks TableIdentifier is the best one because it provides methods to get table name, database name, return unquoted string, and return quoted string. There will be TODOs having SPARK-10104 in it. Those places need to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10129) math function: stddev_samp
[ https://issues.apache.org/jira/browse/SPARK-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712439#comment-14712439 ] Yanbo Liang commented on SPARK-10129: - I'm working on it. math function: stddev_samp -- Key: SPARK-10129 URL: https://issues.apache.org/jira/browse/SPARK-10129 Project: Spark Issue Type: New Feature Components: SQL Reporter: Davies Liu Use the STDDEV_SAMP function to return the standard deviation of a sample variance. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.bigsql.doc/doc/bsql_stdev_samp.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10255) Remove Guava dependencies in spark.ml.param
[ https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10255: Assignee: (was: Apache Spark) Remove Guava dependencies in spark.ml.param --- Key: SPARK-10255 URL: https://issues.apache.org/jira/browse/SPARK-10255 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10255) Remove Guava dependencies in spark.ml.param
[ https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10255: Assignee: Apache Spark Remove Guava dependencies in spark.ml.param --- Key: SPARK-10255 URL: https://issues.apache.org/jira/browse/SPARK-10255 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Assignee: Apache Spark Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10256) Remove Guava dependencies in spark.ml.classificaiton
Feynman Liang created SPARK-10256: - Summary: Remove Guava dependencies in spark.ml.classificaiton Key: SPARK-10256 URL: https://issues.apache.org/jira/browse/SPARK-10256 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10255) Remove Guava dependencies in spark.ml.param
[ https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712447#comment-14712447 ] Apache Spark commented on SPARK-10255: -- User 'feynmanliang' has created a pull request for this issue: https://github.com/apache/spark/pull/8446 Remove Guava dependencies in spark.ml.param --- Key: SPARK-10255 URL: https://issues.apache.org/jira/browse/SPARK-10255 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712461#comment-14712461 ] Vinod KC commented on SPARK-10205: -- I'm working on this Specify schema during PowerIterationClustering.save to avoid reflection --- Key: SPARK-10205 URL: https://issues.apache.org/jira/browse/SPARK-10205 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82] currently infers a schema from a case class when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10211) Specify schema during MatrixFactorizationModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712466#comment-14712466 ] Vinod KC commented on SPARK-10211: -- I'm working on this Specify schema during MatrixFactorizationModel.save to avoid reflection --- Key: SPARK-10211 URL: https://issues.apache.org/jira/browse/SPARK-10211 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor [MatrixFactorizationModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L361] currently infers a schema from a RDD of tuples when the schema is known and should be manually provided. See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org