[GitHub] spark pull request: [SPARK-11884] Drop multiple columns in the Dat...
Github user tedyu commented on the pull request: https://github.com/apache/spark/pull/9862#issuecomment-162783007 I can send out another PR if other people think that variant is needed. This PR has been closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162782886 **[Test build #47312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47312/consoleFull)** for PR 10189 at commit [`fa88169`](https://github.com/apache/spark/commit/fa88169fb8f95943c7da1ffbe8e10bb7cee4312a). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162782894 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10188#discussion_r46917559 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java --- @@ -386,6 +389,20 @@ public void testNestedTupleEncoder() { } @Test + public void testTypeEncoder() { --- End diff -- Sure. Thank you! Let me change it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12197] [SparkCore] Kryo & Avro - Suppor...
GitHub user RotemShaul opened a pull request: https://github.com/apache/spark/pull/10190 [SPARK-12197] [SparkCore] Kryo & Avro - Support Schema Repo Adding ability for efficient and dynamic generic records serialization You can merge this pull request into a Git repository by running: $ git pull https://github.com/RotemShaul/spark AvroSchemaRepo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10190 commit 62b4b6d5789175f4d38c63d68b8c9f7f141ac17b Author: rotems Date: 2015-12-01T22:33:01Z SPARK-12080: Kryo - Support multiple user registrators commit 6924a86e18814f5d7afe3188ed224bf894f6a03f Author: rotems Date: 2015-12-02T08:27:37Z SPARK-12080: Kryo - Support multiple user registrators. Updated configuration documentation. commit ca00a134eb2e3803fcd1e032be505ef8c0735ed1 Author: rotems Date: 2015-12-02T09:09:00Z SPARK-12080: Kryo - Support multiple user registrators. Changing String to Char commit 29e519a36b9ed0a664cd71bcc561d088f25ba2c0 Author: rotems Date: 2015-12-02T09:32:32Z SPARK-12080: Kryo - Support multiple user registrators. Improving style commit 6a4cb9bcd59e5f7f229f908406004e2d859552fd Author: rotems Date: 2015-12-03T23:50:02Z SPARK-12080: Kryo - Support multiple user registrators. Doc typo commit 1e9f50fce8beb39ddfd1a07d3e82dd097f095ea4 Author: rotems Date: 2015-12-08T07:24:02Z SPARK-12197: Kryo - Support Avro SchemaRepo commit 61201416c42980810534d1bdbafb3378a8133ac1 Author: rotems Date: 2015-12-08T07:42:47Z Merge branch 'master' into AvroSchemaRepo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][MLlib]add support of arbitrary l...
Github user ygcao commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r46923499 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -534,8 +588,13 @@ class Word2VecModel private[spark] ( // Need not divide with the norm of the given vector since it is constant. val cosVec = cosineVec.map(_.toDouble) var ind = 0 +var vecNorm = 1f +if (norm) + vecNorm = blas.snrm2(vectorSize, fVector, 1) --- End diff -- Glad to see another pull request demanded the same normalization. I think my change is more backward compatible and leave user choice of speed or normalized metrics for further operations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11605] [MLlib] ML 1.6 QA: API: Java com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10102#issuecomment-162797184 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11605] [MLlib] ML 1.6 QA: API: Java com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10102#issuecomment-162797185 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47316/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11605] [MLlib] ML 1.6 QA: API: Java com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10102#issuecomment-162797178 **[Test build #47316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47316/consoleFull)** for PR 10102 at commit [`5077aa7`](https://github.com/apache/spark/commit/5077aa7a8cadd7d14c1c1696876e23e0fd501f54). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `public class JavaQuantileDiscretizerExample `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][MLlib]add support of arbitrary l...
Github user ygcao commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r46921737 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -469,7 +469,32 @@ class Word2VecModel private[spark] ( this(Word2VecModel.buildWordIndex(model), Word2VecModel.buildWordVectors(model)) } - private def cosineSimilarity(v1: Array[Float], v2: Array[Float]): Double = { + /** + * get cosineSimilarity of two word. assumed to be from the vocabulary and used after model built, otherwise will error out + * @param v1 one word from the vocabulary + * @param v2 the other word from the vocabulary + * @return the cosinesimilarity score in the vector space of the given two words + */ + def cosineSimilarity(v1: String, v2: String): Double = { +return cosineSimilarity(getVectors(v1), getVectors(v2)) + } + + /** + * get the built vocabulary from the input + * @return a map of word to its index + */ + def getVocabulary:Map[String,Int]={ --- End diff -- GetVocabulary and getwordvectors is useful when you need to join or iterator the built vectors in batch. While getvectors is only useful to lookup vector for one specific known word in vocabulary which throws exception when word is out of vocabulary. The usages are very different. Will look into the dataframe version to see whether it can cover the batch usrcase to decide whether we can work around without adding getter here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162791608 **[Test build #47314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47314/consoleFull)** for PR 10189 at commit [`588acaf`](https://github.com/apache/spark/commit/588acafb786f46e22309cb55020d3de5d94f8453). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MESOS]Add documentation about submitting Spar...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10086#issuecomment-162801479 **[Test build #47318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47318/consoleFull)** for PR 10086 at commit [`952f116`](https://github.com/apache/spark/commit/952f1163254eddf1a3dd0b36761a7f7f0978964f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.5
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10098 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8534#issuecomment-162803003 Merged into master and branch-1.6. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12197] [SparkCore] Kryo & Avro - Suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10190#issuecomment-162802941 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12146] [SparkR] SparkR jsonFile should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10145#issuecomment-162793797 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12146] [SparkR] SparkR jsonFile should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10145#issuecomment-162793798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47313/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12146] [SparkR] SparkR jsonFile should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10145#issuecomment-162793714 **[Test build #47313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47313/consoleFull)** for PR 10145 at commit [`06ae53d`](https://github.com/apache/spark/commit/06ae53dfb7db7f1276d4ccf16160e85b285c3864). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162796584 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47314/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162796525 **[Test build #47314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47314/consoleFull)** for PR 10189 at commit [`588acaf`](https://github.com/apache/spark/commit/588acafb786f46e22309cb55020d3de5d94f8453). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11605] [MLlib] ML 1.6 QA: API: Java com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10102#issuecomment-162796594 **[Test build #47316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47316/consoleFull)** for PR 10102 at commit [`5077aa7`](https://github.com/apache/spark/commit/5077aa7a8cadd7d14c1c1696876e23e0fd501f54). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11619][SQL] cannot use UDTF in DataFram...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/9981#discussion_r46920864 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -73,6 +73,10 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer( df.select($"key", functions.json_tuple($"jstring", "f1", "f2", "f3", "f4", "f5")), expected) + +checkAnswer( + df.selectExpr("key", "json_tuple(jstring, 'f1', 'f2', 'f3', 'f4', 'f5')"), --- End diff -- @cloud-fan Thanks a lot for trying in hive. Wenchen, i searched for "lateral view" in spark code and didn't find a test case. I wanted to debug to study more about it. Also wenchen, i made a change in elementTypes computation like following override def elementTypes: Seq[(DataType, Boolean, String)] = fieldExpressions.zipWithIndex.map { case(l @ Literal(value, _), idx) if value.toString() != "null" => (StringType, true, value.toString()) case (_, idx) => (StringType, true, s"c$idx") } I can now see the alias names correctly. I am not sure if this is the right change however. Do you have any thoughts ? Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162796583 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11551][DOC][Example]Replace example cod...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/10002#issuecomment-162799563 @mengxr Yes, I'll submit a follow-up pr to refine it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-162799727 **[Test build #47317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47317/consoleFull)** for PR 8880 at commit [`0e847a6`](https://github.com/apache/spark/commit/0e847a66ef7ec51bec6ae540645e2456b257f7f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162799660 **[Test build #47310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47310/consoleFull)** for PR 10188 at commit [`968392b`](https://github.com/apache/spark/commit/968392be6be2e29aa3208938aa202bee8ed88c81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12196][Core]Store blocks in storage dev...
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/10192 [SPARK-12196][Core]Store blocks in storage devices with hierarchy way https://issues.apache.org/jira/browse/SPARK-12196 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yucai/spark hierarchy_store Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10192 commit 2f478ef559736dc5b1fb0964615c514b2b43a510 Author: yucaiDate: 2015-12-08T00:53:04Z Store blocks in storage devices with hierarchy way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11619][SQL] cannot use UDTF in DataFram...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9981#discussion_r46919957 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -73,6 +73,10 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer( df.select($"key", functions.json_tuple($"jstring", "f1", "f2", "f3", "f4", "f5")), expected) + +checkAnswer( + df.selectExpr("key", "json_tuple(jstring, 'f1', 'f2', 'f3', 'f4', 'f5')"), --- End diff -- I checked with hive, `select json_tuple('{"a":1}', 'a');`, the output column is `c0`, which is different from when the UDTF is in lateral view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-162797991 @mengxr @selvinsource As we talked there, I don't think PMML has good supports for multinomial naive bayes because we cannot fit the model of multinomial naive bayes into PMML with correct prediction result. I plan to remove the support for multinomial NB here and throw a `IllegalArgumentException`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MESOS]Add documentation about submitting Spar...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/10086#issuecomment-162799550 @dragos just updated the docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][MLlib]add support of arbitrary l...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-162800260 @jkbradley as far as I can see: The input to `fit` is `dataset: RDD[Iterable[String]]`. However, in [L273](https://github.com/apache/spark/pull/10152/files#diff-88f4b62c382b26ef8e856b23f5167ccdL273), we create `val words = dataset.flatMap(x => x)`, flattening out the RDD of sentences to an RDD of words, i.e. `RDD[Iterable[String]] -> RDD[String]`. Then in [L285](https://github.com/apache/spark/pull/10152/files#diff-88f4b62c382b26ef8e856b23f5167ccdL285), `words` is used in `mapPartitions` (as opposed to `dataset`). In the `mapPartitions` block, the behaviour of `next` is to advance through the iterator (which is now the flatMapped stream of words, *not* a stream of sentences), and have a hard cut off to create each sentence `Array[Int]` after `MAX_SENTENCE_LENGTH` is reached. So the way I read the current code, it indeed does just treat the input as a stream of words, discards sentence boundaries, and uses a hard 1000 word limit on "sentences". Let me know if I missed something here. This is in fact matching what the Google impl does (from my quick look through [the C code](http://word2vec.googlecode.com/svn/trunk/word2vec.c), e.g. L373-405 and L70 in `ReadWord`). So purely technically the current code is "correct" as it matches the original, but I'm not sure if it was intentional or not to use `words` in the `mapPartitions` block. But to me it's an open question whether this approach, or keeping sentence structure / boundaries, is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10259] [ML] Add @since annotation to ml...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8534 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12198] [SparkR] SparkR support read.par...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10191 [SPARK-12198] [SparkR] SparkR support read.parquet and deprecate parquetFile SparkR support read.parquet and deprecate parquetFile. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-12198 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10191 commit e507c31385730af2de663497d88df42c33f6ac46 Author: Yanbo LiangDate: 2015-12-08T07:42:36Z SparkR support read.parquet and deprecate parquetFile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11884] Drop multiple columns in the Dat...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9862#issuecomment-162781596 There are two drop variants for single column: ``` def drop(colName: String) def drop(col: Column) ``` But there is only one drop accepting multiple column names, why there is no version accepting multiple Columns? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12146] [SparkR] SparkR jsonFile should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10145#issuecomment-162787774 **[Test build #47313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47313/consoleFull)** for PR 10145 at commit [`06ae53d`](https://github.com/apache/spark/commit/06ae53dfb7db7f1276d4ccf16160e85b285c3864). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162787574 @sun-rui Got it, thanks for your kindly remind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11619][SQL] cannot use UDTF in DataFram...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/9981#discussion_r46918098 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -73,6 +73,10 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer( df.select($"key", functions.json_tuple($"jstring", "f1", "f2", "f3", "f4", "f5")), expected) + +checkAnswer( + df.selectExpr("key", "json_tuple(jstring, 'f1', 'f2', 'f3', 'f4', 'f5')"), --- End diff -- @yhuai Hello Yin, just debugged the code a little bit and trying hard to understand. In the json_tuple function in jsonExpression.scala, we compute the elementTypes as follows override def elementTypes: Seq[(DataType, Boolean, String)] = fieldExpressions.zipWithIndex.map { case (_, idx) => (StringType, true, s"c$idx") } This name is then used while making the generator output in makeGeneratorOutput() in Analyzer. Do you think we should change this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162780584 **[Test build #47311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47311/consoleFull)** for PR 10188 at commit [`8c51de4`](https://github.com/apache/spark/commit/8c51de4bb64c064c03058096413e5365f4dfed3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162780940 @yanboliang, thanks for the PR. But for SparkR RDD API, still discussion how to evolve it. So maybe we need more discussion before submitting a PR for all RDD related JIRA issues (Those JIRAs are for tracking purpose for now) @shivaram, @felixcheung. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10189 [SPARK-12170] [SparkR] Deprecate the JAVA-specific deserialized storage levels Deprecate the JAVA-specific deserialized storage levels. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-12170 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10189 commit fa88169fb8f95943c7da1ffbe8e10bb7cee4312a Author: Yanbo LiangDate: 2015-12-08T06:00:18Z Deprecate the JAVA-specific deserialized storage levels --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10166#issuecomment-162795582 @jkbradley Thanks for reviewing, will take those comments into account. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11551][DOC][Example]Replace example cod...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10002#discussion_r46921864 --- Diff: docs/ml-features.md --- @@ -794,39 +411,7 @@ dctDf.select("featuresDCT").show(3) Refer to the [DCT Java docs](api/java/org/apache/spark/ml/feature/DCT.html) for more details on the API. -{% highlight java %} -import java.util.Arrays; - -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.ml.feature.DCT; -import org.apache.spark.mllib.linalg.Vector; -import org.apache.spark.mllib.linalg.VectorUDT; -import org.apache.spark.mllib.linalg.Vectors; -import org.apache.spark.sql.DataFrame; -import org.apache.spark.sql.Row; -import org.apache.spark.sql.RowFactory; -import org.apache.spark.sql.SQLContext; -import org.apache.spark.sql.types.Metadata; -import org.apache.spark.sql.types.StructField; -import org.apache.spark.sql.types.StructType; - -JavaRDD data = jsc.parallelize(Arrays.asList( - RowFactory.create(Vectors.dense(0.0, 1.0, -2.0, 3.0)), - RowFactory.create(Vectors.dense(-1.0, 2.0, 4.0, -7.0)), - RowFactory.create(Vectors.dense(14.0, -2.0, -5.0, 1.0)) -)); -StructType schema = new StructType(new StructField[] { - new StructField("features", new VectorUDT(), false, Metadata.empty()), -}); -DataFrame df = jsql.createDataFrame(data, schema); -DCT dct = new DCT() - .setInputCol("features") - .setOutputCol("featuresDCT") - .setInverse(false); -DataFrame dctDf = dct.transform(df); -dctDf.select("featuresDCT").show(3); -{% endhighlight %} +{% include_example java/org/apache/spark/examples/ml/JavaDCTExample.java %}} --- End diff -- minor: remove extra `{` at the end. You can submit a follow-up PR to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-162802294 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47317/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][MLlib]add support of arbitrary l...
Github user ygcao commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r46922797 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -281,16 +280,17 @@ class Word2Vec extends Serializable with Logging { val expTable = sc.broadcast(createExpTable()) val bcVocab = sc.broadcast(vocab) val bcVocabHash = sc.broadcast(vocabHash) - -val sentences: RDD[Array[Int]] = words.mapPartitions { iter => +//each partition is a collection of sentences, will be translated into arrays of Index integer +val sentences: RDD[Array[Int]] = dataset.mapPartitions { iter => new Iterator[Array[Int]] { def hasNext: Boolean = iter.hasNext def next(): Array[Int] = { val sentence = ArrayBuilder.make[Int] var sentenceLength = 0 - while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) { -val word = bcVocabHash.value.get(iter.next()) + //do translation of each word into its index in the vocabulary, not constraint by fixed length anymore + for (wd <- iter.next()) { --- End diff -- I think my change got some understanding now. The problem is not much about having a maxSentenceLength, the problem of original version is the discarding of nature sentences by flatmap it and then cut the document into chunks of maxSentenceLength(except for the last one which could be less),no sentence boundaries will be respected as a result. This change is designed to use the sentence structure of the input. As to the max sentence length limitations, don't really see a need for it.the worst case is that the entire document is one sentence, it will still work since each document is small enough to fit in memory and skip gram model is built around the max size-constraint sliding window instead of sentence or document. Respect sentence is to avoid the sliding window take words as context of target word from other adjacent sentences which could be irrelevant. Agreed people may use word2vec in very different scenario, we can make two logics for choices. I'll make it an option than arguing which one is absolutely correct although I don't see the good justification of using fixed size chunks instead of structure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10647][MESOS] Fix zookeeper dir with me...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/10057#discussion_r46922802 --- Diff: core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala --- @@ -50,7 +50,7 @@ private[mesos] class MesosClusterDispatcher( extends Logging { private val publicAddress = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(args.host) - private val recoveryMode = conf.get("spark.mesos.deploy.recoveryMode", "NONE").toUpperCase() + private val recoveryMode = conf.get("spark.deploy.recoveryMode", "NONE").toUpperCase() --- End diff -- I'm still wondering about this as well. I think having some backward compatible makes sense at least for a version. Let me add a warning message when it's set and use the value if it's set as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-162802293 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-162802287 **[Test build #47317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47317/consoleFull)** for PR 8880 at commit [`0e847a6`](https://github.com/apache/spark/commit/0e847a66ef7ec51bec6ae540645e2456b257f7f5). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11958] [SPARK-11957] [ML] [Doc] SQLTran...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10006 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11958] [SPARK-11957] [ML] [Doc] SQLTran...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10006#issuecomment-162804040 Merged into master and branch-1.6. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162790306 yes, in fact we are instructed not to work on Spark 2.0 release yet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162795966 **[Test build #47315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47315/consoleFull)** for PR 9565 at commit [`693a6fe`](https://github.com/apache/spark/commit/693a6fed9fdb8002d4e099bd106ec1b44dbdc86d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11551][DOC][Example]Replace example cod...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10002 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162799792 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47310/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11551][DOC][Example]Replace example cod...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10002#issuecomment-162799717 Merged into master and branch-1.6. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162799790 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12160][MLLIB] Use SQLContext.getOrCreat...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10183#issuecomment-162801697 LGTM. Merged into branch-1.5. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162801792 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47315/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162801753 **[Test build #47315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47315/consoleFull)** for PR 9565 at commit [`693a6fe`](https://github.com/apache/spark/commit/693a6fed9fdb8002d4e099bd106ec1b44dbdc86d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162801790 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11395][SPARKR] Support over and window ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10094#issuecomment-162781291 **[Test build #47303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47303/consoleFull)** for PR 10094 at commit [`ef08092`](https://github.com/apache/spark/commit/ef08092f3e49fe95cf13412a6bbf1d9404d4becf). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11395][SPARKR] Support over and window ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10094#issuecomment-162781378 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11395][SPARKR] Support over and window ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10094#issuecomment-162781379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47303/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162782245 **[Test build #47312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47312/consoleFull)** for PR 10189 at commit [`fa88169`](https://github.com/apache/spark/commit/fa88169fb8f95943c7da1ffbe8e10bb7cee4312a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12170] [SparkR] Deprecate the JAVA-spec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10189#issuecomment-162782896 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47312/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162802436 **[Test build #47311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47311/consoleFull)** for PR 10188 at commit [`8c51de4`](https://github.com/apache/spark/commit/8c51de4bb64c064c03058096413e5365f4dfed3a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10188#issuecomment-162802522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47311/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162537898 **[Test build #47267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47267/consoleFull)** for PR 5423 at commit [`6dac1bb`](https://github.com/apache/spark/commit/6dac1bb1d48b89a0bab9facfba95e73061f0f2a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-162540970 **[Test build #47268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47268/consoleFull)** for PR 8744 at commit [`c2add7b`](https://github.com/apache/spark/commit/c2add7bbda0f79dc04c843854f01de48ab7c100a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-162562042 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][MLlib]add support of arbitrary l...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-162526464 Pinging @mengxr @MechCoder @jkbradley (and I think @Ishiihara was the original author of Word2Vec?) So let's focus this PR in on making the max sentence size configurable, if this is desirable? Looking a bit deeper, the sentence structure of the input is essentially discarded in https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L273. This dates back to the original implementation, and it does match the original Google implementation that treats end-of-line as a word boundary, then imposes a `MAX_SENTENCE_LENGTH` of 1000 when processing the word stream. It's interesting to note that e.g. Gensim's implementation respects the sentence structure of the input data (https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec.py#L120). Deeplearning4j seems to do the same. It does seem a little strange to me thinking about it to discard sentence boundaries. It does make sense for very large text corpuses. But Word2Vec is more general than that, and can be applied e.g. in recommendation settings, where the boundary between "sentences" as, say, a "user activity history", is more patently "discontinuos". Thoughts? On the face of it we can leave the implementation as is (as it is true to the original), optionally making the max sentence length a configurable param. Or we can look at using the "sentence" structure of the input data (perhaps making the behaviour configurable between this and the original impl). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7727] [SQL] Avoid inner classes in Rule...
GitHub user stephankessler opened a pull request: https://github.com/apache/spark/pull/10174 [SPARK-7727] [SQL] Avoid inner classes in RuleExecutor Moved (case) classes Strategy, Once, FixedPoint and Batch to the companion object. This is necessary if we want to have the Optimizer easily extendable in the following sense: Usually a user wants to add additional rules, and just take the ones that are already there. However, inner classes made that impossible since the code did not compile This allows easy extension of existing Optimizers see the DefaultOptimizerExtendableSuite for a corresponding test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/stephankessler/spark SPARK-7727 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10174 commit c48fd5f9efa970c856caf0fb52062597450965b0 Author: Stephan KesslerDate: 2015-12-07T07:38:31Z * SPARK-7727 Avoid inner classes in RuleExecutor Moved (case) classes Strategy, Once, FixedPoint and Batch to the companion object. This allows easy extension of existing Optimizers see the DefaultOptimizerExtendableSuite for a corresponding test case. commit fb4988ab16b04753c8e8272de7227a325d22fc4c Author: Stephan Kessler Date: 2015-12-07T13:56:25Z Mod. SQLContext to access changed class locations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10299][ML] word2vec should allow users ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8513#issuecomment-162527770 Feel free to do whatever. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11155][Web UI] Stage summary json shoul...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/10107#discussion_r46826348 --- Diff: core/src/test/scala/org/apache/spark/status/api/v1/AllStagesResourceSuite.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.api.v1 + +import java.util.Date + +import scala.collection.mutable.HashMap + +import org.apache.spark.SparkFunSuite +import org.apache.spark.scheduler.{StageInfo, TaskInfo, TaskLocality} +import org.apache.spark.ui.jobs.UIData.{StageUIData, TaskUIData} + + +class AllStagesResourceSuite extends SparkFunSuite { + + def getFirstTaskLaunchTime(taskLaunchTimes: Seq[Long]): Option[Date] = { +val tasks = taskLaunchTimes.zipWithIndex.map { case (time, idx) => + idx.toLong -> new TaskUIData( +new TaskInfo(idx, idx, 1, time, "", "", TaskLocality.ANY, false), None, None) +}.toMap +val stageUiData = new StageUIData() +stageUiData.taskData = mapToHashmap(tasks) --- End diff -- sorry I hadn't realized that it needed to be a hashmap, I though any map would do. this is really super minor, but rather than having a helper method, I'd just build the hashmap in the first place, the same way you are doing now: ```scala val tasks = new HashMap[Long, TaskUIData] taskLaunchTimes.zipWithIndex.foreach { case (time, idx) => tasks(idx.toLong) = new TaskUIData( new TaskInfo(idx, idx, 1, time, "", "", TaskLocality.ANY, false), None, None) } ```Â --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162552952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47269/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162552814 **[Test build #47269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47269/consoleFull)** for PR 9565 at commit [`26b4d85`](https://github.com/apache/spark/commit/26b4d8583980f671513c7d4d532260bce5b86667). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162552947 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7727] [SQL] Avoid inner classes in Rule...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10174#issuecomment-162534062 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11155][Web UI] Stage summary json shoul...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/10107#discussion_r46826421 --- Diff: core/src/test/scala/org/apache/spark/status/api/v1/AllStagesResourceSuite.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.api.v1 + +import java.util.Date + +import scala.collection.mutable.HashMap + +import org.apache.spark.SparkFunSuite +import org.apache.spark.scheduler.{StageInfo, TaskInfo, TaskLocality} +import org.apache.spark.ui.jobs.UIData.{StageUIData, TaskUIData} + + +class AllStagesResourceSuite extends SparkFunSuite { + + def getFirstTaskLaunchTime(taskLaunchTimes: Seq[Long]): Option[Date] = { +val tasks = taskLaunchTimes.zipWithIndex.map { case (time, idx) => + idx.toLong -> new TaskUIData( +new TaskInfo(idx, idx, 1, time, "", "", TaskLocality.ANY, false), None, None) +}.toMap +val stageUiData = new StageUIData() +stageUiData.taskData = mapToHashmap(tasks) +val status = StageStatus.ACTIVE +val stageInfo = new StageInfo( + 1, 1, "stage 1", 10, Seq.empty, Seq.empty, "details abc", Seq.empty) +val stageData = AllStagesResource.stageUiToStageData(status, stageInfo, stageUiData, false) + +stageData.firstTaskLaunchedTime + } + + def mapToHashmap(original: Map[Long, TaskUIData]): HashMap[Long, TaskUIData] = { +val map = new HashMap[Long, TaskUIData] +original.foreach { e => map.put(e._1, e._2) } + +return map + } + + test("test firstTaskLaunchedTime, there are no tasks") { --- End diff -- you don't need to put "test" in the test name, its understood --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162543995 **[Test build #47269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47269/consoleFull)** for PR 9565 at commit [`26b4d85`](https://github.com/apache/spark/commit/26b4d8583980f671513c7d4d532260bce5b86667). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162571320 **[Test build #47267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47267/consoleFull)** for PR 5423 at commit [`6dac1bb`](https://github.com/apache/spark/commit/6dac1bb1d48b89a0bab9facfba95e73061f0f2a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-162584115 **[Test build #47268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47268/consoleFull)** for PR 8744 at commit [`c2add7b`](https://github.com/apache/spark/commit/c2add7bbda0f79dc04c843854f01de48ab7c100a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162602716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47270/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162602714 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12149] [Web UI] Executor UI improvement...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/10154#discussion_r46852150 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala --- @@ -128,11 +152,36 @@ private[ui] class ExecutorsPage( {Utils.bytesToString(diskUsed)} - {info.activeTasks} - {info.failedTasks} - {info.completedTasks} + 0) { + "background:hsla(120, 100%, 25%, " + activeTasksAlpha + ");color:white" --- End diff -- You are correct, it does not work in this syntax, I tired. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162602535 **[Test build #47270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47270/consoleFull)** for PR 5423 at commit [`0adb70a`](https://github.com/apache/spark/commit/0adb70a1240ea4f6fb7f5f78fc54d2facf7ddbef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12149] [Web UI] Executor UI improvement...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/10154#discussion_r46852090 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala --- @@ -117,6 +119,28 @@ private[ui] class ExecutorsPage( val maximumMemory = info.maxMemory val memoryUsed = info.memoryUsed val diskUsed = info.diskUsed + +// Determine Color Opacity from 0.5-1 +var activeTasksAlpha = 1.0 +var failedTasksAlpha = 1.0 +var completedTasksAlpha = 1.0 +var totalDurationAlpha = 1.0 +if (info.totalCores > 0) { + // activeTasks range from 0 to all cores + activeTasksAlpha = (info.activeTasks.toDouble / info.totalCores) * 0.5 + 0.5 +} +if (info.totalTasks > 0) { --- End diff -- I wanted to do that but the style guide wasn't clear on whether it was allow, I'll update it since it's cleaner. And I'll add the 1.0 cap, I left it off originally because the code was getting busy and hard to read originally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-10625] [SQL] Spark SQL JDBC read/write ...
Github user tribbloid commented on the pull request: https://github.com/apache/spark/pull/8785#issuecomment-162604858 @srowen yeah, I'll reply shortly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162571699 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47267/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162571693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7889] [CORE] WiP HistoryServer to refre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6935#issuecomment-162580841 **[Test build #47271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47271/consoleFull)** for PR 6935 at commit [`fd5e282`](https://github.com/apache/spark/commit/fd5e2825cedc3a3c5c2224369f4f773cdeee7142). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10166#issuecomment-162566145 Hey @holdenk, thanks for reviewing. Do you mean regarding StringIndexer#setHandleInvalid method? If so, yes that'd be a good addition. However, I'm not sure if I should include it in this jira/pr or create another, input welcome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-162584734 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47268/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12026] [MLlib] ChiSqTest gets slower an...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/10146#issuecomment-162569175 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162569257 weird. these tests are passed on my server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-162566075 **[Test build #47270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47270/consoleFull)** for PR 5423 at commit [`0adb70a`](https://github.com/apache/spark/commit/0adb70a1240ea4f6fb7f5f78fc54d2facf7ddbef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11884] Drop multiple columns in the Dat...
Github user ted-yu commented on the pull request: https://github.com/apache/spark/pull/9862#issuecomment-162591780 @marmbrus What do you think ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12149] [Web UI] Executor UI improvement...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/10154#discussion_r46852254 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala --- @@ -128,11 +152,36 @@ private[ui] class ExecutorsPage( {Utils.bytesToString(diskUsed)} - {info.activeTasks} - {info.failedTasks} - {info.completedTasks} + 0) { + "background:hsla(120, 100%, 25%, " + activeTasksAlpha + ");color:white" +} else { + "" +} + }>{info.activeTasks} + 0) { + "background:hsla(0, 100%, 50%, " + failedTasksAlpha + ");color:white" --- End diff -- That is actually how HSLa is designed and was it seemed to be an area on contention when it was first defined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162446652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47259/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-162446599 **[Test build #47259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47259/consoleFull)** for PR 9565 at commit [`f806755`](https://github.com/apache/spark/commit/f80675504a34d5bc0b830b5d25eb2f7f9a637d74). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11932][STREAMING] Partition previous Tr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9988#issuecomment-162450972 **[Test build #47262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47262/consoleFull)** for PR 9988 at commit [`fd6b83e`](https://github.com/apache/spark/commit/fd6b83e617c357949bfa44e135c0ba63e1502b29). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org