[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
Github user florianverhein commented on the pull request: https://github.com/apache/spark/pull/5257#issuecomment-87570414 Some things to think about: - Do we want an option for this (e.g. as for ganglia)? I haven't done this as I think it would be confusing at the moment, since a user would assume that the option would enable the hdfs nfs gateway on the cluster. However as far as I'm aware, spark-ec2 doesn't do this yet (#6601). So I think it would be better if the option were added as part of that work. - Further, since the ports are opened to the authorized address, I don't see a problem in having this done by default for now. I have tested this with a spark-ec2 cluster running the gateway (i.e. with these settings, I can mount the hfds on my local machine - which is really handy!) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-87570358 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87572136 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29390/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87577323 [Test build #29391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29391/consoleFull) for PR 5178 at commit [`6c5c1d4`](https://github.com/apache/spark/commit/6c5c1d4d143e4806edd6cf747b84c56f992f14a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5203][SQL] fix union with different dec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4004#issuecomment-87570883 [Test build #29388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29388/consoleFull) for PR 4004 at commit [`e6614e8`](https://github.com/apache/spark/commit/e6614e828473633847f50927b090e33480699486). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87571847 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87566631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29385/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-sql] a better exception message than s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5235#issuecomment-87572423 Thanks. I'm going to merge this in master branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-sql] a better exception message than s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5235#issuecomment-87571212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29386/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Master
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5258#issuecomment-87579377 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Master
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5258#issuecomment-87581081 Did you mean to submit this as a pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Specify ip of python server scoket
GitHub user Sephiroth-Lin opened a pull request: https://github.com/apache/spark/pull/5256 Specify ip of python server scoket In driver now will start a server socket and use a wildcard ip, use 127.0.0.0 is more reasonable, as we only use it by local Python process. /cc @davies You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sephiroth-Lin/spark SPARK-6604 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5256 commit c88bee9819eef5a8091357d6a239e9ab61da0050 Author: unknown l00251...@hghy1l002515991.china.huawei.com Date: 2015-03-30T06:21:07Z Specify ip of python server scoket --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
GitHub user florianverhein opened a pull request: https://github.com/apache/spark/pull/5257 [EC2] [SPARK-6600] Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway Authorizes incoming access to master on the ports required to use the hadoop hdfs nfs gateway from outside the cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/florianverhein/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5257 commit 72a586a68491608a32cbd5e83d0268cba8b1c18a Author: Florian Verhein florian.verh...@gmail.com Date: 2015-03-30T04:23:40Z [EC2] [SPARK-6600] initial impl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-87571479 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Master
GitHub user nbawzl2004 opened a pull request: https://github.com/apache/spark/pull/5258 Master You can merge this pull request into a Git repository by running: $ git pull https://github.com/sparkmatrix/spark-matrix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5258 commit 25168a787d04ac8d01dc11d24834d66567f9c602 Author: Xiaoran Xu xiaora...@ucla.edu Date: 2015-02-24T23:04:03Z Schedule commit cad7dc008f6997f7aa2e9f5e1e343431057743ba Author: Xiaoran Xu xiaora...@ucla.edu Date: 2015-02-24T23:06:00Z Schedule commit d1cc6113db4508256e53fd341b8961329fc1df3e Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-04T14:28:10Z Merge remote-tracking branch 'upstream/master' commit e2594264324083f7fa5a6fa123de4377686b46ba Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-04T19:07:51Z Coding analysis for BlockMatrix.scala commit 37d1fbf5cf80051e7160eb195a655873b94d7eac Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-04T19:14:36Z Update README.md commit b89fe0beb99671539933180ab72f41714173e8db Author: John Davis j...@foobox.com Date: 2015-03-05T15:44:28Z Fixed indentation commit 0b3a9a63c10b474bb8d3ccefe7b03289b65fcc8c Author: John Davis j...@foobox.com Date: 2015-03-05T17:30:36Z Update comments_BlockMatrix.txt commit 5946c2a366e4ccf68b38b7733757f0a209f61f23 Author: John Davis j...@foobox.com Date: 2015-03-10T15:41:27Z Update comments_BlockMatrix.txt commit 871289df3ce9fd1cf7239341c44e02b143ab0105 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T03:33:13Z new comments commit 8bd1594c6e1a6ae74914580dcec7ffe43340b265 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T05:12:40Z new comment file written in markdown commit bd418a342a5032288003d192d961ac3532e6cefd Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T05:14:34Z README.md commit 4074e83ca4f574316f6c161f84d0ba8063434ea8 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T05:16:36Z new comments commit 7f8de1cf8d71aadf52eb055d983ee8c4cc431e46 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T05:18:59Z new comments commit 4e292ff06d79a86e14db1f79b6beadc3282f8739 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T14:40:08Z Add comments in BlockMatrix.md commit a90b01abdb81107c266a2b0a896beffef848bff7 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-11T18:47:33Z new Blockmatrix.md commit e58fedc5c54858696a1ee378c55e14d8f02f4850 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T03:49:19Z new Blockmatrix.md commit fe33902ad79ae0d0408d297251e2c84b6e875c64 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T03:52:33Z new Blockmatrix.md commit 0b8d67a7b4e46f5efd876b25df70641c6e932912 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T03:55:01Z new Blockmatrix.md commit e56d450e4f0ed3bcdd1e19fb93c9cea80e0010b3 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T06:39:06Z new Blockmatrix.md commit 4126bc19db6d4d254d4ce4d7b0cf9e7819520d45 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T06:40:55Z new Blockmatrix.md commit c76608b767da56f129fa783df3bbe2e0a7157d32 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T06:42:31Z new Blockmatrix.md commit 24154fd4ac7a3a0a58352da8905eb39ef873a96f Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T06:44:55Z new Blockmatrix.md commit f3f1804a7607e37ff7af5edabec9c5dd3603dbe2 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T06:48:20Z new Blockmatrix.md commit f1a3a1470c2c0174767329d49b5f81b7a6186b26 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-12T07:54:29Z new Blockmatrix.md commit a5a25e9fdb74386aed77da1f90473acbad752c8c Author: BruinBear ljy1...@gmail.com Date: 2015-03-24T04:48:45Z started reading on rowmatrix commit 060c882bebca1b0b557df7101810d891562bacb3 Author: netpaladinx xiaora...@ucla.edu Date: 2015-03-26T11:49:44Z new file added commit b9841d5a191851ef5a55843b7d96d014304a0e50 Author: Zhengliang Wu nbawzl2...@gmail.com Date: 2015-03-30T07:26:44Z add_IndexedRowMatrix.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5248#issuecomment-87578913 [Test build #29392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29392/consoleFull) for PR 5248 at commit [`914a374`](https://github.com/apache/spark/commit/914a3743801c7e1637fb43ef841d2d76fc3e4ce7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5257#issuecomment-87568532 [Test build #29389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29389/consoleFull) for PR 5257 at commit [`72a586a`](https://github.com/apache/spark/commit/72a586a68491608a32cbd5e83d0268cba8b1c18a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-87568005 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5203][SQL] fix union with different dec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4004#issuecomment-87570896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29388/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-sql] a better exception message than s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5235 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5260#issuecomment-87586704 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6341][mllib] Upgrade breeze from 0.11.1...
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/5222 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6341][mllib] Upgrade breeze from 0.11.1...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/5222#issuecomment-87589402 @mengxr and @srowen, alright. There was some delay because github was attacked. I'm closing this PR. Thank you for your helping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87611637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29391/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5250#discussion_r27378646 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -246,6 +249,15 @@ class HadoopRDD[K, V]( } catch { case eof: EOFException = finished = true + case e: Exception = --- End diff -- Yes, but it calls into question when you would turn it on. You can't actually handle _just_ the situation you describe reliably even with `IOException`. I think this is band-aiding over an input problem that just isn't properly handled two more steps down the pipeline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Accumulator deserialized twice because the Nar...
GitHub user suyanNone opened a pull request: https://github.com/apache/spark/pull/5259 Accumulator deserialized twice because the NarrowCoGroupSplitDep contains rdd object. 1. Use code like belows, will found accumulator deserialized twice. first: ``` task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader) ``` second: ``` val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])]( ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader) ``` which the first deserialized is not what expected. because ResultTask or ShuffleMapTask will have a partition object. in class ``` CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ : Product2[K, _]]], part: Partitioner) , the CogroupPartition may contains a CoGroupDep: NarrowCoGroupSplitDep( rdd: RDD[_], splitIndex: Int, var split: Partition ) extends CoGroupSplitDep { ``` in that NarrowCoGroupSplitDep, it will bring into rdd object, which result into the first deserialized. example: ``` val acc1 = sc.accumulator(0, test1) val acc2 = sc.accumulator(0, test2) val rdd1 = sc.parallelize((1 to 10).toSeq, 3) val rdd2 = sc.parallelize((1 to 10).toSeq, 3) val combine1 = rdd1.map { case a = (a, 1)}.combineByKey(a = { acc1 += 1 a }, (a: Int, b: Int) = { a + b }, (a: Int, b: Int) = { a + b }, new HashPartitioner(3), mapSideCombine = false) val combine2 = rdd2.map { case a = (a, 1)}.combineByKey( a = { acc2 += 1 a }, (a: Int, b: Int) = { a + b }, (a: Int, b: Int) = { a + b }, new HashPartitioner(3), mapSideCombine = false) combine1.cogroup(combine2, new HashPartitioner(3)).count() ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark fix-acc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5259 commit 2fde0669f62053d86adbbb37196fb161fb5ac1c8 Author: hushan[è¡ç] hus...@xiaomi.com Date: 2015-03-30T08:05:02Z Fix twice deserialized accumulators with CoGroup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6553] [pyspark] Support functools.parti...
Github user ksonj commented on the pull request: https://github.com/apache/spark/pull/5206#issuecomment-87588299 I've added two tests for UDFs with partial functions and callable objects. Thanks for the hint, I'll open future PRs against `master` then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5259#issuecomment-87615549 [Test build #29393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29393/consoleFull) for PR 5259 at commit [`2fde066`](https://github.com/apache/spark/commit/2fde0669f62053d86adbbb37196fb161fb5ac1c8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CoGroupedRDD[K](var rdds: Seq[RDD[_ : Product2[K, _]]], part: Partitioner)` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5260#issuecomment-87623062 No, this is wrong. You can see that the variable is used a few lines later. Mind closing this PR? In the future, a more descriptive title than Update start-slave.sh is needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Master
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5258#issuecomment-87623213 Mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5257#issuecomment-87582560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29389/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5257#issuecomment-87582550 [Test build #29389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29389/consoleFull) for PR 5257 at commit [`72a586a`](https://github.com/apache/spark/commit/72a586a68491608a32cbd5e83d0268cba8b1c18a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update load-spark-env.sh
GitHub user raschild opened a pull request: https://github.com/apache/spark/pull/5261 Update load-spark-env.sh Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2 otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there. I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh You can merge this pull request into a Git repository by running: $ git pull https://github.com/raschild/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5261 commit b9babcdc7f178b93a44efccdff38dcd3bab9adbb Author: raschild rasch...@users.noreply.github.com Date: 2015-03-30T08:44:41Z Update load-spark-env.sh Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2 otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there. I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update load-spark-env.sh
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5261#issuecomment-87594029 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5259#issuecomment-87615560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29393/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-87619172 Thanks for the informative feedback. And I sincerely like it when you tell me what's recommended and what should be changed. # 1. First thing is API. One thing great about Online LDA is that it can avoid loading the entire corpus, since it only need to process one mini batch each time. Thus I kinda feel it's necessary to have an API that can support the usage. In current edition, user can write some code like ``` // corpus does not need to be ready before this val onlineLDA = new OnlineLDAOptimizer(k, D, vocabSize) for(i - 1 to batchNumber){ val batch = // ... convert dynamically or read libsvm directly onlineLDA.submitMiniBatch(batch) } ``` I think this will be especially necessary and helpful for larger data set since doc2vec at large scale is resource intensive. And having a stream of mini `documents: RDD[(Long, Vector)]` rather than an integrated corpus will be a key factor that why OnlineLDA can handle larger dataset and be stream friendly. This is why I leave optimizer public. I'd like to know your opinions. # 2. Builder Pattern and Parameter parity Sure it's doable. Originally I named `OnlineLDAOptimizer` just as `OnlineLDA`, and then I thought we talked about optimizer framework, so I changed it. If we can lock down API, it will be pretty clear how to proceed with these details. # 3. About Scaling and correctness testing, can you please share a recommended dataset? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
GitHub user josegom opened a pull request: https://github.com/apache/spark/pull/5260 Update start-slave.sh add a comment in the line 22 You can merge this pull request into a Git repository by running: $ git pull https://github.com/josegom/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5260 commit db9da7459e7a8dd78c8b5e4b02c3c8d9f98299e3 Author: Jose Manuel Gomez jmgo...@stratio.com Date: 2015-03-30T08:15:08Z Update start-slave.sh add a comment in the line 22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5259#issuecomment-87585645 [Test build #29393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29393/consoleFull) for PR 5259 at commit [`2fde066`](https://github.com/apache/spark/commit/2fde0669f62053d86adbbb37196fb161fb5ac1c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4929#issuecomment-87587319 [Test build #29394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29394/consoleFull) for PR 4929 at commit [`220b67d`](https://github.com/apache/spark/commit/220b67d511cd25d908d8408fa9c59d78d8ad0f9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5248#issuecomment-87607213 [Test build #29392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29392/consoleFull) for PR 5248 at commit [`914a374`](https://github.com/apache/spark/commit/914a3743801c7e1637fb43ef841d2d76fc3e4ce7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AtLeastNNonNulls(n: Int, children: Seq[Expression]) extends Predicate ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6119][SQL] DataFrame support for missin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5248#issuecomment-87607245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29392/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4929#issuecomment-87615844 [Test build #29394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29394/consoleFull) for PR 4929 at commit [`220b67d`](https://github.com/apache/spark/commit/220b67d511cd25d908d8408fa9c59d78d8ad0f9e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class With(child: LogicalPlan, cteRelations: Map[String, Subquery]) extends UnaryNode ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4929#issuecomment-87615858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29394/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user tigerquoll commented on a diff in the pull request: https://github.com/apache/spark/pull/5250#discussion_r27377569 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -246,6 +249,15 @@ class HadoopRDD[K, V]( } catch { case eof: EOFException = finished = true + case e: Exception = --- End diff -- Having been on the receiving end of things I know that the gzip module throws an IOException, but unfortunately I have no knowledge over what the Hadoop input modules and what exceptions they throw, or if they propagate exceptions up from other 3rd party libraries. Catching such a broad exception is mitigated by the fact that this particular option defaults to off, and should only be enabled when you are trying to parse files that you know are corrupt. Given the situation, when you turn the option on, we should really try to finish processing files to the best of our ability, thus I think in this case catching 'Exception' might be appropriate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87662263 @rxin Is there a good reason that makes `DataFrame.rdd` have to be a function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/5266 [SPARK-6528][ML] Add IDF transformer See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yinxusen/spark SPARK-6528 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5266 commit 4802c6769d3b4c89faec3a8f0264ecd03117ceed Author: Xusen Yin yinxu...@gmail.com Date: 2015-03-30T09:37:11Z add IDF transformer and test suite commit 2aa4be0e1d7ce052f8c901c6d9462c611c3a920a Author: Xusen Yin yinxu...@gmail.com Date: 2015-03-30T12:51:32Z clean test suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user tigerquoll commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87681300 If a user can write scala codes that appropriately deals with the problem, why can't they write spark code to deal with it in parallel? Isn't this what spark is about? Isn't this a problem that can be readily parallelised? Spark is being put forward as data processing framework - bad data needs to be handled in some way better then just refusing to have anything to do with it. I believe to parallelise your mentioned solution means adding to the public API, which takes time and consideration. The option was considered as a scoped, quick fix solution to at least give users some ability to continue - the idea would be to retire the option once a new API was in place to gracefully deal with the problem. In regards to the option being presented to the users as a fine thing to do when I don't believe it is - how about providing the information to the user a letting the users chose themselves? A good point about an option being a public API though - what is the understanding about how stable options are? No real Experimental or DeveloperAPI tags available here. Your proposed solution was the same solution I ended up settling on when first confronted with the issue - but only after a number of frustrated attempts at getting spark to do what I wanted it to. What you proposed and what I did In the end was to give up using spark and to bashing out some standalone code using hadoop libraries to do the job. ie: Stopped using spark and used another tool that made my job easier. I felt that it didn't have to be this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] [SPARK-6600] Open ports in ec2/spark_ec2...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5257#issuecomment-87629549 I think this seems reasonable. I'll leave it open for comments for some time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5074 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6598] Python API for IDFModel
GitHub user Lewuathe opened a pull request: https://github.com/apache/spark/pull/5264 [SPARK-6598] Python API for IDFModel This is the sub-task of SPARK-6254. Wrapping IDFModel `idf` member function for pyspark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Lewuathe/spark SPARK-6598 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5264 commit 1dc522cab1bdfe55f8245c687ba6b866ca07853e Author: lewuathe lewua...@me.com Date: 2015-03-30T12:21:45Z [SPARK-6598] Python API for IDFModel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/5265 [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val Before 1.3.0, `SchemaRDD.id` works as a unique identifier of each `SchemaRDD`. In 1.3.0, unlike `SchemaRDD`, `DataFrame` is no longer an RDD, and `DataFrame.rdd` is actually a function which always returns a new RDD instance. Making `DataFrame.rdd` a lazy val should bring the unique identifier back. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-6608 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5265 commit 7f37d2142a388e5717ae2c3e89152c8c735904cc Author: Cheng Lian l...@databricks.com Date: 2015-03-30T12:34:32Z Makes DataFrame.rdd a lazy val --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5263#issuecomment-87661966 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29395/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6558] Utils.getCurrentUserName returns ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/5229#issuecomment-87666829 we should pull this back into 1.3.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-87669352 [Test build #29400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull) for PR 5266 at commit [`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user petro-rudenko commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87670835 +1 for this, since for example [the caching logic from ml package](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L64) doesn't work properly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87679636 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29397/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87679590 [Test build #29397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29397/consoleFull) for PR 5262 at commit [`453af8b`](https://github.com/apache/spark/commit/453af8ba57fa65b32469beb969707aec4b713ee2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5263#issuecomment-87661942 [Test build #29395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29395/consoleFull) for PR 5263 at commit [`1de001d`](https://github.com/apache/spark/commit/1de001d375d06ec681a2ac4eb3a62f01310af21d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87663657 [Test build #29399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull) for PR 5265 at commit [`7f37d21`](https://github.com/apache/spark/commit/7f37d2142a388e5717ae2c3e89152c8c735904cc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5245#issuecomment-87669636 [Test build #29401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29401/consoleFull) for PR 5245 at commit [`b70e7e1`](https://github.com/apache/spark/commit/b70e7e1d0b96c74f4adbe4ebd76442756c072313). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/5267 [SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering I implemented a hierarchical clustering algorithm again. This PR doesn't include examples, documentation and spark.ml APIs. I am going to send another PRs later. https://issues.apache.org/jira/browse/SPARK-6517 - This implementation based on a bi-sectiong K-means clustering. - It derives from the @freeman-lab 's implementation - The basic idea is not changed from the previous version. (#2906) - However, It is 1000x faster than the previous version through parallel processing. Thank you for your great cooperation, RJ Nowling(@rnowling), Jeremy Freeman(@freeman-lab), Xiangrui Meng(@mengxr) and Sean Owen(@srowen). You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark new-hierarchical-clustering Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5267 commit af0f65bb4726315c076b827d37276616c5218010 Author: Yu ISHIKAWA yuu.ishik...@gmail.com Date: 2015-03-30T11:29:12Z [SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering Thank you for your great cooperation, RJ Nowling(@rnowling), Jeremy Freeman(@freeman-lab), Xiangrui Meng(@mengxr) and Sean Owen(@srowen). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5264#issuecomment-87686584 [Test build #29398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29398/consoleFull) for PR 5264 at commit [`1dc522c`](https://github.com/apache/spark/commit/1dc522cab1bdfe55f8245c687ba6b866ca07853e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-87696607 [Test build #29403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29403/consoleFull) for PR 5267 at commit [`3df7f11`](https://github.com/apache/spark/commit/3df7f1157c67135b5dde451b540fe30deb730c99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5245#issuecomment-87700160 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29401/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-87686119 [Test build #29402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29402/consoleFull) for PR 5267 at commit [`af0f65b`](https://github.com/apache/spark/commit/af0f65bb4726315c076b827d37276616c5218010). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5264#issuecomment-87686598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29398/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4812#discussion_r27392069 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -619,10 +619,26 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Some(f) = nodeToRelation(f) case None = NoRelation } - + +val not = (?i)not.r +val exists = (?i)exists.r + val withWhere = whereClause.map { whereNode = - val Seq(whereExpr) = whereNode.getChildren.toSeq - Filter(nodeToExpr(whereExpr), relations) + val Seq(clause) = whereNode.getChildren.toSeq + clause match { +case Token(not(), + Token(TOK_SUBQUERY_EXPR, + Token(TOK_SUBQUERY_OP, Token(exists(), Nil) :: Nil) :: + subquery :: Nil) :: Nil) = + Exists(relations, nodeToPlan(subquery), false) +case Token(TOK_SUBQUERY_EXPR, + Token(TOK_SUBQUERY_OP, Token(exists(), Nil) :: Nil) :: + subquery :: Nil) = + Exists(relations, nodeToPlan(subquery), true) +// TODO add IN and NOT IN +case whereExpr = + Filter(nodeToExpr(whereExpr), relations) + } --- End diff -- Seems this do not support sql with both predicts and exists in where clause: ``` select * from src b where (not exists (select a.key from src a where b.value = a.value and a.key = b.key and a.value 'val_2' ) ) and key 1 ; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5245#issuecomment-87700120 [Test build #29401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29401/consoleFull) for PR 5245 at commit [`b70e7e1`](https://github.com/apache/spark/commit/b70e7e1d0b96c74f4adbe4ebd76442756c072313). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper] ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6595][SQL] MetastoreRelation should be ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5251 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87688106 You can use Spark to do this too, sure. Functions can call the HDFS API to check and delete files in parallel. Roughly: ``` sc.parallelize(fs.listStatus(...).map(_.getPath.toString)).map { pathStr = val path = new Path(pathStr) val in = new GZIPInputStream(fs.open(path)) try { in.read() } catch { case e: ZipException = fs.delete(path, false) } finally { in.close() } } ``` I'm sure that's not 100% right but you see the idea. I am not proposing that this become a Spark API. It seems like an application-specific piece of logic that can be written using Spark. I don't claim Scala + Spark + Hadoop is easy, but it is directly doable with these tools. I think the point stands that this change does not help solve the problem directly, as the above does. It ignores the problem, which is sometimes a fine strategy, but at the cost of significant side effects. The side effects are the non-starter, to me. But the upside is I think there is a direct solution too. Well I've said enough so it's time to let others weigh in too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5262 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5263#issuecomment-87696048 This is a good point. Actually all these characters ` ,;{}()\n\t=` (note there is a space character at the beginning) can be problematic if they appear in field names, according to [`MessageTypeParser`] [1]. However, personally I think simply replacing these characters with legitimate ones like brackets might be confusing. On the other hand, similar problems can be worked around easily by assigning an alias. So how about this: 1. Check all field names for invalid characters in `convertFromAttributes` 2. Throw an error message when any invalid character is found 3. In the error message, suggest the user to add an alias to the field explicitly [1]: https://github.com/apache/incubator-parquet-mr/blob/b8f5d89e0f4347ce54cf680bd7dffc9bc02f876a/parquet-column/src/main/java/parquet/schema/MessageTypeParser.java#L46 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87691639 [Test build #29399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull) for PR 5265 at commit [`7f37d21`](https://github.com/apache/spark/commit/7f37d2142a388e5717ae2c3e89152c8c735904cc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87691658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29399/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update load-spark-env.sh
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5261#issuecomment-87628770 (You could give this a more specific title than Update load-spark-env.sh) This is borderline important enough for a JIRA, but I think we might consider this a minor add-on fix for SPARK-4924, maybe. I'm not sure about this. For example `spark-class` sources this script with `. $SPARK_HOME/bin/load-spark-env.sh` and `pyspark` does similarly. So these have `SPARK_HOME` set. However `run-example` uses `. $FWDIR/bin/load-spark-env.sh`, and scripts in `sbin` use `. $SPARK_PREFIX/bin/load-spark-env.sh` Clearly they don't expect `SPARK_HOME` necessarily. CC @vanzin since this used to refer to `FWDIR` actually: https://github.com/apache/spark/commit/517975d89d40a77c7186f488547eed11f79c1e97 The lines you reference don't exist in 1.3.0 though. Are you sure you're using 1.3.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
GitHub user josegom opened a pull request: https://github.com/apache/spark/pull/5262 Update start-slave.sh wihtout this change the below error happens when I execute sbin/start-all.sh localhost: /spark-1.3/sbin/start-slave.sh: line 32: unexpected EOF while looking for matching `' localhost: /spark-1.3/sbin/start-slave.sh: line 33: syntax error: unexpected end of file my operating system is Linux Mint 17.1 Rebecca You can merge this pull request into a Git repository by running: $ git pull https://github.com/josegom/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5262.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5262 commit 2c456bd66555646a60529571d313ad392c6bd1f2 Author: Jose Manuel Gomez jmgo...@stratio.com Date: 2015-03-30T10:32:01Z Update start-slave.sh wihtout this change the below error happens when I execute sbin/start-all.sh localhost: /spark-1.3/sbin/start-slave.sh: line 32: unexpected EOF while looking for matching `' localhost: /spark-1.3/sbin/start-slave.sh: line 33: syntax error: unexpected end of file my operating system is Linux Mint 17.1 Rebecca --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5263#issuecomment-87637278 [Test build #29395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29395/consoleFull) for PR 5263 at commit [`1de001d`](https://github.com/apache/spark/commit/1de001d375d06ec681a2ac4eb3a62f01310af21d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87630374 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user josegom closed the pull request at: https://github.com/apache/spark/pull/5260 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6597][Minor] Replace `input:checkbox` w...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5254#issuecomment-87632095 This sounds fine. Since the tests won't test it, have you had a chance to try the affected controls locally to verify they still work as expected? A manual test would be good to double-check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5262#discussion_r27381022 --- Diff: sbin/start-slave.sh --- @@ -19,7 +19,7 @@ # Starts a slave on the machine this script is executed on. -usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 +usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 --- End diff -- Are you going to update this one? it's minor but I think it not worth even dealing with double quotes in a quoted string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user josegom commented on a diff in the pull request: https://github.com/apache/spark/pull/5262#discussion_r27383584 --- Diff: sbin/start-slave.sh --- @@ -19,7 +19,7 @@ # Starts a slave on the machine this script is executed on. -usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 +usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 --- End diff -- I change the quotations form the correct place. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87651382 [Test build #29397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29397/consoleFull) for PR 5262 at commit [`453af8b`](https://github.com/apache/spark/commit/453af8ba57fa65b32469beb969707aec4b713ee2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user josegom commented on a diff in the pull request: https://github.com/apache/spark/pull/5262#discussion_r27380158 --- Diff: sbin/start-slave.sh --- @@ -19,7 +19,7 @@ # Starts a slave on the machine this script is executed on. -usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 +usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 --- End diff -- Ok I closed the other PR yet. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6607][SQL] Aggregation attribute name i...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/5263 [SPARK-6607][SQL] Aggregation attribute name including special chars '(' and ')' should be replaced before generating Parquet schema '(' and ')' are special characters used in Parquet schema for type annotation. When we run an aggregation query, we will obtain attribute name such as MAX(a). If we directly store the generated DataFrame as Parquet file, it causes failure when reading and parsing the stored schema string. Several methods can be adopted to solve this. This pr uses a simplest one to just replace attribute names before generating Parquet schema based on these attributes. Another possible method might be modifying all aggregation expression names from func(column) to func[column]. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 parquet_aggregation_name Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5263 commit 1de001d375d06ec681a2ac4eb3a62f01310af21d Author: Liang-Chi Hsieh vii...@gmail.com Date: 2015-03-30T11:05:26Z Replace special characters '(' and ')' of Parquet schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4812#issuecomment-87654495 Hi @chenghao-intel can you rebase this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6598][MLLIB] Python API for IDFModel
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5264#issuecomment-87660274 [Test build #29398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29398/consoleFull) for PR 5264 at commit [`1dc522c`](https://github.com/apache/spark/commit/1dc522cab1bdfe55f8245c687ba6b866ca07853e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5262#discussion_r27379765 --- Diff: sbin/start-slave.sh --- @@ -19,7 +19,7 @@ # Starts a slave on the machine this script is executed on. -usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 +usage=Usage: start-slave.sh worker# spark-master-URL where spark-master-URL is like spark://localhost:7077 --- End diff -- Ah I see, good catch. Actually, the quote in `spark` should just be removed. I think. You can close your other PR and push an update to this one and I'll merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6596] fix the instruction on building s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5228#issuecomment-87638533 [Test build #29396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29396/consoleFull) for PR 5228 at commit [`41a5768`](https://github.com/apache/spark/commit/41a5768b9eb5154b2f1af38199b3c121770a5367). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class StdDeviation(child: Expression)` * `case class StdDeviationFunction(expr: Expression, base: AggregateExpression)` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5228#issuecomment-87638517 [Test build #29396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29396/consoleFull) for PR 5228 at commit [`41a5768`](https://github.com/apache/spark/commit/41a5768b9eb5154b2f1af38199b3c121770a5367). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6548 : Adding stddev to DataFrame ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5228#issuecomment-87638534 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29396/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87650270 LGTM as a hotfix for SPARK-6552. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user tigerquoll commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87650287 Hi Sean, Thanks for your input - your views have helped me refine my thinking on the matter. I believe that If you take a purist's point of view then yes you can say the source of the problem (likely) is with the data producer and should be fixed at the data producer's end. The point being is that this is a problem that is affecting many spark users right now, and many users are not in control of the source system of the data they are analysing and are forced to 'make do' with what they have. You call this solution a band-aid - but many ETL solutions are a bandaid - but providing this functionality is useful and serves a purpose for the end-user. Are you concerned that swallowing an exception could leave the hadoop input libraries in an inconsistent state, causing more data corruption? This will not happen because swallowing the exception triggers the immediate finish of the file reading task and no more data will be read by the task. Are you concerned that swallowing an exception indicates that something has potentially gone wrong earlier in the hadoop input read, and that previous data could have been corrupted? The user already knows this is potentially the case because running the application without this option enabled has caused the application to terminate in the first place. The fact that we are being more permissive of potentially corrupt data is a show stopper for this being default behaviour - but I'm not proposing this be default behaviour, I'm proposing this be a last-ditch option that an advanced user can knowingly enable when attempting to deal with corrupted data, with the understanding that their data could be made worse, but most likely corrupt data will be omitted. The alternative is to tell them that their data is not suitable for being loaded into spark and perhaps they should use another tool or tell the data system owner to fix their data feeds and get back to them with another data set some time in the future. I know which option I would prefer if given the choice - don't let perfect be the enemy of good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update start-slave.sh
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5262#issuecomment-87650305 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87653690 I don't know if corrupted gzip files are such a common problem, but I'm not sure that would change the logic about where to fix things. It is a problem with the preceding ETL process, yes. Something else needs to explicitly check and/or fix the input first if this is a problem. I suppose my point too is that this change does not just address the proposed problem with gzip files. It treats any error as recoverable. It's nothing to do with inconsistent state. It's the presenting a successful result that is actually silently missing input, which might not even be deterministic. This seems way more problematic than reliably failing-fast and, yes, making you fix your upstream process. Hiding behind a flag only goes so far. It's documented (or else how many people does it help?). It becomes a code path that has to be supported for a long time. It is presented to users as a fine thing to do when I don't believe it is. It's not the good being the enemy of the perfect, but the dangerous being the enemy of the good. This is nothing to do with telling people they can't use Spark, or have to fix an unfixable upstream process. This is about appropriately dealing with bad upstream data in the right place, and this is not how to do it. Specifically: why not write a process that just opens a stream on each input file in turn and tries to read a handful of bytes? if it fails, delete the file or do what you like with it. This is maybe 10 lines of code in your driver. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6592] fix filter for scaladoc to genera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5252#issuecomment-87706210 [Test build #29404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29404/consoleFull) for PR 5252 at commit [`02098a4`](https://github.com/apache/spark/commit/02098a4667f251e7999c8f9cae3b3fa662513acb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6595][SQL] MetastoreRelation should be ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5251#issuecomment-87704476 Merged to master and 1.3, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5265#issuecomment-87708936 @petro-rudenko Oops, that's a good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-87708916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29400/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org