[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/4087#issuecomment-70446766 [~leahmcguire], Thanks for the patch! A few comments: 1. PySpark calls the Scala API for MLlib, so for API compatibility, we can't use enumerations on the public APIs. I suggest using a string for the train() functions but keeping the enumeration for the internal API. 2. Can you create a new JIRA for updating the PySpark MLlib NB API? I can post details on what needs to change there -- if you don't want to do the PR for that, I can. 3. The populateMatrix function is verbose. Breeze seems to support element-wise operations (https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) which might be negate the need for the populateMatrix function. 4. Can you update the MLlib docs in docs/mllib-naive-bayes.md ? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] fix typo in class description
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4100#issuecomment-70453298 [Test build #25746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25746/consoleFull) for PR 4100 at commit [`b13b9d6`](https://github.com/apache/spark/commit/b13b9d6345df178e49fb1a5be6016008b0b08488). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/4043#issuecomment-70453703 @pwendell - patch updated to latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70453833 [Test build #25740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25740/consoleFull) for PR 4098 at commit [`b349b77`](https://github.com/apache/spark/commit/b349b77509229eee3ea5a7f3fbad6737b82d2e95). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug fix for SPARK-5242: ec2/spark_ec2.py lauc...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4038#issuecomment-70442384 cc @shivaram I haven't had a chance to look at this more closely yet, and likely won't until next weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3946#discussion_r23142124 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -384,4 +388,32 @@ class HiveThriftServer2Suite extends FunSuite with Logging { } } } + + test(SPARK-5100 monitor page) { --- End diff -- @JoshRosen, I have talked with @liancheng about the UISeleniumSuite. I did not add more complex web UI tests because that we worried about the test costs too much time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-70447973 [Test build #25737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25737/consoleFull) for PR 2495 at commit [`0461ed0`](https://github.com/apache/spark/commit/0461ed06a66966480a93085e41fdb0a620804222). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-70447975 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25737/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70448381 [Test build #25738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25738/consoleFull) for PR 3897 at commit [`8232aa8`](https://github.com/apache/spark/commit/8232aa8b07a10cb6d1e07e8be49741585f1b4126). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user jackylk commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-70450975 Yes, I have tested the parallel FP-Growth algorithm using a open data set from http://fimi.ua.ac.be/data/, performance test result can be found at https://issues.apache.org/jira/browse/SPARK-4001 All modification is done except for the 7th (generic type), please review the code for now. I am still considering whether it is worthy to implement generic type since it adds more complexity to the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4699][SQL] make caseSensitive configura...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3558#issuecomment-70456370 [Test build #25751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25751/consoleFull) for PR 3558 at commit [`05b09a3`](https://github.com/apache/spark/commit/05b09a3c1008869571e438c12e8593def7ecdc2c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4699][SQL] make caseSensitive configura...
Github user jackylk commented on the pull request: https://github.com/apache/spark/pull/3558#issuecomment-70456295 I have updated the code based on SPARK-3965 (SPARK-5168) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70446535 [Test build #25735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25735/consoleFull) for PR 3946 at commit [`daed3d1`](https://github.com/apache/spark/commit/daed3d126a5112d9e4e94fac7592ff804775ec05). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70447398 The problem is that: Currently the `GetField` class is an operation which picks the first field whose name equal to the required `fieldName` with case sensitive. As I said before, we will parse `a.b[0].c.d` to `GetField(GetField(GetItem(Unresolved(a.b), 0), c), d)`. For the `a.b`, we can check anything we want before build `GetField`, but for the 2 outer `GetFiled`, we can only do the check in `Analyzer`(or we can expose `resolver` to `GetField`, but it's not recommended). So we need a way to indicate whether a `GetField` need analyse or not. For SPARK-3698, we can do this by searching required field with case sensitive, if success, we are done. if not, we still have chance if the resolver is case insensitive, so we can do the check in `Analyzer` as @marmbrus did in https://github.com/apache/spark/pull/3724. For SPARK-5278 here, it's more complicated. it seems to me that the only way is adding a flag to `GetField`, or introduce `UnresolvedGetField`. What do you think? @marmbrus @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] fix typo in class description
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4100#issuecomment-70452986 [Test build #25744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25744/consoleFull) for PR 4100 at commit [`fcc8c85`](https://github.com/apache/spark/commit/fcc8c857aef468d1a86c085554a2a7184ff769a3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3794#issuecomment-70452982 [Test build #25745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25745/consoleFull) for PR 3794 at commit [`b535a53`](https://github.com/apache/spark/commit/b535a531ee853c29d63cda0154be54512740bc78). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] fix typo in class description
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4100#issuecomment-70457188 Thanks. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/4069#issuecomment-70441745 @srowen Would you mind to take another look? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4001#issuecomment-70441804 `HiveShim.getCommandProcess` delegates to methods defined in `CommandProcessorFactory`, which tries to find a cached `Driver` object and initialize it. The underlying `Driver` cache map is synchronized. However, I'm not quite sure whether `Driver` is thread-safe. Also, `HiveServer2` actually creates a new `Driver` instance for every SQL statement and never caches them. Considering all the above, I'd agree that the risks is greater than the benefits. A better solution for this is to avoid using `HiveShim.getCommandProcess` (which caches `Driver` objects) but mimicing what `HiveServer2` does and create new `Driver` instances for every SQL statement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70445416 [Test build #25734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25734/consoleFull) for PR 3897 at commit [`932289f`](https://github.com/apache/spark/commit/932289f6d808932da9fa54c21b32c61efca5a18f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70446387 rebased from latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.
Github user mfawzymkh commented on the pull request: https://github.com/apache/spark/pull/1813#issuecomment-70446580 do we have an ETA to get this pull request merged to master? The guava shading issue is causing a problem for client libs that has a dependency on swift-service when spark is compiled with hadoop-2.4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on a diff in the pull request: https://github.com/apache/spark/pull/4087#discussion_r23142812 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -75,9 +106,12 @@ class NaiveBayesModel private[mllib] ( * document classification. By making every vector a 0-1 vector, it can also be used as * Bernoulli NB ([[http://tinyurl.com/p7c96j6]]). The input feature values must be nonnegative. */ -class NaiveBayes private (private var lambda: Double) extends Serializable with Logging { +class NaiveBayes private (private var lambda: Double, + var model: NaiveBayesModels) extends Serializable with Logging { - def this() = this(1.0) + def this(lambda: Double) = this(lambda, NaiveBayesModels.Multinomial) + + def this() = this(1.0, NaiveBayesModels.Multinomial) --- End diff -- I suggest removing the default model for the internal API. Backwards compatibility only matters for public API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70449375 [Test build #25733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25733/consoleFull) for PR 3946 at commit [`14a461d`](https://github.com/apache/spark/commit/14a461dc3dc05b66bd8f6c4027e4c1a39a84d90d). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70449381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25733/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70449484 [Test build #25739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25739/consoleFull) for PR 3997 at commit [`0d9d130`](https://github.com/apache/spark/commit/0d9d13040e4d2730ec1c8ceaf5d8d48ead9d0bd8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/4098 [SPARK-5307] SerializationDebugger - take 2 This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object. Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc. An example serialization stack: ``` Serialization stack: -object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable@2c43caa4) -element of array (index: 0) -array (class [Ljava.lang.Object;, size 1) -field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;) -object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray@193c5908) -writeExternal data -externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass@320bdadc) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SerializationDebugger Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4098.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4098 commit b349b77509229eee3ea5a7f3fbad6737b82d2e95 Author: Reynold Xin r...@databricks.com Date: 2015-01-19T05:55:01Z [SPARK-5307] SerializationDebugger to help debug NotSerializableException - take 2 This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object. Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70450284 I've tested this PR but the result seems to be off. Parquet generated from Hive with timestamp values set by 'from_utc_timestamp('1970-01-01 08:00:00','PST')' What I see with this PR: scala t.take(10).foreach(println(_)) ... 15/01/18 22:06:41 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: file:/users/x/parquetwithtimestamp start: 0 end: 25448 length: 25448 hosts: [] requestedSchema: message root { optional binary code (UTF8); optional binary description (UTF8); optional int32 total_emp; optional int32 salary; optional int96 timestamp; } readSupportMetadata: {org.apache.spark.sql.parquet.row.metadata={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}, org.apache.spark.sql.parquet.row.requested_schema={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}}} 15/01/18 22:06:41 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 15/01/18 22:06:41 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 823 records. 15/01/18 22:06:41 INFO InternalParquetRecordReader: at row 0. reading next block 15/01/18 22:06:41 INFO CodecPool: Got brand-new decompressor [.snappy] 15/01/18 22:06:41 INFO InternalParquetRecordReader: block read in memory in 27 ms. row count = 823 [00-,All Occupations,134354250,40690,1974-01-07 17:58:00.08896] [11-,Management occupations,6003930,96150,1974-01-07 17:58:00.08896] Expect: 1970-01-01 08:00:00 Actual: 1974-01-07 17:58:00.08896 Any idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-70450297 [Test build #25742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25742/consoleFull) for PR 2847 at commit [`eb3e4ca`](https://github.com/apache/spark/commit/eb3e4ca0709696b6b2b8afd1cfc56a5a9f87555d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70450293 [Test build #25741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25741/consoleFull) for PR 4098 at commit [`572d0cb`](https://github.com/apache/spark/commit/572d0cbfdbbd816d45290b38c6c6c86d2447efdc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] fix typo in class description
GitHub user jackylk opened a pull request: https://github.com/apache/spark/pull/4100 [SQL] fix typo in class description You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/spark patch-9 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4100.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4100 commit fcc8c857aef468d1a86c085554a2a7184ff769a3 Author: Jacky Li jacky.li...@gmail.com Date: 2015-01-19T06:52:57Z [SQL] fix typo in class description --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70454202 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25747/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5297][Streaming] Fix Java file stream t...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/4101 [SPARK-5297][Streaming] Fix Java file stream type erasure problem Current Java file stream doesn't support custom key/value type because of loss of type information, details can be seen in [SPARK-5297](https://issues.apache.org/jira/browse/SPARK-5297). Fix this problem by getting correct `ClassTag` from `Class[_]`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-5297 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4101.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4101 commit 6c179f50bde9cfa46c6e2225313c992b231fb25f Author: jerryshao saisai.s...@intel.com Date: 2015-01-19T06:49:00Z Fix Java fileInputStream type erasure problem commit ec0131c1a2f4a4097d6d7b2f8a27d7abbf39b746 Author: jerryshao saisai.s...@intel.com Date: 2015-01-19T07:12:35Z Add Mima exclusion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70454200 [Test build #25747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25747/consoleFull) for PR 4068 at commit [`bfe069b`](https://github.com/apache/spark/commit/bfe069bfb3ac6e80fa82849b4e1dee90a606e731). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user wangxiaojing commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-70437483 @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on a diff in the pull request: https://github.com/apache/spark/pull/4087#discussion_r23142620 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -75,9 +106,12 @@ class NaiveBayesModel private[mllib] ( * document classification. By making every vector a 0-1 vector, it can also be used as * Bernoulli NB ([[http://tinyurl.com/p7c96j6]]). The input feature values must be nonnegative. */ -class NaiveBayes private (private var lambda: Double) extends Serializable with Logging { +class NaiveBayes private (private var lambda: Double, + var model: NaiveBayesModels) extends Serializable with Logging { --- End diff -- Model should probably be a val, not a var. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70449497 Link to the earlier attempt: https://github.com/apache/spark/pull/4093 by me and https://github.com/apache/spark/issues/3518 by @ilganeli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4099 [SPARK-5022] [Sql] Change VectorUDT to object You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark spark-5022 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4099.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4099 commit 0014a59c0d0d263482208c16cc8601205fe565bf Author: MechCoder manojkumarsivaraj...@gmail.com Date: 2015-01-19T06:16:15Z [SPARK-5022] Change VectorUDT to object --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70450637 cc @rxin I am unable to understand how to change this line `@SQLUserDefinedType(udt = classOf[VectorUDT])` . I tried doing `@SQLUserDefinedType(udt = VectorUDT.getClass)` Sorry if this seems dumb, because I'm relatively new. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2310#issuecomment-70440912 Sounds good, I concur. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70441989 Alright, but maybe the documentation can be updated that the indices should be non-negative? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...
Github user idanz commented on the pull request: https://github.com/apache/spark/pull/4094#issuecomment-70443024 I see, I don't want to repeat old discussions so to be more pragmatic, the real problem for me is to set the partition size when using sparksql. My cluster uses 128MB blocks for hdfs, and when I use hiveContext.sql, it just takes the partitions in that size. This causes memory issues so I wanted to use smaller partitions. However the only way I found to do that requires setting mapred.map.tasks which is an undocumented setting. Would you suggest opening a new ticket for this requirement? Thanks Here's some links to prior discussions of this: - https://issues.apache.org/jira/browse/SPARK-822 - mesos/spark#718 https://github.com/mesos/spark/pull/718 — Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/4094#issuecomment-70436546. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4131][SQL] Writing data into the f...
Github user nieldomingo commented on the pull request: https://github.com/apache/spark/pull/2997#issuecomment-70443145 this would really help me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70443890 @mengxr I have implemented several variants of Kullback-Leibler divergence in my separate GitHub repository https://github.com/derrickburns/generalized-kmeans-clustering. These variants are more efficient that the standard KL-divergence which is defined on R+ ^ n because they take advantage of extra knowledge of the domain. I have used these variants with much success (i.e. much faster running time) in my large scale clustering runs. On Sat, Jan 17, 2015 at 7:02 PM, UCB AMPLab notificati...@github.com wrote: Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25711/ Test FAILed. — Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2634#issuecomment-70394598. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70444727 [Test build #25733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25733/consoleFull) for PR 3946 at commit [`14a461d`](https://github.com/apache/spark/commit/14a461dc3dc05b66bd8f6c4027e4c1a39a84d90d). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70444764 Rebase is not finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on a diff in the pull request: https://github.com/apache/spark/pull/4087#discussion_r23142579 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD * @param pi log of class priors, whose dimension is C, number of labels * @param theta log of class conditional probabilities, whose dimension is C-by-D, * where D is number of features + * @param model The type of NB model to fit from the enumeration NaiveBayesModels, can be + * Multinomial or Bernoulli */ + class NaiveBayesModel private[mllib] ( val labels: Array[Double], val pi: Array[Double], -val theta: Array[Array[Double]]) extends ClassificationModel with Serializable { - - private val brzPi = new BDV[Double](pi) - private val brzTheta = new BDM[Double](theta.length, theta(0).length) +val theta: Array[Array[Double]], +val model: NaiveBayesModels) extends ClassificationModel with Serializable { - { -// Need to put an extra pair of braces to prevent Scala treating `i` as a member. + def populateMatrix(arrayIn: Array[Array[Double]], + matrixIn: BDM[Double], + transformation: (Double) = Double = (x) = x) = { var i = 0 -while (i theta.length) { +while (i arrayIn.length) { var j = 0 - while (j theta(i).length) { -brzTheta(i, j) = theta(i)(j) + while (j arrayIn(i).length) { +matrixIn(i, j) = transformation(theta(i)(j)) j += 1 } i += 1 } } + private val brzPi = new BDV[Double](pi) + private val brzTheta = new BDM[Double](theta.length, theta(0).length) + populateMatrix(theta, brzTheta) + + private val brzNegTheta: Option[BDM[Double]] = model match { --- End diff -- Why use an Option if this method is only called for Bernoulli anyway? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70445944 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70455615 @mateiz I've rebased this PR and finished tests successfully. Merge this, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70455616 [Test build #25750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25750/consoleFull) for PR 4068 at commit [`d8c1dc9`](https://github.com/apache/spark/commit/d8c1dc958148a4b052b387f5573b147cfd9385da). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70456466 [Test build #25750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25750/consoleFull) for PR 4068 at commit [`d8c1dc9`](https://github.com/apache/spark/commit/d8c1dc958148a4b052b387f5573b147cfd9385da). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70456470 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25750/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on a diff in the pull request: https://github.com/apache/spark/pull/4087#discussion_r23142512 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD * @param pi log of class priors, whose dimension is C, number of labels * @param theta log of class conditional probabilities, whose dimension is C-by-D, * where D is number of features + * @param model The type of NB model to fit from the enumeration NaiveBayesModels, can be + * Multinomial or Bernoulli */ + class NaiveBayesModel private[mllib] ( val labels: Array[Double], val pi: Array[Double], -val theta: Array[Array[Double]]) extends ClassificationModel with Serializable { - - private val brzPi = new BDV[Double](pi) - private val brzTheta = new BDM[Double](theta.length, theta(0).length) +val theta: Array[Array[Double]], +val model: NaiveBayesModels) extends ClassificationModel with Serializable { - { -// Need to put an extra pair of braces to prevent Scala treating `i` as a member. + def populateMatrix(arrayIn: Array[Array[Double]], --- End diff -- This function seems excessive. Does the Breeze library support element-wise log/exp and addition/subtraction with matrices? If so, that would be cleaner and less verbose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user rnowling commented on a diff in the pull request: https://github.com/apache/spark/pull/4087#discussion_r23142533 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD * @param pi log of class priors, whose dimension is C, number of labels * @param theta log of class conditional probabilities, whose dimension is C-by-D, * where D is number of features + * @param model The type of NB model to fit from the enumeration NaiveBayesModels, can be + * Multinomial or Bernoulli */ + class NaiveBayesModel private[mllib] ( val labels: Array[Double], val pi: Array[Double], -val theta: Array[Array[Double]]) extends ClassificationModel with Serializable { - - private val brzPi = new BDV[Double](pi) - private val brzTheta = new BDM[Double](theta.length, theta(0).length) +val theta: Array[Array[Double]], --- End diff -- This should probably be converted to a Breeze matrix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70445835 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25732/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70445832 [Test build #25732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25732/consoleFull) for PR 3897 at commit [`25f3617`](https://github.com/apache/spark/commit/25f3617182c8d4491a0545e3231dbdafe668c4a5). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * ` command.setValue(scd $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70446696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25735/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70446695 [Test build #25735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25735/consoleFull) for PR 3946 at commit [`daed3d1`](https://github.com/apache/spark/commit/daed3d126a5112d9e4e94fac7592ff804775ec05). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-70447655 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-70447688 [Test build #25737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25737/consoleFull) for PR 2495 at commit [`0461ed0`](https://github.com/apache/spark/commit/0461ed06a66966480a93085e41fdb0a620804222). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70447740 [Test build #25734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25734/consoleFull) for PR 3897 at commit [`932289f`](https://github.com/apache/spark/commit/932289f6d808932da9fa54c21b32c61efca5a18f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70447746 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25734/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70452235 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25738/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70452231 [Test build #25738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25738/consoleFull) for PR 3897 at commit [`8232aa8`](https://github.com/apache/spark/commit/8232aa8b07a10cb6d1e07e8be49741585f1b4126). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70453518 [Test build #25739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25739/consoleFull) for PR 3997 at commit [`0d9d130`](https://github.com/apache/spark/commit/0d9d13040e4d2730ec1c8ceaf5d8d48ead9d0bd8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70453523 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25739/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-70453627 [Test build #25747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25747/consoleFull) for PR 4068 at commit [`bfe069b`](https://github.com/apache/spark/commit/bfe069bfb3ac6e80fa82849b4e1dee90a606e731). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4043#issuecomment-70453629 [Test build #25748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25748/consoleFull) for PR 4043 at commit [`3b11803`](https://github.com/apache/spark/commit/3b11803ae9b64acba2d64ad02d1e31d756783eaf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][WEBUI] Adding a pop-up cont...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-70437399 Hmm, agree with you, but have not found a easy way to spot truncated description. if we add `...` for truncated decs, we will consider the case of window scaling and different screen which make this more complex --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3897#issuecomment-70441182 [Test build #25732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25732/consoleFull) for PR 3897 at commit [`25f3617`](https://github.com/apache/spark/commit/25f3617182c8d4491a0545e3231dbdafe668c4a5). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70449764 [Test build #25740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25740/consoleFull) for PR 4098 at commit [`b349b77`](https://github.com/apache/spark/commit/b349b77509229eee3ea5a7f3fbad6737b82d2e95). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70451197 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25743/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70451196 [Test build #25743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25743/consoleFull) for PR 4099 at commit [`0014a59`](https://github.com/apache/spark/commit/0014a59c0d0d263482208c16cc8601205fe565bf). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-70452828 [Test build #25742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25742/consoleFull) for PR 2847 at commit [`eb3e4ca`](https://github.com/apache/spark/commit/eb3e4ca0709696b6b2b8afd1cfc56a5a9f87555d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-70452833 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25742/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70453837 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25740/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4098#discussion_r23146186 --- Diff: core/src/test/scala/org/apache/spark/serializer/SerializationDebuggerSuite.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.serializer + +import java.io.{ObjectOutput, ObjectInput} + +import org.scalatest.{BeforeAndAfterEach, FunSuite} + + +class SerializationDebuggerSuite extends FunSuite with BeforeAndAfterEach { + + import SerializationDebugger.find + + override def beforeEach(): Unit = { +SerializationDebugger.enableDebugging = true + } + + test(primitives, strings, and nulls) { +assert(find(1) === List.empty) +assert(find(1L) === List.empty) +assert(find(1.toShort) === List.empty) +assert(find(1.0) === List.empty) +assert(find(1) === List.empty) +assert(find(null) === List.empty) + } + + test(primitive arrays) { +assert(find(Array[Int](1, 2)) === List.empty) +assert(find(Array[Long](1, 2)) === List.empty) + } + + test(non-primitive arrays) { +assert(find(Array(aa, bb)) === List.empty) +assert(find(Array(new SerializableClass1)) === List.empty) + } + + test(serializable object) { +assert(find(new Foo(1, b, 'c', 'd', null, null, null)) === List.empty) + } + + test(nested arrays) { +val foo1 = new Foo(1, b, 'c', 'd', null, null, null) +val foo2 = new Foo(1, b, 'c', 'd', null, Array(foo1), null) +assert(find(new Foo(1, b, 'c', 'd', null, Array(foo2), null)) === List.empty) + } + + test(nested objects) { +val foo1 = new Foo(1, b, 'c', 'd', null, null, null) +val foo2 = new Foo(1, b, 'c', 'd', null, null, foo1) +assert(find(new Foo(1, b, 'c', 'd', null, null, foo2)) === List.empty) + } + + test(cycles (should not loop forever)) { +val foo1 = new Foo(1, b, 'c', 'd', null, null, null) +foo1.g = foo1 +assert(find(new Foo(1, b, 'c', 'd', null, null, foo1)) === List.empty) + } + + test(root object not serializable) { +val s = find(new NotSerializable) +assert(s.size === 1) +assert(s.head.contains(NotSerializable)) + } + + test(array containing not serializable element) { +val s = find(new SerializableArray(Array(new NotSerializable))) +assert(s.size === 5) +assert(s(0).contains(NotSerializable)) +assert(s(1).contains(element of array)) +assert(s(2).contains(array)) +assert(s(3).contains(arrayField)) +assert(s(4).contains(SerializableArray)) + } + + test(object containing not serializable field) { +val s = find(new SerializableClass2(new NotSerializable)) +assert(s.size === 3) +assert(s(0).contains(NotSerializable)) +assert(s(1).contains(objectField)) +assert(s(2).contains(SerializableClass2)) + } + + test(externalizable class writing out not serializable object) { +val s = find(new ExternalizableClass) +assert(s.size === 5) +assert(s(0).contains(NotSerializable)) +assert(s(1).contains(objectField)) +assert(s(2).contains(SerializableClass2)) +assert(s(3).contains(writeExternal)) +assert(s(4).contains(ExternalizableClass)) + } +} + + +class SerializableClass1 extends Serializable + + +class SerializableClass2(val objectField: Object) extends Serializable + + +class SerializableArray(val arrayField: Array[Object]) extends Serializable + + +class ExternalizableClass extends java.io.Externalizable { + override def writeExternal(out: ObjectOutput): Unit = { +out.writeInt(1) +out.writeObject(new SerializableClass2(new NotSerializable)) + } + + override def readExternal(in: ObjectInput): Unit = {} +} + + +class Foo( +a: Int, +b:
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70447201 [Test build #25736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25736/consoleFull) for PR 3946 at commit [`fb507df`](https://github.com/apache/spark/commit/fb507df555db0084ea7d91ae8a7167d0164480c0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4094#issuecomment-70448190 Hey @idanz first of all, we should add some comments to the code referencing SPARK-822, so that we don't go through this all over again for the core Spark API. Second, maybe we should have a configuration option in Spark SQL that allows you to tune this for input tables there (if that's doable). It would be more narrowly scoped to only Spark SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70450761 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25736/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3946#issuecomment-70450760 [Test build #25736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25736/consoleFull) for PR 3946 at commit [`fb507df`](https://github.com/apache/spark/commit/fb507df555db0084ea7d91ae8a7167d0164480c0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4099#issuecomment-70450843 [Test build #25743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25743/consoleFull) for PR 4099 at commit [`0014a59`](https://github.com/apache/spark/commit/0014a59c0d0d263482208c16cc8601205fe565bf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70454498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25741/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4098#issuecomment-70454492 [Test build #25741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25741/consoleFull) for PR 4098 at commit [`572d0cb`](https://github.com/apache/spark/commit/572d0cbfdbbd816d45290b38c6c6c86d2447efdc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5297][Streaming] Fix Java file stream t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4101#issuecomment-70454568 [Test build #25749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25749/consoleFull) for PR 4101 at commit [`ec0131c`](https://github.com/apache/spark/commit/ec0131c1a2f4a4097d6d7b2f8a27d7abbf39b746). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4091#issuecomment-70400460 [Test build #25718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25718/consoleFull) for PR 4091 at commit [`cd8860b`](https://github.com/apache/spark/commit/cd8860bf30ad99480794f85529cfcc7230ba01ee). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70400053 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25714/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70400050 [Test build #25714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25714/consoleFull) for PR 3997 at commit [`93f0d46`](https://github.com/apache/spark/commit/93f0d461487f9582a6bc2a34f09179dbe8672d3d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4093#issuecomment-70400534 [Test build #25716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25716/consoleFull) for PR 4093 at commit [`bde6512`](https://github.com/apache/spark/commit/bde6512a55765a48ca74f321068f9ab91516edae). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `out.stack.map(o = s - $o (class $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-70434374 @hhbyyh Yes, please review the design doc linked from the JIRA. There is quite a bit of functionality which will not be in this initial PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][minor] Put DataTypes.java in java dir.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4097 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70435538 Hi @jacek-lewandowski, Thanks for bringing this up to date. I took a quick pass through and left some minor comments. Just to clarify: this only adds SSL support for internal HttpServer and Akka traffic, and not the Spark web UI? When we last discussed this in #2739, I think the idea was that SSLOptions could use namespaced configs in order to allow the web UI to use different SSL configurations than, say, Akka. I see that there's some namespace support built into this patch (the `ns` argument to `parse`); is this support sufficient to support HTTPs in the UI? Also, does it support scenarios where I want to enable SSL only for the UI or only for Akka? Settings like `spark.ssl.enabled` sound like they're systemwide settings, so we should think through how these might interact with different UI configurations, etc. I'm not asking to implement SSL for the UI in this patch, but I'd like to just make sure that the SSLOptions configuration code will be compatible with it. It would be great if you could add a short summary of this PR's changes to the PR description, since that description will become this PR's commit message. There's a big block comment at the top of `SecurityManager.scala` which should be updated to reflect this PR's changes (it currently says We currently do not support SSL (https) ...). It would also be great to add a small section to the security documentation (`docs/security.md`) to mention how to configure this. The documentation should mention the relevant Spark options, describe how/why someone would use the `useNodeLocalConf` setting, etc. It could also contain a pointer to external instructions for generating your own keystore / truststores, etc., since this isn't a trivial process. The new configuration options should also be documented in `docs/configuration.md` alongside the other security configurations. In addition, the documentation should describe how the key stores are / aren't distributed depending on the choice of cluster manager. If this works in fundamentally different ways on different cluster managers, then the docs should make this clear so users know what to expect. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4074#discussion_r23139343 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { def first(): T = rdd.first() /** + * @return true if and only if the RDD contains no elements at all. Note that an RDD + * may be empty even when it has at least 1 partition. + */ + def isEmpty(): Boolean = rdd.isEmpty() --- End diff -- Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4074#issuecomment-70435986 LGTM @srowen - are you still working on it or is it good from your end? Will leave a bit of time for others to comment as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4042#issuecomment-70436107 Okay - @AdamGS thanks for sending this patch but I think we'll pass on adding this API. Overall we're pretty conservative with adding API's like this if there isn't a compelling reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4042#issuecomment-70436115 Let's close this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4920][UI]: back port the PR-3763 to bra...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/3768 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4094#issuecomment-70436485 Yeah, this has always been broken. What's even more confusing is what Hadoop actually does with this minSplits if you trace the code through Hadoop - I remember looking through it and the logic on the Hadoop side is really complicated. @idanz can you create a JIRA for this? Also, can you explain what Hadoop is actually doing with this parameter - IIRC it's not as simple as what it appears to be. An issue with changing this is that we could cause behavior to change in a very unexpected way for Hadoop RDD's. Right now this is effectively a no-op because it is almost always set to 2. I've only seen it affect things when someone is running a file in local mode that really could have been processed with a single spit. If we change it, it could affect user applications a bunch. For instance in a large cluster it will actually cause all reads of Hadoop files to be split over # cores tasks, even if there are just a small amount of data in the file. That might not be desirable. I wonder if we should just set it to 2 (i.e. hard code it) and just add a note saying it's set this way for legacy reasons, and that really users should pass in their own minSplits when creating a hadoopRDD if they want to control the read splits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4094#issuecomment-70436546 Here's some links to prior discussions of this: - https://issues.apache.org/jira/browse/SPARK-822 - https://github.com/mesos/spark/pull/718 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4096#issuecomment-70436617 @MechCoder Similar to #3791, this will significantly hurt performance. Having indices being nonnegative and ordered is a contract. If you want to ensure these, please use the factory method `Vectors.sparse(size, entries)` to construct a sparse vector. Do you minding close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org