[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/4307 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4344#issuecomment-72779281 LGTM too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784543 ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72785278 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72785441 [Test build #26713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26713/consoleFull) for PR 4350 at commit [`4c3913a`](https://github.com/apache/spark/commit/4c3913add23b39e4c5a5120d8a56917f972e1b4b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72786916 [Test build #26707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26707/consoleFull) for PR 4258 at commit [`73b719f`](https://github.com/apache/spark/commit/73b719f5bc7b69fca8d51cb8b991074dc92e50ed). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5380][GraphX] Solve an ArrayIndexOutOfB...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/4176#issuecomment-72786951 I wonder if stopping the process is the best solution. If there is only one illegal entry in a last line, we need to re-try loading a whole file, which is time-consuming. An other idea is that illegal entries are silently re-directed into a file or something. Finally, # of the re-directed entries is only output in the last phase of bulk-loading. Moreover, I think that it'd be better to add a new API to append these entries into existing Graph such as GraphOps.addEdges(val edges: RDD[Edge[ED]]). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...
Github user kul commented on the pull request: https://github.com/apache/spark/pull/4243#issuecomment-72786838 @marmbrus Thanks for review! Rebased against master and sqashed in a new commit renaming `schemaRDDOperations` to now more aptly called `dataFrameRDDOperations`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72786922 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26707/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/4354 [SPARK-5583][SQL][WIP] Support unique join in hive context Support unique join in hive context, the basic idea is transform unique join into outer join + filter in spark sql: FROM UNIQUEJOIN [PRESERVE] T1 a (a.key), [PRESERVE] T2 b (b.key), [PRESERVE] T3 c (c.key) ... If all the tables have PRESERVE keyword == T1 full out join T2 full out join T3 ... else If all the tables do not have PRESERVE keyword = T1 inner join T2 inner join T3 ... else == T = (T1 full out join T2 full out join T3 ...) Filter on T, filter condition = keep the rows with any preserve field is not null. for examples: 1 T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) == if b.key is not null or c.key is not null, we'll keep the row 2 T1 a (a.key), T2 b (b.key), PRESERVE T3 c (c.key) == if c.key is not null we'll keep the row Correct me if i am wrong. todos: add tests for this You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark unique-join Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4354.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4354 commit b7e89a94cbeddcb53aac779d4b9d7de2d94e0325 Author: wangfei wangf...@huawei.com Date: 2015-02-03T05:29:09Z support unique join in hive context --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Use HiveContext's sessionState in HiveMe...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/4355 [SQL] Use HiveContext's sessionState in HiveMetastoreCatalog.hiveDefaultTableFilePath `client.getDatabaseCurrent` uses SessionState's local variable which can be an issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark defaultTablePath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4355.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4355 commit 84a29e51b657f7b265f166a6ec25600c944cc440 Author: Yin Huai yh...@databricks.com Date: 2015-02-04T04:46:26Z Use HiveContext's sessionState instead of using SessionState's thread local variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72793042 Thanks for the detailed look @tdas! Think I addressed both nits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4289#issuecomment-7280 In general I think the change looks reasonable to me, and we'd better use the Hive `ObjectConverter` directly, and some of the code can be cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72779779 Why did you choose the parameters metadata.broker.list and the bootstrap.servers as the required kafka params? I looked at the Kafka docs, and it says that for consumers, the necessary properties are zookeeper.connect and group.id. And intuitively the application is consuming, so the consumer configs should apply (not group.id, but zookeeper.connect). So our interface should also require zookeeper.connect and not other two. Isnt it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72780813 [Test build #26707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26707/consoleFull) for PR 4258 at commit [`73b719f`](https://github.com/apache/spark/commit/73b719f5bc7b69fca8d51cb8b991074dc92e50ed). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72782334 [Test build #26701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26701/consoleFull) for PR 3798 at commit [`8c31855`](https://github.com/apache/spark/commit/8c31855cf6b7327c6b6611e715457ba15bb79355). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class DeterministicKafkaInputDStreamCheckpointData extends DStreamCheckpointData(this) ` * `class KafkaCluster(val kafkaParams: Map[String, String]) extends Serializable ` * ` case class LeaderOffset(host: String, port: Int, offset: Long)` * `class KafkaRDDPartition(` * `trait HasOffsetRanges ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72782343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26701/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72782219 After some more thought and testing, I don't know if it's safe to ignore task failures that are due to commits being denied, since doing so risks infinite rescheduling if all commits are denied. On the other hand, treating these as failures could lead to spurious job failures in cases where you have many copies of one slow, speculated task (the old behavior would treat these as successful task completions). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4147#issuecomment-72782236 [Test build #576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/576/consoleFull) for PR 4147 at commit [`33ac9bb`](https://github.com/apache/spark/commit/33ac9bb57f9e0e6a60e9ffd0eeeac7599aec8c49). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/3642#discussion_r24060542 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/ShortestPathsSuite.scala --- @@ -40,7 +40,7 @@ class ShortestPathsSuite extends FunSuite with LocalSparkContext { val graph = Graph.fromEdgeTuples(edges, 1) val landmarks = Seq(1, 4).map(_.toLong) val results = ShortestPaths.run(graph, landmarks).vertices.collect.map { -case (v, spMap) = (v, spMap.mapValues(_.get)) --- End diff -- If they are not ambiguous, I'd add the implicits back to make sure we never break. I added them back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72786346 [Test build #26716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26716/consoleFull) for PR 4216 at commit [`792e112`](https://github.com/apache/spark/commit/792e1121dff43e69c84fe9cfff4fc1be61ba2af5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class MasterStateResponse(` * `class LocalSparkCluster(` * ` * (4) the main class for the child` * ` case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])` * ` throw new SubmitRestMissingFieldException(Main class must be set in submit request.)` * `class SubmitRestProtocolException(message: String, cause: Exception = null)` * `class SubmitRestMissingFieldException(message: String) extends SubmitRestProtocolException(message)` * `abstract class SubmitRestProtocolMessage ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72786347 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26716/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r24061275 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -459,7 +461,41 @@ class LogisticRegressionSuite extends FunSuite with MLlibTestSparkContext with M // very steep curve in logistic function so that when we draw samples from distribution, it's // very easy to assign to another labels. However, this prediction result is consistent to R. validatePrediction(model.predict(validationRDD.map(_.features)).collect(), validationData, 0.47) + } + + test(model export/import) { --- End diff -- I'm reorganizing the code some to make it easier to keep exporters for each version. It should be pretty maintainable and will allow for better testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SPARK-5251][SQL] Allow table name...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4062#issuecomment-72786283 [Test build #26715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26715/consoleFull) for PR 4062 at commit [`057d23e`](https://github.com/apache/spark/commit/057d23e2223f7dd0a2d5cde5e3b5f0d47df59059). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SPARK-5251][SQL] Allow table name...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4062#issuecomment-72791305 [Test build #26715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26715/consoleFull) for PR 4062 at commit [`057d23e`](https://github.com/apache/spark/commit/057d23e2223f7dd0a2d5cde5e3b5f0d47df59059). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72791644 [Test build #26717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26717/consoleFull) for PR 4216 at commit [`c643f64`](https://github.com/apache/spark/commit/c643f646ce2cab7fa76a69fa64f9c6a5320111d1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class MasterStateResponse(` * `class LocalSparkCluster(` * ` * (4) the main class for the child` * ` case class BoundPortsResponse(actorPort: Int, webUIPort: Int, restPort: Option[Int])` * ` throw new SubmitRestMissingFieldException(Main class must be set in submit request.)` * `class SubmitRestProtocolException(message: String, cause: Exception = null)` * `class SubmitRestMissingFieldException(message: String) extends SubmitRestProtocolException(message)` * `abstract class SubmitRestProtocolMessage ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24057600 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -315,9 +335,23 @@ private[hive] object HadoopTableReader extends HiveInspectors { } } +val partTblObjectInspectorConverter = ObjectInspectorConverters.getConverter( + deserializer.getObjectInspector, soi) + // Map each tuple to a row object iterator.map { value = - val raw = deserializer.deserialize(value) + val raw = convertdeserializer match { --- End diff -- In general, we'd better not to do the pattern matching within the iterator, and we can do that like: ` xx match { case xxx = iterator.map { ... } case yyy = iterator.map { ... } } ``` For this case, as I shown above, if we passed the converter directly into `fillObject`, I don't think we need the pattern match here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4147#issuecomment-72775916 @kayousterhout done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72775833 [Test build #26701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26701/consoleFull) for PR 3798 at commit [`8c31855`](https://github.com/apache/spark/commit/8c31855cf6b7327c6b6611e715457ba15bb79355). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72777123 Ohh I meant createStream -- createDirectStream. I would have preferred something like createReceiverLessStream but thats a mouthful. I think direct is something that comes close without being a mouthful. Had not occurred to me until Patrick suggested it. And the underlying assumptions, I confess are not super concrete. Somethings like binary compatiblity issues (ex, do not use scala traits with implemented methods) are fairly concrete, where as things about API elegance (e.g. rdd.asInstanceOf[KafkaRDD] vs rdd.asInstanceOf[HasOffsetRanges]) are a little fuzzy and opinions vary from person to person. Often what seems intuitive to me is not intuitive to someone else, even within the key committers like Patrick, Michael, Matei, etc. We usually argue about this in design docs, get as many eyeballs as possible, and try to reach a consensus. Its is indeed a bit fuzzy, but its all towards making the API that we *think* will be the best for the developers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72777210 [Test build #26704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26704/consoleFull) for PR 3642 at commit [`914b2d6`](https://github.com/apache/spark/commit/914b2d6a65afe19b582b436ed1eb6501d5c16db3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72778614 [Test build #26706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26706/consoleFull) for PR 3798 at commit [`59e29f6`](https://github.com/apache/spark/commit/59e29f61cd6a730eeea4e47a5316cbbe47615618). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-5577] Python udf for DataFrame
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4351#issuecomment-72778536 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26703/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72780349 High level consumers connect to ZK. Simple consumers (which is what this is using) connect to brokers directly instead. See https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example I chose to accept either of the two existing means in Kafka of specifying a list of seed brokers, rather than making up yet a third way On Tue, Feb 3, 2015 at 8:36 PM, Tathagata Das notificati...@github.com wrote: Why did you choose the parameters metadata.broker.list and the bootstrap.servers as the required kafka params? I looked at the Kafka docs, and it says that for consumers, the necessary properties are zookeeper.connect and group.id. And intuitively the application is consuming, so the consumer configs should apply (not group.id, but zookeeper.connect). So our interface should also require zookeeper.connect and not other two. Isnt it? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-72779779. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4345#issuecomment-72781888 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26698/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/1767#issuecomment-72782993 What't the status of this patch? If possibly merged into the master, I'll refactor the codes and add unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72784748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72784745 [Test build #26706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26706/consoleFull) for PR 3798 at commit [`59e29f6`](https://github.com/apache/spark/commit/59e29f61cd6a730eeea4e47a5316cbbe47615618). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class DirectKafkaInputDStreamCheckpointData extends DStreamCheckpointData(this) ` * `class KafkaCluster(val kafkaParams: Map[String, String]) extends Serializable ` * ` case class LeaderOffset(host: String, port: Int, offset: Long)` * `class KafkaRDDPartition(` * `trait HasOffsetRanges ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784728 @harishreedharan This begs a higher level questions of whether the write ahead log (which is the probably component to fail) should have its own retries independent of the receiver retrying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72786661 [Test build #26717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26717/consoleFull) for PR 4216 at commit [`c643f64`](https://github.com/apache/spark/commit/c643f646ce2cab7fa76a69fa64f9c6a5320111d1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72787965 I think the simplest solution is to assign zookeeper.connect. But you are assigning it in KafkaCluster lines 338 - 345. So why is this warning being thrown? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72788058 Ok I'm going to merge this. Thanks for working on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72789745 [Test build #26713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26713/consoleFull) for PR 4350 at commit [`4c3913a`](https://github.com/apache/spark/commit/4c3913add23b39e4c5a5120d8a56917f972e1b4b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72789756 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26713/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72789875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26710/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72789850 Hi @tdas , should we add a example to show users how to use this new Kafka API correctly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Use HiveContext's sessionState in HiveMe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4355#issuecomment-72790535 [Test build #26724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26724/consoleFull) for PR 4355 at commit [`84a29e5`](https://github.com/apache/spark/commit/84a29e51b657f7b265f166a6ec25600c944cc440). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72791220 Holy crap! Dont bother about this at all. This can wait. I hope everything is okay. Take care and all the best! On Feb 3, 2015 8:45 PM, Cody Koeninger notificati...@github.com wrote: The warning is for metadata.broker.list, since its not expected by the existing ConsumerConfig (its used by other config classes) Couldn't get subclassing to work, the verifiedproperties class it uses is very dependent on order of operations during construction. I think the simplest thing is a class that is constructed using kafkaparams, and uses the static defaults from the ConsumerConfig object. I'm currently waiting in an ER with my child with a 105 fever, so won't be getting to it for a few hours to say the least. On Feb 3, 2015 10:15 PM, Tathagata Das notificati...@github.com wrote: I think the simplest solution is to assign zookeeper.connect. But you are assigning it in KafkaCluster lines 338 - 345. So why is this warning being thrown? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-72787965. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-72790044. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SPARK-5251][SQL] Allow table name...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4062#issuecomment-72791312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26715/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4354#issuecomment-72792534 Do you mind adding more inline comment? My worry is just complexity. If nobody uses this, it's going to be a bunch of code there that for the sake of supporting a thing in Hive. Do any other database systems support this unique join syntax? (Or something similar) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4348#issuecomment-72794234 This select() and filter() in Python do not support yet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4348#discussion_r24063817 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala --- @@ -179,10 +179,20 @@ private[sql] class DataFrameImpl protected[sql]( select((col +: cols).map(Column(_)) :_*) } + override def selectExpr(exprs: String*): DataFrame = { --- End diff -- I think this one could be merged into select(), column is also a valid expression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24064149 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli classOf[LongWritable], classOf[BytesWritable], conf=conf) -val data = br.map{ case (k, v) = v.getBytes} +val data = br.map { case (k, v) = + val bytes = v.getBytes + assert(bytes.length == recordLength, Byte array does not have correct length) + bytes --- End diff -- I meant should the user be told that the system can throw error when the records are not of the expected size. I dont have any strong feeling on this, just wondering. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72778202 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26699/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72778196 [Test build #26699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26699/consoleFull) for PR 4350 at commit [`4c3913a`](https://github.com/apache/spark/commit/4c3913add23b39e4c5a5120d8a56917f972e1b4b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72779615 Yeah, there's a weird distinction in Kafka between simple consumers and high level consumers in that they have a lot of common configuration parameters, but one of them talks directly to brokers and the other goes through zk. I'll see if I can make a private subclass of ConsumerConfig to shut that warning up. On Tue, Feb 3, 2015 at 8:28 PM, Tathagata Das notificati...@github.com wrote: Hey Cody, I was trying it and I found a odd behavior. It was printing this repeatedly. 15/02/03 18:22:08 WARN VerifiableProperties: Property metadata.broker.list is not valid I was using this code. val kafkaParams = Map[String, String](metadata.broker.list - brokerList) val lines = KafkaUtils.createNewStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet) I chose metadata.broker.list from the code in KafkaCluster, because without that I was getting exception from the KafkaCluster. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-72779120. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72780420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5379][Streaming] Add awaitTerminationOr...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/4171#issuecomment-72785026 Also could could update the python API as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5379][Streaming] Add awaitTerminationOr...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/4171#issuecomment-72784927 Please add unit tests for this behavior! It should be in StreamingContextSuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5582] [history] Ignore empty log direct...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/4352 [SPARK-5582] [history] Ignore empty log directories. Empty log directories are not useful at the moment, but if one ends up showing in the log root, it breaks the code that checks for log directories. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-5582 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4352.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4352 commit 1a6a3d45c64276ab6cb14341e57cbc8d397a1afc Author: Marcelo Vanzin van...@cloudera.com Date: 2015-02-04T03:18:26Z [SPARK-5582] Fix exception when looking at empty directories. Empty log directories are not useful at the moment, but if one ends up showing in the log root, it breaks the code that checks for log directories. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72786281 [Test build #26716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26716/consoleFull) for PR 4216 at commit [`792e112`](https://github.com/apache/spark/commit/792e1121dff43e69c84fe9cfff4fc1be61ba2af5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4243#issuecomment-72787085 [Test build #26718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26718/consoleFull) for PR 4243 at commit [`2390fba`](https://github.com/apache/spark/commit/2390fba337e80eb63fa25b0c4fa6adc9945b6d2d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4348#issuecomment-72788843 [Test build #26723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26723/consoleFull) for PR 4348 at commit [`2baeef2`](https://github.com/apache/spark/commit/2baeef2f4035bad7aa829cf52fc338245f52fafd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72790044 The warning is for metadata.broker.list, since its not expected by the existing ConsumerConfig (its used by other config classes) Couldn't get subclassing to work, the verifiedproperties class it uses is very dependent on order of operations during construction. I think the simplest thing is a class that is constructed using kafkaparams, and uses the static defaults from the ConsumerConfig object. I'm currently waiting in an ER with my child with a 105 fever, so won't be getting to it for a few hours to say the least. On Feb 3, 2015 10:15 PM, Tathagata Das notificati...@github.com wrote: I think the simplest solution is to assign zookeeper.connect. But you are assigning it in KafkaCluster lines 338 - 345. So why is this warning being thrown? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3798#issuecomment-72787965. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72790079 [Test build #26712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26712/consoleFull) for PR 4066 at commit [`3969f5f`](https://github.com/apache/spark/commit/3969f5f27f85e1092c8271d575e23cc834ca9ffb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class CommitDeniedException(` * ` class OutputCommitCoordinatorActor(outputCommitCoordinator: OutputCommitCoordinator)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72790083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26712/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5582] [history] Ignore empty log direct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4352#issuecomment-72790111 [Test build #26711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26711/consoleFull) for PR 4352 at commit [`1a6a3d4`](https://github.com/apache/spark/commit/1a6a3d45c64276ab6cb14341e57cbc8d397a1afc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5484] Checkpoint every 25 iterations in...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/4273#issuecomment-72790567 How about adding a new configuration, e.g., spark.graphx.pregel.checkpoint.interval in SparkConf? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4348#issuecomment-72796410 [Test build #26723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26723/consoleFull) for PR 4348 at commit [`2baeef2`](https://github.com/apache/spark/commit/2baeef2f4035bad7aa829cf52fc338245f52fafd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4348#issuecomment-72796417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26723/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-5577] Python udf for DataFrame
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4351 [WIP] [SPARK-5577] Python udf for DataFrame You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark python_udf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4351.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4351 commit 3ab26614b5278edce6e8571e5c51fe0b67e3124e Author: Davies Liu dav...@databricks.com Date: 2015-02-03T08:08:00Z add more tests for DataFrame commit 6040ba73431cc22d8d777555db6b35241275bdce Author: Davies Liu dav...@databricks.com Date: 2015-02-03T09:09:36Z fix docs commit 9ab78b4262961deafe0256c8c28d2911a4c07b0a Author: Davies Liu dav...@databricks.com Date: 2015-02-03T09:10:54Z Merge branch 'master' of github.com:apache/spark into fix_df Conflicts: sql/core/src/main/scala/org/apache/spark/sql/Column.scala commit 78ebcfa6ba750e081f6b5c7b07c8d04f32c2d4d6 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T09:12:02Z add sql_test.py in run_tests commit 35ccb9f5721266a3a25df7e5f6d4b2c98f5f18d5 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T09:23:16Z fix build commit 8dd19a912e8595dddeec56fea964ab40b5b9f738 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T18:00:04Z fix tests in python 2.6 commit c052f6fe0aaaf688a8f08e0fe04abdeea8933448 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T18:44:36Z Merge branch 'master' of github.com:apache/spark into fix_df commit 83c92fedc4f69dfff909d61899c906cea357498f Author: Davies Liu dav...@databricks.com Date: 2015-02-03T20:21:08Z address comments commit 467332cacca8754f04271a70bbaf15c8f2afd5c6 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T20:34:16Z support string in cast() commit dd9919f115d3b8f4b66d213c4a57bc832ed8ed57 Author: Davies Liu dav...@databricks.com Date: 2015-02-03T22:17:09Z fix tests commit 1e4766485b20629a9cee12fc1c4751fc427cc569 Author: Davies Liu dav...@databricks.com Date: 2015-02-04T01:24:15Z Merge branch 'master' of github.com:apache/spark into python_udf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72780049 [Test build #26700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26700/consoleFull) for PR 4066 at commit [`97da5fe`](https://github.com/apache/spark/commit/97da5feb6fe49255afaac1dc9d5db1edf8c1ff42). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class CommitDeniedException(` * ` class OutputCommitCoordinatorActor(outputCommitCoordinator: OutputCommitCoordinator)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4147#issuecomment-72782830 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26702/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4147#issuecomment-72782813 [Test build #26702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26702/consoleFull) for PR 4147 at commit [`2acdf9d`](https://github.com/apache/spark/commit/2acdf9d1eb6034581eb33ef3df1c8fc652bf325a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72783410 [Test build #26704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26704/consoleFull) for PR 3642 at commit [`914b2d6`](https://github.com/apache/spark/commit/914b2d6a65afe19b582b436ed1eb6501d5c16db3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-72784310 Aah cool. However 0.8.1 and 0.8.2 have pretty big changes between them, so lets merge this for the next release. We are already doing a lot of experimental Kafka stuff in this release (feature merge window has closed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3039] [BUILD] Spark assembly for new ha...
Github user medale commented on the pull request: https://github.com/apache/spark/pull/4315#issuecomment-72785613 The problem was that the Spark project hive-exec 0.13.1a depends on ``` dependency groupIdorg.apache.avro/groupId artifactIdavro-mapred/artifactId version${avro.version}/version /dependency ``` (see http://central.maven.org/maven2/org/spark-project/hive/hive-exec/0.13.1a/hive-exec-0.13.1a.pom) Its parent defines avro.version as 1.7.5 avro.version1.7.5/avro.version (see http://central.maven.org/maven2/org/spark-project/hive/hive/0.13.1a/hive-0.13.1a.pom) The only place hive-exec is being used as a dependency is in: find . -name pom.xml | xargs grep hive-exec pom.xml (where we define it in dependencyManagement section) sql/hive/pom.xml (in actual dependencies) In sql/hive/pom.xml we also explicitly have dependency on: ``` dependency groupIdorg.apache.avro/groupId artifactIdavro-mapred/artifactId classifier${avro.mapred.classifier}/classifier /dependency ``` Therefore if we choose a profile that does not define avro.mapred.classifier this field is left empty (see main pom.xml avro.mapred.classifier/avro.mapred.classifier). We pull: avro-mapred-1.7.6.jar (exact same as avro-mapred-1.7.6-hadoop1.jar) as it should be. If we choose a profile like hadoop-2.4 we set it to hadoop2 and pull: avro-mapred-1.7.6-hadoop2.jar as it should be. ``` profile idhadoop-2.4/id properties hadoop.version2.4.0/hadoop.version protobuf.version2.5.0/protobuf.version jets3t.version0.9.0/jets3t.version hbase.version0.98.7-hadoop2/hbase.version commons.math3.version3.1.1/commons.math3.version avro.mapred.classifierhadoop2/avro.mapred.classifier /properties /profile ``` However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: ``` dependency groupIdorg.apache.avro/groupId artifactIdavro-mapred/artifactId version${avro.version}/version classifier${avro.mapred.classifier}/classifier scope${hive.deps.scope}/scope ``` That scope is in main pom.xml: hive.deps.scopecompile/hive.deps.scope However, with changes in 1.3.0-SNAPSHOT the avro-mapred's scope is newly defined as: ``` dependency groupIdorg.apache.avro/groupId artifactIdavro-mapred/artifactId version${avro.version}/version classifier${avro.mapred.classifier}/classifier scope${hive.deps.scope}/scope ``` That scope is in main pom.xml: hive.deps.scopecompile/hive.deps.scope assembly/pom.xml:hive.deps.scopeprovided/hive.deps.scope examples/pom.xml:hive.deps.scopeprovided/hive.deps.scope Same for hive-exec. So competing avro-mapred classes will no longer be included in the spark-assembly.jar. They are not included on the Hadoop classpath (only Avro), so they need to be supplied by the job. That will be new for Avro users. But excluding the hive-exec dependency and explicitly specifying avro-mapred to be only 1.7.6 with the correct classifier will be necessary if anything like maven enforcer is ever run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24061014 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli classOf[LongWritable], classOf[BytesWritable], conf=conf) -val data = br.map{ case (k, v) = v.getBytes} +val data = br.map { case (k, v) = + val bytes = v.getBytes + assert(bytes.length == recordLength, Byte array does not have correct length) + bytes --- End diff -- nit: Is this something that the user should be made aware of in the docs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4354#issuecomment-72788451 [Test build #26721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26721/consoleFull) for PR 4354 at commit [`015fe2f`](https://github.com/apache/spark/commit/015fe2f7fede76ef25102f1dc928ee5c57c6d167). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4354#issuecomment-72792001 [Test build #26722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26722/consoleFull) for PR 4354 at commit [`dd34ebf`](https://github.com/apache/spark/commit/dd34ebf60295046cb1bc82862b6c0a86ce2f8837). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24063184 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala --- @@ -210,6 +211,20 @@ class JavaStreamingContext(val ssc: StreamingContext) extends Closeable { } /** + * :: Experimental :: + * + * Create an input stream that monitors a Hadoop-compatible filesystem + * for new files and reads them as flat binary files with fixed record lengths, + * yielding byte arrays + * @param directory HDFS directory to monitor for new files + * @param recordLength The length at which to split the records + */ --- End diff -- Thanks, added! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4243#issuecomment-72792070 [Test build #26718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26718/consoleFull) for PR 4243 at commit [`2390fba`](https://github.com/apache/spark/commit/2390fba337e80eb63fa25b0c4fa6adc9945b6d2d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4243#issuecomment-72792072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26718/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4354#issuecomment-72792006 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26722/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5583][SQL][WIP] Support unique join in ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4354#issuecomment-72792978 It seems this is hive specified syntax as far as i know... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24063473 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli classOf[LongWritable], classOf[BytesWritable], conf=conf) -val data = br.map{ case (k, v) = v.getBytes} +val data = br.map { case (k, v) = + val bytes = v.getBytes + assert(bytes.length == recordLength, Byte array does not have correct length) + bytes --- End diff -- Do you mean something more than these notes we're adding? I just clarified the notes a bit to make it obvious the check is on the byte array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-5577] Python udf for DataFrame
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4351#discussion_r24063741 --- Diff: python/pyspark/sql.py --- @@ -2263,18 +2263,6 @@ def subtract(self, other): return DataFrame(getattr(self._jdf, except)(other._jdf), self.sql_ctx) -def sample(self, withReplacement, fraction, seed=None): --- End diff -- there are two sample(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5579][SQL][DataFrame] Support for proje...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4348#discussion_r24063992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala --- @@ -179,10 +179,20 @@ private[sql] class DataFrameImpl protected[sql]( select((col +: cols).map(Column(_)) :_*) } + override def selectExpr(exprs: String*): DataFrame = { --- End diff -- It should work in these cases with this implementation. ``` select('a', '`the name`', 'a + 1', 'min(b) * 3') ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5379][Streaming] Add awaitTerminationOr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4171#issuecomment-72796748 [Test build #26726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26726/consoleFull) for PR 4171 at commit [`c9e660b`](https://github.com/apache/spark/commit/c9e660b4c8e4547a16c00364fa7baa2a40536345). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5379][Streaming] Add awaitTerminationOr...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/4171#issuecomment-72797695 LGTM, will merge when tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5574] use given name prefix in dir
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4344#issuecomment-72798456 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24057968 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala --- @@ -242,6 +242,11 @@ private[hive] object HiveShim { } } + // make getConvertedOI compatible between 0.12.0 and 0.13.1 + def getConvertedOI(inputOI: ObjectInspector, outputOI: ObjectInspector): ObjectInspector = { +ObjectInspectorConverters.getConvertedOI(inputOI, outputOI, new java.lang.Boolean(true)) --- End diff -- Just `true` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72777921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26695/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/4147#issuecomment-72777970 LGTM; I'll merge this as soon as tests pass. @tdas @pwendell this is fine with me to merge into 1.2 (although I realize it won't make it until 1.2.2); does that seem ok with you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72777913 [Test build #26695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26695/consoleFull) for PR 4066 at commit [`97da5fe`](https://github.com/apache/spark/commit/97da5feb6fe49255afaac1dc9d5db1edf8c1ff42). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class CommitDeniedException(` * ` class OutputCommitCoordinatorActor(outputCommitCoordinator: OutputCommitCoordinator)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-72778669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26697/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [FIX][MLLIB] fix seed handling in Python GMM
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4349#issuecomment-72778764 [Test build #26696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26696/consoleFull) for PR 4349 at commit [`3be5926`](https://github.com/apache/spark/commit/3be592612f9e4b5b6a1fbc2bf84ac006fa223bfb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [FIX][MLLIB] fix seed handling in Python GMM
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4349#issuecomment-72778768 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26696/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-72778662 [Test build #26697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26697/consoleFull) for PR 4068 at commit [`340223d`](https://github.com/apache/spark/commit/340223d44fce76096e9952cc8aab9eb46ff9d1f8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Dsl(object):` * `class ExamplePointUDT(UserDefinedType):` * `class SQLTests(ReusedPySparkTestCase):` * `case class UnresolvedGetField(child: Expression, fieldName: String) extends UnaryExpression ` * `case class GetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72780055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26700/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org