[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67319/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67319/consoleFull)**
 for PR 15575 at commit 
[`2eee2ee`](https://github.com/apache/spark/commit/2eee2eea81d665706a76a5614d26b4401a07f127).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84422357
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+class OfferState(val workOffer: WorkerOffer) {
+  /** The current remaining cores that can be allocated to tasks. */
+  var coresAvailable: Int = workOffer.cores
+  /** The list of tasks that are assigned to this WorkerOffer. */
+  val tasks = new ArrayBuffer[TaskDescription](coresAvailable)
+}
+
+/**
+ * TaskAssigner is the base class for all task assigner implementations, 
and can be
+ * extended to implement different task scheduling algorithms.
+ * Together with [[org.apache.spark.scheduler.TaskScheduler 
TaskScheduler]], TaskAssigner
+ * is used to assign tasks to workers with available cores. Internally, 
when TaskScheduler
+ * perform task assignment given available workers, it first sorts the 
candidate tasksets,
+ * and then for each taskset, it takes a number of rounds to request 
TaskAssigner for task
+ * assignment with different locality restrictions until there is either 
no qualified
+ * workers or no valid tasks to be assigned.
+ *
+ * TaskAssigner is responsible to maintain the worker availability state 
and task assignment
+ * information. The contract between 
[[org.apache.spark.scheduler.TaskScheduler TaskScheduler]]
+ * and TaskAssigner is as follows.
+ *
+ * First, TaskScheduler invokes construct() of TaskAssigner to initialize 
the its internal
+ * worker states at the beginning of resource offering.
+ *
+ * Second, before each round of task assignment for a taskset, 
TaskScheduler invoke the init()
--- End diff --

`invoke` -> `invokes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15568: [SPARK-18028][SQL] simplify TableFileCatalog

2016-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15568
  
@mallman , item 4 is a potential problem in the future. The current 
workflow is, we get the `MetastoreRelation` via 
`HiveMetastoreCatalog.lookupRelation`, which always lower case the database and 
table name. Then we construct `TableFileCatalog` and call 
`ExternalCatalog.listPartitionsByFilter` with the database and table name. So 
we won't have case senstivity problem here.

However, we may have a workflow which call `listPartitionsByFilter` 
directly with database and table name given by users. e.g. #15302. Then we need 
to care about case sensitivity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14847
  
**[Test build #67324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67324/consoleFull)**
 for PR 14847 at commit 
[`3c7c1b5`](https://github.com/apache/spark/commit/3c7c1b506b135a1b980d19d98aaa9e41c4fa9018).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84422079
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+class OfferState(val workOffer: WorkerOffer) {
+  /** The current remaining cores that can be allocated to tasks. */
+  var coresAvailable: Int = workOffer.cores
+  /** The list of tasks that are assigned to this WorkerOffer. */
+  val tasks = new ArrayBuffer[TaskDescription](coresAvailable)
+}
+
+/**
+ * TaskAssigner is the base class for all task assigner implementations, 
and can be
+ * extended to implement different task scheduling algorithms.
+ * Together with [[org.apache.spark.scheduler.TaskScheduler 
TaskScheduler]], TaskAssigner
+ * is used to assign tasks to workers with available cores. Internally, 
when TaskScheduler
+ * perform task assignment given available workers, it first sorts the 
candidate tasksets,
--- End diff --

`perform` -> `performs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-20 Thread zhzhan
Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/15541
  
@gatorsmile I didn't see your new comments 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84421949
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+class OfferState(val workOffer: WorkerOffer) {
--- End diff --

Is this class private to `scheduler`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67323/consoleFull)**
 for PR 13891 at commit 
[`dc4f4ba`](https://github.com/apache/spark/commit/dc4f4badba26635aa95c6ade5a589d4bd50ae886).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15568: [SPARK-18028][SQL] simplify TableFileCatalog

2016-10-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15568#discussion_r84421537
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala
 ---
@@ -102,6 +95,13 @@ class TableFileCatalog(
   }
 
   override def inputFiles: Array[String] = allPartitions.inputFiles
+
+  override def equals(o: Any): Boolean = o match {
--- End diff --

Under hive context, we will cache the `LogicalRelation` for every data 
source table(including converted from hive), which means every table will 
always have a `TableFileCatalog` of same instance.

However, it's not true in sql core. We will re-construct the 
`TableFileCatalog` and `LogicalRelation` everytime we look up a table. Thus we 
may encounter cache miss even if the table is cached, because 
`TableFileCatalog` of difference instances never equal to each other.

Although it's not a real problem now, I think it's reasonable to follow 
`ListFileCatalong` and add the `equals` and `hashCode`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-20 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/15319
  
Thanks @jiangxb1987, this equivalence class approach looks pretty solid. 
I'll take a closer look tomorrow!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-20 Thread hqzizania
Github user hqzizania commented on the issue:

https://github.com/apache/spark/pull/13891
  
@yanboliang So sorry for my late response.

Some regression performance test results:
Datasets: using 
[genExplicitTestData](https://github.com/apache/spark/pull/13891/files#diff-2c1dd810bcfabf444a0ecdcd613e954aR393)
 to generate with numUsers = 2, numItems = 2000
Single-node cluster: 16 physical cores, 100GB memory
ALS: numUserBlocks = 30, numItemBlocks = 30
It will run 
[computeFactors](https://github.com/apache/spark/pull/13891/files#diff-be65dd1d6adc53138156641b610fcadaR1291)
 with 30 partitions in parallel.


ALS: rank = 1024
Computing time and used memory for computeFactors:

![image](https://cloud.githubusercontent.com/assets/9315372/19587668/cb0065a8-9792-11e6-8226-b5a448a8dc9a.png)


=
ALS: rank = 129 
Computing time for computeFactors:

![image](https://cloud.githubusercontent.com/assets/9315372/19587672/d0197d54-9792-11e6-86df-b260d165c8e6.png)



=
ALS: rank = 512
Computing time for computeFactors:

![image](https://cloud.githubusercontent.com/assets/9315372/19587680/d542783a-9792-11e6-95a2-c2001eb59853.png)



The results shows this patch makes it faster very much when rank is large, 
but we should reset the two threshold values of "doStack".
However, a following problem is that the unit test for this patch will take 
much time as rank must be larger than 1024. Should I just remove the unit test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15576
  
Thanks for fixing this! I encountered this issue before. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15541
  
Accidentally, I deleted all my comments. You might need to check the emails 
to find all my comments. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-20 Thread sheepduke
Github user sheepduke commented on the issue:

https://github.com/apache/spark/pull/15579
  
This is rather useful sometimes because you wan tto add some extra tuning 
arguments like 'numactl'.
Otherwise it is not even possible to achieve that.

Yes it only works with YARN for now because we have requirements on it.

In future more features may be added to other facilities.

Any idea or potential improvement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67318/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67318/consoleFull)**
 for PR 15575 at commit 
[`c5b6626`](https://github.com/apache/spark/commit/c5b6626c144a97054f74adc3158ceea6de3cea58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15579
  
Please also review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Thanks a lot!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15579
  
This seems really weird to do (and also only works in YARN).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67321/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15576
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15576
  
**[Test build #67321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67321/consoleFull)**
 for PR 15576 at commit 
[`1c9bd2e`](https://github.com/apache/spark/commit/1c9bd2ef353802132e33558a61725efdd8bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Oh, no. I will try to test each when writing the documentation. Please 
ignore minor incorrectness here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15513
  
Do we have binary literals anyway?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Then I will do as below:


**Literal only**

```
a string literal.
a numeric literal that defines ...
a binary literal that represents ... For example, ...
```

**Column**

```
a timestamp expression.
a binary expression that represents ...
a date expression that defines ... For example, ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Sure, sounds great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15579
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15579: Added support for extra command in front of spark...

2016-10-20 Thread sheepduke
GitHub user sheepduke opened a pull request:

https://github.com/apache/spark/pull/15579

Added support for extra command in front of spark.

## What changes were proposed in this pull request?

A minor functional change is added into yarn facility to make it possible 
for users to add some command prefix before spark. Such as "numactl 
--cpunodebind=1 ...". This makes it useful in some cases to do some 
optimization work.

## How was this patch tested?
Since it is only a minor functional patch, it is re-compiled and tested on 
our cluster with BigBench to make sure that it works properly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sheepduke/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15579


commit a24aff91d59dd848cdd5802602b340db832e5507
Author: U-CCR\daianyue 
Date:   2016-10-21T03:15:46Z

Added support for extra command in front of spark.
Such as numactl etc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15513
  
How about "a string literal" vs "a string expression"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Also, will try to consolidate multiple usages and take out `_FUNC_` in 
extended part too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
@rxin, How about this?

**Literal only**

```
a string type value.
a numeric type value that defines ...
a binary type value that defines ... For example, ...
```

**Column**

```
a timestamp type column/value.
a binary type column/value that represents ...
a date type column/value that represents ... For example, ...
```

(BTW, I am sure you meant to not mention about implicitly casting with 
knowing this but I am a little bit worried. This was actually taken after 
Oracle[1]. Will anyway try to take out this but I wanted to just note this just 
in case.)

[1] 
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions011.htm#i82074
> any numeric datatype or any nonnumeric datatype that can be implicitly 
converted to a numeric datatype.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15551: [SPARK-18012][SQL] Simplify WriterContainer

2016-10-20 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15551
  
@rxin : Thanks for notifying me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15569: [SPARK-18029][SQL] PruneFileSourcePartitions should not ...

2016-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15569
  
thanks for the review, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15569: [SPARK-18029][SQL] PruneFileSourcePartitions shou...

2016-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15569


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15481
  
**[Test build #67322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67322/consoleFull)**
 for PR 15481 at commit 
[`7bf3bf8`](https://github.com/apache/spark/commit/7bf3bf8606f261e2eb1bebd3c6e3c3ff8600e140).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15481
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15576
  
**[Test build #67321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67321/consoleFull)**
 for PR 15576 at commit 
[`1c9bd2e`](https://github.com/apache/spark/commit/1c9bd2ef353802132e33558a61725efdd8bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15575
  
@yhuai : please see the table in this PRs description. I have added a 
`comment` (last column) for each entry to point out those cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15575
  
> I felt that there are numerous places where child's output ordering could 
be used but the operators don't set it

Can you list them at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [SPARK-17674][SPARKR] check for warning in test output

2016-10-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15576
  
Rebased. This should pass now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15576: [WIP][SPARK-17674][SPARKR] check for warning in test out...

2016-10-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15576
  
Test failure is intentional, it's picking up the following warnings:

```
Warnings 
---
1. createDataFrame uses files for large objects (@test_sparkSQL.R#215) - 
Use Sepal_Length instead of Sepal.Length  as column name

2. createDataFrame uses files for large objects (@test_sparkSQL.R#215) - 
Use Sepal_Width instead of Sepal.Width  as column name

3. createDataFrame uses files for large objects (@test_sparkSQL.R#215) - 
Use Petal_Length instead of Petal.Length  as column name

4. createDataFrame uses files for large objects (@test_sparkSQL.R#215) - 
Use Petal_Width instead of Petal.Width  as column name
```

Which is fixed in PR #15560 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15560: [SPARKR] fix warnings

2016-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15560


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15578: Branch 2.0

2016-10-20 Thread wankunde
Github user wankunde closed the pull request at:

https://github.com/apache/spark/pull/15578


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15560: [SPARKR] fix warnings

2016-10-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15560
  
merged to master and branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67320/consoleFull)**
 for PR 15575 at commit 
[`cfcd2a7`](https://github.com/apache/spark/commit/cfcd2a79231ed16c2a3e82551e87a6de888c2af6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15575
  
Agree with the planner behavior described in the last few comments 
(relevant code : 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L165)

I have updated the description of the PR with a table which shows the 
output partitioning and ordering for each implementation of `UnaryExecNode`. It 
will help with the review in case we are missing something / setting something 
wrong.

`BroadcastExchangeExec `, `CoalesceExec `, `CollectLimitExec `, 
`ExpandExec`, `TakeOrderedAndProjectExec` and `ShuffleExchange` do not use 
child's output partitioning.

I felt that there are numerous places where child's output ordering could 
be used but the operators don't set it (thus using default `Nil`). I won't made 
those modifications in this PR as I want to only do the refac in this diff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67319/consoleFull)**
 for PR 15575 at commit 
[`2eee2ee`](https://github.com/apache/spark/commit/2eee2eea81d665706a76a5614d26b4401a07f127).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67317/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15577: [SPARK-18030][Tests] Adds more checks to collect ...

2016-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15577


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #67317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67317/consoleFull)**
 for PR 14079 at commit 
[`6b3babc`](https://github.com/apache/spark/commit/6b3babc703668e32f3ec8d5026baa4f2b161a383).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class YarnSchedulerBackendSuite extends SparkFunSuite with 
MockitoSugar with LocalSparkContext `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15577: [SPARK-18030][Tests] Adds more checks to collect more in...

2016-10-20 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15577
  
Thanks! Merging to master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67314/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15513
  
**[Test build #67314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67314/consoleFull)**
 for PR 15513 at commit 
[`7841860`](https://github.com/apache/spark/commit/7841860bf50fba8b31da774574ad9818ee678d85).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67315/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15513
  
**[Test build #67315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67315/consoleFull)**
 for PR 15513 at commit 
[`01eecfe`](https://github.com/apache/spark/commit/01eecfe44c5edc3db0d25f43a7ae8f80ca07ac61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-20 Thread zhzhan
Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/15541
  
@rxin Can you please take a look, and let me know if you have any concern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67313/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15513
  
**[Test build #67313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67313/consoleFull)**
 for PR 15513 at commit 
[`91d2ab5`](https://github.com/apache/spark/commit/91d2ab5174273623819a6b18f0d2d557c54603f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15578: Branch 2.0

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15578
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15578: Branch 2.0

2016-10-20 Thread wankunde
GitHub user wankunde opened a pull request:

https://github.com/apache/spark/pull/15578

Branch 2.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wankunde/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15578.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15578


commit 72d9fba26c19aae73116fd0d00b566967934c6fc
Author: WeichenXu 
Date:   2016-09-22T11:35:54Z

[SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for 
AFTSurvivalRegression

## What changes were proposed in this pull request?

Add treeAggregateDepth parameter for AFTSurvivalRegression to keep 
consistent with LiR/LoR.

## How was this patch tested?

Existing tests.

Author: WeichenXu 

Closes #14851 from 
WeichenXu123/add_treeAggregate_param_for_survival_regression.

commit 8a02410a92429bff50d6ce082f873cea9e9fa91e
Author: Wenchen Fan 
Date:   2016-09-22T15:25:32Z

[SQL][MINOR] correct the comment of SortBasedAggregationIterator.safeProj

## What changes were proposed in this pull request?

This comment went stale long time ago, this PR fixes it according to my 
understanding.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #15095 from cloud-fan/update-comment.

commit 17b72d31e0c59711eddeb525becb8085930eadcc
Author: Dhruve Ashar 
Date:   2016-09-22T17:10:37Z

[SPARK-17365][CORE] Remove/Kill multiple executors together to reduce RPC 
call time.

## What changes were proposed in this pull request?
We are killing multiple executors together instead of iterating over 
expensive RPC calls to kill single executor.

## How was this patch tested?
Executed sample spark job to observe executors being killed/removed with 
dynamic allocation enabled.

Author: Dhruve Ashar 
Author: Dhruve Ashar 

Closes #15152 from dhruve/impr/SPARK-17365.

commit 9f24a17c59b1130d97efa7d313c06577f7344338
Author: Shivaram Venkataraman 
Date:   2016-09-22T18:52:42Z

Skip building R vignettes if Spark is not built

## What changes were proposed in this pull request?

When we build the docs separately we don't have the JAR files from the 
Spark build in
the same tree. As the SparkR vignettes need to launch a SparkContext to be 
built, we skip building them if JAR files don't exist

## How was this patch tested?

To test this we can run the following:
```
build/mvn -DskipTests -Psparkr clean
./R/create-docs.sh
```
You should see a line `Skipping R vignettes as Spark JARs not found` at the 
end

Author: Shivaram Venkataraman 

Closes #15200 from shivaram/sparkr-vignette-skip.

commit 85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c
Author: Burak Yavuz 
Date:   2016-09-22T20:05:41Z

[SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames

## What changes were proposed in this pull request?

Consider you have a bucket as `s3a://some-bucket`
and under it you have files:
```
s3a://some-bucket/file1.parquet
s3a://some-bucket/file2.parquet
```
Getting the parent path of `s3a://some-bucket/file1.parquet` yields
`s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the 
hash map.

When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at 
the end) to get the list of files, and we're left with an empty list!

This PR fixes this by adding a `/` at the end of the `URI` iff the given 
`Path` doesn't have a parent, i.e. is the root. This is a no-op if the path 
already had a `/` at the end, and is handled through the Hadoop Path, path 
merging semantics.

## How was this patch tested?

Unit test in `FileCatalogSuite`.

Author: Burak Yavuz 

Closes #15169 from brkyvz/SPARK-17613.

commit 3cdae0ff2f45643df7bc198cb48623526c7eb1a6
Author: Shixiong Zhu 
Date:   2016-09-22T21:26:45Z

[SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process 
is dead

## What changes were proposed in this pull request?

When the Python process is dead, the JVM StreamingContext is still running. 
Hence we will see a lot of Py4jException before the JVM process exits. It's 
better to stop the JVM StreamingContext to avoid those annoying logs.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu 


[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14957
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15575
  
Yeah. So I don't see any exception that an unary node, if it is not 
`ShuffleExchange`, can have an `outputPartitioning` other than 
`child.outputPartitioning`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14957
  
**[Test build #67316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67316/consoleFull)**
 for PR 14957 at commit 
[`23465ba`](https://github.com/apache/spark/commit/23465babd0f60db8a79e70f0589af2ec5bf360eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15575
  
Our planner decides if to add an `ShuffleExchange` by consider 
`outputPartitioning` and `requiredDistribution` together. If the 
`outputPartitioning` of the child does not satisfy the `requiredDistribution` 
of the parent, we will add a `ShuffleExchange` operator.

When we say `child`, it is literally the child of a node. It does not 
matter if the child is a `ShuffleExchange` or not. For a node, when its 
`outputPartitioning` is `child. outputPartitioning`, this node does not shuffle 
data. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15566: [SPARK-18026][SQL] should not always lowercase partition...

2016-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15566
  
Sorry I was wrong. The root cause of SPARK-17990 is that, when we write 
data for a hive table, we ignore the truth that hive table is not 
case-preserving, and create the partition directory with the case-preserving 
partition columns.

I think it's not related to this PR and we should fix it with a new PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15556: [SPARK-18010][Core] Reduce work performed for bui...

2016-10-20 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/15556#discussion_r84413566
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala ---
@@ -43,38 +43,56 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
* @param sourceName Filename (or other source identifier) from whence 
@logData is being read
* @param maybeTruncated Indicate whether log file might be truncated 
(some abnormal situations
*encountered, log file might not finished writing) or not
+   * @param eventsFilter Filter function to select JSON event strings in 
the log data stream that
+   *should be parsed and replayed. When not specified, all event 
strings in the log data
+   *are parsed and replayed.
*/
   def replay(
   logData: InputStream,
   sourceName: String,
-  maybeTruncated: Boolean = false): Unit = {
-var currentLine: String = null
-var lineNumber: Int = 1
+  maybeTruncated: Boolean = false,
+  eventsFilter: (String) => Boolean = 
ReplayListenerBus.SELECT_ALL_FILTER): Unit = {
 try {
-  val lines = Source.fromInputStream(logData).getLines()
-  while (lines.hasNext) {
-currentLine = lines.next()
+  val lineEntries = Source.fromInputStream(logData)
+.getLines()
+.zipWithIndex
+.filter(entry => eventsFilter(entry._1))
+
+  var entry: (String, Int) = ("", 0)
+
+  while (lineEntries.hasNext) {
 try {
-  postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine)))
+  entry = lineEntries.next()
+  postToAll(JsonProtocol.sparkEventFromJson(parse(entry._1)))
 } catch {
   case jpe: JsonParseException =>
 // We can only ignore exception from last line of the file 
that might be truncated
-if (!maybeTruncated || lines.hasNext) {
+// the last entry may not be the very last line in the event 
log, but we treat it
+// as such in a best effort to replay the given input
+if (!maybeTruncated || lineEntries.hasNext) {
   throw jpe
 } else {
   logWarning(s"Got JsonParseException from log file 
$sourceName" +
-s" at line $lineNumber, the file might not have finished 
writing cleanly.")
+s" at line number ${entry._2}, the file might not have 
finished writing cleanly.")
 }
+
+  case e: Exception =>
--- End diff --

needed to have the line number at which failure happened, since we had that 
in the earlier implementation. the line/line no values are not in scope in the 
outer catch so can't get them logged there. open for a better way to do this if 
any.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13507: [SPARK-15765][SQL][Streaming] Make continuous Par...

2016-10-20 Thread lw-lin
Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/13507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13507: [SPARK-15765][SQL][Streaming] Make continuous Parquet wr...

2016-10-20 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13507
  
I'm closing this in favor of SPARK-17924, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67311/consoleFull)**
 for PR 15575 at commit 
[`f9a4420`](https://github.com/apache/spark/commit/f9a442080d73dd42682a20f0a1ee6e5844abb938).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15575
  
The query planner will inject exchange based on the required distribution 
for the child of `UnaryNodeExec`. Looks like all unary nodes have 
`child.outputPartitioning` as its output partition. I can't see an exception 
based on the fact that only `ShuffleExchange` will change `outputPartitioning`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67311/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15575
  
@yhuai let me be more clear. It is actually fairly confusing to say "this 
only changes if the operator doesn't shuffle data". 

The thing is that we are relying on each operator's output partitioning to 
determine injecting exchanges. So is it safe to depend on "child", which is 
ambiguous in this context? Child could mean the current child before injecting 
exchange, or it could mean the injected exchange operator. Can the query 
planner incorrectly not injecting exchange because a plan node is deemed 
already having the correct output partitioning? Or is this always guaranteed to 
not happen due to certain ordering we enforce in planning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15575
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15575
  
**[Test build #67318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67318/consoleFull)**
 for PR 15575 at commit 
[`c5b6626`](https://github.com/apache/spark/commit/c5b6626c144a97054f74adc3158ceea6de3cea58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84411683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

Nit: also add an empty line here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84411628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -45,6 +45,8 @@ case class SortExec(
 
   override def outputOrdering: Seq[SortOrder] = sortOrder
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

Could we add comments here? It can help others understand the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67312/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15481
  
**[Test build #67312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67312/consoleFull)**
 for PR 15481 at commit 
[`7bf3bf8`](https://github.com/apache/spark/commit/7bf3bf8606f261e2eb1bebd3c6e3c3ff8600e140).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-20 Thread YuhuWang2002
Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
@tgravescs :
Thank you for your response, when a single reduce task handling huge data, 
it's slowly and unstable. so we split one reduce task to multi- reduce task.
A single reduce task doing like A join B. we split to multi-task. task 1 
doing A1 join B,  task 2 dong A2 join B and so on.  A1 is a part of A which 
read from a range of maps output.  For spark sql, it is the A1 as a  separate 
partitions when processing. so it can use mutil-executor to run the task.  for 
dispersion the process pressure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2016-10-20 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15575
  
`outputPartitioning` of a node will only be changed if this node shuffles 
data. Right now, only `ShuffleExchange` shuffles data. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15577: [SPARK-18030][Tests] Adds more checks to collect more in...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67310/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15577: [SPARK-18030][Tests] Adds more checks to collect more in...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15577
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15577: [SPARK-18030][Tests] Adds more checks to collect more in...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15577
  
**[Test build #67310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67310/consoleFull)**
 for PR 15577 at commit 
[`32a1eea`](https://github.com/apache/spark/commit/32a1eea1183dc8c26c6bfd778d4ed24b3e9eb856).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410677
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

sorry. nvm. `child.outputPartitioning` is right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala 
---
@@ -103,6 +103,8 @@ case class WindowExec(
 
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

This is right. This operator does not shuffle data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15571: [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67307/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410586
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

This should be `SinglePartition` since `requiredChildDistribution` is 
`AllTuples`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15571: [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15571
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15571: [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15571
  
**[Test build #67307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67307/consoleFull)**
 for PR 15571 at commit 
[`c83faa5`](https://github.com/apache/spark/commit/c83faa5a95d9e9c61f1d92b9f883daf284f91501).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410162
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -45,6 +45,8 @@ case class SortExec(
 
   override def outputOrdering: Seq[SortOrder] = sortOrder
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

No. ShuffleExchange is the only operator that can change 
`outputPartitioning`. `global` at here only influences the value of 
`requiredChildDistribution`, which later help the planner decide if to add a 
ShuffleExchange operator for range partitioning.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15566: [SPARK-18026][SQL] should not always lowercase partition...

2016-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15566
  
Actually SPARK-17990 can be fixed in this PR with a little more work. Let 
me do it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-20 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/9
  
Related thought: if the model holds a pointer to its initialModel, then it 
will be serialized and shipped along with the model at prediction time. This 
will be inefficient for large models and even if we cut the lineage, we 
needlessly double the size of the closure. 

It seems best not to have the initialModel in the model at all, but then it 
is an edge case for params, and it's still nice to know how the model was 
initialized at fit time. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >