date:20141029

[GitHub] spark pull request: [Build] Uploads HiveCompatibilitySuite logs

2014-10-29 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2993

[Build] Uploads HiveCompatibilitySuite logs

In addition to unit-tests.log files, also upload failure output files 
generated by `HiveCompatibilitySuite` to Jenkins master. These files can be 
very helpful to debug Hive compatibility test failures.

/cc @pwendell @marmbrus

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark upload-hive-compat-logs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2993.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2993


commit 8e6247fd410ee55560e76a4387671a48d68edcd5
Author: Cheng Lian l...@databricks.com
Date:   2014-10-29T03:47:31Z

Uploads HiveCompatibilitySuite logs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60877916
  
  [Test build #22430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22430/consoleFull)
 for   PR 2944 at commit 
[`f4ac1c1`](https://github.com/apache/spark/commit/f4ac1c1099019cc43525508545452e22eb677a70).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60877993
  
I've rewritten this patch so that thread dumps are triggered on-demand 
using a new driver - executor RPC channel.  There are a few hacks involved in 
setting this up, mostly to work around some limitations of our current executor 
registration mechanisms without resorting to heavy refactoring.  Please take a 
look and let me know what you think.

I'm going to leave a couple of comments on the diff to help explain the 
hackier parts of these changes.

Also, we still might want to enable compression of the stacktrace RPCs, but 
I'll leave that to a separate PR (this would be a good starter JIRA task, BTW).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60878081
  
  [Test build #22430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22430/consoleFull)
 for   PR 2944 at commit 
[`f4ac1c1`](https://github.com/apache/spark/commit/f4ac1c1099019cc43525508545452e22eb677a70).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class GetActorSystemHostPortForExecutor(executorId: String) 
extends ToBlockManagerMaster`
  * `class ThreadDumpPage(parent: ExecutorsTab) extends 
WebUIPage(threadDump) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Uploads HiveCompatibilitySuite logs

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2993#issuecomment-60878064
  
  [Test build #22431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22431/consoleFull)
 for   PR 2993 at commit 
[`8e6247f`](https://github.com/apache/spark/commit/8e6247fd410ee55560e76a4387671a48d68edcd5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60878083
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22430/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2944#discussion_r19521370
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala ---
@@ -412,6 +415,17 @@ class BlockManagerMasterActor(val isLocal: Boolean, 
conf: SparkConf, listenerBus
   Seq.empty
 }
   }
+
+  private def getActorSystemHostPortForExecutor(executorId: String): 
Option[(String, Int)] = {
--- End diff --

This verbosely-named method is the big hack for enabling driver - 
executor RPC.  Basically, I needed to have a way to address the remote 
`ExecutorActor` from the driver.  Here, we rely on the fact that every executor 
registers a BlockManager actor with the BlockManagerMasterActor and that there 
is only one actor system per executor / driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60878201
  
  [Test build #22432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22432/consoleFull)
 for   PR 2992 at commit 
[`2b5e882`](https://github.com/apache/spark/commit/2b5e8828a6db72adee10cfbdc71f07d372f43f90).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread saucam

Github user saucam commented on a diff in the pull request:

https://github.com/apache/spark/pull/2841#discussion_r19521440
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 ---
@@ -423,10 +436,8 @@ private[parquet] class FilteringParquetRowInputFormat
   configuration: Configuration,
   footers: JList[Footer]): JList[ParquetInputSplit] = {
 
-import FilteringParquetRowInputFormat.blockLocationCache
-
-val cacheMetadata = 
configuration.getBoolean(SQLConf.PARQUET_CACHE_METADATA, false)
-
+// Use task side strategy by default
+val taskSideMetaData = 
configuration.getBoolean(ParquetInputFormat.TASK_SIDE_METADATA, true)
--- End diff --

yes , in parquet , clientsidemetadata strategy is the default one, but as I 
mentioned earlier, don't we want task side strategy by default due to the 
inherent advantages of less memory usage at client side ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2944#discussion_r19521449
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ThreadDumpPage.scala 
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ui.exec
+
+import javax.servlet.http.HttpServletRequest
+
+import scala.util.Try
+import scala.xml.{Text, Node}
+
+import org.apache.spark.ui.{UIUtils, WebUIPage}
+
+class ThreadDumpPage(parent: ExecutorsTab) extends WebUIPage(threadDump) 
{
+
+  private val sc = parent.sc
+
+  def render(request: HttpServletRequest): Seq[Node] = {
+val executorId = Option(request.getParameter(executorId)).getOrElse {
+  return Text(sMissing executorId parameter)
+}
+val time = System.currentTimeMillis()
+val maybeThreadDump = Try(sc.get.getExecutorThreadDump(executorId))
--- End diff --

This is a blocking call.  From a high-performance HTTP server standpoint, 
this is probably a bad idea; it might improve throughput to handle this in some 
sort of future / continuation so that we don't starve the request handling 
threadpool while waiting on a remote RPC.  On the other hand, I don't think 
that we should over-engineer things right now; we can always move towards a 
fancier request handling strategy if we discover that we need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2753#issuecomment-60878626
  
  [Test build #22427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22427/consoleFull)
 for   PR 2753 at commit 
[`cadfd28`](https://github.com/apache/spark/commit/cadfd28f116f0dbca11e580a23caf82060bcf922).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2753#issuecomment-60878629
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22427/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60878689
  
  [Test build #22433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22433/consoleFull)
 for   PR 2944 at commit 
[`bc1e675`](https://github.com/apache/spark/commit/bc1e675f0204e6f111cf7ec49cdb318150ecef54).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4109][CORE] Correctly deserialize Task....

2014-10-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2971#discussion_r19521525
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -128,7 +128,7 @@ private[spark] class ShuffleMapTask(
   }
 
   override def readExternal(in: ObjectInput) {
--- End diff --

I guess I missed this case in my PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4109][CORE] Correctly deserialize Task....

2014-10-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2971#discussion_r19521513
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -128,7 +128,7 @@ private[spark] class ShuffleMapTask(
   }
 
   override def readExternal(in: ObjectInput) {
--- End diff --

As long as you're modifying this code, mind tossing a 
`Utils.tryOrIOException` here so that any errors that occur here are reported 
properly?  See #2932 for explanation / context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread harishreedharan

GitHub user harishreedharan opened a pull request:

https://github.com/apache/spark/pull/2994

[SPARK-4122][STREAMING] Add a library that can write data back to Kafka ...

...from Spark Streaming.

This adds a library that can writes dstreams to Kafka. An implicit also has 
been added so users
can call dstream.writeToKafka(..)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/harishreedharan/spark Kafka-output

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2994


commit f61d82f3152f8e6cf758fa349c8198289e0deae8
Author: Hari Shreedharan hshreedha...@apache.org
Date:   2014-10-29T06:06:33Z

[SPARK-4122][STREAMING] Add a library that can write data back to Kafka 
from Spark Streaming.

This adds a library that can writes dstreams to Kafka. An implicit also has 
been added so users
can call dstream.writeToKafka(..)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60878843
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22425/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60878841
  
  [Test build #22425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22425/consoleFull)
 for   PR 2991 at commit 
[`5cc4cb1`](https://github.com/apache/spark/commit/5cc4cb198662cb35008d9a2e46e320f75ce35a71).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60878899
  
  [Test build #22433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22433/consoleFull)
 for   PR 2944 at commit 
[`bc1e675`](https://github.com/apache/spark/commit/bc1e675f0204e6f111cf7ec49cdb318150ecef54).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class GetActorSystemHostPortForExecutor(executorId: String) 
extends ToBlockManagerMaster`
  * `class ThreadDumpPage(parent: ExecutorsTab) extends 
WebUIPage(threadDump) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-611] [WIP] Display executor thread dump...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2944#issuecomment-60878901
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22433/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60879163
  
  [Test build #22434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22434/consoleFull)
 for   PR 2994 at commit 
[`f61d82f`](https://github.com/apache/spark/commit/f61d82f3152f8e6cf758fa349c8198289e0deae8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60879456
  
  [Test build #22435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22435/consoleFull)
 for   PR 2994 at commit 
[`372c749`](https://github.com/apache/spark/commit/372c749458e22ba1a9acd2badbfad51e4dda3968).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread saucam

Github user saucam commented on a diff in the pull request:

https://github.com/apache/spark/pull/2841#discussion_r19521851
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 ---
@@ -460,29 +515,85 @@ private[parquet] class FilteringParquetRowInputFormat
   val status = fileStatuses.getOrElse(file, fs.getFileStatus(file))
   val parquetMetaData = footer.getParquetMetadata
   val blocks = parquetMetaData.getBlocks
-  var blockLocations: Array[BlockLocation] = null
-  if (!cacheMetadata) {
-blockLocations = fs.getFileBlockLocations(status, 0, status.getLen)
-  } else {
-blockLocations = blockLocationCache.get(status, new 
Callable[Array[BlockLocation]] {
-  def call(): Array[BlockLocation] = 
fs.getFileBlockLocations(status, 0, status.getLen)
-})
-  }
+  totalRowGroups = totalRowGroups + blocks.size
+  val filteredBlocks = RowGroupFilter.filterRowGroups(
+filter,
+blocks,
+parquetMetaData.getFileMetaData.getSchema)
+  rowGroupsDropped = rowGroupsDropped + (blocks.size - 
filteredBlocks.size)
+  
+  if (!filteredBlocks.isEmpty){
+  var blockLocations: Array[BlockLocation] = null
+  if (!cacheMetadata) {
+blockLocations = fs.getFileBlockLocations(status, 0, 
status.getLen)
+  } else {
+blockLocations = blockLocationCache.get(status, new 
Callable[Array[BlockLocation]] {
+  def call(): Array[BlockLocation] = 
fs.getFileBlockLocations(status, 0, status.getLen)
+})
+  }
+  splits.addAll(
+generateSplits.invoke(
+  null,
+  filteredBlocks,
+  blockLocations,
+  status,
+  readContext.getRequestedSchema.toString,
+  readContext.getReadSupportMetadata,
+  minSplitSize,
+  maxSplitSize).asInstanceOf[JList[ParquetInputSplit]])
+}
+}
+
+if (rowGroupsDropped  0  totalRowGroups  0){
+  val percentDropped = ((rowGroupsDropped/totalRowGroups.toDouble) * 
100).toInt
+  logInfo(sDropping $rowGroupsDropped row groups that do not pass 
filter predicate 
++ s($percentDropped %) !)
+}
+else {
+  logInfo(There were no row groups that could be dropped due to 
filter predicates)
+}
+splits
+
+  }
+
+  def getTaskSideSplits(
+configuration: Configuration,
+footers: JList[Footer],
+maxSplitSize: JLong,
+minSplitSize: JLong,
+readContext: ReadContext): JList[ParquetInputSplit] = {
+
+val splits = mutable.ArrayBuffer.empty[ParquetInputSplit]
+
+// Ugly hack, stuck with it until PR:
+// https://github.com/apache/incubator-parquet-mr/pull/17
+// is resolved
+val generateSplits =
+  Class.forName(parquet.hadoop.TaskSideMetadataSplitStrategy)
+   .getDeclaredMethods.find(_.getName == 
generateTaskSideMDSplits).getOrElse(
+ sys.error(
+   sFailed to reflectively invoke 
TaskSideMetadataSplitStrategy.generateTaskSideMDSplits))
+generateSplits.setAccessible(true)
+ 
+for (footer - footers) {
+  val file = footer.getFile
+  val fs = file.getFileSystem(configuration)
+  val status = fileStatuses.getOrElse(file, fs.getFileStatus(file))
+  val blockLocations = fs.getFileBlockLocations(status, 0, 
status.getLen)
   splits.addAll(
--- End diff --

I am not sure I follow here, if globalmetadata is non null, it means there 
is data and hence splits would be generated by the generatesplits function 
which takes the hdfsblocks to process as argument ? the generated splits are 
added to the splits to be returned later.

splits.addAll(
generateSplits.invoke(
  null,
  filteredBlocks,
  blockLocations,
  status,
  readContext.getRequestedSchema.toString,
  readContext.getReadSupportMetadata,
  minSplitSize,
  maxSplitSize).asInstanceOf[JList[ParquetInputSplit]])
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread saucam

Github user saucam commented on the pull request:

https://github.com/apache/spark/pull/2841#issuecomment-60880619
  
Hi @mateiz , thanks for the suggestions, just a few points 
1. Need to know which strategy to be kept as default (currently we use a 
different one than the default one in  parquet library)
2. This PR is adding support to use filter2 api from the parquet library 
which supports row group filtering. Do we need to add tests to ensure that ? 
because such test cases already exist in the parquet library : 


https://github.com/Parquet/parquet-mr/blob/parquet-1.6.0rc3/parquet-hadoop/src/test/java/parquet/filter2/compat/TestRowGroupFilter.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3795] Heuristics for dynamically scalin...

2014-10-29 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2746#issuecomment-60880622
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60880659
  
  [Test build #22436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22436/consoleFull)
 for   PR 2615 at commit 
[`95c2e8e`](https://github.com/apache/spark/commit/95c2e8e86f69f3b1d11aad04f6ad14b0ede1950a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2942#issuecomment-60880665
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22428/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2942#issuecomment-60880661
  
  [Test build #22428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22428/consoleFull)
 for   PR 2942 at commit 
[`9f7aea9`](https://github.com/apache/spark/commit/9f7aea9eac3c64f646d1783909e0e2d155663399).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StreamingKMeansModel(`
  * `class StreamingKMeans(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2753#issuecomment-60880960
  
  [Test build #22437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22437/consoleFull)
 for   PR 2753 at commit 
[`cadfd28`](https://github.com/apache/spark/commit/cadfd28f116f0dbca11e580a23caf82060bcf922).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SQL][SPARK-3839] Reimplement Left/Right ...

2014-10-29 Thread Ishiihara

Github user Ishiihara commented on the pull request:

https://github.com/apache/spark/pull/2723#issuecomment-60881119
  
@marmbrus All test failures have the same pattern

select * from a right outer join b on condition1 join c on condition2

With the extra join operation, somehow the iterator of a join b does not 
output all the values of the join result. However, 

select * from a right outer join b on a.key = b.key

returns the correct result. I am currently investigating the cause of the 
failures, please let me know if you have any ideas. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60881601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22429/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60881653
  
  [Test build #22432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22432/consoleFull)
 for   PR 2992 at commit 
[`2b5e882`](https://github.com/apache/spark/commit/2b5e8828a6db72adee10cfbdc71f07d372f43f90).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class DeferredObjectAdapter(oi: ObjectInspector) extends 
DeferredObject `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60881594
  
  [Test build #22429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22429/consoleFull)
 for   PR 2992 at commit 
[`ebe3e74`](https://github.com/apache/spark/commit/ebe3e74df70eb424aecc3170fc55008cfb6a76ec).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60881656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22432/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60881849
  
  [Test build #22438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22438/consoleFull)
 for   PR 2991 at commit 
[`09d57c5`](https://github.com/apache/spark/commit/09d57c5270cf876e52d49db702e2330c2b6a6e10).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Uploads HiveCompatibilitySuite logs

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2993#issuecomment-60882504
  
  [Test build #22431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22431/consoleFull)
 for   PR 2993 at commit 
[`8e6247f`](https://github.com/apache/spark/commit/8e6247fd410ee55560e76a4387671a48d68edcd5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Uploads HiveCompatibilitySuite logs

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2993#issuecomment-60882509
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22431/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread saucam

Github user saucam commented on a diff in the pull request:

https://github.com/apache/spark/pull/2841#discussion_r19522625
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala ---
@@ -209,25 +221,25 @@ private[sql] object ParquetFilters {
   case _ = None
 }
   }
-  case p @ EqualTo(left: Literal, right: NamedExpression) if 
!right.nullable =
+  case p @ EqualTo(left: Literal, right: NamedExpression) =
 Some(createEqualityFilter(right.name, left, p))
-  case p @ EqualTo(left: NamedExpression, right: Literal) if 
!left.nullable =
+  case p @ EqualTo(left: NamedExpression, right: Literal) =
 Some(createEqualityFilter(left.name, right, p))
-  case p @ LessThan(left: Literal, right: NamedExpression) if 
!right.nullable =
+  case p @ LessThan(left: Literal, right: NamedExpression) =
 Some(createLessThanFilter(right.name, left, p))
-  case p @ LessThan(left: NamedExpression, right: Literal) if 
!left.nullable =
+  case p @ LessThan(left: NamedExpression, right: Literal) =
 Some(createLessThanFilter(left.name, right, p))
-  case p @ LessThanOrEqual(left: Literal, right: NamedExpression) if 
!right.nullable =
+  case p @ LessThanOrEqual(left: Literal, right: NamedExpression) =
 Some(createLessThanOrEqualFilter(right.name, left, p))
-  case p @ LessThanOrEqual(left: NamedExpression, right: Literal) if 
!left.nullable =
+  case p @ LessThanOrEqual(left: NamedExpression, right: Literal) =
 Some(createLessThanOrEqualFilter(left.name, right, p))
-  case p @ GreaterThan(left: Literal, right: NamedExpression) if 
!right.nullable =
+  case p @ GreaterThan(left: Literal, right: NamedExpression) =
 Some(createGreaterThanFilter(right.name, left, p))
-  case p @ GreaterThan(left: NamedExpression, right: Literal) if 
!left.nullable =
+  case p @ GreaterThan(left: NamedExpression, right: Literal) =
 Some(createGreaterThanFilter(left.name, right, p))
-  case p @ GreaterThanOrEqual(left: Literal, right: NamedExpression) 
if !right.nullable =
+  case p @ GreaterThanOrEqual(left: Literal, right: NamedExpression) =
 Some(createGreaterThanOrEqualFilter(right.name, left, p))
-  case p @ GreaterThanOrEqual(left: NamedExpression, right: Literal) 
if !left.nullable =
+  case p @ GreaterThanOrEqual(left: NamedExpression, right: Literal) =
--- End diff --

The nullable option is set when the field is optional. So adding tests for 
those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4124] [MLlib] [PySpark] simplify serial...

2014-10-29 Thread davies

GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/2995

[SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API

Create several helper functions to call MLlib Java API, convert the 
arguments to Java type and convert return value to Python object automatically, 
this simplify serialization in MLlib Python API very much.

After this, the MLlib Python API does not need to deal with serialization 
details anymore, it's easier to add new API.

cc @mengxr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2995.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2995


commit 731331fdafe9ce6e4bf24dc1e6667942e1e59587
Author: Davies Liu dav...@databricks.com
Date:   2014-10-29T07:19:33Z

simplify serialization in MLlib Python API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60883389
  
  [Test build #22440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22440/consoleFull)
 for   PR 2992 at commit 
[`b99db6c`](https://github.com/apache/spark/commit/b99db6caa0a5f2d6e69d5940b5c37e88914c5e36).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60883548
  
  [Test build #22441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22441/consoleFull)
 for   PR 2615 at commit 
[`df2b19e`](https://github.com/apache/spark/commit/df2b19e1ef5bd462c02681d59d4fa4422c944ce4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4124] [MLlib] [PySpark] simplify serial...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2995#issuecomment-60883555
  
  [Test build #22439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22439/consoleFull)
 for   PR 2995 at commit 
[`731331f`](https://github.com/apache/spark/commit/731331fdafe9ce6e4bf24dc1e6667942e1e59587).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60883762
  
  [Test build #22434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22434/consoleFull)
 for   PR 2994 at commit 
[`f61d82f`](https://github.com/apache/spark/commit/f61d82f3152f8e6cf758fa349c8198289e0deae8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60883766
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22434/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-29 Thread yanbohappy

Github user yanbohappy commented on the pull request:

https://github.com/apache/spark/pull/2978#issuecomment-60883826
  
Rename parameter and function names to be consistent with spark naming 
rules.
Delete unused columns and set prediction as the first column.
Add explanation and reference to r2Score and explained variance.
Other code style keeping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60883856
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22441/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60883853
  
  [Test build #22441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22441/consoleFull)
 for   PR 2615 at commit 
[`df2b19e`](https://github.com/apache/spark/commit/df2b19e1ef5bd462c02681d59d4fa4422c944ce4).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2983#issuecomment-60884047
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60884111
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22435/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

2014-10-29 Thread jegonzal

GitHub user jegonzal opened a pull request:

https://github.com/apache/spark/pull/2996

[SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace 

This simple patch filters out extra whitespace entries.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jegonzal/spark loadLibSVM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2996


commit e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980
Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com
Date:   2014-10-29T07:13:56Z

fixing whitespace bug in loadLibSVMFile when parsing libSVM files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-60884103
  
  [Test build #22435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22435/consoleFull)
 for   PR 2994 at commit 
[`372c749`](https://github.com/apache/spark/commit/372c749458e22ba1a9acd2badbfad51e4dda3968).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2996#issuecomment-60884323
  
  [Test build #22442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull)
 for   PR 2996 at commit 
[`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2983#issuecomment-60884309
  
  [Test build #22443 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22443/consoleFull)
 for   PR 2983 at commit 
[`4ca62cd`](https://github.com/apache/spark/commit/4ca62cd306d96890a7a56da04710ae5548715c4a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60884698
  
  [Test build #22444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22444/consoleFull)
 for   PR 2615 at commit 
[`472bbcf`](https://github.com/apache/spark/commit/472bbcfe5082920ac97bc2e29faeae78764141c7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885003
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22444/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885001
  
  [Test build #22444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22444/consoleFull)
 for   PR 2615 at commit 
[`472bbcf`](https://github.com/apache/spark/commit/472bbcfe5082920ac97bc2e29faeae78764141c7).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885128
  
Jenkins, retest this please. (I might get lucky this time, Looks like the 
compilation failure is random.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/2994#discussion_r19523356
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
 ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kafka
+
+import java.util.Properties
+
+import scala.reflect.ClassTag
+
+import kafka.producer.{ProducerConfig, KeyedMessage, Producer}
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.streaming.dstream.DStream
+
+/**
+ * Import this object in this form:
+ * {{{
+ *   import org.apache.spark.streaming.kafka.KafkaWriter._
+ * }}}
+ *
+ * Once imported, the `writeToKafka` can be called on any [[DStream]] 
object in this form:
+ * {{{
+ *   dstream.writeToKafka(producerConfig, f)
+ * }}}
+ */
+object KafkaWriter {
+  import scala.language.implicitConversions
+  /**
+   * This implicit method allows the user to call dstream.writeToKafka(..)
+   * @param dstream - DStream to write to Kafka
+   * @tparam T - The type of the DStream
+   * @tparam K - The type of the key to serialize to
+   * @tparam V - The type of the value to serialize to
+   * @return
+   */
+  implicit def createKafkaOutputWriter[T: ClassTag, K, V](dstream: 
DStream[T]): KafkaWriter[T] = {
+new KafkaWriter[T](dstream)
+  }
+}
+
+/**
+ *
+ * This class can be used to write data to Kafka from Spark Streaming. To 
write data to Kafka
+ * simply `import org.apache.spark.streaming.kafka.KafkaWriter._` in your 
application and call
+ * `dstream.writeToKafka(producerConf, func)`
+ *
+ * Here is an example:
+ * {{{
+ * // Adding this line allows the user to call 
dstream.writeDStreamToKafka(..)
+ * import org.apache.spark.streaming.kafka.KafkaWriter._
+ *
+ * class ExampleWriter {
+ *   val instream = ssc.queueStream(toBe)
+ *   val producerConf = new Properties()
+ *   producerConf.put(serializer.class, 
kafka.serializer.DefaultEncoder)
+ *   producerConf.put(key.serializer.class, 
kafka.serializer.StringEncoder)
+ *   producerConf.put(metadata.broker.list, kafka.example.com:5545)
+ *   producerConf.put(request.required.acks, 1)
+ *   instream.writeToKafka(producerConf,
+ *(x: String) = new KeyedMessage[String, String](default, null, x))
+ *   ssc.start()
+ * }
+ *
+ * }}}
+ * @param dstream - The [[DStream]] to be written to Kafka
+ *
+ */
+class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) {
+
+  /**
+   * To write data from a DStream to Kafka, call this function after 
creating the DStream. Once
+   * the DStream is passed into this function, all data coming from the 
DStream is written out to
+   * Kafka. The properties instance takes the configuration required to 
connect to the Kafka
+   * brokers in the standard Kafka format. The serializerFunc is a 
function that converts each
+   * element of the RDD to a Kafka [[KeyedMessage]]. This closure should 
be serializable - so it
+   * should use only instances of Serializables.
+   * @param producerConfig The configuration that can be used to connect 
to Kafka
+   * @param serializerFunc The function to convert the data from the 
stream into Kafka
+   *   [[KeyedMessage]]s.
+   * @tparam K The type of the key
+   * @tparam V The type of the value
+   *
+   */
+  def writeToKafka[K, V](producerConfig: Properties,
+serializerFunc: T = KeyedMessage[K, V]): Unit = {
+
+// Broadcast the producer to avoid sending it every time.
+val broadcastedConfig = dstream.ssc.sc.broadcast(producerConfig)
+
+def func = (rdd: RDD[T]) = {
+  rdd.foreachPartition(events = {
+// The ForEachDStream runs the function locally on the driver. So 
the
+// ProducerCache from the driver is likely to get serialized and
+//

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885269
  
  [Test build #22445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22445/consoleFull)
 for   PR 2615 at commit 
[`472bbcf`](https://github.com/apache/spark/commit/472bbcfe5082920ac97bc2e29faeae78764141c7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885565
  
  [Test build #22445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22445/consoleFull)
 for   PR 2615 at commit 
[`472bbcf`](https://github.com/apache/spark/commit/472bbcfe5082920ac97bc2e29faeae78764141c7).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60885568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22445/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/2994#discussion_r19523702
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
 ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kafka
+
+import java.util.Properties
+
+import scala.reflect.ClassTag
+
+import kafka.producer.{ProducerConfig, KeyedMessage, Producer}
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.streaming.dstream.DStream
+
+/**
+ * Import this object in this form:
+ * {{{
+ *   import org.apache.spark.streaming.kafka.KafkaWriter._
+ * }}}
+ *
+ * Once imported, the `writeToKafka` can be called on any [[DStream]] 
object in this form:
+ * {{{
+ *   dstream.writeToKafka(producerConfig, f)
+ * }}}
+ */
+object KafkaWriter {
+  import scala.language.implicitConversions
+  /**
+   * This implicit method allows the user to call dstream.writeToKafka(..)
+   * @param dstream - DStream to write to Kafka
+   * @tparam T - The type of the DStream
+   * @tparam K - The type of the key to serialize to
+   * @tparam V - The type of the value to serialize to
+   * @return
+   */
+  implicit def createKafkaOutputWriter[T: ClassTag, K, V](dstream: 
DStream[T]): KafkaWriter[T] = {
+new KafkaWriter[T](dstream)
+  }
+}
+
+/**
+ *
+ * This class can be used to write data to Kafka from Spark Streaming. To 
write data to Kafka
+ * simply `import org.apache.spark.streaming.kafka.KafkaWriter._` in your 
application and call
+ * `dstream.writeToKafka(producerConf, func)`
+ *
+ * Here is an example:
+ * {{{
+ * // Adding this line allows the user to call 
dstream.writeDStreamToKafka(..)
+ * import org.apache.spark.streaming.kafka.KafkaWriter._
+ *
+ * class ExampleWriter {
+ *   val instream = ssc.queueStream(toBe)
+ *   val producerConf = new Properties()
+ *   producerConf.put(serializer.class, 
kafka.serializer.DefaultEncoder)
+ *   producerConf.put(key.serializer.class, 
kafka.serializer.StringEncoder)
+ *   producerConf.put(metadata.broker.list, kafka.example.com:5545)
+ *   producerConf.put(request.required.acks, 1)
+ *   instream.writeToKafka(producerConf,
+ *(x: String) = new KeyedMessage[String, String](default, null, x))
+ *   ssc.start()
+ * }
+ *
+ * }}}
+ * @param dstream - The [[DStream]] to be written to Kafka
+ *
+ */
+class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) {
+
+  /**
+   * To write data from a DStream to Kafka, call this function after 
creating the DStream. Once
+   * the DStream is passed into this function, all data coming from the 
DStream is written out to
+   * Kafka. The properties instance takes the configuration required to 
connect to the Kafka
+   * brokers in the standard Kafka format. The serializerFunc is a 
function that converts each
+   * element of the RDD to a Kafka [[KeyedMessage]]. This closure should 
be serializable - so it
+   * should use only instances of Serializables.
+   * @param producerConfig The configuration that can be used to connect 
to Kafka
+   * @param serializerFunc The function to convert the data from the 
stream into Kafka
+   *   [[KeyedMessage]]s.
+   * @tparam K The type of the key
+   * @tparam V The type of the value
+   *
+   */
+  def writeToKafka[K, V](producerConfig: Properties,
+serializerFunc: T = KeyedMessage[K, V]): Unit = {
+
+// Broadcast the producer to avoid sending it every time.
+val broadcastedConfig = dstream.ssc.sc.broadcast(producerConfig)
+
+def func = (rdd: RDD[T]) = {
+  rdd.foreachPartition(events = {
+// The ForEachDStream runs the function locally on the driver. So 
the
+// ProducerCache from the driver is likely to get serialized and
+//

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/2994#discussion_r19523728
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
 ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

A empty line after Apache header :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2753#issuecomment-60886384
  
  [Test build #22437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22437/consoleFull)
 for   PR 2753 at commit 
[`cadfd28`](https://github.com/apache/spark/commit/cadfd28f116f0dbca11e580a23caf82060bcf922).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2753#issuecomment-60886387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22437/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2994#discussion_r19523818
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ProducerCache.scala
 ---
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kafka
+
+object ProducerCache {
+
+  private var producerOpt: Option[Any] = None
--- End diff --

Isn't this going to share one Producer across the entire JVM, and 
potentially unrelated applications?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60887196
  
  [Test build #22438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22438/consoleFull)
 for   PR 2991 at commit 
[`09d57c5`](https://github.com/apache/spark/commit/09d57c5270cf876e52d49db702e2330c2b6a6e10).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60887199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22438/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2983#issuecomment-60887244
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22443/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2983#issuecomment-60887240
  
  [Test build #22443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22443/consoleFull)
 for   PR 2983 at commit 
[`4ca62cd`](https://github.com/apache/spark/commit/4ca62cd306d96890a7a56da04710ae5548715c4a).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnscaledValue(child: Expression) extends UnaryExpression `
  * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) 
extends UnaryExpression `
  * `case class MutableLiteral(var value: Any, dataType: DataType, 
nullable: Boolean = true)`
  * `case class PrecisionInfo(precision: Int, scale: Int)`
  * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends 
FractionalType `
  * `final class Decimal extends Ordered[Decimal] with Serializable `
  * `  trait DecimalIsConflicted extends Numeric[Decimal] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4124] [MLlib] [PySpark] simplify serial...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2995#issuecomment-60888203
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22439/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4124] [MLlib] [PySpark] simplify serial...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2995#issuecomment-60888200
  
  [Test build #22439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22439/consoleFull)
 for   PR 2995 at commit 
[`731331f`](https://github.com/apache/spark/commit/731331fdafe9ce6e4bf24dc1e6667942e1e59587).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaModelWrapper(object):`
  * `class JavaVectorTransformer(JavaModelWrapper, VectorTransformer):`
  * `class StandardScalerModel(JavaVectorTransformer):`
  * `class IDFModel(JavaVectorTransformer):`
  * `class Word2VecModel(JavaVectorTransformer):`
  * `class MatrixFactorizationModel(JavaModelWrapper):`
  * `class MultivariateStatisticalSummary(JavaModelWrapper):`
  * `class DecisionTreeModel(JavaModelWrapper):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2996#issuecomment-60888681
  
  [Test build #22442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22442/consoleFull)
 for   PR 2996 at commit 
[`e028e84`](https://github.com/apache/spark/commit/e028e8443ff38e3617a1bdd0a2a3f5ec9b42d980).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2996#issuecomment-60888689
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60889264
  
  [Test build #22440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22440/consoleFull)
 for   PR 2992 at commit 
[`b99db6c`](https://github.com/apache/spark/commit/b99db6caa0a5f2d6e69d5940b5c37e88914c5e36).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2992#issuecomment-60889271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22440/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60889710
  
**[Test build #22436 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22436/consoleFull)**
 for PR 2615 at commit 
[`95c2e8e`](https://github.com/apache/spark/commit/95c2e8e86f69f3b1d11aad04f6ad14b0ede1950a)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60889716
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22436/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.1

2014-10-29 Thread huozhanfeng

Github user huozhanfeng commented on the pull request:

https://github.com/apache/spark/pull/1824#issuecomment-60890031
  
@rxin  Iâm sorry. This is a wrong operation and thanks for your help to 
close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4131][SQL] Writing data into the f...

2014-10-29 Thread wangxiaojing

GitHub user wangxiaojing opened a pull request:

https://github.com/apache/spark/pull/2997

[WIP][SPARK-4131][SQL] Writing data into the filesystem from queries



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangxiaojing/spark SPARK-4131

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2997


commit f406f49749d590f230c953194a8c36e760fc9460
Author: wangxiaojing u9j...@gmail.com
Date:   2014-10-29T08:58:19Z

 add Token




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60892857
  
  [Test build #22446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22446/consoleFull)
 for   PR 2991 at commit 
[`bb57d05`](https://github.com/apache/spark/commit/bb57d05b2e3579c9c3e59429918082937a99e87f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4131][SQL] Writing data into the f...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2997#issuecomment-60893115
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527141
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowDefMap = new ConcurrentHashMap[Long,Map[String, 
Seq[ASTNode]]]()
+
+  // store the window spec of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527153
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527190
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowDefMap = new ConcurrentHashMap[Long,Map[String, 
Seq[ASTNode]]]()
+
+  // store the window spec of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowPartitionsMap = new ConcurrentHashMap[Long, 
ArrayBuffer[Node]]()
+
+  protected def initWindow() = {
+windowDefMap.put(Thread.currentThread().getId, Map[String, 
Seq[ASTNode]]())
+windowPartitionsMap.put(Thread.currentThread().getId, new 
ArrayBuffer[Node]())
+  }
+  protected def checkWindowDef(windowClause: Option[Node]) = {
+
+var winDefs = windowDefMap.get(Thread.currentThread().getId)
+
+windowClause match {
+  case Some(window) = window.getChildren.foreach {
+case Token(TOK_WINDOWDEF, Token(alias, Nil) :: 
Token(TOK_WINDOWSPEC, ws) :: Nil) = {
+  winDefs += alias - ws
+}
+  }
+  case None = //do nothing
+}
+
+windowDefMap.put(Thread.currentThread().getId, winDefs)
+  }
+
+  protected def translateWindowSpec(windowSpec: Seq[ASTNode]): 
Seq[ASTNode]= {
+
+windowSpec match {
+  case Token(alias, Nil) :: Nil = 
translateWindowSpec(getWindowSpec(alias))
+  case Token(alias, Nil) :: range = {
+val (partitionClause :: rowsRange :: valueRange :: Nil) = 
getClauses(
+  Seq(
+TOK_PARTITIONINGSPEC,
+TOK_WINDOWRANGE,
+TOK_WINDOWVALUES),
+  translateWindowSpec(getWindowSpec(alias)))
+partitionClause match {
+  case Some(partition) = partition.asInstanceOf[ASTNode] :: range
+  case None = range
+}
+  }
+  case e = e
+}
+  }
+
+  protected def getWindowSpec(alias: String): Seq[ASTNode]= {
+windowDefMap.get(Thread.currentThread().getId).getOrElse(
+  alias, sys.error(no window def for  + alias))
+  }
+
+  protected def addWindowPartitions(partition: Node) = {
+
+var winPartitions = 
windowPartitionsMap.get(Thread.currentThread().getId)
+winPartitions += partition
+windowPartitionsMap.put(Thread.currentThread().getId, winPartitions)
+  }
+
+  protected def getWindowPartitions(): Seq[Node]= {
+windowPartitionsMap.get(Thread.currentThread().getId).toSeq
+  }
+
+  protected def checkWindowPartitions(): Option[Seq[ASTNode]] = {
+
+val partitionUnits = new ArrayBuffer[Seq[ASTNode]]()
+
+getWindowPartitions.map {
+  case Token(TOK_PARTITIONINGSPEC, partition)  = Some(partition)
+  case _ = None
+}.foreach {
+  case Some(partition) = {
+if (partitionUnits.isEmpty) partitionUnits += partition
+else {
+  //only add different window partitions
+  try {
+partition zip partitionUnits.head foreach {
+  case (l,r) = l checkEquals r
+}
+  } catch {
+case re: RuntimeException = partitionUnits += partition
+  }
+}
+  }
+  case None = //do nothing
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527163
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowDefMap = new ConcurrentHashMap[Long,Map[String, 
Seq[ASTNode]]]()
+
+  // store the window spec of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowPartitionsMap = new ConcurrentHashMap[Long, 
ArrayBuffer[Node]]()
+
+  protected def initWindow() = {
+windowDefMap.put(Thread.currentThread().getId, Map[String, 
Seq[ASTNode]]())
+windowPartitionsMap.put(Thread.currentThread().getId, new 
ArrayBuffer[Node]())
+  }
+  protected def checkWindowDef(windowClause: Option[Node]) = {
+
+var winDefs = windowDefMap.get(Thread.currentThread().getId)
+
+windowClause match {
+  case Some(window) = window.getChildren.foreach {
+case Token(TOK_WINDOWDEF, Token(alias, Nil) :: 
Token(TOK_WINDOWSPEC, ws) :: Nil) = {
+  winDefs += alias - ws
+}
+  }
+  case None = //do nothing
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527181
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowDefMap = new ConcurrentHashMap[Long,Map[String, 
Seq[ASTNode]]]()
+
+  // store the window spec of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowPartitionsMap = new ConcurrentHashMap[Long, 
ArrayBuffer[Node]]()
+
+  protected def initWindow() = {
+windowDefMap.put(Thread.currentThread().getId, Map[String, 
Seq[ASTNode]]())
+windowPartitionsMap.put(Thread.currentThread().getId, new 
ArrayBuffer[Node]())
+  }
+  protected def checkWindowDef(windowClause: Option[Node]) = {
+
+var winDefs = windowDefMap.get(Thread.currentThread().getId)
+
+windowClause match {
+  case Some(window) = window.getChildren.foreach {
+case Token(TOK_WINDOWDEF, Token(alias, Nil) :: 
Token(TOK_WINDOWSPEC, ws) :: Nil) = {
+  winDefs += alias - ws
+}
+  }
+  case None = //do nothing
+}
+
+windowDefMap.put(Thread.currentThread().getId, winDefs)
+  }
+
+  protected def translateWindowSpec(windowSpec: Seq[ASTNode]): 
Seq[ASTNode]= {
+
+windowSpec match {
+  case Token(alias, Nil) :: Nil = 
translateWindowSpec(getWindowSpec(alias))
+  case Token(alias, Nil) :: range = {
+val (partitionClause :: rowsRange :: valueRange :: Nil) = 
getClauses(
+  Seq(
+TOK_PARTITIONINGSPEC,
+TOK_WINDOWRANGE,
+TOK_WINDOWVALUES),
+  translateWindowSpec(getWindowSpec(alias)))
+partitionClause match {
+  case Some(partition) = partition.asInstanceOf[ASTNode] :: range
+  case None = range
+}
+  }
+  case e = e
+}
+  }
+
+  protected def getWindowSpec(alias: String): Seq[ASTNode]= {
+windowDefMap.get(Thread.currentThread().getId).getOrElse(
+  alias, sys.error(no window def for  + alias))
+  }
+
+  protected def addWindowPartitions(partition: Node) = {
+
+var winPartitions = 
windowPartitionsMap.get(Thread.currentThread().getId)
+winPartitions += partition
+windowPartitionsMap.put(Thread.currentThread().getId, winPartitions)
+  }
+
+  protected def getWindowPartitions(): Seq[Node]= {
+windowPartitionsMap.get(Thread.currentThread().getId).toSeq
+  }
+
+  protected def checkWindowPartitions(): Option[Seq[ASTNode]] = {
+
+val partitionUnits = new ArrayBuffer[Seq[ASTNode]]()
+
+getWindowPartitions.map {
+  case Token(TOK_PARTITIONINGSPEC, partition)  = Some(partition)
+  case _ = None
+}.foreach {
+  case Some(partition) = {
+if (partitionUnits.isEmpty) partitionUnits += partition
+else {
+  //only add different window partitions
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527196
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -845,6 +858,198 @@ private[hive] object HiveQl {
   throw new NotImplementedError(sNo parse rules for:\n 
${dumpTree(a).toString} )
   }
 
+  // store the window def of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowDefMap = new ConcurrentHashMap[Long,Map[String, 
Seq[ASTNode]]]()
+
+  // store the window spec of current sql
+  //use thread id as key to avoid mistake when muti sqls parse at the same 
time
+  protected val windowPartitionsMap = new ConcurrentHashMap[Long, 
ArrayBuffer[Node]]()
+
+  protected def initWindow() = {
+windowDefMap.put(Thread.currentThread().getId, Map[String, 
Seq[ASTNode]]())
+windowPartitionsMap.put(Thread.currentThread().getId, new 
ArrayBuffer[Node]())
+  }
+  protected def checkWindowDef(windowClause: Option[Node]) = {
+
+var winDefs = windowDefMap.get(Thread.currentThread().getId)
+
+windowClause match {
+  case Some(window) = window.getChildren.foreach {
+case Token(TOK_WINDOWDEF, Token(alias, Nil) :: 
Token(TOK_WINDOWSPEC, ws) :: Nil) = {
+  winDefs += alias - ws
+}
+  }
+  case None = //do nothing
+}
+
+windowDefMap.put(Thread.currentThread().getId, winDefs)
+  }
+
+  protected def translateWindowSpec(windowSpec: Seq[ASTNode]): 
Seq[ASTNode]= {
+
+windowSpec match {
+  case Token(alias, Nil) :: Nil = 
translateWindowSpec(getWindowSpec(alias))
+  case Token(alias, Nil) :: range = {
+val (partitionClause :: rowsRange :: valueRange :: Nil) = 
getClauses(
+  Seq(
+TOK_PARTITIONINGSPEC,
+TOK_WINDOWRANGE,
+TOK_WINDOWVALUES),
+  translateWindowSpec(getWindowSpec(alias)))
+partitionClause match {
+  case Some(partition) = partition.asInstanceOf[ASTNode] :: range
+  case None = range
+}
+  }
+  case e = e
+}
+  }
+
+  protected def getWindowSpec(alias: String): Seq[ASTNode]= {
+windowDefMap.get(Thread.currentThread().getId).getOrElse(
+  alias, sys.error(no window def for  + alias))
+  }
+
+  protected def addWindowPartitions(partition: Node) = {
+
+var winPartitions = 
windowPartitionsMap.get(Thread.currentThread().getId)
+winPartitions += partition
+windowPartitionsMap.put(Thread.currentThread().getId, winPartitions)
+  }
+
+  protected def getWindowPartitions(): Seq[Node]= {
+windowPartitionsMap.get(Thread.currentThread().getId).toSeq
+  }
+
+  protected def checkWindowPartitions(): Option[Seq[ASTNode]] = {
+
+val partitionUnits = new ArrayBuffer[Seq[ASTNode]]()
+
+getWindowPartitions.map {
+  case Token(TOK_PARTITIONINGSPEC, partition)  = Some(partition)
+  case _ = None
+}.foreach {
+  case Some(partition) = {
+if (partitionUnits.isEmpty) partitionUnits += partition
+else {
+  //only add different window partitions
+  try {
+partition zip partitionUnits.head foreach {
+  case (l,r) = l checkEquals r
+}
+  } catch {
+case re: RuntimeException = partitionUnits += partition
+  }
+}
+  }
+  case None = //do nothing
+}
+
+//check whether all window partitions are same, we just support same 
window partition now
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...

2014-10-29 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2685#issuecomment-60896714
  
A summary of what I've found with local testing:

- For master branch:

  - Check out a fresh copy, build the assembly jar first, then run 
`HashShuffleSuite`: pass
  - Check out a fresh copy, run `HashShuffleSuite` directly without 
building assembly jar: fail

- For this PR:

  Both approaches fail.

So I guess the problem should be related to Maven configurations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527811
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WindowFunction.scala ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.physical.AllTuples
+import org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution
+import org.apache.spark.sql.catalyst.errors._
+import scala.collection.mutable.ArrayBuffer
+import org.apache.spark.util.collection.CompactBuffer
+import org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.SortPartitions
+
+
+/**
+ * :: DeveloperApi ::
+ * Groups input data by `partitionExpressions` and computes the 
`computeExpressions` for each
+ * group.
+ * @param partitionExpressions expressions that are evaluated to determine 
partition.
+ * @param functionExpressions expressions that are computed for each 
partition.
+ * @param child the input data source.
+ */
+@DeveloperApi
+case class WindowFunction(
+  partitionExpressions: Seq[Expression],
+  functionExpressions: Seq[NamedExpression],
+  child: SparkPlan)
+  extends UnaryNode {
+
+  override def requiredChildDistribution =
+if (partitionExpressions == Nil) {
+  AllTuples :: Nil
+} else {
+  ClusteredDistribution(partitionExpressions) :: Nil
+}
+
+  // HACK: Generators don't correctly preserve their output through 
serializations so we grab
+  // out child's output attributes statically here.
+  private[this] val childOutput = child.output
+
+  override def output = functionExpressions.map(_.toAttribute)
+
+  /** A list of functions that need to be computed for each partition. */
+  private[this] val computeExpressions = new 
ArrayBuffer[AggregateExpression]
+
+  private[this] val otherExpressions = new ArrayBuffer[NamedExpression]
+
+  functionExpressions.foreach { sel =
+sel.collect {
+  case func: AggregateExpression = computeExpressions += func
+  case other: NamedExpression if (!other.isInstanceOf[Alias]) = 
otherExpressions += other
+}
+  }
+
+  private[this] val functionAttributes = computeExpressions.map { func =
+func - AttributeReference(sfuncResult:$func, func.dataType, 
func.nullable)()}
+
+  /** The schema of the result of all evaluations */
+  private[this] val resultAttributes =
+otherExpressions.map(_.toAttribute) ++ functionAttributes.map(_._2)
+
+  private[this] val resultMap =
+(otherExpressions.map { other = other - other.toAttribute } ++ 
functionAttributes
+).toMap
+
+
+  private[this] val resultExpressions = functionExpressions.map { sel =
+sel.transform {
+  case e: Expression if resultMap.contains(e) = resultMap(e)
+}
+  }
+
+  private[this] val sortExpressions =
+if (child.isInstanceOf[SortPartitions]) {
+  child.asInstanceOf[SortPartitions].sortExpressions
+}
+else if (child.isInstanceOf[Sort]) {
+  child.asInstanceOf[Sort].sortOrder
+}
+else null
+
+  /** Creates a new function buffer for a partition. */
+  private[this] def newFunctionBuffer(): Array[AggregateFunction] = {
+val buffer = new

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WindowFunction.scala ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.physical.AllTuples
+import org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution
+import org.apache.spark.sql.catalyst.errors._
+import scala.collection.mutable.ArrayBuffer
+import org.apache.spark.util.collection.CompactBuffer
+import org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.types._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.SortPartitions
+
+
+/**
+ * :: DeveloperApi ::
+ * Groups input data by `partitionExpressions` and computes the 
`computeExpressions` for each
+ * group.
+ * @param partitionExpressions expressions that are evaluated to determine 
partition.
+ * @param functionExpressions expressions that are computed for each 
partition.
+ * @param child the input data source.
+ */
+@DeveloperApi
+case class WindowFunction(
+  partitionExpressions: Seq[Expression],
+  functionExpressions: Seq[NamedExpression],
+  child: SparkPlan)
+  extends UnaryNode {
+
+  override def requiredChildDistribution =
+if (partitionExpressions == Nil) {
+  AllTuples :: Nil
+} else {
+  ClusteredDistribution(partitionExpressions) :: Nil
+}
+
+  // HACK: Generators don't correctly preserve their output through 
serializations so we grab
+  // out child's output attributes statically here.
+  private[this] val childOutput = child.output
+
+  override def output = functionExpressions.map(_.toAttribute)
+
+  /** A list of functions that need to be computed for each partition. */
+  private[this] val computeExpressions = new 
ArrayBuffer[AggregateExpression]
+
+  private[this] val otherExpressions = new ArrayBuffer[NamedExpression]
+
+  functionExpressions.foreach { sel =
+sel.collect {
+  case func: AggregateExpression = computeExpressions += func
+  case other: NamedExpression if (!other.isInstanceOf[Alias]) = 
otherExpressions += other
+}
+  }
+
+  private[this] val functionAttributes = computeExpressions.map { func =
+func - AttributeReference(sfuncResult:$func, func.dataType, 
func.nullable)()}
+
+  /** The schema of the result of all evaluations */
+  private[this] val resultAttributes =
+otherExpressions.map(_.toAttribute) ++ functionAttributes.map(_._2)
+
+  private[this] val resultMap =
+(otherExpressions.map { other = other - other.toAttribute } ++ 
functionAttributes
+).toMap
+
+
+  private[this] val resultExpressions = functionExpressions.map { sel =
+sel.transform {
+  case e: Expression if resultMap.contains(e) = resultMap(e)
+}
+  }
+
+  private[this] val sortExpressions =
+if (child.isInstanceOf[SortPartitions]) {
+  child.asInstanceOf[SortPartitions].sortExpressions
+}
+else if (child.isInstanceOf[Sort]) {
+  child.asInstanceOf[Sort].sortOrder
+}
+else null
+
+  /** Creates a new function buffer for a partition. */
+  private[this] def newFunctionBuffer(): Array[AggregateFunction] = {
+val buffer = new

[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement

2014-10-29 Thread wangxiaojing

Github user wangxiaojing commented on a diff in the pull request:

https://github.com/apache/spark/pull/2953#discussion_r19527873
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionSuite.scala
 ---
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.sql.hive._
+import org.apache.spark.sql.hive.test.TestHive
+import org.apache.spark.sql.hive.test.TestHive._
+import org.apache.spark.sql.{Row, SchemaRDD}
+
+class HiveWindowFunctionSuite extends HiveComparisonTest {
+
+  override def beforeAll() {
+sql(DROP TABLE IF EXISTS part).collect()
+
+sql(
+|CREATE TABLE part(
+|p_partkey INT,
+|p_name STRING,
+|p_mfgr STRING,
+|p_brand STRING,
+|p_type STRING,
+|p_size INT,
+|p_container STRING,
+|p_retailprice DOUBLE,
+|p_comment STRING
+|)
+  .stripMargin).collect()
+
+//remove duplicate data in part_tiny.txt for hive bug
--- End diff --

Space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread saucam

Github user saucam commented on the pull request:

https://github.com/apache/spark/pull/2841#issuecomment-60897965
  
Added more tests for filtering on nullable columns 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3611] Show number of cores for each exe...

2014-10-29 Thread devldevelopment

Github user devldevelopment commented on the pull request:

https://github.com/apache/spark/pull/2980#issuecomment-60898133
  
Thanks for the feedback guys, updated with changes. Removed the WIP as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3611] Show number of cores for each exe...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2980#issuecomment-60898397
  
  [Test build #22447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22447/consoleFull)
 for   PR 2980 at commit 
[`50a1592`](https://github.com/apache/spark/commit/50a15921576862ae99df5542f2e4d3bb36253d1b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2841#issuecomment-60898409
  
  [Test build #22448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22448/consoleFull)
 for   PR 2841 at commit 
[`8282ba0`](https://github.com/apache/spark/commit/8282ba0752951fc9d7a7593c6f6c89815bb92b3a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2850#issuecomment-60900756
  
  [Test build #22449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22449/consoleFull)
 for   PR 2850 at commit 
[`4c4292c`](https://github.com/apache/spark/commit/4c4292ccbd23cac1cb511ca9581bb9ac17ac037f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60901164
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22446/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-60901153
  
  [Test build #22446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22446/consoleFull)
 for   PR 2991 at commit 
[`bb57d05`](https://github.com/apache/spark/commit/bb57d05b2e3579c9c3e59429918082937a99e87f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...

2014-10-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2850#issuecomment-60906435
  
  [Test build #22449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22449/consoleFull)
 for   PR 2850 at commit 
[`4c4292c`](https://github.com/apache/spark/commit/4c4292ccbd23cac1cb511ca9581bb9ac17ac037f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 547 matches

Mail list logo