[GitHub] spark issue #13911: [SPARK-16215][SQL] Reduce runtime overhead of a program ...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13911
  
**[Test build #61256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61256/consoleFull)**
 for PR 13911 at commit 
[`b1f6289`](https://github.com/apache/spark/commit/b1f6289d99445980f35a7e80127fb129517280d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13911: [SPARK-16215][SQL] Reduce runtime overhead of a p...

2016-06-25 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/13911

[SPARK-16215][SQL] Reduce runtime overhead of a program that writes an 
primitive array in Dataframe/Dataset

## What changes were proposed in this pull request?

This PR optimize generate code of projection for an primitive type array. 
While we know primitive type array does not require null check and has 
contigious data region, current generated code performs null checks and 
performs copy for each element (at Lines 075-082 at Generated code before 
applying this PR)

1. Eliminate null checks for each array element
2. Perform bulk data copy by using ```Platform.copy```
3. Eliminate primitive array allocation in ```GenericArrayData``` when 
https://github.com/apache/spark/pull/13758 is merged
4. Eliminate setting sparse index for ```UnsafeArrayData``` when 
https://github.com/apache/spark/pull/13680 is merged

They are done in helper method 
```UnsafeArrayWrite.writePrimitiveArray()``` (at Line 075 at 
Generated code after applying this PR).

For now, 3 and 4 are not enabled. But, code are ready.


An example program
```
val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF
df.selectExpr("Array(value + 1.1d, value + 2.2d)").collect
```

Generated code before applying this PR
```java
/* 028 */   protected void processNext() throws java.io.IOException {
/* 029 */ while (inputadapter_input.hasNext()) {
/* 030 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 031 */   double inputadapter_value = inputadapter_row.getDouble(0);
/* 032 */
/* 033 */   final boolean project_isNull = false;
/* 034 */   this.project_values = new Object[2];
/* 035 */   double project_value1 = -1.0;
/* 036 */   project_value1 = inputadapter_value + 1.1D;
/* 037 */   if (false) {
/* 038 */ project_values[0] = null;
/* 039 */   } else {
/* 040 */ project_values[0] = project_value1;
/* 041 */   }
/* 042 */
/* 043 */   double project_value4 = -1.0;
/* 044 */   project_value4 = inputadapter_value + 2.2D;
/* 045 */   if (false) {
/* 046 */ project_values[1] = null;
/* 047 */   } else {
/* 048 */ project_values[1] = project_value4;
/* 049 */   }
/* 050 */
/* 051 */   final ArrayData project_value = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_values);
/* 052 */   this.project_values = null;
/* 053 */   project_holder.reset();
/* 054 */
/* 055 */   project_rowWriter.zeroOutNullBytes();
/* 056 */
/* 057 */   if (project_isNull) {
/* 058 */ project_rowWriter.setNullAt(0);
/* 059 */   } else {
/* 060 */ // Remember the current cursor so that we can calculate 
how many bytes are
/* 061 */ // written later.
/* 062 */ final int project_tmpCursor = project_holder.cursor;
/* 063 */
/* 064 */ if (project_value instanceof UnsafeArrayData) {
/* 065 */   final int project_sizeInBytes = ((UnsafeArrayData) 
project_value).getSizeInBytes();
/* 066 */   // grow the global buffer before writing data.
/* 067 */   project_holder.grow(project_sizeInBytes);
/* 068 */   ((UnsafeArrayData) 
project_value).writeToMemory(project_holder.buffer, project_holder.cursor);
/* 069 */   project_holder.cursor += project_sizeInBytes;
/* 070 */
/* 071 */ } else {
/* 072 */   final int project_numElements = 
project_value.numElements();
/* 073 */   project_arrayWriter.initialize(project_holder, 
project_numElements, 8);
/* 074 */
/* 075 */   for (int project_index = 0; project_index < 
project_numElements; project_index++) {
/* 076 */ if (project_value.isNullAt(project_index)) {
/* 077 */   project_arrayWriter.setNullAt(project_index);
/* 078 */ } else {
/* 079 */   final double project_element = 
project_value.getDouble(project_index);
/* 080 */   project_arrayWriter.write(project_index, 
project_element);
/* 081 */ }
/* 082 */   }
/* 083 */
/* 084 */ }
/* 085 */
/* 086 */ project_rowWriter.setOffsetAndSize(0, project_tmpCursor, 
project_holder.cursor - project_tmpCursor);
/* 087 */ project_rowWriter.alignToWords(project_holder.cursor - 
project_tmpCursor);
/* 088 */   }
/* 089 */   project_result.setTotalSize(project_holder.totalSize());
/* 090 */   append(project_result);
/* 091 */   if (shouldStop()) return;
/* 092 */ }
/* 093 */   }
/* 094 */ }
```

Generated code after applying this PR
```java
/* 028 */   

[GitHub] spark issue #13765: [SPARK-16052][SQL] Add `CollapseRepartitionBy` optimizer

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13765
  
**[Test build #61255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61255/consoleFull)**
 for PR 13765 at commit 
[`c3016f3`](https://github.com/apache/spark/commit/c3016f3f301e47148de780be1d6dfe8ae91b9f1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13730
  
**[Test build #61254 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61254/consoleFull)**
 for PR 13730 at commit 
[`ba9a529`](https://github.com/apache/spark/commit/ba9a5294cfa90003318526ee7b67533a73da56c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13624: [SPARK-15858][ML]: Fix calculating error by tree ...

2016-06-25 Thread mhmoudr
Github user mhmoudr commented on a diff in the pull request:

https://github.com/apache/spark/pull/13624#discussion_r68497920
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -205,31 +205,29 @@ private[spark] object GradientBoostedTrees extends 
Logging {
   case _ => data
 }
 
-val numIterations = trees.length
-val evaluationArray = Array.fill(numIterations)(0.0)
-val localTreeWeights = treeWeights
-
-var predictionAndError = computeInitialPredictionAndError(
-  remappedData, localTreeWeights(0), trees(0), loss)
-
-evaluationArray(0) = predictionAndError.values.mean()
-
 val broadcastTrees = sc.broadcast(trees)
-(1 until numIterations).foreach { nTree =>
-  predictionAndError = 
remappedData.zip(predictionAndError).mapPartitions { iter =>
-val currentTree = broadcastTrees.value(nTree)
-val currentTreeWeight = localTreeWeights(nTree)
-iter.map { case (point, (pred, error)) =>
-  val newPred = updatePrediction(point.features, pred, 
currentTree, currentTreeWeight)
-  val newError = loss.computeError(newPred, point.label)
-  (newPred, newError)
-}
+val localTreeWeights = treeWeights
+val treesIndices = trees.indices
+
+val dataCount = remappedData.count()
+val evaluation = remappedData.map { point =>
+  treesIndices.map { idx => {
+val prediction = broadcastTrees.value(idx)
+  .rootNode
+  .predictImpl(point.features)
+  .prediction
+prediction * localTreeWeights(idx)
   }
-  evaluationArray(nTree) = predictionAndError.values.mean()
+  }
+.scanLeft(0.0)(_ + _).drop(1)
--- End diff --

I was just relying on Intellij to adjust all the indents, the only issue is 
that if I join it with the previous line the second line will look even worse.
is there any scala style rules to be applied automatically on build time so 
we avoid going into this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...

2016-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13886
  
@cloud-fan Submitted a PR for converting CTAS in parquet to data source 
tables without hive support. Could you also review that PR? Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13386: [SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is t...

2016-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13386
  
Just realized this PR introduced the original changes. Could you also 
review my PR: https://github.com/apache/spark/pull/13907?

When users create table as query with `STORED AS` or `ROW FORMAT` and 
`spark.sql.hive.convertCTAS` is set to true, we do not convert them to data 
source tables. I am wondering whether we still can convert the tables specified 
in parquet and orc formats to data source tables? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61247/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61247 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61247/consoleFull)**
 for PR 13758 at commit 
[`65317e1`](https://github.com/apache/spark/commit/65317e101b4cc88231541d0398d4b404db3aaabe).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61251/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61251/consoleFull)**
 for PR 13908 at commit 
[`d8a1ba0`](https://github.com/apache/spark/commit/d8a1ba08591a8430487f5e7a786dcad2a6a3eb60).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #61250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61250/consoleFull)**
 for PR 13909 at commit 
[`37e4ce2`](https://github.com/apache/spark/commit/37e4ce2d09b8233fdefc615296155a3ec5cb6eb6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11863
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61250/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11863
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61252/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11863
  
**[Test build #61252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61252/consoleFull)**
 for PR 11863 at commit 
[`0f15bd1`](https://github.com/apache/spark/commit/0f15bd138b7cdb5468d5515a1b0c02b09c60136a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13907
  
**[Test build #61253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61253/consoleFull)**
 for PR 13907 at commit 
[`a9ce0d8`](https://github.com/apache/spark/commit/a9ce0d8342a2c3768823b4dd120fda0997b1c313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11863
  
**[Test build #61252 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61252/consoleFull)**
 for PR 11863 at commit 
[`0f15bd1`](https://github.com/apache/spark/commit/0f15bd138b7cdb5468d5515a1b0c02b09c60136a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61249/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61249 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61249/consoleFull)**
 for PR 13908 at commit 
[`647c2af`](https://github.com/apache/spark/commit/647c2af2854da16f6e0c7c64a983e5b94923f8e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61248/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61248 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61248/consoleFull)**
 for PR 13908 at commit 
[`576a0e4`](https://github.com/apache/spark/commit/576a0e41b96c90129dce1415ad8e609e3ddb4d19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13910: make rdd count be n

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13910
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13910: make rdd count be n

2016-06-25 Thread yanghaogn
Github user yanghaogn commented on the issue:

https://github.com/apache/spark/pull/13910
  
As the denominator is n, the iteration time should also be n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13910: make rdd count be n

2016-06-25 Thread yanghaogn
GitHub user yanghaogn opened a pull request:

https://github.com/apache/spark/pull/13910

make rdd count be n

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghaogn/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13910


commit f91e55eb6a9c81189292989a40d9a4e76dc9309c
Author: 杨浩 
Date:   2016-06-26T04:12:56Z

make rdd count be n




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #61250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61250/consoleFull)**
 for PR 13909 at commit 
[`37e4ce2`](https://github.com/apache/spark/commit/37e4ce2d09b8233fdefc615296155a3ec5cb6eb6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61251/consoleFull)**
 for PR 13908 at commit 
[`d8a1ba0`](https://github.com/apache/spark/commit/d8a1ba08591a8430487f5e7a786dcad2a6a3eb60).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-25 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/13909

[SPARK-16213][SQL] Reduce runtime overhead of a program that creates an 
primitive array in DataFrame

## What changes were proposed in this pull request?

This PR reduces runtime overhead of a program the creates an primitive 
array in DataFrameGenerated code performs boxing operation in an assignment 
from InternalRow to an ```Object[]``` temporary array (at Lines 040 and 048 in 
the generated code before applying this PR). If we know that type of array 
elements is primitive, we apply the following optimizations:

1. Eliminate a pair of ```isNullAt()``` and a null assignment
2. Allocate an primitive array instead of ```Object[]``` (eliminate boxing 
operations)
3. Call ```GenericArrayData.allocate(project_values)``` to avoid 
[boxing](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31)
 in constructor of ```GenericArrayData``` if 
https://github.com/apache/spark/pull/13758 is merged

An example program
```
val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF
df.selectExpr("Array(value + 1.1d, value + 2.2d)").show
```


Generated code before applying this PR
```java
/* 018 */   public void init(int index, scala.collection.Iterator inputs[]) 
{
/* 019 */ partitionIndex = index;
/* 020 */ inputadapter_input = inputs[0];
/* 021 */ this.project_values = null;
/* 022 */ project_result = new UnsafeRow(1);
/* 023 */ this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
32);
/* 024 */ this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 1);
/* 025 */ this.project_arrayWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
/* 026 */   }
/* 027 */
/* 028 */   protected void processNext() throws java.io.IOException {
/* 029 */ while (inputadapter_input.hasNext()) {
/* 030 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 031 */   double inputadapter_value = inputadapter_row.getDouble(0);
/* 032 */
/* 033 */   final boolean project_isNull = false;
/* 034 */   this.project_values = new Object[2];
/* 035 */   double project_value7 = -1.0;
/* 036 */   project_value7 = inputadapter_value + 1.1D;
/* 037 */   if (false) {
/* 038 */ project_values[0] = null;
/* 039 */   } else {
/* 040 */ project_values[0] = project_value7;
/* 041 */   }
/* 042 */
/* 043 */   double project_value10 = -1.0;
/* 044 */   project_value10 = inputadapter_value + 2.2D;
/* 045 */   if (false) {
/* 046 */ project_values[1] = null;
/* 047 */   } else {
/* 048 */ project_values[1] = project_value10;
/* 049 */   }
/* 050 */
/* 051 */   /* final ArrayData project_value = 
org.apache.spark.sql.catalyst.util.GenericArrayData.allocate(project_values); */
/* 052 */   final ArrayData project_value = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_values);
/* 053 */   this.project_values = null;
/* 054 */   project_holder.reset();
/* 055 */
/* 056 */   project_rowWriter.zeroOutNullBytes();
/* 057 */
/* 058 */   if (project_isNull) {
/* 059 */ project_rowWriter.setNullAt(0);
/* 060 */   } else {
/* 061 */ // Remember the current cursor so that we can calculate 
how many bytes are
/* 062 */ // written later.
/* 063 */ final int project_tmpCursor = project_holder.cursor;
/* 064 */
/* 065 */ if (project_value instanceof UnsafeArrayData) {
/* 066 */   final int project_sizeInBytes = ((UnsafeArrayData) 
project_value).getSizeInBytes();
/* 067 */   // grow the global buffer before writing data.
/* 068 */   project_holder.grow(project_sizeInBytes);
/* 069 */   ((UnsafeArrayData) 
project_value).writeToMemory(project_holder.buffer, project_holder.cursor);
/* 070 */   project_holder.cursor += project_sizeInBytes;
/* 071 */
/* 072 */ } else {
/* 073 */   final int project_numElements = 
project_value.numElements();
/* 074 */   project_arrayWriter.initialize(project_holder, 
project_numElements, 8);
/* 075 */
/* 076 */   for (int project_index = 0; project_index < 
project_numElements; project_index++) {
/* 077 */ if (project_value.isNullAt(project_index)) {
/* 078 */   project_arrayWriter.setNullAt(project_index);
/* 079 */ } else {
/* 080 */   final double project_element = 

[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61249/consoleFull)**
 for PR 13908 at commit 
[`647c2af`](https://github.com/apache/spark/commit/647c2af2854da16f6e0c7c64a983e5b94923f8e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from review...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13908
  
**[Test build #61248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61248/consoleFull)**
 for PR 13908 at commit 
[`576a0e4`](https://github.com/apache/spark/commit/576a0e41b96c90129dce1415ad8e609e3ddb4d19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from...

2016-06-25 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/13908

[SPARK-16212][STREAMING][KAFKA] code cleanup from review feedback

## What changes were proposed in this pull request?
code cleanup in kafka-0-8 to match suggested changes for kafka-0-10 branch


## How was this patch tested?
unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 kafka-0-8-cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13908


commit 576a0e41b96c90129dce1415ad8e609e3ddb4d19
Author: cody koeninger 
Date:   2016-06-26T03:53:32Z

[SPARK-16212][STREAMING][KAFKA] code cleanup from review feedback on 0-10 
branch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61247/consoleFull)**
 for PR 13758 at commit 
[`65317e1`](https://github.com/apache/spark/commit/65317e101b4cc88231541d0398d4b404db3aaabe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61245/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13885
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13885
  
**[Test build #61245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61245/consoleFull)**
 for PR 13885 at commit 
[`385645b`](https://github.com/apache/spark/commit/385645b9db9eb9468f07ac39144ef0a88af4830f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13904: [SPARKR] add csv tests

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61246/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13904: [SPARKR] add csv tests

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13904
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13904: [SPARKR] add csv tests

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13904
  
**[Test build #61246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61246/consoleFull)**
 for PR 13904 at commit 
[`296c2d4`](https://github.com/apache/spark/commit/296c2d482b168b5e7e501452e57f2dbd3960).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61244/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61244/consoleFull)**
 for PR 13758 at commit 
[`8ac0840`](https://github.com/apache/spark/commit/8ac08408e61f9ff605f1e4467384849cd5025e1c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9936: [SPARK-11938][ML] Expose numFeatures in all ML Prediction...

2016-06-25 Thread Lewuathe
Github user Lewuathe commented on the issue:

https://github.com/apache/spark/pull/9936
  
@vectorijk It seems difficult to handle this JIRA for me. Could you help me 
with this?
Thanks so much for taking care!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13904: [SPARKR] add csv tests

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13904
  
**[Test build #61246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61246/consoleFull)**
 for PR 13904 at commit 
[`296c2d4`](https://github.com/apache/spark/commit/296c2d482b168b5e7e501452e57f2dbd3960).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13885
  
**[Test build #61245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61245/consoleFull)**
 for PR 13885 at commit 
[`385645b`](https://github.com/apache/spark/commit/385645b9db9eb9468f07ac39144ef0a88af4830f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13885#discussion_r68496462
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -110,11 +110,46 @@ infer_type <- function(x) {
   }
 }
 
-getDefaultSqlSource <- function() {
+#' Get Runtime Config from the current active SparkSession
+#'
+#' Get Runtime Config from the current active SparkSession.
+#' To change SparkSession Runtime Config, please see `sparkR.session()`.
+#'
+#' @param key (optional) The key of the config to get, if omitted, all 
config is returned
+#' @param defaultValue (optional) The default value of the config to 
return if they config is not
+#' set, if omitted, the call fails if the config key is not set
+#' @return a list of config values with keys as their names
+#' @rdname sparkR.conf
+#' @name sparkR.conf
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' allConfigs <- sparkR.conf()
+#' masterValue <- unlist(sparkR.conf("spark.master"))
+#' namedConfig <- sparkR.conf("spark.executor.memory", "0g")
+#' }
+#' @note sparkR.conf since 2.0.0
+sparkR.conf <- function(key, defaultValue) {
   sparkSession <- getSparkSession()
-  conf <- callJMethod(sparkSession, "conf")
-  source <- callJMethod(conf, "get", "spark.sql.sources.default", 
"org.apache.spark.sql.parquet")
-  source
+  if (missing(key)) {
+m <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
"getSessionConf", sparkSession)
+as.list(m, all.names = TRUE, sorted = TRUE)
+  } else {
+conf <- callJMethod(sparkSession, "conf")
+value <- if (missing(defaultValue)) {
+  callJMethod(conf, "get", key) # throws if key not found
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61244/consoleFull)**
 for PR 13758 at commit 
[`8ac0840`](https://github.com/apache/spark/commit/8ac08408e61f9ff605f1e4467384849cd5025e1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13888: [SPARK-16187] [ML] Implement util method for ML M...

2016-06-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/13888#discussion_r68496003
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -309,8 +309,8 @@ object MLUtils extends Logging {
   }
 
   /**
-   * Converts vector columns in an input Dataset to the 
[[org.apache.spark.ml.linalg.Vector]] type
-   * from the new [[org.apache.spark.mllib.linalg.Vector]] type under the 
`spark.ml` package.
+   * Converts vector columns in an input Dataset to the 
[[org.apache.spark.mllib.linalg.Vector]]
+   * type from the new [[org.apache.spark.ml.linalg.Vector]] type under 
the `spark.ml` package.
* @param dataset input dataset
--- End diff --

convertVectorColumnsFromML, it should be to mllib from ml, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13907
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61243/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13907
  
**[Test build #61243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61243/consoleFull)**
 for PR 13907 at commit 
[`c4bde02`](https://github.com/apache/spark/commit/c4bde0217a5e6a31da15cc29dc552a198ed6ef21).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13907
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...

2016-06-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13876
  
Hi, @rxin .
For this `OptimizeIn` PR, please let me know if we need further 
optimization.
Thank you always.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r68495438
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1053,6 +1055,34 @@ object PruneFilters extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
+ * Collapse plans consisting all empty local relations generated by 
[[PruneFilters]].
+ * Note that the ObjectProducer/Consumer and direct aggregations are the 
exceptions.
+ * {{{
+ *   SELECT a, b FROM t WHERE 1=0 GROUP BY a, b ORDER BY a, b ==> empty 
result
+ *   SELECT SUM(a) FROM t WHERE 1=0 GROUP BY a HAVING COUNT(*)>1 ORDER BY 
a (Not optimized)
+ * }}}
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case x if x.isInstanceOf[ObjectProducer] || 
x.isInstanceOf[ObjectConsumer] => x
+
+// Case 1: If groupingExpressions contains all aggregation 
expressions, the result is empty.
+case a @ Aggregate(ge, ae, child) if isEmptyLocalRelation(child) && 
ae.forall(ge.contains(_)) =>
--- End diff --

Ur, at the first, I thought you meant line 1080 for `case p: LogicalPlan`,


https://github.com/apache/spark/pull/13906/files#diff-a636a87d8843eeccca90140be91d4fafR1080
 .

Did I understand your advice correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r68495422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1053,6 +1055,34 @@ object PruneFilters extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
+ * Collapse plans consisting all empty local relations generated by 
[[PruneFilters]].
+ * Note that the ObjectProducer/Consumer and direct aggregations are the 
exceptions.
+ * {{{
+ *   SELECT a, b FROM t WHERE 1=0 GROUP BY a, b ORDER BY a, b ==> empty 
result
+ *   SELECT SUM(a) FROM t WHERE 1=0 GROUP BY a HAVING COUNT(*)>1 ORDER BY 
a (Not optimized)
+ * }}}
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case x if x.isInstanceOf[ObjectProducer] || 
x.isInstanceOf[ObjectConsumer] => x
+
+// Case 1: If groupingExpressions contains all aggregation 
expressions, the result is empty.
+case a @ Aggregate(ge, ae, child) if isEmptyLocalRelation(child) && 
ae.forall(ge.contains(_)) =>
--- End diff --

Thank you for review, @rxin .
I see. I will update this PR into whitelist approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13907
  
**[Test build #61243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61243/consoleFull)**
 for PR 13907 at commit 
[`c4bde02`](https://github.com/apache/spark/commit/c4bde0217a5e6a31da15cc29dc552a198ed6ef21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data S...

2016-06-25 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/13907

[SPARK-16209] [SQL] Convert Hive Tables to Data Source Tables for CREATE 
TABLE AS SELECT

 What changes were proposed in this pull request?
Currently, the following created table is Hive Table.
```SQL
CREATE TABLE t STORED AS parquet SELECT 1 as a, 1 as b
```
When users create table as query with `STORED AS` or `ROW FORMAT`, we will 
not convert them to data source tables when `spark.sql.hive.convertCTAS` is set 
to `true`. Actually, for parquet and orc formats, we still can convert them to 
data source table even if the users use `STORED AS` or `ROW FORMAT`.

 How was this patch tested?
Added test cases for both ORC and PARQUET

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark storedAsParquet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13907


commit 06e115ca886809a7b1fcd16e96bd1e9f493add79
Author: gatorsmile 
Date:   2016-06-25T23:05:19Z

fix

commit 2cf107d69a7a16500f17d03d034be43b3ac8cab3
Author: gatorsmile 
Date:   2016-06-25T23:06:42Z

Merge remote-tracking branch 'upstream/master' into storedAsParquet

commit c4bde0217a5e6a31da15cc29dc552a198ed6ef21
Author: gatorsmile 
Date:   2016-06-25T23:08:00Z

clean




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r68495048
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1053,6 +1055,34 @@ object PruneFilters extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
+ * Collapse plans consisting all empty local relations generated by 
[[PruneFilters]].
+ * Note that the ObjectProducer/Consumer and direct aggregations are the 
exceptions.
+ * {{{
+ *   SELECT a, b FROM t WHERE 1=0 GROUP BY a, b ORDER BY a, b ==> empty 
result
+ *   SELECT SUM(a) FROM t WHERE 1=0 GROUP BY a HAVING COUNT(*)>1 ORDER BY 
a (Not optimized)
+ * }}}
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case x if x.isInstanceOf[ObjectProducer] || 
x.isInstanceOf[ObjectConsumer] => x
+
+// Case 1: If groupingExpressions contains all aggregation 
expressions, the result is empty.
+case a @ Aggregate(ge, ae, child) if isEmptyLocalRelation(child) && 
ae.forall(ge.contains(_)) =>
--- End diff --

this kind of blacklisting approach is too risky -- if we were to introduce 
a new logical node in the future, most likely we will forget to update this 
rule.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13906
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13906
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61242/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13906
  
**[Test build #61242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61242/consoleFull)**
 for PR 13906 at commit 
[`7ddf449`](https://github.com/apache/spark/commit/7ddf449d39f22090bc8aa157fae12c79ba00928e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13906
  
**[Test build #61242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61242/consoleFull)**
 for PR 13906 at commit 
[`7ddf449`](https://github.com/apache/spark/commit/7ddf449d39f22090bc8aa157fae12c79ba00928e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-25 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13906

[SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

## What changes were proposed in this pull request?

This PR adds a new logical optimizer, `CollapseEmptyPlan`, to collapse a 
logical plans consisting of only empty LocalRelations. The only exceptional 
logical plan is aggregation. For aggregation plan, only simple cases are 
consider for this optimization.

**Before**
```scala
scala> sql("select a from values (1,2) T(a,b) where 1=0 group by a,b having 
a>1 order by a,b").explain
== Physical Plan ==
*Project [a#11]
+- *Sort [a#11 ASC, b#12 ASC], true, 0
   +- Exchange rangepartitioning(a#11 ASC, b#12 ASC, 200)
  +- *HashAggregate(keys=[a#11, b#12], functions=[])
 +- Exchange hashpartitioning(a#11, b#12, 200)
+- *HashAggregate(keys=[a#11, b#12], functions=[])
   +- LocalTableScan , [a#11, b#12]
```

**After**
```scala
scala> sql("select a from values (1,2) T(a,b) where 1=0 group by a,b having 
a>1 order by a,b").explain
== Physical Plan ==
LocalTableScan , [a#0]
```

## How was this patch tested?

Pass the Jenkins tests (including a new testsuite).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-16208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13906


commit 7ddf449d39f22090bc8aa157fae12c79ba00928e
Author: Dongjoon Hyun 
Date:   2016-06-25T09:18:44Z

[SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61241/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61241/consoleFull)**
 for PR 13758 at commit 
[`9379865`](https://github.com/apache/spark/commit/9379865c8c960dc8952669944eeb452f3173804e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PrimitiveArrayBenchmark extends BenchmarkBase `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61240/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13860
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13860
  
**[Test build #61240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61240/consoleFull)**
 for PR 13860 at commit 
[`b8e0511`](https://github.com/apache/spark/commit/b8e051113b353f4fb8502e763dd8e54d70c3a5af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61239/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #61239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61239/consoleFull)**
 for PR 13680 at commit 
[`500e978`](https://github.com/apache/spark/commit/500e978b7ef10311da1141c97d834a58821d9c11).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/11863
  
Sorry for the delayed reply, I had travel plans that had to be canceled due
to a family emergency (everyone's mostly ok).

1 + 2, I understand that preferred locations is not a guarantee.  Caching
should be a performance issue, not a correctness issue.  It's limited in
size so executors on the "wrong" hosts should eventually get phased out if
space is an issue.  I can add more comments as to the intention.

3 + 4, I agree that there should be an easy way for people to just specify
a list of topics without knowing about how Consumer works.  I agree that if
the convenience constructors are in KafkaUtils, the methods in the
companion objects for DirectKafkaInputDStream / KafkaRDD aren't necessary.

We can work out the specifics of the easy vs advanced api, but as long as
there's a way to get access to all of the Consumer behavior for advanced
users, I'm on board.  A few things I notice about the specifics of your
suggestion:

return type of DStream[(K, V)]:  This cant just be a tuple of (key, value),
at least for advanced users, because there's additional per-message
metadata, timestamp being a big one.  That's currently
ConsumerRecord[K,V].  If you need it to be a wrapped class that's fine as
long as it has the same fields, but that's another object instantiation for
each message.

TopicPartitionOffsets:  I agree with you that reducing overloads would be
good.  I agree with you that the old way we were using auto.offset.reset
was a little weird, because the simple consumer doesn't actually read that
parameter.  In this case however, the new consumer does read that parameter
and does use it to determine initial starting point (in the absence of a
specific seek to a specific offset).  Having more than one way to specify
the same thing is probably going to be more confusing, not less confusing.
I can probably come up with something that's a similar simple API, but it
may not look exactly like that.

So to summarize

- I'll start tweaking things
- Let me know if you think a wrapped class for ConsumerRecord is worth the
per-message overhead
- Let me know if you're 100% attached to TopicPartitionOffsets


On Fri, Jun 24, 2016 at 12:47 PM, Tathagata Das 
wrote:

> @koeninger  Ping!
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61241/consoleFull)**
 for PR 13758 at commit 
[`9379865`](https://github.com/apache/spark/commit/9379865c8c960dc8952669944eeb452f3173804e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13758
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13758
  
@hvanhovell and @cloud-fan This```GenericArrayData``` can be used in 
generated code for a program with an primitive array written in DataFrame or 
Dataset. I newly added Dataframe benchmark program and results. The performance 
improvements are up to 4.4x.

Here is use case of this```GenericArrayData``` in the generated code of 
this benchmark program.
```java
// source program
sc.parallelize(Seq(Array.fill[Double](1)(1), 
1).selectExpr("value[0]").count

// part of generated code
double[] value1 = ...
final ArrayData value = isNull ? null : GenericArrayData.allocate(value1);
```

Another use case is PR https://github.com/apache/spark/pull/13704. 
https://github.com/apache/spark/pull/13704 can generate the following code. At 
Line 046, this PR can eliminate data conversion from a primitive array and 
```Array[Any]```. At Line 051, this PR can also eliminate data conversion.
```java
/* 044 */   if (!inputadapter_isNull) {
/* 045 */ final double[] deserializetoobject_values = 
inputadapter_value.toDoubleArray();
/* 046 */ deserializetoobject_value1 = new 
GenericArrayData(deserializetoobject_values);
/* 047 */
/* 048 */   }
/* 049 */
/* 050 */   boolean deserializetoobject_isNull = 
deserializetoobject_isNull1;
/* 051 */   final double[] deserializetoobject_value = 
deserializetoobject_isNull ? null : (double[]) 
deserializetoobject_value1.toDoubleArray();
```

If https://github.com/apache/spark/pull/13680 is also merged, we can 
improve performance of format conversion between ```UnsafeArrayData``` and 
```GenericArrayData``` by using ```Platform.copy```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13905: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-25 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/13905


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13860
  
**[Test build #61240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61240/consoleFull)**
 for PR 13860 at commit 
[`b8e0511`](https://github.com/apache/spark/commit/b8e051113b353f4fb8502e763dd8e54d70c3a5af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13899: [SPARK-16196][SQL] Codegen in-memory scan with ColumnarB...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13899
  
@andrewor14 Looks interesting.

I created two PRs that generate similar code like [your 
code](https://gist.github.com/andrewor14/7ce4c37a3c6bcd5cc2b6b16c861859e9). My 
PRs use current ```ByteBuffer``` and supports compressions for primitive types. 
Do these PRs help you?
https://github.com/apache/spark/pull/11956
https://github.com/apache/spark/pull/12894

I am waiting for review.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13885: [SPARK-16184][SPARKR] conf API for SparkSession

2016-06-25 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13885#discussion_r68491306
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -110,11 +110,46 @@ infer_type <- function(x) {
   }
 }
 
-getDefaultSqlSource <- function() {
+#' Get Runtime Config from the current active SparkSession
+#'
+#' Get Runtime Config from the current active SparkSession.
+#' To change SparkSession Runtime Config, please see `sparkR.session()`.
+#'
+#' @param key (optional) The key of the config to get, if omitted, all 
config is returned
+#' @param defaultValue (optional) The default value of the config to 
return if they config is not
+#' set, if omitted, the call fails if the config key is not set
+#' @return a list of config values with keys as their names
+#' @rdname sparkR.conf
+#' @name sparkR.conf
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' allConfigs <- sparkR.conf()
+#' masterValue <- unlist(sparkR.conf("spark.master"))
+#' namedConfig <- sparkR.conf("spark.executor.memory", "0g")
+#' }
+#' @note sparkR.conf since 2.0.0
+sparkR.conf <- function(key, defaultValue) {
   sparkSession <- getSparkSession()
-  conf <- callJMethod(sparkSession, "conf")
-  source <- callJMethod(conf, "get", "spark.sql.sources.default", 
"org.apache.spark.sql.parquet")
-  source
+  if (missing(key)) {
+m <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
"getSessionConf", sparkSession)
+as.list(m, all.names = TRUE, sorted = TRUE)
+  } else {
+conf <- callJMethod(sparkSession, "conf")
+value <- if (missing(defaultValue)) {
+  callJMethod(conf, "get", key) # throws if key not found
--- End diff --

instead of throwing a java exception, can we catch it and throw a `stop` 
from the SparkR code ? its slightly more user friendly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ArrayBa...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13847
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ArrayBa...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61237/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ArrayBa...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13847
  
**[Test build #61237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61237/consoleFull)**
 for PR 13847 at commit 
[`902fe5f`](https://github.com/apache/spark/commit/902fe5fdf3ef9379252b7e8f7eaeaeaa6dd8759d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61238/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61238/consoleFull)**
 for PR 13758 at commit 
[`9379865`](https://github.com/apache/spark/commit/9379865c8c960dc8952669944eeb452f3173804e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PrimitiveArrayBenchmark extends BenchmarkBase `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...

2016-06-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13860#discussion_r68491125
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala ---
@@ -51,4 +51,35 @@ case class StructField(
   ("nullable" -> nullable) ~
   ("metadata" -> metadata.jsonValue)
   }
+
+  /**
+   * Updates the StructField with a new comment value.
+   */
+  def withComment(comment: String): StructField = {
+val newMetadata = new MetadataBuilder()
+  .withMetadata(metadata)
+  .putString("comment", comment)
+  .build()
+copy(metadata = newMetadata)
+  }
+
+  /**
+   * Return the comment of this StructField.
+   */
+  def getComment(): Option[String] = {
+if (metadata.contains("comment")) 
Option(metadata.getString("comment")) else None
+  }
+}
+
+object StructField {
+  def apply(
--- End diff --

uh, I see what you want. Let me change it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #61239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61239/consoleFull)**
 for PR 13680 at commit 
[`500e978`](https://github.com/apache/spark/commit/500e978b7ef10311da1141c97d834a58821d9c11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61236/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #61236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61236/consoleFull)**
 for PR 13680 at commit 
[`138810b`](https://github.com/apache/spark/commit/138810b11348b69b6965230031a8abacfbd9ad7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...

2016-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13886
  
CREATE TABLE AS SELECT can be converted to Create Data Source Table when 
the following condition is true: `the statement does not have the 
user-specified file format and row format`. (Actually, when the file format is 
`Parquet`, we still can convert it. Will try to submit a PR for this case)

However, the default value of internal Conf `spark.sql.hive.convertCTAS` is 
`false`. Thus, we do not convert them even if it is possible. Maybe we can add 
a rule to do it when users do not enable Hive support?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13897: [SPARK-16195][SQL] Allow users to specify empty over cla...

2016-06-25 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/13897
  
Thanks a lot @hvanhovell @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13758
  
**[Test build #61238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61238/consoleFull)**
 for PR 13758 at commit 
[`9379865`](https://github.com/apache/spark/commit/9379865c8c960dc8952669944eeb452f3173804e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ArrayBa...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13847
  
**[Test build #61237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61237/consoleFull)**
 for PR 13847 at commit 
[`902fe5f`](https://github.com/apache/spark/commit/902fe5fdf3ef9379252b7e8f7eaeaeaa6dd8759d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13847: [SPARK-16135][SQL] Remove hashCode and euqals in ...

2016-06-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13847#discussion_r68490096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MapData.scala ---
@@ -19,6 +19,10 @@ package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.sql.types.DataType
 
+/**
+ * `MapData` should not implement `equals` and `hashCode` because the type 
cannot be used as join
--- End diff --

okay, how about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #61236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61236/consoleFull)**
 for PR 13680 at commit 
[`138810b`](https://github.com/apache/spark/commit/138810b11348b69b6965230031a8abacfbd9ad7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >