date:20160714

[GitHub] spark pull request #14203: [SPARK-16546][SQL][PySpark] update python datafra...

2016-07-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62364/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #62364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62364/consoleFull)**
 for PR 14158 at commit 
[`41c2daa`](https://github.com/apache/spark/commit/41c2daa19a4b4dc340f6345e2624fd269565638b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop

2016-07-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14203
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62365/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14215
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14215
  
**[Test build #62365 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62365/consoleFull)**
 for PR 14215 at commit 
[`b45f2ea`](https://github.com/apache/spark/commit/b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14169
  
Are all script transforms broken? Don't we already have a test case that 
actually run script transforms?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70923795
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

Not really - as I mentioned the getSQLDatatype looks at the schema - the 
method which looks at the R objects is in 
https://github.com/apache/spark/blob/2e4075e2ece9574100c79558cab054485e25c2ee/R/pkg/R/serialize.R#L84


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK

Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70923645
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

Sounds good. for the mapping between: 'POSIXct / POSIXlt' to 'timestamp' 
and 'Date' to 'date' do we need to update 'getSQLDataType' method ?

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L91



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62362/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)**
 for PR 14214 at commit 
[`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14150
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2016-07-14 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/14151
  
@rxin Do you think it looks okay now ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-07-14 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/14087
  
@marmbrus Do you think this is useful ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70922863
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

And as you mentioned above we can also change `date` to `Date` to be more 
specific. (It would be ideal now that I think to link these R types to the CRAN 
help page. For example we can link to 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Dates.html for Date and 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html 
for `POSIXct / POSIXlt`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62366/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70922747
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

We can remove map, struct. For timestamp lets replace the R side of the 
table with `POSIXct` / `POSIXlt`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62366 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62366/consoleFull)**
 for PR 14216 at commit 
[`cbb104a`](https://github.com/apache/spark/commit/cbb104a4c48fc425517e5b68c67054b1dc4455dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread chenghao-intel

Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/14169
  
HiveConf provides default value 
`org.apache.hadoop.hive.ql.exec.TextRecordReader`, 
`org.apache.hadoop.hive.ql.exec.TextRecordWriter` for keys 
`hive.script.recordreader` and `hive.script.recordwriter` respectively; 
however, SQLConf doesn't provides those keys, and it means the default values 
will be null; this causes the backward-incompatibility;



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14169
  
@rxin  In Spark 2.0, those conf values start with "hive.", which have 
default value in HiveConf, cannot get the default value now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK

Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70921996
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

Thanks for the explanation, @shivaram !
So, I'll remove map, struct and timestamp and leave the rest as is.
Does it sound fine ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-07-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14035
  
ping @mengxr and @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14217: [SPARK-16562][SQL] Do not allow downcast in INT32 based ...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14217
  
**[Test build #62367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62367/consoleFull)**
 for PR 14217 at commit 
[`97303c9`](https://github.com/apache/spark/commit/97303c97e990c12abebf309fe3ab9dd0fc31e515).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14217: [SPARK-16562][SQL] Do not allow downcast in INT32 based ...

2016-07-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14217
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14217: [SPARK-16562][SQL] Do not allow downcast in INT32...

2016-07-14 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14217

[SPARK-16562][SQL] Do not allow downcast in INT32 based types for normal 
Parquet reader

## What changes were proposed in this pull request?

Currently, INT32 based types, (`ShortType`, `ByteType`, `IntegerType`) can 
be downcasted in any combination. For example, the codes below:

```scala
val path = "/tmp/test.parquet"
val data = (1 to 4).map(Tuple1(_.toInt))
data.toDF("a").write.parquet(path)
val schema = StructType(StructField("a", ShortType, true) :: Nil)
spark.read.schema(schema).parquet(path).show()
```

works fine. This should not be allowed.

This only happens when vectorized reader is disabled.

## How was this patch tested?

Unit test in `ParquetIOSuite`.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14217


commit 97303c97e990c12abebf309fe3ab9dd0fc31e515
Author: hyukjinkwon 
Date:   2016-07-15T04:51:44Z

Do not allow downcast in INT32 based types for non-vectorized Parquet reader




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread mariobriggs

Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/14214
  
What i tried to do as a 'side fix' was like this,
  eliminate [1] since it was a lazy val. 

  Move [2] out of the code path of the main thread i.e. let ListenerBus 
thread pay the penalty of producing the physical plan for logging ( i was 
coming from a performance test scenario, so it allowed me to proceed :-) ) . So 
the change was that SparkListenerSQLExecutionStart only take QueryExecution as 
a input parameter and not physicalPlanDescription & SparkPlanInfo . However 
this cannot be the solution since SparkListenerSQLExecutionStart is a public 
API already.

 [3] remains.

As you might have already noticed ConsoleSink also suffers from the same 
problem of [2] and these are inside Dataset.withTypedCallback/withCallback, but 
it is only for Debug purposes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70920785
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

Thats a good point - So users can create a schema with `struct` and that is 
mapping to a corresponding SQL type. But they can't create any R objects that 
will be parsed as `struct`. The main reason our schema is more flexible than 
our serialization / deserialization support is that the schema can be used to 
say read JSON files or JDBC tables etc.

For the use case here, where users are returning a `data.frame` from UDF I 
dont think there is any valid mapping for `struct` from R. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62363 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62363/consoleFull)**
 for PR 14045 at commit 
[`1788d4c`](https://github.com/apache/spark/commit/1788d4c3fb9d547390cdea2bcf28c597bee540d2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62366/consoleFull)**
 for PR 14216 at commit 
[`cbb104a`](https://github.com/apache/spark/commit/cbb104a4c48fc425517e5b68c67054b1dc4455dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK

Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70920518
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

@shivaram, I've looked at the following list:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L92
It is being called for creating schema's field and it has map, struct, 
timestamp, etc ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary mi...

2016-07-14 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/14216

[SPARK-16561][MLLib] fix multivarOnlineSummary min/max bug

## What changes were proposed in this pull request?

add a member vector `cnnz` to count each dimensions non-zero value number.
instead `nnz` with `cnnz` when calculating min/max

## How was this patch tested?

Existing test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark multivarOnlineSummary

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14216


commit cbb104a4c48fc425517e5b68c67054b1dc4455dd
Author: WeichenXu 
Date:   2016-07-12T05:08:42Z

improve multivarOnlineSummary




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62363/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread NarineK

Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70920244
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

@felixcheung, I think according to the following mapping we expect 'date':

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L91
And it seems that there is a 'Date' in base. Do I understand correct ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread mariobriggs

Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/14214
  
> [1] should not be eliminated in general;

  I dont understand the full internal aspects of IncrementalExecution, but 
my generally thinking was that 1 can be eliminated because 'executedPlan' is a 
' lazy val' on QueryExecution ?

>[2] is eliminated by this patch, by replacing the queryExecution with 
incrementalExecution provided by [3];

If the goal is to get it to just as minimal as possible for now and wait 
for SPARK-16264 (which i was also thinking where it will have to finally wait 
for full resolution), why not keep [1] and the change to [2] be the simple case 
of changing 
[L52](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L52)
 to the following
 
``` new Dataset(data.sparkSession, data.queryExecution, 
implicitly[Encoder[T]]) ```

and no further changes required to your ealier code. Will it be the case 
that the wrong physical plan will logged in SparkListenerSQLExecutionStart ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14215
  
**[Test build #62365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62365/consoleFull)**
 for PR 14215 at commit 
[`b45f2ea`](https://github.com/apache/spark/commit/b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14215
  
Hi @gatorsmile @dongjoon-hyun @liancheng , currently this deals with only 
`NumericType` except `DecimalType` for upcasting only for non-vectorized reader.

Before proceeding further, I want to be sure that this approach looks good. 
Could I ask some feedback please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14215: [SPARK-16544][SQL][WIP] Support for conversion fr...

2016-07-14 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14215

[SPARK-16544][SQL][WIP] Support for conversion from compatible schema for 
Parquet data source when data types are not matched

## What changes were proposed in this pull request?

This PR adds schema compatibility for Parquet.

Currently if user-given schema is different with the Parquet schema, it 
throws an exception even when the user-given schema is compatible with Parquet 
schema.

For example, executing the codes below:

```scala
val path = "/tmp/test.parquet"
val data = (1 to 4).map(Tuple1(_))
spark.createDataFrame(data).toDF("a").write.parquet(path)
val schema = StructType(StructField("a", LongType, true) :: Nil)
spark.read.schema(schema).parquet(path).show()
```

throws an exception as below:

```
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in 
block 0 
...
```

This PR lets Parqet supports this schema compatibility.

- [x] Schema compatibility for `NumericType` except `DecimalType`. 
- [ ] Schema compatibility for other `AtomicType`.
- [ ] Schema compatibility for vectorized reader.

## How was this patch tested?

Unit tests in `ParquetIOSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16544

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14215.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14215


commit b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8
Author: hyukjinkwon 
Date:   2016-07-15T03:37:45Z

Support for conversion from compatible schema for Parquet data source when 
data types are not matched




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-07-14 Thread nblintao

Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14158
  
Updated by truncating long texts and adding a tooltip.
The detail description and the screenshot at 
https://github.com/apache/spark/pull/14158#issue-165127460 is also updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #62364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62364/consoleFull)**
 for PR 14158 at commit 
[`41c2daa`](https://github.com/apache/spark/commit/41c2daa19a4b4dc340f6345e2624fd269565638b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62363/consoleFull)**
 for PR 14045 at commit 
[`1788d4c`](https://github.com/apache/spark/commit/1788d4c3fb9d547390cdea2bcf28c597bee540d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)**
 for PR 14214 at commit 
[`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14214: [SPARK-16545][SQL] Eliminate one unnecessary roun...

2016-07-14 Thread lw-lin

GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/14214

[SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in 
ForeachSink

## Problem

As reported by 
[SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545), in 
`ForeachSink` we have initialized 3 rounds of physical planning.

Specifically:

[1] In `StreamExecution`, 
[lastExecution.executedPlan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L369)

[2] In `ForeachSink`, 
[forearchPartition()](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L69)
 calls withNewExecutionId(..., **_queryExection_**) which further calls 
[**_queryExecution_**.executedPlan](https://github.com/apache/spark/blob/9a5071996b968148f6b9aba12e0d3fe888d9acd8/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L55)
 
[3] In `ForeachSink`, [val rdd = { ... incrementalExecution = new 
IncrementalExecution 
...}](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L53)

## What changes were proposed in this pull request?

[1] should not be eliminated in general;

**[2] is eliminated by this patch, by replacing the `queryExecution` with 
`incrementalExecution` provided by [3];**

[3] should be eliminated but can not be done at this stage; let's revisit 
it when SPARK-16264 is resolved.


## How was this patch tested?

- checked manually now there are only 2 rounds of physical planning in 
ForeachSink after this patch
- existing tests ensues it cause no regression


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark physical-3x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14214


commit 8ec635fe7403baf5149e3f6714872bf706b37cd7
Author: Liwei Lin 
Date:   2016-07-15T02:12:02Z

Fix foreachPartition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14169
  
What do you mean that "Since Spark 2.0 has deleted those config keys from 
hive conf" ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-14 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan anything else, it good to merge ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14203
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14203
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62361/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14203
  
**[Test build #62361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62361/consoleFull)**
 for PR 14203 at commit 
[`3952ea0`](https://github.com/apache/spark/commit/3952ea059945b014323cbdba22766212bfe25b54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14203: [SPARK-16546][SQL][PySpark] update python datafra...

2016-07-14 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14203#discussion_r70913944
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1416,13 +1416,25 @@ def drop(self, col):
 
 >>> df.join(df2, df.name == df2.name, 
'inner').drop(df2.name).collect()
 [Row(age=5, name=u'Bob', height=85)]
+
+>>> df.join(df2, df.name == df2.name, 'inner').drop(df2.name) \\
--- End diff --

@rxin Now I update the testcase to make it clearer. Thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14203
  
**[Test build #62361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62361/consoleFull)**
 for PR 14203 at commit 
[`3952ea0`](https://github.com/apache/spark/commit/3952ea059945b014323cbdba22766212bfe25b54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14211: [SPARK-16557][SQL] Remove stale doc in sql/README...

2016-07-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14211


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14211
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14210
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14210
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62359/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14210
  
**[Test build #62359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62359/consoleFull)**
 for PR 14210 at commit 
[`2d76a9f`](https://github.com/apache/spark/commit/2d76a9f1eb50aef1d8036fd59b315bfa401195b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14154: [SPARK-16497][SQL] Don't throw an exception if drop non-...

2016-07-14 Thread lianhuiwang

Github user lianhuiwang commented on the issue:

https://github.com/apache/spark/pull/14154
  
OK, I close it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14154: [SPARK-16497][SQL] Don't throw an exception if dr...

2016-07-14 Thread lianhuiwang

Github user lianhuiwang closed the pull request at:

https://github.com/apache/spark/pull/14154


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...

2016-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14201#discussion_r70912587
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -359,6 +364,82 @@ public SparkLauncher setVerbose(boolean verbose) {
   }
 
   /**
+   * Sets the working directory of the driver process.
+   * @param dir The directory to set as the driver's working directory.
+   * @return This launcher.
+ */
+  public SparkLauncher directory(File dir) {
+builder.workingDir = dir;
+return this;
+  }
+
+  /**
+   * Specifies that stderr in the driver should be redirected to stdout.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError() {
+builder.redirectErrorStream = true;
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified Redirect.
+   * @param to The method of redirection.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError(ProcessBuilder.Redirect to) {
+builder.errorStream = to;
+return this;
+  }
+
+  /**
+   * Redirects standard output to the specified Redirect.
+   * @param to The method of redirection.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectOutput(ProcessBuilder.Redirect to) {
+builder.outputStream = to;
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified File.
+   * @param errFile The file to which stderr is written.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError(File errFile) {
+builder.errorStream = ProcessBuilder.Redirect.to(errFile);
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified File.
+   * @param outFile The file to which stdout is written.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectOutput(File outFile) {
+builder.outputStream = ProcessBuilder.Redirect.to(outFile);
+return this;
+  }
+
+  /**
+   * Sets all output to be logged and redirected to a logger with the 
specified name.
+   * @param loggerName The name of the logger to log stdout and stderr.
+   * @return This launcher.
+ */
+  public SparkLauncher redirectToLog(String loggerName) {
+try {
+  // NOTE: the below ordering is important, so builder.redirectToLog 
is only set to true iff
+  // the preceding put() finishes without exception.
+  builder.getEffectiveConfig().put(CHILD_PROCESS_LOGGER_NAME, 
loggerName);
--- End diff --

No, `getEffectiveConfig()` is updated whenever you modify the configuration 
(e.g. via `setConf`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...

2016-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14201#discussion_r70912543
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -82,8 +83,12 @@
   /** Used internally to create unique logger names. */
   private static final AtomicInteger COUNTER = new AtomicInteger();
 
+  public static final ThreadFactory REDIRECTOR_FACTORY = new 
NamedThreadFactory("launcher-proc-%d");
--- End diff --

It doesn't need to be public. Package private (a.k.a. no modifier) is 
enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14213
  
**[Test build #62360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62360/consoleFull)**
 for PR 14213 at commit 
[`00c9941`](https://github.com/apache/spark/commit/00c9941b7c113afc1d7ab2a59b50c208f46e0cc9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62360/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14213
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14213
  
**[Test build #62360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62360/consoleFull)**
 for PR 14213 at commit 
[`00c9941`](https://github.com/apache/spark/commit/00c9941b7c113afc1d7ab2a59b50c208f46e0cc9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-base...

2016-07-14 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/14213

[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib 
guide

## What changes were proposed in this pull request?

Made DataFrame-based API primary
* Spark doc menu bar and other places now link to ml-guide.html, not 
mllib-guide.html
* mllib-guide.html keeps RDD-specific list of features, with a link at the 
top redirecting people to ml-guide.html
* ml-guide.html includes a "maintenance mode" announcement about the 
RDD-based API
  * **Reviewers: please check this carefully**
* (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles 
for RDD API have "- RDD-based API" suffix
* Moved migration guide to ml-guide from mllib-guide
  * Also moved past guides from mllib-migration-guides to 
ml-migration-guides, with a redirect link on mllib-migration-guides
  * **Reviewers**: I did not change any of the content of the migration 
guides.

Reorganized DataFrame-based guide:
* ml-guide.html mimics the old mllib-guide.html page in terms of content: 
overview, migration guide, etc.
* Moved Pipeline description into ml-pipeline.html and moved tuning into 
ml-tuning.html
  * **Reviewers**: I did not change the content of these guides, except 
some intro text.
* Sidebar remains the same, but with pipeline and tuning sections added

Other:
* ml-classification-regression.html: Moved text about linear methods to new 
section in page

## How was this patch tested?

Generated docs locally

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark ml-guide-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14213


commit 00c9941b7c113afc1d7ab2a59b50c208f46e0cc9
Author: Joseph K. Bradley 
Date:   2016-07-15T01:18:36Z

Reorganized MLlib Programming Guide to make DataFrame-based API the primary 
API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14129: [SPARK-16280][SQL][WIP] Implement histogram_numer...

2016-07-14 Thread tilumi

GitHub user tilumi reopened a pull request:

https://github.com/apache/spark/pull/14129

[SPARK-16280][SQL][WIP] Implement histogram_numeric SQL function

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tilumi/spark SPARK-16280

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14129


commit c286d187fdc51dcb3e9bb65a0e1b250ec3049391
Author: Lucas Yang 
Date:   2016-07-11T01:54:42Z

implement histogram_numeric SQL function

commit ced9954206ea921ec2213cbf3a5485054212ebad
Author: Lucas Yang 
Date:   2016-07-13T02:11:22Z

add histogram_numeric test

commit 7a91110ab70f707803edc8f0302a0459f3aee9fc
Author: Lucas Yang 
Date:   2016-07-13T11:17:23Z

add ImperativeNumericHistogram

commit a56e8836c28a9fc189f81173fbabed5332d6adee
Author: Lucas Yang 
Date:   2016-07-15T00:20:23Z

histogram benchmark

commit 62d44c12323fe9684814d1f681a2e7a29884b07d
Author: Lucas Yang 
Date:   2016-07-15T00:39:34Z

polish Benchmark_SPARK_16280

commit 2beadd1f37aab22341e01b75aa6c22bb032da35a
Author: Lucas Yang 
Date:   2016-07-15T01:25:37Z

polish Benchmark_SPARK_16280




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14129: [SPARK-16280][SQL][WIP] Implement histogram_numer...

2016-07-14 Thread tilumi

Github user tilumi closed the pull request at:

https://github.com/apache/spark/pull/14129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...

2016-07-14 Thread tilumi

Github user tilumi commented on the issue:

https://github.com/apache/spark/pull/14129
  
I Implement 3 kinds of histogram_numeric and the result is 

  (10, 100)).map((pair) => {



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14211
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62356/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14211
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14211
  
**[Test build #62356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62356/consoleFull)**
 for PR 14211 at commit 
[`e507177`](https://github.com/apache/spark/commit/e5071777f6c02a74395c83d5162aa3274ba136e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...

2016-07-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14196#discussion_r70909029
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2409,9 +2409,9 @@ private[spark] object Utils extends Logging {
* "spark.yarn.dist.jars" properties, while in other modes it returns 
the jar files pointed by
* only the "spark.jars" property.
*/
-  def getUserJars(conf: SparkConf): Seq[String] = {
+  def getUserJars(conf: SparkConf, isShell: Boolean = false): Seq[String] 
= {
--- End diff --

Do I still need to update the docs, or maybe this can be done later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62357/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)**
 for PR 14132 at commit 
[`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62354/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62354/consoleFull)**
 for PR 14132 at commit 
[`717f47a`](https://github.com/apache/spark/commit/717f47abb5b8574a611c4a256fde3e620fdce92b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread jameszhouyi

Github user jameszhouyi commented on the issue:

https://github.com/apache/spark/pull/14169
  
Hi Spark guys,
Could you please help to review this PR to merge it in Spark 2.0.0 ? Thanks 
in advance !

Best Regards,
Yi


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14210
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14210
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62355/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14210
  
**[Test build #62355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62355/consoleFull)**
 for PR 14210 at commit 
[`680b6f0`](https://github.com/apache/spark/commit/680b6f0faa835eecb4cd7b9e6add4700fdfa809c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14210: [SPARK-16556] [SQL] Fix Silent Ignorance of Bucket Speci...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14210
  
**[Test build #62359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62359/consoleFull)**
 for PR 14210 at commit 
[`2d76a9f`](https://github.com/apache/spark/commit/2d76a9f1eb50aef1d8036fd59b315bfa401195b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
The following is updated.
- Add more descriptions and test cases (about finding closest table and 
nested hint).
- Support no parameter hint like `/*+ INDEX */`.
- Generalize `hintStatement` rule.
- Simplify `withHints`.
- Move `toUpperCase` into Analyzer rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62352/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...

2016-07-14 Thread andreweduffy

Github user andreweduffy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14201#discussion_r70905418
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -359,6 +364,82 @@ public SparkLauncher setVerbose(boolean verbose) {
   }
 
   /**
+   * Sets the working directory of the driver process.
+   * @param dir The directory to set as the driver's working directory.
+   * @return This launcher.
+ */
+  public SparkLauncher directory(File dir) {
+builder.workingDir = dir;
+return this;
+  }
+
+  /**
+   * Specifies that stderr in the driver should be redirected to stdout.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError() {
+builder.redirectErrorStream = true;
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified Redirect.
+   * @param to The method of redirection.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError(ProcessBuilder.Redirect to) {
+builder.errorStream = to;
+return this;
+  }
+
+  /**
+   * Redirects standard output to the specified Redirect.
+   * @param to The method of redirection.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectOutput(ProcessBuilder.Redirect to) {
+builder.outputStream = to;
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified File.
+   * @param errFile The file to which stderr is written.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectError(File errFile) {
+builder.errorStream = ProcessBuilder.Redirect.to(errFile);
+return this;
+  }
+
+  /**
+   * Redirects error output to the specified File.
+   * @param outFile The file to which stdout is written.
+   * @return This launcher.
+   */
+  public SparkLauncher redirectOutput(File outFile) {
+builder.outputStream = ProcessBuilder.Redirect.to(outFile);
+return this;
+  }
+
+  /**
+   * Sets all output to be logged and redirected to a logger with the 
specified name.
+   * @param loggerName The name of the logger to log stdout and stderr.
+   * @return This launcher.
+ */
+  public SparkLauncher redirectToLog(String loggerName) {
+try {
+  // NOTE: the below ordering is important, so builder.redirectToLog 
is only set to true iff
+  // the preceding put() finishes without exception.
+  builder.getEffectiveConfig().put(CHILD_PROCESS_LOGGER_NAME, 
loggerName);
--- End diff --

Should I also modify `startApplication` to read from builder.conf? It 
appears to use `builder.getEffectiveConfig()` which as far as I can tell is 
sourced from a properties file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #62352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62352/consoleFull)**
 for PR 14079 at commit 
[`351a9a7`](https://github.com/apache/spark/commit/351a9a7e2893a0b90c57233d5e44a52c147bb2a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14212
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62358/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14212
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...

2016-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14212
  
**[Test build #62358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62358/consoleFull)**
 for PR 14212 at commit 
[`596aba6`](https://github.com/apache/spark/commit/596aba6c80bb2c9c5f90f6cdeb5a0c20e3590f55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14090#discussion_r70905195
  
--- Diff: docs/sparkr.md ---
@@ -316,6 +314,139 @@ head(ldf, 3)
 {% endhighlight %}
 
 
+ Run a given function on a large dataset grouping by input column(s) 
and using `gapply` or `gapplyCollect`
+
+# gapply
+Apply a function to each group of a `SparkDataFrame`. The function is to 
be applied to each group of the `SparkDataFrame` and should have only two 
parameters: grouping key and R `data.frame` corresponding to
+that key. The groups are chosen from `SparkDataFrame`s column(s).
+The output of function should be a `data.frame`. Schema specifies the row 
format of the resulting
+`SparkDataFrame`. It must represent R function's output schema on the 
basis of Spark data types. The column names of the returned `data.frame` are 
set by user. Below data type mapping between R
+and Spark.
+
+ Data type mapping between R and Spark
+
+RSpark
+
+  byte
+  byte
+
+
+  integer
+  integer
+
+
+  float
+  float
+
+
+  double
+  double
+
+
+  numeric
+  double
+
+
+  character
+  string
+
+
+  string
+  string
+
+
+  binary
+  binary
+
+
+  raw
+  binary
+
+
+  logical
+  boolean
+
+
+  timestamp
+  timestamp
+
+
+  date
+  date
+
+
+  array
+  array
+
+
+  list
+  array
+
+
+  map
+  map
+
+
+  env
+  map
+
+
+  struct
--- End diff --

I don't think `date` is a type either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14211
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...

2016-07-14 Thread andreweduffy

Github user andreweduffy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14201#discussion_r70904921
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -82,8 +83,12 @@
   /** Used internally to create unique logger names. */
   private static final AtomicInteger COUNTER = new AtomicInteger();
 
+  public static final ThreadFactory REDIRECTOR_FACTORY = new 
NamedThreadFactory("launcher-proc-%d");
--- End diff --

How should this be shared do you think without making it public static in 
either SparkLauncher or ChildProcApphandle?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14179: [SPARK-16055][SPARKR] warning added while using s...

2016-07-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14179#discussion_r70904015
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -155,6 +155,9 @@ sparkR.sparkContext <- function(
 
   existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "")
   if (existingPort != "") {
+if(sparkPackages != ""){
+warning("--packages flag should be used with with spark-submit")
--- End diff --

@shivaram maybe it should but sparkR.session() is already called in sparkR 
shell, and calling SparkSession again with the sparkPackages does nothing:
```
> sparkR.session(sparkPackages = "com.databricks:spark-avro_2.10:2.0.1")
Java ref type org.apache.spark.sql.SparkSession id 1
> read.df("", source = "avro")
16/07/14 23:55:43 ERROR RBackendHandler: loadDF on 
org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  org.apache.spark.sql.AnalysisException: Failed to find data source: avro. 
Please use Spark package 
http://spark-packages.org/package/databricks/spark-avro;
```
@krishnakalyan3 something like "sparkPackages has no effect when using 
spark-submit or sparkR shell, please use the --packages commandline instead"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62351/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 554 matches

Mail list logo