date:20170803

[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18833
  
**[Test build #80200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80200/testReport)**
 for PR 18833 at commit 
[`f346f5b`](https://github.com/apache/spark/commit/f346f5b240f20b653d4a2c6eaf11660f7f7ff98b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...

2017-08-03 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/18833

[SPARK-21625][SQL] sqrt(negative number) should be null.

## What changes were proposed in this pull request?

This PR makes `sqrt(negative number)` to null, same as Hive and MySQL.

## How was this patch tested?

unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-21625

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18833.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18833


commit f346f5b240f20b653d4a2c6eaf11660f7f7ff98b
Author: Yuming Wang 
Date:   2017-08-03T08:43:52Z

sqrt function deal with negative number.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18832
  
**[Test build #80199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)**
 for PR 18832 at commit 
[`83c7504`](https://github.com/apache/spark/commit/83c75043fee2a20f1eb6298bd2dab1259409c3ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18808: [SPARK-21605][BUILD] Let IntelliJ IDEA correctly detect ...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18808
  
I hit the same issue! Thanks for fixing it!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu

GitHub user mpjlu opened a pull request:

https://github.com/apache/spark/pull/18832

[SPARK-21623][ML]fix RF doc

## What changes were proposed in this pull request?

comments of parentStats in RF are wrong.
parentStats is not only used for the first iteration, it is used with all 
the iteration for unordered features.

## How was this patch tested?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpjlu/spark fixRFDoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18832.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18832


commit 83c75043fee2a20f1eb6298bd2dab1259409c3ef
Author: Peng Meng 
Date:   2017-08-03T08:26:51Z

fix RF doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18821: [SPARK-21615][ML][MLlib][DOCS] Fix broken redirect in co...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18821
  
**[Test build #3875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3875/testReport)**
 for PR 18821 at commit 
[`c253579`](https://github.com/apache/spark/commit/c253579fb03e2cfef24e422815ff806725049254).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18811: [SPARK-21604][SQL] if the object extends Logging, i sugg...

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18811
  
Ah, I'm sorry I misled you a bit here @zuotingbing . Yes you found another 
unused variable, but, it's in code that is copied directly from Hive. I think 
we should leave HiveSessionImplwithUGI untouched. It makes it easier to patch 
it if we don't have custom changes.

https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java

If you'll back that out, and this is the only other change you found, I'll 
merge it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18815
  
This won't necessarily be available to people viewing the UI? it could be 
an HDFS location


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18821: [SPARK-21615][ML][MLlib][DOCS] Fix broken redirect in co...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18821
  
**[Test build #3875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3875/testReport)**
 for PR 18821 at commit 
[`c253579`](https://github.com/apache/spark/commit/c253579fb03e2cfef24e422815ff806725049254).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80198/testReport)**
 for PR 18668 at commit 
[`c629cc4`](https://github.com/apache/spark/commit/c629cc49b6af146860b3d7cecdbe4760f347e8c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131074348
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18808: [SPARK-21605][BUILD] Let IntelliJ IDEA correctly detect ...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18808
  
**[Test build #3874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3874/testReport)**
 for PR 18808 at commit 
[`d690a03`](https://github.com/apache/spark/commit/d690a03fc3b735054433b362ef2539af412bc4ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18749
  
Yes, this isn't RTC. It isn't even RTC with 1-vote consensus, because you 
rightly say that you could merge with no other votes if it were obviously not 
required. But whatever. We can pick the process we want, and the name isn't 
important. I agree with your take and I think that's already the current 
process.

This change was clearly reviewed by several other people, so I don't see an 
issue with merging. If the objection was, "you didn't literally write 'LGTM'" 
then no I don't agree that this is important, and not a convention I've seen 
observed consistently. Think about the implication: someone says "yeah this 
looks good" and then you block for a day begging them to come back and write 
'LGTM'? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
@HyukjinKwon you weren't a committer before :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
@srowen search for "RTC vs CTR (was: Concerning Sentry...)"

From Todd Lipcon:
```
I don't have incubator stats... nor do I have a good way to measure "most
active" or "most successful" projects in the ASF (seems that itself could
be a 'centithread'-worthy discussion). But a potential proxy could be the
number of stars on github:

https://github.com/search?utf8=%E2%9C%93=user%3Aapache=Repositories=searchresults
 (sort by number of stars)

Of the top ten:

Spark: RTC via github pull request
Storm: RTC (https://storm.apache.org/documentation/BYLAWS.html see "Code
Change")
Cassandra: RTC (based on my skimming the commit log which has "Reviewed by"
quite often)
CouchDB: RTC (http://couchdb.apache.org/bylaws.html see "RTC" section)
Kafka: RTC (based on "Reviewed by" showing up in recent commit logs)
Thrift: CTR
Mesos: RTC (based on reviewboard links in most of the recent commits)
Zookeeper: RTC (based on personal experience and comments above in this
thread)
Cordova: CTR (based on

https://github.com/apache/cordova-coho/blob/master/docs/committer-workflow.md
)
Hadoop: RTC (based on personal experience)

Briefly looking through the #11 through #30 projects I also see a
substantial number which operate on RTC (and others for which I don't know)

So, I don't think there's much evidence that RTC prevents a project from
becoming successful in the eyes of the developer community. Also worth
noting that several of these are relatively new TLPs (i.e. within the last
~3 years) whereas others are quite old but still active and successful.
```

BTW I didn't realize (and neither did Todd) ASF's definition of "RTC" is 
min 3 vote consensus, which is pretty odd.

The convention has always been find somebody who understands the area to 
review and give explicit LGTM. There are some exceptions like tiny changes 
(basically very similar to your paragraph on judgement call).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
(BTW, just for clarification, I think anyone can use that approve feature. 
I did it before - `https://github.com/apache/spark/pull/17734`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18749
  
RTC means a vote happens for each change: 
https://www.apache.org/foundation/glossary.html#ReviewThenCommit
That's not what we do. What debate are you referring to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
Actually Sean I disagree. Spark has always been review then commit from the 
days before it entered ASF. In a huge debate last year within the ASF on RTC vs 
CTR, Spark was cited as a prominent example. Sometimes there are tiny teeny 
patches that committers can use their own judgements and don't necessarily 
require somebody else to sign-off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131068575
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
+
+In most cases, you may have more than one applications running and rely on 
some different Hadoop/Hive
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131068501
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
+
+In most cases, you may have more than one applications running and rely on 
some different Hadoop/Hive
+client side configurations. You can copy and modify `hdfs-site.xml`, 
`core-site.xml`, `yarn-site.xml`,
+`hive-site.xml` in Spark's ClassPath for each application, but it is not 
very convenient and these
+files are best to be shared with common properties to avoid hard-coding 
certain configurations.
+
+The better choice is to use spark hadoop properties in the form of 
`spark.hadoop.*`. 
+They can be considered as same as normal spark properties which can be set 
in `$SPARK_HOME/conf/spark-defalut.conf`
+
+In some cases, you may want to avoid hard-coding certain configurations in 
a `SparkConf`. For
+instance. Spark allows you to simply create an empty conf and set 
spark/spark hadoop properties.
+
+{% highlight scala %}
+val conf = new SparkConf().set("spark.hadoop.abc.def","xyz")
+val sc = new SparkContext(conf)
+{% endhighlight %}
+
+Also, you can modify or add configurations at runtime:
+{% highlight bash %}
+./bin/spark-submit \ 
+  --name "My app" \ 
+  --master local[4] \  
+  --conf spark.eventLog.enabled=false \ 
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" \ 
+  --conf spark.hadoop.abc.def=xyz
+  myApp.jar
+{% endhighlight %}
+
+## Typical Hadoop/Hive Configurations
+
+
+
+  spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
+  1
+  
+The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
+Version 2 may have better performance, but version 1 may handle 
failures better in certain situations,
+as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
+  
+
+
+
+  spark.hadoop.fs.hdfs.impl.disable.cache
+  false
+  
+Don't cache 'hdfs' filesystem instances. Set true if HDFS Token Expiry 
in long-running spark applicaitons.https://issues.apache.org/jira/browse/HDFS-9276;>HDFS-9276.
--- End diff --

@gatorsmile i guess fs.hdfs.impl.disable.cache true means disable caching 
dfs client instance no tokens, FileSystem.get() will always create a new one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18749
  
Formally, our model is "Commit Then Review". No approval or vote is 
required for any change, but changes can be retroactively vetoed. 
https://www.apache.org/foundation/glossary.html#CommitThenReview

Of course, we don't actually just commit things and then see what people 
think. But voting on every change is infeasible. It is a judgment call. My 
decision depends on things like: do I believe in good faith that any committer 
would be surprised if I merged this? are they likely to want to review first? 
am I confident enough to take on the consequence of fixing things if this 
causes a problem? Hence I leave changes open longer and ping more people the 
bigger the change is. Eventually, you get feedback from everyone who cares and 
tacit consent from anyone who hasn't commented.

Can you merge a change that nobody else has explicitly approved? yes, but, 
would only do so if it's trivial or if you have asked for a while for input and 
nobody has given any.

"LGTM" is not a formal mechanism or vote, and that text doesn't say it is. 
It's there to help contributors understand that's a common way to indicate 
approval.

I didn't realize that only ASF members can use the "Approve/Comment/Request 
Changes" feature. I prefer it just because it renders more nicely and doesn't 
generate an email. I see no reason to prefer that or "LGTM" or "SGTM" or "+1" 
or whatever. Any clear indication of approval is equivalent; these are just 
English words, not a formal vote. I am happy to update the doc to that effect.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
I see. Thanks for the details and explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80197/testReport)**
 for PR 18779 at commit 
[`4fce4ab`](https://github.com/apache/spark/commit/4fce4ab3da0ce425b4ba1807d165b4ab05a812b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
Ah OK. That's what we are discussing here. In the past it has always been 
an explicit "LGTM". That was defined before github had even the approval 
feature. Now most committers are actually not ASF github org members so they 
don't even have the permission to use this approve feature, and for whatever 
legacy reason almost everybody I know (except Sean) uses explicit LGTM. We also 
have tooling that checks the LGTM.

So rather than complicating tooling or changing everybody's behavior, I was 
suggesting to @srowen maybe he should just use LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80196/testReport)**
 for PR 18804 at commit 
[`c1ab569`](https://github.com/apache/spark/commit/c1ab569f7960846262de20340e14cf3ad939c448).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/18804
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
I was wondering if other methods such as SGTM or a Github approval aa an 
approval for a patch are not allowed by rule. I usually say based on 
documentation or references to other guys and I read documentation a lot really 
but it wasn't in my head.. Just want to be sure and follow the rule. I am 
asking this really only because I am curious and want to be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18828
  
Still looking into it, but the failure is related to reuse exchange and 
caching.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80195/testReport)**
 for PR 18804 at commit 
[`c1ab569`](https://github.com/apache/spark/commit/c1ab569f7960846262de20340e14cf3ad939c448).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18831: [SPARK-21622][ML][SparkR] Support offset in SparkR GLM

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18831
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80194/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18831: [SPARK-21622][ML][SparkR] Support offset in SparkR GLM

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18831
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
What's your point? You should be able to merge PR without anybody reviewing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80195/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18831: [SPARK-21622][ML][SparkR] Support offset in SparkR GLM

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18831
  
**[Test build #80194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80194/testReport)**
 for PR 18831 at commit 
[`6ec068e`](https://github.com/apache/spark/commit/6ec068e5f48d393d539f4600bca3cbd1ea7d65a3).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
I understood Spark have used LGTM as convention and it is a good way to 
show an approval as a sign-off but I meant, is "LGTM", not any other methods, 
required before merging it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18628
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18628
  
**[Test build #80193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80193/testReport)**
 for PR 18628 at commit 
[`12565cd`](https://github.com/apache/spark/commit/12565cdc8a11761d3b6e383807a873496a8e7f0d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80193/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18831: [SPARK-21622][ML][SparkR] Support offset in SparkR GLM

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18831
  
**[Test build #80194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80194/testReport)**
 for PR 18831 at commit 
[`6ec068e`](https://github.com/apache/spark/commit/6ec068e5f48d393d539f4600bca3cbd1ea7d65a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80195/testReport)**
 for PR 18804 at commit 
[`c1ab569`](https://github.com/apache/spark/commit/c1ab569f7960846262de20340e14cf3ad939c448).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...

2017-08-03 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/18831

[SPARK-21622][ML][SparkR] Support offset in SparkR GLM

## What changes were proposed in this pull request?
Support offset in SparkR GLM #16699 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark sparkROffset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18831.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18831


commit 6ec068e5f48d393d539f4600bca3cbd1ea7d65a3
Author: actuaryzhang 
Date:   2017-08-03T06:37:41Z

add offset to SparkR




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
Yes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18628
  
**[Test build #80193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80193/testReport)**
 for PR 18628 at commit 
[`12565cd`](https://github.com/apache/spark/commit/12565cdc8a11761d3b6e383807a873496a8e7f0d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-03 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18538
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...

2017-08-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18830
  
Nice catch, looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
Hm, do you mean this line?

> Reviewers can indicate that a change looks suitable for merging with a 
comment such as: âI think this patch looks goodâ. Spark uses the LGTM 
convention for indicating the strongest level of technical sign-off on a patch: 
simply comment with the word âLGTMâ. It specifically means: âIâve 
looked at this thoroughly and take as much ownership as if I wrote the patch 
myselfâ. If you comment LGTM you will be expected to help with bugs or 
follow-up issues on the patch. Consistent, judicious use of LGTMs is a great 
way to gain credibility as a reviewer with the broader community.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
@gatorsmile 

scala> df.groupBy(lit(2)).agg(col("a")).queryExecution.logical
res6: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
'Aggregate [2], [2 AS 2#51, unresolvedalias('a, None)]
+- Project [_1#3 AS a#7, _2#4 AS b#8, _3#5 AS c#9]
   +- LocalRelation [_1#3, _2#4, _3#5]

scala> df.groupBy(lit(2)).agg(col("a")).queryExecution.analyzed
res7: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Aggregate [a#7], [2 AS 2#59, a#7]
+- Project [_1#3 AS a#7, _2#4 AS b#8, _3#5 AS c#9]
   +- LocalRelation [_1#3, _2#4, _3#5]

The int literal `2` in grouping expression is replaced with `a`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80191/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80192/testReport)**
 for PR 18804 at commit 
[`7a8fa2c`](https://github.com/apache/spark/commit/7a8fa2cc6e339c19dfd68eb1dcc9e754c6bdc865).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80192/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80191/testReport)**
 for PR 18779 at commit 
[`c41f209`](https://github.com/apache/spark/commit/c41f209b619f118b237b7641de9d7cc457e4c5e0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80192/testReport)**
 for PR 18804 at commit 
[`7a8fa2c`](https://github.com/apache/spark/commit/7a8fa2cc6e339c19dfd68eb1dcc9e754c6bdc865).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
When using Dataset groupBy API, if you use int literals as grouping 
expressions, do we filter this case out for substituting `UnresolvedOrdinals`? 
Seems there is no related logic to prevent it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80186/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80186/testReport)**
 for PR 18668 at commit 
[`10d624c`](https://github.com/apache/spark/commit/10d624cdeed4b14668e042bc5ad2591eea9817bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18779
  
Our Dataset APIs support `conf.groupByOrdinal`? If so, this might surprise 
me. `conf.groupByOrdinal` was introduced for SQL APIs only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18749
  
It is documented: http://spark.apache.org/contributing.html

It's been the convention forever and it's also good to use one way rather 
than multiple, so I'd prefer us just using that ... until we have a compelling 
reason to change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18668
  
 @vanzin @zhzhan @tejasapatil Could you also help review the documentation 
and the fix? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18809: [SPARK-21602][R] Add map_keys and map_values functions t...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18809
  
Thank you @felixcheung and @actuaryzhang.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131060237
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
+
+In most cases, you may have more than one applications running and rely on 
some different Hadoop/Hive
--- End diff --

`In most cases, you may have more than one applications running and rely on 
some different Hadoop/Hive`
->
`Multiple running applications might require different Hadoop/Hive client 
side configurations.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131059952
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
--- End diff --

`ClassPath ` -> `class path`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18809: [SPARK-21602][R] Add map_keys and map_values functions t...

2017-08-03 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18809
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18628: [SPARK-18061][ThriftServer] Add spnego auth suppo...

2017-08-03 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18628#discussion_r131059886
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala
 ---
@@ -57,6 +59,19 @@ private[hive] class SparkSQLCLIService(hiveServer: 
HiveServer2, sqlContext: SQLC
 case e @ (_: IOException | _: LoginException) =>
   throw new ServiceException("Unable to login to kerberos with 
given principal/keytab", e)
   }
+
+  // Try creating spnego UGI if it is configured.
+  val principal = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL)
+  val keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB)
--- End diff --

I searched the code, looks like there's no other place where using 
`HiveConf.getVar` trims the result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...

2017-08-03 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18830
  
@cloud-fan @vanzin Would you mind take a lookï¼ Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...

2017-08-03 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18830
  
You can see here 
[L208](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala#L208),
 when we called 'revertPartialWritesAndClose', the written reocrds will 
decrease to 0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131059673
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
--- End diff --

`Application ` -> `applications`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18824#discussion_r131059637
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -413,7 +414,10 @@ private[hive] class HiveClientImpl(
 unsupportedFeatures += "partitioned view"
   }
 
-  val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
+  val properties = 
Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map())
+
+  val provider = 
properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER)
+.orElse(Some(DDLUtils.HIVE_PROVIDER))
--- End diff --

Oh. Nvm. Looks like we access the key `DATASOURCE_PROVIDER` in 
`table.properties` for that purpose. This should be safe. Anyway, actually we 
will set `provider` for `CatalogTable` later when restoring the table read from 
metastore.

Another concern is we previously don't restore `provider` for a view, 
please refer to 
https://github.com/apache/spark/blob/f18b905f6cace7686ef169fda7de474079d0af23/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L682.
 By this change, we will set `provider` to `HIVE_PROVIDER` too for view.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
I think we should rather make https://spark-prs.appspot.com recognising the 
github approval as well .. I considered github approval as an approval for this 
patch.

BTW, for now, is it documented that "LGTM" is required somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18830
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131059429
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,59 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark Application interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's ClassPath.
+
+In most cases, you may have more than one applications running and rely on 
some different Hadoop/Hive
+client side configurations. You can copy and modify `hdfs-site.xml`, 
`core-site.xml`, `yarn-site.xml`,
+`hive-site.xml` in Spark's ClassPath for each application, but it is not 
very convenient and these
+files are best to be shared with common properties to avoid hard-coding 
certain configurations.
+
+The better choice is to use spark hadoop properties in the form of 
`spark.hadoop.*`. 
+They can be considered as same as normal spark properties which can be set 
in `$SPARK_HOME/conf/spark-defalut.conf`
+
+In some cases, you may want to avoid hard-coding certain configurations in 
a `SparkConf`. For
+instance. Spark allows you to simply create an empty conf and set 
spark/spark hadoop properties.
+
+{% highlight scala %}
+val conf = new SparkConf().set("spark.hadoop.abc.def","xyz")
+val sc = new SparkContext(conf)
+{% endhighlight %}
+
+Also, you can modify or add configurations at runtime:
+{% highlight bash %}
+./bin/spark-submit \ 
+  --name "My app" \ 
+  --master local[4] \  
+  --conf spark.eventLog.enabled=false \ 
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" \ 
+  --conf spark.hadoop.abc.def=xyz
+  myApp.jar
+{% endhighlight %}
+
+## Typical Hadoop/Hive Configurations
+
+
+
+  spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
+  1
+  
+The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
+Version 2 may have better performance, but version 1 may handle 
failures better in certain situations,
+as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
+  
+
+
+
+  spark.hadoop.fs.hdfs.impl.disable.cache
+  false
+  
+Don't cache 'hdfs' filesystem instances. Set true if HDFS Token Expiry 
in long-running spark applicaitons.https://issues.apache.org/jira/browse/HDFS-9276;>HDFS-9276.
--- End diff --

`When true, HDFS instances do not cache delegation tokens. With the cached 
tokens, HDFS delegation token updates might fail in long-running Spark 
applications.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18830: [SPARK-21621][Core] Reset numRecordsWritten after...

2017-08-03 Thread ConeyLiu

GitHub user ConeyLiu opened a pull request:

https://github.com/apache/spark/pull/18830

[SPARK-21621][Core] Reset numRecordsWritten after 
DiskBlockObjectWriter.commitAndGet called

## What changes were proposed in this pull request?

We should reset numRecordsWritten to zero after 
DiskBlockObjectWriter.commitAndGet called.
Because when `revertPartialWritesAndClose` be called, we decrease the 
written records in `ShuffleWriteMetrics` . However, we decreased the written 
records to zero, this should be wrong, we should only decreased the number 
reords after the last `commitAndGet` called.

## How was this patch tested?
Modified existing test.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConeyLiu/spark DiskBlockObjectWriter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18830


commit d82401d1771009e02a81152b70b4fa48ce077593
Author: Xianyang Liu 
Date:   2017-08-03T05:55:09Z

reset numRecordsWritten to zero after commitAndGet called




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80190/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80190/testReport)**
 for PR 18828 at commit 
[`785a569`](https://github.com/apache/spark/commit/785a5698eef1f398cbbf71d1805da353e28b341e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80187/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80187/testReport)**
 for PR 18828 at commit 
[`785a569`](https://github.com/apache/spark/commit/785a5698eef1f398cbbf71d1805da353e28b341e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18413: [SPARK-21205][SQL] pmod(number, 0) should be null.

2017-08-03 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/18413
  
@HyukjinKwon Could you help review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5

401 - 482 of 482 matches

Mail list logo