[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52576421
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -272,15 +285,14 @@ class Word2Vec extends Serializable with Logging {
 
   /**
* Computes the vector representation of each word in vocabulary.
-   * @param dataset an RDD of words
+   * @param dataset a RDD of sentences,
--- End diff --

we should use "an" here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-02-11 Thread jayadevanmurali
Github user jayadevanmurali commented on the pull request:

https://github.com/apache/spark/pull/11051#issuecomment-182760091
  
@hvanhovell Thanks for the quick fix, I think we can retest this PR now. 
What you think ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-182778235
  
I don't quite see why this solves a lock problem. Should this be a set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-02-11 Thread skonto
Github user skonto commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-182778238
  
done. Git fetch didnt work for the test build. Could someone re-launch it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13264][Doc] Removed multi-byte characte...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11149#issuecomment-182781119
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13264][Doc] Removed multi-byte characte...

2016-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11149


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52575653
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -272,15 +285,14 @@ class Word2Vec extends Serializable with Logging {
 
   /**
* Computes the vector representation of each word in vocabulary.
-   * @param dataset an RDD of words
+   * @param dataset a RDD of sentences,
--- End diff --

That's right, though RDD effectively starts with a vowel sound: 
arr-dee-dee. A native speaker would certainly say "an RDD" like "an hour". In a 
similar way, people disagree over "a SQL database" vs "an SQL database" but 
it's really a disagreement over whether you say "a _sequel_ database" or "an 
_ess-cyoo-ell_ database.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12915] [SQL] add SQL metrics for whole ...

2016-02-11 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11170#issuecomment-182777260
  
cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11102] [SQL] Uninformative exception wh...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9490#issuecomment-182785426
  
Please close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread ygcao
Github user ygcao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52574236
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -272,15 +285,14 @@ class Word2Vec extends Serializable with Logging {
 
   /**
* Computes the vector representation of each word in vocabulary.
-   * @param dataset an RDD of words
+   * @param dataset a RDD of sentences,
--- End diff --

This is an interesting topic, seem r is not a vowel, not sounds like vowel 
either, why 'an'?
I found this from web:You use the article “a” before words that start 
with a consonant sound and “an” before words that start with a vowel sound.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread ygcao
Github user ygcao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52574384
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -551,12 +551,17 @@ class Word2VecModel private[spark] (
   }
   ind += 1
 }
-wordList.zip(cosVec)
+var topResults = wordList.zip(cosVec)
   .toSeq
-  .sortBy(- _._2)
+  .sortBy(-_._2)
   .take(num + 1)
   .tail
-  .toArray
+if (vecNorm != 0.0f) {
+  topResults = topResults.map { case (word, cosVec) =>
--- End diff --

Good point!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-182772277
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-182772167
  
**[Test build #51090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51090/consoleFull)**
 for PR 9 at commit 
[`166a6ff`](https://github.com/apache/spark/commit/166a6fffcfb9ec8aacdcc91ce827450fca0e79d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-182772279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51090/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13278][CORE] Launcher fails to start wi...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11160#issuecomment-182779157
  
I think `SparkBuild.scala` has a similar computation that needs a similar 
treatment. Also `test("Kill process")` in `UtilsSuite.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182781741
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51089/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182758075
  
**[Test build #51089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51089/consoleFull)**
 for PR 11168 at commit 
[`4cb9d2a`](https://github.com/apache/spark/commit/4cb9d2a0401d10277195c7853999cc89a0853abd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11169#issuecomment-182772770
  
**[Test build #51091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51091/consoleFull)**
 for PR 11169 at commit 
[`b52e156`](https://github.com/apache/spark/commit/b52e1564b578cddb35931af2dd0e9c2c9d97b6f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13278][CORE] Launcher fails to start wi...

2016-02-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11160#discussion_r52577418
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java ---
@@ -336,4 +334,18 @@ static void addPermGenSizeOpt(List cmd) {
 cmd.add("-XX:MaxPermSize=256m");
   }
 
+  /**
+   * Get the major version of the java.version string supplied.
+   */
+  static int javaMajorVersion(String javaVersion) {
+String[] version = javaVersion.split("[+.\\-]+");
--- End diff --

How about just splitting on non-numbers? It's all kind of a theoretical 
difference though. This looks OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12915] [SQL] add SQL metrics for whole ...

2016-02-11 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/11170

[SPARK-12915]  [SQL] add SQL metrics for whole stage codegen

This PR add SQL metrics for generated operators, the cost is about 0.2 nano 
seconds per row.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark gen_metric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11170


commit cf21f0538338dd14623b3aa8f93ad120182a0cd6
Author: Davies Liu 
Date:   2016-02-11T09:14:51Z

add SQL metrics for whole stage codegen




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13074][Core] Add JavaSparkContext. getP...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10978#issuecomment-182781589
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13074][Core] Add JavaSparkContext. getP...

2016-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10978


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182781740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12915] [SQL] add SQL metrics for whole ...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11170#issuecomment-182781676
  
**[Test build #51092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51092/consoleFull)**
 for PR 11170 at commit 
[`cf21f05`](https://github.com/apache/spark/commit/cf21f0538338dd14623b3aa8f93ad120182a0cd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread ygcao
Github user ygcao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52573698
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging {
 val expTable = sc.broadcast(createExpTable())
 val bcVocab = sc.broadcast(vocab)
 val bcVocabHash = sc.broadcast(vocabHash)
-
-val sentences: RDD[Array[Int]] = words.mapPartitions { iter =>
-  new Iterator[Array[Int]] {
-def hasNext: Boolean = iter.hasNext
-
-def next(): Array[Int] = {
-  val sentence = ArrayBuilder.make[Int]
-  var sentenceLength = 0
-  while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) {
-val word = bcVocabHash.value.get(iter.next())
-word match {
-  case Some(w) =>
-sentence += w
-sentenceLength += 1
-  case None =>
-}
+// each partition is a collection of sentences,
+// will be translated into arrays of Index integer
+val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter 
=>
+  // Each sentence will map to 0 or more Array[Int]
+  sentenceIter.flatMap { sentence => {
+  // Sentence of words, some of which map to a word index
+  val wordIndexes = sentence.flatMap(bcVocabHash.value.get)
+  if (wordIndexes.nonEmpty) {
--- End diff --

will empty iterator makes flatMap skip it just like skipping None?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/11169

[SPARK-13260][SQL] count(*) does not work with CSV data source

https://issues.apache.org/jira/browse/SPARK-13260
This is a quicky fix for `count(*)`.

When the `requiredColumns` is empty, currently it returns 
`sqlContext.sparkContext.emptyRDD[Row]` which does not have the count.

Just like JSON datasource, this PR let the CSV datasource count the rows 
but do not parse each tokens.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-13260

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11169


commit b52e1564b578cddb35931af2dd0e9c2c9d97b6f3
Author: hyukjinkwon 
Date:   2016-02-11T08:42:00Z

count(*) does not work with CSV data source




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52575825
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging {
 val expTable = sc.broadcast(createExpTable())
 val bcVocab = sc.broadcast(vocab)
 val bcVocabHash = sc.broadcast(vocabHash)
-
-val sentences: RDD[Array[Int]] = words.mapPartitions { iter =>
-  new Iterator[Array[Int]] {
-def hasNext: Boolean = iter.hasNext
-
-def next(): Array[Int] = {
-  val sentence = ArrayBuilder.make[Int]
-  var sentenceLength = 0
-  while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) {
-val word = bcVocabHash.value.get(iter.next())
-word match {
-  case Some(w) =>
-sentence += w
-sentenceLength += 1
-  case None =>
-}
+// each partition is a collection of sentences,
+// will be translated into arrays of Index integer
+val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter 
=>
+  // Each sentence will map to 0 or more Array[Int]
+  sentenceIter.flatMap { sentence => {
+  // Sentence of words, some of which map to a word index
+  val wordIndexes = sentence.flatMap(bcVocabHash.value.get)
+  if (wordIndexes.nonEmpty) {
--- End diff --

Yes, `flatMap` would flatten an empty iterator to nothing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182781527
  
**[Test build #51089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51089/consoleFull)**
 for PR 11168 at commit 
[`4cb9d2a`](https://github.com/apache/spark/commit/4cb9d2a0401d10277195c7853999cc89a0853abd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL][WIP] Create native DDL comm...

2016-02-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11048#issuecomment-182784626
  
btw @viirya can we create a execution.commands package for this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183191566
  
@yanboliang Could you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread NarineK
GitHub user NarineK opened a pull request:

https://github.com/apache/spark/pull/11179

[SPARK-13295] [ ML, MLlib ] AFTSurvivalRegression.AFTAggregator 
improvements - Avoids creating new instances of arrays/vectors for each record

As also mentioned/marked by TODO in AFTAggregator.AFTAggregator.add(data: 
AFTPoint) a new array is being created for intercept value and it is being 
concatenated
with another array which contains the betas, the resulted Array is being 
converted into a Dense vector which in it's turn is being converted into breeze 
vector.
This is expensive and not necessarily beautiful.

I've tried to solve above mentioned problem by simple algebraic 
decompositions - keeping and treating intercept independently.

Please let me know what do you think and if you have any questions.

Thanks,
Narine


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NarineK/spark survivaloptim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11179


commit 8d443e9d7cd4b8b4cf7a4e14bec8287b7db6aff7
Author: Narine Kokhlikyan 
Date:   2016-02-12T02:42:08Z

Initial commit - AFTSurvivalRegression improvements




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11100#issuecomment-183194619
  
**[Test build #51169 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51169/consoleFull)**
 for PR 11100 at commit 
[`79c11de`](https://github.com/apache/spark/commit/79c11de8954e137e134d3a8645b6936cd625f38e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11100#issuecomment-183195272
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51169/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183195289
  
**[Test build #51174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51174/consoleFull)**
 for PR 9893 at commit 
[`fe79873`](https://github.com/apache/spark/commit/fe79873ef416f3fd4ca29b6970cc2991fb43d017).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11100#issuecomment-183195269
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183191594
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183197817
  
**[Test build #51173 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51173/consoleFull)**
 for PR 11179 at commit 
[`8d443e9`](https://github.com/apache/spark/commit/8d443e9d7cd4b8b4cf7a4e14bec8287b7db6aff7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread ygcao
Github user ygcao commented on the pull request:

https://github.com/apache/spark/pull/10152#issuecomment-183197942
  
addressed new comments. still kept the if statement as I explained by 
sample codes.
reran test and lint test. Jenkins should still be happy :fireworks: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WebUI][SPARK-7889] HistoryServer updates UI f...

2016-02-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8#issuecomment-183198008
  
Just saw this got merged. I'm probably missing some context, but can 
somebody explain to me why something so conceptually simple leads to such a big 
patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...

2016-02-11 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11140#issuecomment-183198830
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

2016-02-11 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-183200242
  
Merging to master. Thanks, @redsanket


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

2016-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10838


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

2016-02-11 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-183200705
  
@redsanket what's your JIRA account name? I want to assign it to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12729 PhantomReferences to replace Final...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11140#issuecomment-183201556
  
**[Test build #51175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51175/consoleFull)**
 for PR 11140 at commit 
[`837252a`](https://github.com/apache/spark/commit/837252a74ec87e8f1ac07e80406bf0410c9088d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183205820
  
**[Test build #51173 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51173/consoleFull)**
 for PR 11179 at commit 
[`8d443e9`](https://github.com/apache/spark/commit/8d443e9d7cd4b8b4cf7a4e14bec8287b7db6aff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183206177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51173/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11178#issuecomment-183206102
  
**[Test build #51176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51176/consoleFull)**
 for PR 11178 at commit 
[`bef62eb`](https://github.com/apache/spark/commit/bef62ebb8ec5065061ff0ca49a4cb7e0182c47b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183206172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11178#issuecomment-183211270
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11178#issuecomment-183211245
  
**[Test build #51176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51176/consoleFull)**
 for PR 11178 at commit 
[`bef62eb`](https://github.com/apache/spark/commit/bef62ebb8ec5065061ff0ca49a4cb7e0182c47b6).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11178#issuecomment-183211271
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51176/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183216221
  
**[Test build #51172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51172/consoleFull)**
 for PR 9893 at commit 
[`e61ec6a`](https://github.com/apache/spark/commit/e61ec6a4a3b603d34c6f7de697d61ee559786337).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183216521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51172/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183216520
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added pygments.rb dependancy

2016-02-11 Thread amitdev
GitHub user amitdev opened a pull request:

https://github.com/apache/spark/pull/11180

Added pygments.rb dependancy

Looks like pygments.rb gem is also required for jekyll build to work. At 
least on Ubuntu/RHEL I could not do build without this dependency. So added 
this to steps.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitdev/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11180


commit f705e9bbe7f1e6a6393062c07e239b23ebf53ac8
Author: Amit Dev 
Date:   2016-02-12T07:43:13Z

Added pygments.rb dependancy




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Documentation] Added pygments.rb dependancy

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11180#issuecomment-183219234
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183219699
  
**[Test build #51174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51174/consoleFull)**
 for PR 9893 at commit 
[`fe79873`](https://github.com/apache/spark/commit/fe79873ef416f3fd4ca29b6970cc2991fb43d017).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183219843
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51174/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183219842
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11153#issuecomment-183197064
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183193865
  
**[Test build #51170 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51170/consoleFull)**
 for PR 11179 at commit 
[`8d443e9`](https://github.com/apache/spark/commit/8d443e9d7cd4b8b4cf7a4e14bec8287b7db6aff7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183194055
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...

2016-02-11 Thread ygcao
Github user ygcao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10152#discussion_r52708705
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging {
 val expTable = sc.broadcast(createExpTable())
 val bcVocab = sc.broadcast(vocab)
 val bcVocabHash = sc.broadcast(vocabHash)
-
-val sentences: RDD[Array[Int]] = words.mapPartitions { iter =>
-  new Iterator[Array[Int]] {
-def hasNext: Boolean = iter.hasNext
-
-def next(): Array[Int] = {
-  val sentence = ArrayBuilder.make[Int]
-  var sentenceLength = 0
-  while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) {
-val word = bcVocabHash.value.get(iter.next())
-word match {
-  case Some(w) =>
-sentence += w
-sentenceLength += 1
-  case None =>
-}
+// each partition is a collection of sentences,
+// will be translated into arrays of Index integer
+val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter 
=>
+  // Each sentence will map to 0 or more Array[Int]
+  sentenceIter.flatMap { sentence => {
+  // Sentence of words, some of which map to a word index
+  val wordIndexes = sentence.flatMap(bcVocabHash.value.get)
+  if (wordIndexes.nonEmpty) {
--- End diff --

Sorry, still not quite sure about this. did a test, turns out I am right 
:grinning: 
scala> val sentences=List("test sen 1","","testsen 2")
sentences: List[String] = List(test sen 1, "", testsen 2)

scala> val rdd=sc.parallelize(sentences)
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at 
parallelize at :23

scala> val results=rdd.flatMap(sen=>sen.split(" ").grouped(1))
results: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[1] at 
flatMap at :25

scala> results.collect
res0: Array[Array[String]] = Array(Array(test), Array(sen), Array(1), 
**Array("")**, Array(testsen), Array(2))

if we don't have the if statement, we'll result empty things which could 
cause trouble for following steps. I'd like to be on the safe side. if 
statement is cheap enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13295] [ ML, MLlib ] AFTSurvivalRegress...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11179#issuecomment-183194060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51170/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13196] [MLlib] Optimize the iterator in...

2016-02-11 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11078#issuecomment-183194684
  
@hhbyyh Did you test it? `Iterator` is lazy. I think the new version would 
consume more memory because `modified` would store all the values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11153#discussion_r52707357
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -572,98 +572,64 @@ class Analyzer(
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
 
-  case s @ Sort(_, _, child) if !s.resolved && child.resolved =>
-val (newOrdering, missingResolvableAttrs) = 
collectResolvableMissingAttrs(s.order, child)
-
-if (missingResolvableAttrs.isEmpty) {
-  val unresolvableAttrs = s.order.filterNot(_.resolved)
-  logDebug(s"Failed to find $unresolvableAttrs in 
${child.output.mkString(", ")}")
-  s // Nothing we can do here. Return original plan.
-} else {
-  // Add the missing attributes into projectList of Project/Window 
or
-  //   aggregateExpressions of Aggregate, if they are in the 
inputSet
-  //   but not in the outputSet of the plan.
-  val newChild = child transformUp {
-case p: Project =>
-  p.copy(projectList = p.projectList ++
-missingResolvableAttrs.filter((p.inputSet -- 
p.outputSet).contains))
-case w: Window =>
-  w.copy(projectList = w.projectList ++
-missingResolvableAttrs.filter((w.inputSet -- 
w.outputSet).contains))
-case a: Aggregate =>
-  val resolvableAttrs = 
missingResolvableAttrs.filter(a.groupingExpressions.contains)
-  val notResolvedAttrs = 
resolvableAttrs.filterNot(a.aggregateExpressions.contains)
-  val newAggregateExpressions = a.aggregateExpressions ++ 
notResolvedAttrs
-  a.copy(aggregateExpressions = newAggregateExpressions)
-case o => o
-  }
-
+  case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
+val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
+val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
+val missingAttrs = requiredAttrs -- child.outputSet
+if (missingAttrs.nonEmpty) {
   // Add missing attributes and then project them away after the 
sort.
   Project(child.output,
-Sort(newOrdering, s.global, newChild))
+Sort(newOrder, s.global, addMissingAttr(child, missingAttrs)))
+} else if (newOrder != order) {
+  s.copy(order = newOrder)
+} else {
+  s
 }
 }
 
 /**
- * Traverse the tree until resolving the sorting attributes
- * Return all the resolvable missing sorting attributes
- */
-@tailrec
-private def collectResolvableMissingAttrs(
-ordering: Seq[SortOrder],
-plan: LogicalPlan): (Seq[SortOrder], Seq[Attribute]) = {
+  * Add the missing attributes into projectList of Project/Window or 
aggregateExpressions of
+  * Aggregate.
+  */
+private def addMissingAttr(plan: LogicalPlan, missingAttrs: 
AttributeSet): LogicalPlan = {
+  if (missingAttrs.isEmpty) {
+return plan
+  }
   plan match {
-// Only Windows and Project have projectList-like attribute.
-case un: UnaryNode if un.isInstanceOf[Project] || 
un.isInstanceOf[Window] =>
-  val (newOrdering, missingAttrs) = 
resolveAndFindMissing(ordering, un, un.child)
-  // If missingAttrs is non empty, that means we got it and return 
it;
-  // Otherwise, continue to traverse the tree.
-  if (missingAttrs.nonEmpty) {
-(newOrdering, missingAttrs)
-  } else {
-collectResolvableMissingAttrs(ordering, un.child)
-  }
+case p: Project =>
+  val missing = missingAttrs -- p.child.outputSet
+  Project(p.projectList ++ missingAttrs, addMissingAttr(p.child, 
missing))
+case w: Window =>
+  val missing = missingAttrs -- w.child.outputSet
+  w.copy(projectList = w.projectList ++ missingAttrs,
+child = addMissingAttr(w.child, missing))
 case a: Aggregate =>
-  val (newOrdering, missingAttrs) = 
resolveAndFindMissing(ordering, a, a.child)
-  // For Aggregate, all the order by columns must be specified in 
group by clauses
-  if (missingAttrs.nonEmpty &&
-  missingAttrs.forall(ar => 
a.groupingExpressions.exists(_.semanticEquals(ar {
-(newOrdering, missingAttrs)
-  } else {
-// If missingAttrs is empty, we are unable to 

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183189627
  
**[Test build #51172 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51172/consoleFull)**
 for PR 9893 at commit 
[`e61ec6a`](https://github.com/apache/spark/commit/e61ec6a4a3b603d34c6f7de697d61ee559786337).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183189659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51171/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-183189658
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12705] [SQL] push missing attributes fo...

2016-02-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11153#discussion_r52707329
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -572,98 +572,64 @@ class Analyzer(
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
 
-  case s @ Sort(_, _, child) if !s.resolved && child.resolved =>
-val (newOrdering, missingResolvableAttrs) = 
collectResolvableMissingAttrs(s.order, child)
-
-if (missingResolvableAttrs.isEmpty) {
-  val unresolvableAttrs = s.order.filterNot(_.resolved)
-  logDebug(s"Failed to find $unresolvableAttrs in 
${child.output.mkString(", ")}")
-  s // Nothing we can do here. Return original plan.
-} else {
-  // Add the missing attributes into projectList of Project/Window 
or
-  //   aggregateExpressions of Aggregate, if they are in the 
inputSet
-  //   but not in the outputSet of the plan.
-  val newChild = child transformUp {
-case p: Project =>
-  p.copy(projectList = p.projectList ++
-missingResolvableAttrs.filter((p.inputSet -- 
p.outputSet).contains))
-case w: Window =>
-  w.copy(projectList = w.projectList ++
-missingResolvableAttrs.filter((w.inputSet -- 
w.outputSet).contains))
-case a: Aggregate =>
-  val resolvableAttrs = 
missingResolvableAttrs.filter(a.groupingExpressions.contains)
-  val notResolvedAttrs = 
resolvableAttrs.filterNot(a.aggregateExpressions.contains)
-  val newAggregateExpressions = a.aggregateExpressions ++ 
notResolvedAttrs
-  a.copy(aggregateExpressions = newAggregateExpressions)
-case o => o
-  }
-
+  case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
+val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
+val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
+val missingAttrs = requiredAttrs -- child.outputSet
+if (missingAttrs.nonEmpty) {
   // Add missing attributes and then project them away after the 
sort.
   Project(child.output,
-Sort(newOrdering, s.global, newChild))
+Sort(newOrder, s.global, addMissingAttr(child, missingAttrs)))
+} else if (newOrder != order) {
+  s.copy(order = newOrder)
+} else {
+  s
 }
 }
 
 /**
- * Traverse the tree until resolving the sorting attributes
- * Return all the resolvable missing sorting attributes
- */
-@tailrec
-private def collectResolvableMissingAttrs(
-ordering: Seq[SortOrder],
-plan: LogicalPlan): (Seq[SortOrder], Seq[Attribute]) = {
+  * Add the missing attributes into projectList of Project/Window or 
aggregateExpressions of
+  * Aggregate.
+  */
+private def addMissingAttr(plan: LogicalPlan, missingAttrs: 
AttributeSet): LogicalPlan = {
--- End diff --

It makes sense to me. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13282][SQL] LogicalPlan toSql should ju...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11171#issuecomment-182792045
  
**[Test build #51093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51093/consoleFull)**
 for PR 11171 at commit 
[`9fd34fc`](https://github.com/apache/spark/commit/9fd34fc1fa27c09cfa5426a53be85cbc5e0460c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-182758831
  
**[Test build #51090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51090/consoleFull)**
 for PR 9 at commit 
[`166a6ff`](https://github.com/apache/spark/commit/166a6fffcfb9ec8aacdcc91ce827450fca0e79d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0

2016-02-11 Thread mariobriggs
Github user mariobriggs commented on the pull request:

https://github.com/apache/spark/pull/11143#issuecomment-182769749
  
FWIW, the [IBM Cloud Message Hub 
service](https://www.ng.bluemix.net/docs/services/MessageHub/index.html#messagehub050)
 which is Kafka, has already moved to 0.9.0 , so i support option 1 that 
@markgrover  suggests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12414] [CORE] Remove closure serializer

2016-02-11 Thread srowen
Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/11150


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-02-11 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/11051#issuecomment-182892635
  
@jayadevanmurali issueing `retest this please` should normally do the trick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r52610253
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,472 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionParams extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"the name of family which is a description of the error distribution 
to be used in the model",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilies.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of the model link function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "the name of the 
model link function",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinks.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+
require(GeneralizedLinearRegression.supportedFamilyLinkPairs.contains($(family) 
-> $(link)),
+  s"Generalized Linear Regression with ${$(family)} family does not 
support ${$(link)} " +
+s"link function.")
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor and
+ * a description of the error distribution.
+ */
+@Experimental
+@Since("2.0.0")
+class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") 
override val uid: String)
+  extends Regressor[Vector, GeneralizedLinearRegression, 
GeneralizedLinearRegressionModel]
+  with GeneralizedLinearRegressionParams with Logging {
+
+  @Since("2.0.0")
+  def this() = this(Identifiable.randomUID("genLinReg"))
+
+  /**
+   * Set the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setFamily(value: String): this.type = set(family, value)
+
+  /**
+   * Set the name of the model link function.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setLink(value: String): this.type = set(link, value)
+
+  /**
+   * Set if we should fit the intercept.
+   * Default is true.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
+  

[GitHub] spark pull request: [SPARK-13124] [Web UI] Fixed CSS and JS issues...

2016-02-11 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/11038#issuecomment-182895149
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13124] [Web UI] Fixed CSS and JS issues...

2016-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11038


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11701][SPARK-13054] dynamic allocation ...

2016-02-11 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-182894494
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r52611868
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,472 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionParams extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"the name of family which is a description of the error distribution 
to be used in the model",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilies.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of the model link function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "the name of the 
model link function",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinks.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+
require(GeneralizedLinearRegression.supportedFamilyLinkPairs.contains($(family) 
-> $(link)),
+  s"Generalized Linear Regression with ${$(family)} family does not 
support ${$(link)} " +
+s"link function.")
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor and
+ * a description of the error distribution.
+ */
+@Experimental
+@Since("2.0.0")
+class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") 
override val uid: String)
+  extends Regressor[Vector, GeneralizedLinearRegression, 
GeneralizedLinearRegressionModel]
+  with GeneralizedLinearRegressionParams with Logging {
+
+  @Since("2.0.0")
+  def this() = this(Identifiable.randomUID("genLinReg"))
+
+  /**
+   * Set the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setFamily(value: String): this.type = set(family, value)
+
+  /**
+   * Set the name of the model link function.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setLink(value: String): this.type = set(link, value)
+
+  /**
+   * Set if we should fit the intercept.
+   * Default is true.
+   * @group setParam
+   */
+  @Since("2.0.0")
+  def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
+  

[GitHub] spark pull request: [SPARK-12594] [SQL] Outer Join Elimination by ...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10567#issuecomment-182898263
  
**[Test build #51098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51098/consoleFull)**
 for PR 10567 at commit 
[`e7fa63f`](https://github.com/apache/spark/commit/e7fa63f2581b77fdbb1437ed0bf21b8fde137db0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL][WIP] Create native DDL comm...

2016-02-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/11048#discussion_r52612493
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala ---
@@ -52,7 +56,7 @@ private[sql] class SparkQl(conf: ParserConf = 
SimpleParserConf()) extends Cataly
   getClauses(Seq("TOK_CREATETABLE", "FORMATTED", "EXTENDED"), 
explainArgs)
 ExplainCommand(nodeToPlan(crtTbl), extended = extended.isDefined)
 
-  case Token("TOK_EXPLAIN", explainArgs) =>
+  case Token("TOK_EXPLAIN", explainArgs) if "TOK_QUERY" == 
explainArgs.head.text =>
--- End diff --

Why not `Token("TOK_EXPLAIN", Token("TOK_QUERY", query) :: explainArgs) =>` 
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12792][SPARKR] Refactor RRDD to support...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10947#issuecomment-182890890
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51094/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12792][SPARKR] Refactor RRDD to support...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10947#issuecomment-182890889
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182895732
  
@hvanhovell there are two alternatives to match `tableProvider` rule:

 tableProvider
 tableOpts?
 (KW_AS selectStatementWithCTE)?

And

(LPAREN columnNameTypeList RPAREN)?
 (p=tableProvider?)
...

Because `(LPAREN columnNameTypeList RPAREN)` is optional, an input 
`KW_USING Identifier` can be matched with both paths. So the warning is emitted 
and path 1 is chosen and path 2 is disabled. Actually it doesn't affect the 
functionality we need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11701][SPARK-13054] dynamic allocation ...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-182902087
  
**[Test build #51099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51099/consoleFull)**
 for PR 10951 at commit 
[`5fc19c7`](https://github.com/apache/spark/commit/5fc19c7b292365644e8e615227f2cfa0b211d261).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11100#issuecomment-182902581
  
**[Test build #51100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51100/consoleFull)**
 for PR 11100 at commit 
[`e62c3d0`](https://github.com/apache/spark/commit/e62c3d0f908eb219798c958a6af731ce2750fbb8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13277][SQL] ANTLR ignores other rule us...

2016-02-11 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/11168#issuecomment-182913207
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12792][SPARKR] Refactor RRDD to support...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10947#issuecomment-182890645
  
**[Test build #51094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51094/consoleFull)**
 for PR 10947 at commit 
[`e4d6b5f`](https://github.com/apache/spark/commit/e4d6b5fe233464a35524b3686d896351b0481f84).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13221] [SQL] Fixing GroupingSets when A...

2016-02-11 Thread aray
Github user aray commented on the pull request:

https://github.com/apache/spark/pull/11100#issuecomment-182891207
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12982][SQL] Add table name validation i...

2016-02-11 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/11051#issuecomment-182891796
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-182893361
  
**[Test build #51096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51096/consoleFull)**
 for PR 11157 at commit 
[`a4e575d`](https://github.com/apache/spark/commit/a4e575d79dbbe7ec8935932ff284ebb0164c9971).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-182897395
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51097/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-182897389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13139][SQL][WIP] Create native DDL comm...

2016-02-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/11048#discussion_r52611305
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkQlSuite.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.plans.PlanTest
+
+class SparkQlSuite extends PlanTest {
--- End diff --

We really should test the resulting plans here, and not wait for an 
`AnalysisException` to be thrown. I know this is a PITA, but it will save us a 
lot of headaches in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r52611593
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,472 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionParams extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"the name of family which is a description of the error distribution 
to be used in the model",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilies.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of the model link function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "the name of the 
model link function",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinks.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+
require(GeneralizedLinearRegression.supportedFamilyLinkPairs.contains($(family) 
-> $(link)),
--- End diff --

Good point! But we can not check ```isSet(link)``` in the setter for 
family, because users may set family before set link and it will produce 
mistake. We can check ```isSet(link)``` at the start of train.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-11 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/11136#issuecomment-182905997
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

2016-02-11 Thread redsanket
Github user redsanket commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-182911828
  
@zsxwing rebased and changed ArrayBuffer to HashSet
@tgravescs might want to take a look at it one more time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WebUI][SPARK-7889] HistoryServer updates UI f...

2016-02-11 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/8#discussion_r52614670
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -511,6 +545,14 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
   bus: ReplayListenerBus): Option[FsApplicationAttemptInfo] = {
 val logPath = eventLog.getPath()
 logInfo(s"Replaying log path: $logPath")
+// Note that the eventLog may have *increased* in size since when we 
grabbed the filestatus,
+// and when we read the file here.  That is OK -- it may result in an 
unnecessary refresh
+// when there is no update, but will not result in missing an update.  
We *must* prevent
+// an error the other way -- if we report a size bigger (ie later) 
than the file that is
+// actually read, we may never refresh the app
+// we expect FileStatus to return the file size when it was initially 
created, but the api
+// is not explicit about this so lets be extra-safe.
+val eventLogLength = eventLog.getLen()
--- End diff --

ah I see, I expected it to behave that way but couldn't find any 
documentation which really made that explicit.  I guess you're saying its 
guaranteed by the post-conditions for getFileStatus()?  I've updated the 
comment now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-182912125
  
**[Test build #51103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51103/consoleFull)**
 for PR 10949 at commit 
[`6c934bd`](https://github.com/apache/spark/commit/6c934bd23f2df13481262ac9506ae8ab1548027e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >