date:20140826

[GitHub] spark pull request: [SPARK-3226][MLLIB] update mllib dependencies

2014-08-26 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/2128

[SPARK-3226][MLLIB] update mllib dependencies

to mention `-Pnetlib-lgpl` option. @atalwalkar

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark mllib-native

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2128


commit 4cbba570f02ee9a8b749ffad23e919a98b91380e
Author: Xiangrui Meng m...@databricks.com
Date:   2014-08-26T05:55:34Z

update mllib dependencies




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3226][MLLIB] doc update for native libr...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2128#issuecomment-53379064
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19199/consoleFull)
 for   PR 2128 at commit 
[`4cbba57`](https://github.com/apache/spark/commit/4cbba570f02ee9a8b749ffad23e919a98b91380e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53379531
  
Some quick benchmark:

 from pyspark.rdd import MaxHeapQ
 import heapq, random, timeit
 l = range(113)
 random.shuffle(l)

 def take1():
 q = MaxHeapQ(100)
 for i in l:
 q.insert(i)
 return q.getElements()

 def take2():
 return heapq.nsmallest(100, l)
 # for S  N
 print timeit.timeit(take1(), from __main__ import *, number=100)
0.748146057129
 print timeit.timeit(take2(), from __main__ import *, number=100)
0.142593860626
 # for N  S
 l = range(80)
 random.shuffle(l)
 print timeit.timeit(take1(), from __main__ import *, 
number=1000)
0.156821012497
 print timeit.timeit(take2(), from __main__ import *, 
number=1000)
0.00907206535339

Whenever S  N or S  N, nsmallest() is much faster than MaxHeapQ.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

2014-08-26 Thread andrewor14

GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/2129

[WIP][SPARK-3167] Handle special driver configs in Windows

This is an attempt to bring the Windows scripts up to speed after recent 
splashing changes in #1845.

This is still WIP because there is an issue with getting 
`SparkSubmitDriverBootstrapper` to work. More specifically, the `SparkSubmit` 
subprocess is not picking up `stdin` from the console as expected, because 
there is now an extra layer of execution in this code path.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark windows-config

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2129


commit 83ebe601032867327988940073de4ee08a42c3fe
Author: Andrew Or andrewo...@gmail.com
Date:   2014-08-26T06:28:48Z

Parse special driver configs in Windows (broken)

Note that this is still currently broken. There is an issue with
using SparkSubmitDriverBootstrapper with windows; the stdin is not
being picked up properly by the SparkSubmit subprocess. This must
be fixed before the PR is merged.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

2014-08-26 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2129#discussion_r16697858
  
--- Diff: bin/spark-class2.cmd ---
@@ -115,5 +125,27 @@ rem Figure out where java is.
 set RUNNER=java
 if not x%JAVA_HOME%==x set RUNNER=%JAVA_HOME%\bin\java
 
-%RUNNER% -cp %CLASSPATH% %JAVA_OPTS% %*
+rem In Spark submit client mode, the driver is launched in the same JVM as 
Spark submit itself.
+rem Here we must parse the properties file for relevant spark.driver.* 
configs before launching
+rem the driver JVM itself. Instead of handling this complexity in Bash, we 
launch a separate JVM
+rem to prepare the launch environment of this driver JVM.
+
+rem In this case, leave out the main class 
(org.apache.spark.deploy.SparkSubmit) and use our own.
+rem Leaving out the first argument is surprisingly difficult to do in 
Windows. Note that this must
+rem be done here because the Windows shift command does not work in a 
conditional block.
+set BOOTSTRAP_ARGS=
+shift
+:start_parse
+if %~1 ==  goto end_parse
+set BOOTSTRAP_ARGS=%BOOTSTRAP_ARGS% %~1
+shift
+goto start_parse
+:end_parse
+
+if not [%SPARK_SUBMIT_BOOTSTRAP_DRIVER%] == [] (
+  set SPARK_CLASS=1
+  %RUNNER% org.apache.spark.deploy.SparkSubmitDriverBootstrapper 
%BOOTSTRAP_ARGS%
--- End diff --

This is not working yet. See commit message 
https://github.com/andrewor14/spark/commit/83ebe601032867327988940073de4ee08a42c3fe
 for more detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2886] Use more specific actor system na...

2014-08-26 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1810#issuecomment-53380967
  
Thanks @mateiz, I have merged this into master and 1.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2123#issuecomment-53380977
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19200/consoleFull)
 for   PR 2123 at commit 
[`213fe3f`](https://github.com/apache/spark/commit/213fe3f31f708ff0ee56d56e36644b51c0bba56e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2886] Use more specific actor system na...

2014-08-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1810


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2123#issuecomment-53381105
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19200/consoleFull)
 for   PR 2123 at commit 
[`213fe3f`](https://github.com/apache/spark/commit/213fe3f31f708ff0ee56d56e36644b51c0bba56e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  shift # Ignore main class (org.apache.spark.deploy.SparkSubmit) and 
use our own`
  * `$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main $`
  * `$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main $`
  * `case class SparkListenerTaskStart(stageId: Int, stageAttemptId: Int, 
taskInfo: TaskInfo)`
  * `In multiclass classification, all `$2^`
  * `public final class JavaDecisionTree `
  * `class KMeansModel (val clusterCenters: Array[Vector]) extends 
Serializable `
  * `class BoundedFloat(float):`
  * `class JoinedRow2 extends Row `
  * `class JoinedRow3 extends Row `
  * `class JoinedRow4 extends Row `
  * `class JoinedRow5 extends Row `
  * `class GenericRow(protected[sql] val values: Array[Any]) extends Row `
  * `abstract class MutableValue extends Serializable `
  * `final class MutableInt extends MutableValue `
  * `final class MutableFloat extends MutableValue `
  * `final class MutableBoolean extends MutableValue `
  * `final class MutableDouble extends MutableValue `
  * `final class MutableShort extends MutableValue `
  * `final class MutableLong extends MutableValue `
  * `final class MutableByte extends MutableValue `
  * `final class MutableAny extends MutableValue `
  * `final class SpecificMutableRow(val values: Array[MutableValue]) 
extends MutableRow `
  * `case class CountDistinct(expressions: Seq[Expression]) extends 
PartialAggregate `
  * `case class CollectHashSet(expressions: Seq[Expression]) extends 
AggregateExpression `
  * `case class CollectHashSetFunction(`
  * `case class CombineSetsAndCount(inputSet: Expression) extends 
AggregateExpression `
  * `case class CombineSetsAndCountFunction(`
  * `case class CountDistinctFunction(`
  * `case class MaxOf(left: Expression, right: Expression) extends 
Expression `
  * `case class NewSet(elementType: DataType) extends LeafExpression `
  * `case class AddItemToSet(item: Expression, set: Expression) extends 
Expression `
  * `case class CombineSets(left: Expression, right: Expression) extends 
BinaryExpression `
  * `case class CountSet(child: Expression) extends UnaryExpression `
  * `case class ExplainCommand(plan: LogicalPlan, extended: Boolean = 
false) extends Command `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3224] FetchFailed reduce stages should ...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2127#issuecomment-53381095
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19198/consoleFull)
 for   PR 2127 at commit 
[`1dd3eb5`](https://github.com/apache/spark/commit/1dd3eb5b849bb250f4251730c1afa722757b706b).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread brkyvz

Github user brkyvz closed the pull request at:

https://github.com/apache/spark/pull/2123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2129#issuecomment-53381288
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19201/consoleFull)
 for   PR 2129 at commit 
[`83ebe60`](https://github.com/apache/spark/commit/83ebe601032867327988940073de4ee08a42c3fe).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53381277
  
In PyPy 2.3, the result is reversed, MaxHeapQ is 3x faster than 
nsmallest(). They are both implemented in pure Python, nsmallest() does more 
than MaxHeapQ, it will make it stable. 

If nsmallest() does not do stable sort, MaxHeapQ is still 30% faster than 
nsmallest(), because it will try to call  __le__ if no __lt__ (MaxHeapQ will 
fail if object has no __gt__).

BTW, PyPy do very well in optimizing these algorithm. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread brkyvz

Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/2123#issuecomment-53381256
  
??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread brkyvz

GitHub user brkyvz reopened a pull request:

https://github.com/apache/spark/pull/2123

[SPARK-2839][MLlib] Stats Toolkit documentation updated

Documentation updated for the Statistics Toolkit of MLlib. @mengxr 
@atalwalkar

https://issues.apache.org/jira/browse/SPARK-2839

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brkyvz/spark StatsLib-Docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2123


commit fec4d9d8b9569000125e5e3778d8a5521f4f0b72
Author: Burak brk...@gmail.com
Date:   2014-08-26T01:44:43Z

[SPARK-2830][MLlib] Stats Toolkit documentation updated

commit 213fe3f31f708ff0ee56d56e36644b51c0bba56e
Author: Burak brk...@gmail.com
Date:   2014-08-26T06:28:13Z

[SPARK-2839][MLlib] Modifications made according to review




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3226][MLLIB] doc update for native libr...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2128#issuecomment-53382153
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19199/consoleFull)
 for   PR 2128 at commit 
[`4cbba57`](https://github.com/apache/spark/commit/4cbba570f02ee9a8b749ffad23e919a98b91380e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread brkyvz

Github user brkyvz closed the pull request at:

https://github.com/apache/spark/pull/2123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2130#issuecomment-53383014
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19202/consoleFull)
 for   PR 2130 at commit 
[`bfc6896`](https://github.com/apache/spark/commit/bfc68961c29979e5f0d577eddcd73328e0366f37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2131

[SPARK-3170][CORE][BUG]:RDD info loss in StorageTab and ExecutorTab

compeleted stage only need to remove its own partitions that are no longer 
cached. However, StorageTab may lost some rdds which are cached actually. Not 
only in StorageTab, ExectutorTab may also lose some rdd info which have 
been overwritten by last rdd in a same task.
1. StorageTab: when multiple stages run simultaneously, completed stage 
will remove rdd info which belong to other stages that are still running.
2. ExectutorTab: taskcontext may lose some updatedBlocks info of  rdds  
in a dependency chain. Like the following example:
 val r1 = sc.paralize(..).cache()
 val r2 = r1.map(...).cache()
 val n = r2.count()

When count the r2, r1 and r2 will be cached finally. So in 
CacheManager.getOrCompute, the taskcontext should contain updatedBlocks of r1 
and r2. Currently, the updatedBlocks only contain the info of r2. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master_ui_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit c82ba82ae90c92244e63811f30e1aeb05608c57a
Author: uncleGen husty...@gmail.com
Date:   2014-08-26T06:54:04Z

Bug Fix: RDD info loss in StorageTab and ExecutorTab




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53383213
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-26 Thread uncleGen

Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2076#issuecomment-53383270
  
@andrewor14  @pwendell  @srowen  
As my branch is not up to date, I decide to close this and submit a new PR.
 Please Review It : https://github.com/apache/spark/pull/2131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3170][CORE]: Bug Fix in Storage UI

2014-08-26 Thread uncleGen

Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/2076


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2964] [SQL] Remove duplicated code from...

2014-08-26 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/1886#issuecomment-53383594
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2964] [SQL] Remove duplicated code from...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1886#issuecomment-53384111
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19204/consoleFull)
 for   PR 1886 at commit 
[`8ef8751`](https://github.com/apache/spark/commit/8ef8751da568481b4abf3c87601bc2a16115d7c3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2019#issuecomment-53384128
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19203/consoleFull)
 for   PR 2019 at commit 
[`855c207`](https://github.com/apache/spark/commit/855c2076c34801c8228c8e3fa3e5dee30c82e853).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3228][Streaming]

2014-08-26 Thread Leolh

GitHub user Leolh opened a pull request:

https://github.com/apache/spark/pull/2132

[SPARK-3228][Streaming]

When I use DStream to save files to hdfs, it will create a directory and a 
empty file named _SUCCESS for each job which made in the batch duration.
But if there are no data from source for a long time , and the duration is 
very short(e.g. 10s), it will create so many directory and empty files in hdfs.
I don't think it is necessary. So I want to modify class DStream's method 
saveAsObjectFiles and saveAsTextFiles , it creates directory and files just 
when the RDD's partitions size  0 .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Leolh/spark spark-streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2132


commit 7eca5c7b323f4ba0e83355e22d0508cfb9381880
Author: leo leo@leo.localdomain
Date:   2014-08-26T07:14:13Z

When DStream save RDD to hdfs , don't create directory and empty file if 
there are no data received from source in the batch duration .

commit 35678d22a97c059e319b2fe53be69c989a855674
Author: leo leo@leo.localdomain
Date:   2014-08-26T07:23:17Z

modify the code format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2129#issuecomment-53385409
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19201/consoleFull)
 for   PR 2129 at commit 
[`83ebe60`](https://github.com/apache/spark/commit/83ebe601032867327988940073de4ee08a42c3fe).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `rem In this case, leave out the main class 
(org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3228][Streaming]

2014-08-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2132#issuecomment-53385634
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53386059
  
Looks like I understand what you mean now, I am still curious to try to 
benchmark the takeOrdered function with both approaches. There is definitely 3x 
difference of C vs python implementation(w/o pypy). Can we use PyPy with spark 
? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2019#issuecomment-53386362
  
In this PR, I want to resolve following issues.

(1) Race condition between a thread invoking ConnectionManager#stop and a 
thread invoking threads invoking Connection#close

In this case, if a thread invoking ConnectionManager#stop evaluates 
connectionsByKey -= connection.key in ConnectionManager#removeConnection() 
after a thread invoking Connection#close evaluates k.cancel or channel.close in 
Connection#close(), warning message All connections not cleaned up appears 
because when evaluating connectionsByKey -= connection.key, key is already 
null.

(2) Race condition between a thread invoking SendingConnection#close and a 
thread invoking SendingConnection#close after invoking ReceivingConnection#close

In this case, if a thread invoking ReceivingConnection#close evaluates 
!sendingConnectionOpt.isDefined in ConnectionManager#removeConnection after a 
thread invoking SendingConnection#close evaluates connectionsById -= 
sendingConnectionManagerId in ConnectionManager#removeConnection, 
!sendingConnectionOpt.isDefined is true and error message Corresponding 
SendingConnection to ${remoteConnectionManagerId} not found appears.

(4) Race condition between a thread invoking ConnectionManager#run and  
threads invoking Connection#close

In this case, if a thread invoking ConnectionManager#run evaluates ! 
key.invalid, after threads invoking Connection#close evaluates key.cancel, ! 
key.invalid is true and error message related to CancelledKeyException appears.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3216] Spark-shell is broken in branch-1...

2014-08-26 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2122#issuecomment-53386852
  
Strange... Test wouldn't start.
May I re-create PR for this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-26 Thread chuxi

Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2082#issuecomment-53386890
  
Thank you, marmbrus, you are so nice. I am fresh here and never post any PR 
to a open project. I will take your suggestions and modify my code as the scala 
style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2839][MLlib] Stats Toolkit documentatio...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2130#issuecomment-53387619
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19202/consoleFull)
 for   PR 2130 at commit 
[`bfc6896`](https://github.com/apache/spark/commit/bfc68961c29979e5f0d577eddcd73328e0366f37).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-26 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1986#discussion_r16700565
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -111,48 +113,43 @@ private[spark] class CoarseMesosSchedulerBackend(
 
   def createCommand(offer: Offer, numCores: Int): CommandInfo = {
 val environment = Environment.newBuilder()
-val extraClassPath = conf.getOption(spark.executor.extraClassPath)
-extraClassPath.foreach { cp =
-  environment.addVariables(
-
Environment.Variable.newBuilder().setName(SPARK_CLASSPATH).setValue(cp).build())
-}
+val mesosCommand = CommandInfo.newBuilder()
+  .setEnvironment(environment)
+  
+val driverUrl = akka.tcp://spark@%s:%s/user/%s.format(
+  conf.get(spark.driver.host), conf.get(spark.driver.port),
+  CoarseGrainedSchedulerBackend.ACTOR_NAME)
+val args = Seq(driverUrl, offer.getSlaveId.getValue, 
offer.getHostname, numCores.toString)
 val extraJavaOpts = conf.getOption(spark.executor.extraJavaOptions)
+  .map(Utils.splitCommandString).getOrElse(Seq.empty)
 
-val libraryPathOption = spark.executor.extraLibraryPath
-val extraLibraryPath = conf.getOption(libraryPathOption).map(p = 
s-Djava.library.path=$p)
-val extraOpts = Seq(extraJavaOpts, 
extraLibraryPath).flatten.mkString( )
+// Start executors with a few necessary configs for registering with 
the scheduler
+val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
+val javaOpts = sparkJavaOpts ++ extraJavaOpts
 
-sc.executorEnvs.foreach { case (key, value) =
-  environment.addVariables(Environment.Variable.newBuilder()
-.setName(key)
-.setValue(value)
-.build())
+val classPathEntries = 
conf.getOption(spark.executor.extraClassPath).toSeq.flatMap { cp =
+  cp.split(java.io.File.pathSeparator)
 }
-val command = CommandInfo.newBuilder()
-  .setEnvironment(environment)
-val driverUrl = akka.tcp://spark@%s:%s/user/%s.format(
-  conf.get(spark.driver.host),
-  conf.get(spark.driver.port),
-  CoarseGrainedSchedulerBackend.ACTOR_NAME)
+val libraryPathEntries =
+  conf.getOption(spark.executor.extraLibraryPath).toSeq.flatMap { cp 
=
+cp.split(java.io.File.pathSeparator)
+  }
+
+val command = Command(
+  org.apache.spark.executor.CoarseGrainedExecutorBackend, args, 
sc.executorEnvs,
--- End diff --

Only passing `sc.executorEnvs` here to the `command` object here is not 
enough. Because we only use `command` with `CommandUtils.buildCommandSeq` below 
to generate the command line string, and in this case, `command.environment` is 
only used to run `bin/compute-classpath` (see 
[here](https://github.com/apache/spark/blob/b21ae5bbb9baa966f69303a30659aa8bbb2098da/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala#L69-L71)),
 not propagated to the target executor process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...

2014-08-26 Thread chuxi

Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2084#issuecomment-53387954
  
@marmbrus, I agree with you. Use CAST and so we can avoid some tough 
design. I know little about hive and do you mean in HiveTypeCoercion there is a 
CAST problem? I will try to follow the code.

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3150] Fix NullPointerException in in Sp...

2014-08-26 Thread tanyatik

Github user tanyatik commented on the pull request:

https://github.com/apache/spark/pull/2062#issuecomment-53388890
  
Yes I did, this patch fixes NPE and Spark restarts successfully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2019#issuecomment-53388879
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19203/consoleFull)
 for   PR 2019 at commit 
[`855c207`](https://github.com/apache/spark/commit/855c2076c34801c8228c8e3fa3e5dee30c82e853).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...

2014-08-26 Thread baishuo

Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-53390280
  
Hi @marmbrus i had update the file relating with test. all test passed on 
my machine. Would you please help to verify this patch when you have time:)  I 
had write out the thinking of the code. thank you. 
@rxin @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2964] [SQL] Remove duplicated code from...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1886#issuecomment-53391025
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19204/consoleFull)
 for   PR 1886 at commit 
[`8ef8751`](https://github.com/apache/spark/commit/8ef8751da568481b4abf3c87601bc2a16115d7c3).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `$FWDIR/bin/spark-submit --class $CLASS $`
  * `$FWDIR/bin/spark-submit --class $CLASS $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-26 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1986#discussion_r16703237
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 ---
@@ -96,18 +99,39 @@ private[spark] class MesosSchedulerBackend(
 .setValue(value)
 .build())
 }
-val command = CommandInfo.newBuilder()
+val mesosCommand = CommandInfo.newBuilder()
   .setEnvironment(environment)
-val uri = sc.conf.get(spark.executor.uri, null)
-if (uri == null) {
-  command.setValue(new File(sparkHome, 
/sbin/spark-executor).getCanonicalPath)
+  
+val extraJavaOpts = conf.getOption(spark.executor.extraJavaOptions)
+  .map(Utils.splitCommandString).getOrElse(Seq.empty)
+
+// Start executors with a few necessary configs for registering with 
the scheduler
+val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
+val javaOpts = sparkJavaOpts ++ extraJavaOpts
+
+val classPathEntries = 
conf.getOption(spark.executor.extraClassPath).toSeq.flatMap { cp =
+  cp.split(java.io.File.pathSeparator)
+}
+val libraryPathEntries =
+  conf.getOption(spark.executor.extraLibraryPath).toSeq.flatMap { cp 
=
+cp.split(java.io.File.pathSeparator)
+  }
+
+val command = Command(
+  org.apache.spark.executor.MesosExecutorBackend, Nil, 
sc.executorEnvs,
+  classPathEntries, libraryPathEntries, javaOpts)
--- End diff --

`PYTHONPATH` set by `sbin/spark-executor` is ignored here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-08-26 Thread liyezhang556520

Github user liyezhang556520 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1165#discussion_r16703247
  
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -141,6 +193,93 @@ private class MemoryStore(blockManager: BlockManager, 
maxMemory: Long)
   }
 
   /**
+   * Unroll the given block in memory safely.
+   *
+   * The safety of this operation refers to avoiding potential OOM 
exceptions caused by
+   * unrolling the entirety of the block in memory at once. This is 
achieved by periodically
+   * checking whether the memory restrictions for unrolling blocks are 
still satisfied,
+   * stopping immediately if not. This check is a safeguard against the 
scenario in which
+   * there is not enough free memory to accommodate the entirety of a 
single block.
+   *
+   * This method returns either an array with the contents of the entire 
block or an iterator
+   * containing the values of the block (if the array would have exceeded 
available memory).
+   */
+  def unrollSafely(
+  blockId: BlockId,
+  values: Iterator[Any],
+  droppedBlocks: ArrayBuffer[(BlockId, BlockStatus)])
+: Either[Array[Any], Iterator[Any]] = {
+
+// Number of elements unrolled so far
+var elementsUnrolled = 0
+// Whether there is still enough memory for us to continue unrolling 
this block
+var keepUnrolling = true
+// Initial per-thread memory to request for unrolling blocks (bytes). 
Exposed for testing.
+val initialMemoryThreshold = 
conf.getLong(spark.storage.unrollMemoryThreshold, 1024 * 1024)
+// How often to check whether we need to request more memory
+val memoryCheckPeriod = 16
+// Memory currently reserved by this thread for this particular 
unrolling operation
+var memoryThreshold = initialMemoryThreshold
+// Memory to request as a multiple of current vector size
+val memoryGrowthFactor = 1.5
+// Previous unroll memory held by this thread, for releasing later 
(only at the very end)
+val previousMemoryReserved = currentUnrollMemoryForThisThread
+// Underlying vector for unrolling the block
+var vector = new SizeTrackingVector[Any]
+
+// Request enough memory to begin unrolling
+keepUnrolling = 
reserveUnrollMemoryForThisThread(initialMemoryThreshold)
+
+// Unroll this block safely, checking whether we have exceeded our 
threshold periodically
+try {
+  while (values.hasNext  keepUnrolling) {
+vector += values.next()
+if (elementsUnrolled % memoryCheckPeriod == 0) {
+  // If our vector's size has exceeded the threshold, request more 
memory
+  val currentSize = vector.estimateSize()
+  if (currentSize = memoryThreshold) {
+val amountToRequest = (currentSize * (memoryGrowthFactor - 
1)).toLong
+// Hold the accounting lock, in case another thread 
concurrently puts a block that
+// takes up the unrolling space we just ensured here
+accountingLock.synchronized {
+  if (!reserveUnrollMemoryForThisThread(amountToRequest)) {
+// If the first request is not granted, try again after 
ensuring free space
+// If there is still not enough space, give up and drop 
the partition
+val spaceToEnsure = maxUnrollMemory - currentUnrollMemory
+if (spaceToEnsure  0) {
+  val result = ensureFreeSpace(blockId, spaceToEnsure)
+  droppedBlocks ++= result.droppedBlocks
+}
+keepUnrolling = 
reserveUnrollMemoryForThisThread(amountToRequest)
+  }
+}
+// New threshold is currentSize * memoryGrowthFactor
+memoryThreshold = currentSize + amountToRequest
+  }
+}
+elementsUnrolled += 1
+  }
+
+  if (keepUnrolling) {
+// We successfully unrolled the entirety of this block
+Left(vector.toArray)
+  } else {
+// We ran out of space while unrolling the values for this block
+Right(vector.iterator ++ values)
+  }
+
+} finally {
+  // If we return an array, the values returned do not depend on the 
underlying vector and
+  // we can immediately free up space for other threads. Otherwise, if 
we return an iterator,
+  // we release the memory claimed by this thread later on when the 
task finishes.
+  if (keepUnrolling) {
+val amountToRelease = currentUnrollMemoryForThisThread - 
previousMemoryReserved
+releaseUnrollMemoryForThisThread(amountToRelease)
--- End diff --

[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-26 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1986#discussion_r16703504
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 ---
@@ -96,18 +99,39 @@ private[spark] class MesosSchedulerBackend(
 .setValue(value)
 .build())
 }
-val command = CommandInfo.newBuilder()
+val mesosCommand = CommandInfo.newBuilder()
   .setEnvironment(environment)
-val uri = sc.conf.get(spark.executor.uri, null)
-if (uri == null) {
-  command.setValue(new File(sparkHome, 
/sbin/spark-executor).getCanonicalPath)
+  
+val extraJavaOpts = conf.getOption(spark.executor.extraJavaOptions)
+  .map(Utils.splitCommandString).getOrElse(Seq.empty)
+
+// Start executors with a few necessary configs for registering with 
the scheduler
+val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
+val javaOpts = sparkJavaOpts ++ extraJavaOpts
+
+val classPathEntries = 
conf.getOption(spark.executor.extraClassPath).toSeq.flatMap { cp =
+  cp.split(java.io.File.pathSeparator)
+}
+val libraryPathEntries =
+  conf.getOption(spark.executor.extraLibraryPath).toSeq.flatMap { cp 
=
+cp.split(java.io.File.pathSeparator)
+  }
+
+val command = Command(
+  org.apache.spark.executor.MesosExecutorBackend, Nil, 
sc.executorEnvs,
+  classPathEntries, libraryPathEntries, javaOpts)
--- End diff --

Should set `PYTHONPATH` as what `sbin/spark-executor` does somehow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3230][SQL] Fix udfs that return structs

2014-08-26 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/2133

[SPARK-3230][SQL] Fix udfs that return structs

We need to convert the case classes into Rows.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark structUdfs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2133


commit d8d0b769272d9a333b927e3ad78e6cfef4d49797
Author: Michael Armbrust mich...@databricks.com
Date:   2014-08-26T09:53:10Z

Fix udfs that return structs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-08-26 Thread chutium

Github user chutium commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r16704875
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -57,6 +61,8 @@ class JdbcRDD[T: ClassTag](
 mapRow: (ResultSet) = T = JdbcRDD.resultSetToObjectArray _)
   extends RDD[T](sc, Nil) with Logging {
 
+  private var schema: Seq[(String, Int, Boolean)] = null
--- End diff --

yep, i tried to do like you said before, but there is no public method or 
attribute to get ResultSet or Statement from this ```JdbcRDD``` in spark core, 
so in ```JdbcResultSetRDD``` i have no idea how can we get the metadata from 
```JdbcRDD```... otherwise we do something like ```jdbcRDD.head``` then we can 
get the metadata from first row, but it may execute the whole query at plan 
phase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3230][SQL] Fix udfs that return structs

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2133#issuecomment-53399018
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19205/consoleFull)
 for   PR 2133 at commit 
[`d8d0b76`](https://github.com/apache/spark/commit/d8d0b769272d9a333b927e3ad78e6cfef4d49797).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-08-26 Thread chutium

Github user chutium commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r16705207
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala 
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql._
+
+import org.scalatest.BeforeAndAfter
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.TestSQLContext._
+
+class JdbcResultSetRDDSuite extends QueryTest with BeforeAndAfter {
+
+  before {
+Class.forName(org.apache.derby.jdbc.EmbeddedDriver)
+val conn = 
DriverManager.getConnection(jdbc:derby:target/JdbcSchemaRDDSuiteDb;create=true)
+try {
+  val create = conn.createStatement
+  create.execute(
+CREATE TABLE FOO(
+  ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, 
INCREMENT BY 1),
+  DATA INTEGER
+))
+  create.close()
+  val insert = conn.prepareStatement(INSERT INTO FOO(DATA) VALUES(?))
+  (1 to 100).foreach { i =
+insert.setInt(1, i * 2)
+insert.executeUpdate
+  }
+  insert.close()
+} catch {
+  case e: SQLException if e.getSQLState == X0Y32 =
+// table exists
+} finally {
+  conn.close()
+}
+  }
+
+  test(basic functionality) {
+val jdbcResultSetRDD = 
jdbcResultSet(jdbc:derby:target/JdbcSchemaRDDSuiteDb, SELECT DATA FROM FOO)
+jdbcResultSetRDD.registerAsTable(foo)
+
+checkAnswer(
+  sql(select count(*) from foo),
+  100
+)
+checkAnswer(
+  sql(select sum(DATA) from foo),
+  10100
+)
+  }
+
+  after {
+try {
+  DriverManager.getConnection(jdbc:derby:;shutdown=true)
+} catch {
+  case se: SQLException if se.getSQLState == XJ015 =
--- End diff --

i have also no idea what is this XJ015... i taked it from 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala
 :P


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-08-26 Thread chutium

Github user chutium commented on a diff in the pull request:

https://github.com/apache/spark/pull/1612#discussion_r16705825
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag](
   logInfo(statement fetch size set to:  + stmt.getFetchSize +  to 
force MySQL streaming )
 }
 
-stmt.setLong(1, part.lower)
-stmt.setLong(2, part.upper)
+val parameterCount = stmt.getParameterMetaData.getParameterCount
+if (parameterCount  0) {
--- End diff --

i am afraid they do not think it is a problem, in the original comment of 
JdbcRDD:
```
 * @param sql the text of the query.
 *   The query must contain two ? placeholders for parameters used to 
partition the results.
 *   E.g. select title, author from books where ? = id and id = ?
```

but i believe many users just want to get the whole table out of RDBMS 
simply, and then do some calculation in Spark's magic world... how many 
partitions will be created is no matter, so in the normal use case, the tables 
stored in RDBMS are small, therefore these two ? placeholders for partitioning 
is not always necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

2014-08-26 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/2134

[SPARK-3000][CORE] drop old blocks to disk in parallel when memory is no...

...t large enough for caching new blocks

Currently, old blocks dropping for new blocks' caching are processed by one 
thread at the same time. Which can not fully utilize the disk throughput. If 
the to be dropped block size is huge, then the dropping time will be very long. 
We need to make it processed in parallel. In this patch, dropping blocks 
operation are processed in multiple threads, before dropping, each thread will 
select the blocks that to be dropped for itself.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark spark-3000-v0.4.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2134


commit 357dae839034490bf83b8fdadb413cdef32f2e8b
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-08-26T10:20:30Z

[SPARK-3000][CORE] drop old blocks to disk in parallel when memory is not 
large enough for caching new blocks

Currently, old blocks dropping for new blocks' caching are processed by one 
thread at the same time. Which can not fully utilize the disk throughput. If 
the to be dropped block size is huge, then the dropping time will be very long. 
We need to make it processed in parallel. In this patch, dropping blocks 
operation are processed in multiple threads, before dropping, each thread will 
select the blocks that to be dropped for itself.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3229] spark.shuffle.safetyFraction and ...

2014-08-26 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2135

[SPARK-3229] spark.shuffle.safetyFraction and spark.storage.safetyFraction 
is not documented



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-3229

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2135.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2135


commit fdcc6b7e8d547237c691ad8aa7ba8099349b6483
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-08-26T10:35:35Z

Aded descriptions for spark.shuffle.memoryFraction and 
spark.storage.memoryFraction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3229] spark.shuffle.safetyFraction and ...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2135#issuecomment-53402599
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19206/consoleFull)
 for   PR 2135 at commit 
[`fdcc6b7`](https://github.com/apache/spark/commit/fdcc6b7e8d547237c691ad8aa7ba8099349b6483).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2134#issuecomment-53402614
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19207/consoleFull)
 for   PR 2134 at commit 
[`357dae8`](https://github.com/apache/spark/commit/357dae839034490bf83b8fdadb413cdef32f2e8b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Geolocation to twitter stream

2014-08-26 Thread danjamker

Github user danjamker commented on the pull request:

https://github.com/apache/spark/pull/2098#issuecomment-53404090
  
Yes, they look like they are doing the same thing. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2019#issuecomment-53405037
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19208/consoleFull)
 for   PR 2019 at commit 
[`4eee6c9`](https://github.com/apache/spark/commit/4eee6c9402efb862b015ea3a9203ebafb21592bc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3230][SQL] Fix udfs that return structs

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2133#issuecomment-53405791
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19205/consoleFull)
 for   PR 2133 at commit 
[`d8d0b76`](https://github.com/apache/spark/commit/d8d0b769272d9a333b927e3ad78e6cfef4d49797).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3229] spark.shuffle.safetyFraction and ...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2135#issuecomment-53407257
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19206/consoleFull)
 for   PR 2135 at commit 
[`fdcc6b7`](https://github.com/apache/spark/commit/fdcc6b7e8d547237c691ad8aa7ba8099349b6483).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2134#issuecomment-53407378
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19207/consoleFull)
 for   PR 2134 at commit 
[`357dae8`](https://github.com/apache/spark/commit/357dae839034490bf83b8fdadb413cdef32f2e8b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3131][SQL] Allow user to set parquet co...

2014-08-26 Thread chutium

Github user chutium commented on the pull request:

https://github.com/apache/spark/pull/2039#issuecomment-53408822
  
thanks @marmbrus , i change the property name and default codec


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3131][SQL] Allow user to set parquet co...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2039#issuecomment-53409498
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19209/consoleFull)
 for   PR 2039 at commit 
[`2f44964`](https://github.com/apache/spark/commit/2f4496492c7d68cf90094c900aa8905b6d6f9241).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2019#issuecomment-53410681
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19208/consoleFull)
 for   PR 2019 at commit 
[`4eee6c9`](https://github.com/apache/spark/commit/4eee6c9402efb862b015ea3a9203ebafb21592bc).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-26 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/1986#discussion_r16709350
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -111,48 +113,43 @@ private[spark] class CoarseMesosSchedulerBackend(
 
   def createCommand(offer: Offer, numCores: Int): CommandInfo = {
 val environment = Environment.newBuilder()
-val extraClassPath = conf.getOption(spark.executor.extraClassPath)
-extraClassPath.foreach { cp =
-  environment.addVariables(
-
Environment.Variable.newBuilder().setName(SPARK_CLASSPATH).setValue(cp).build())
-}
+val mesosCommand = CommandInfo.newBuilder()
+  .setEnvironment(environment)
+  
+val driverUrl = akka.tcp://spark@%s:%s/user/%s.format(
+  conf.get(spark.driver.host), conf.get(spark.driver.port),
+  CoarseGrainedSchedulerBackend.ACTOR_NAME)
+val args = Seq(driverUrl, offer.getSlaveId.getValue, 
offer.getHostname, numCores.toString)
 val extraJavaOpts = conf.getOption(spark.executor.extraJavaOptions)
+  .map(Utils.splitCommandString).getOrElse(Seq.empty)
 
-val libraryPathOption = spark.executor.extraLibraryPath
-val extraLibraryPath = conf.getOption(libraryPathOption).map(p = 
s-Djava.library.path=$p)
-val extraOpts = Seq(extraJavaOpts, 
extraLibraryPath).flatten.mkString( )
+// Start executors with a few necessary configs for registering with 
the scheduler
+val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
+val javaOpts = sparkJavaOpts ++ extraJavaOpts
 
-sc.executorEnvs.foreach { case (key, value) =
-  environment.addVariables(Environment.Variable.newBuilder()
-.setName(key)
-.setValue(value)
-.build())
+val classPathEntries = 
conf.getOption(spark.executor.extraClassPath).toSeq.flatMap { cp =
+  cp.split(java.io.File.pathSeparator)
 }
-val command = CommandInfo.newBuilder()
-  .setEnvironment(environment)
-val driverUrl = akka.tcp://spark@%s:%s/user/%s.format(
-  conf.get(spark.driver.host),
-  conf.get(spark.driver.port),
-  CoarseGrainedSchedulerBackend.ACTOR_NAME)
+val libraryPathEntries =
+  conf.getOption(spark.executor.extraLibraryPath).toSeq.flatMap { cp 
=
+cp.split(java.io.File.pathSeparator)
+  }
+
+val command = Command(
+  org.apache.spark.executor.CoarseGrainedExecutorBackend, args, 
sc.executorEnvs,
--- End diff --

yes, here i should set env to  CommandInfo to propagate  to the target 
executor process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53415133
  
 @mattf While I was scanning down the whole file line by line in order to 
find out all the issues related to persersesPartitioning, reformatting them in 
the same time, if some lines did not looks nice to me. It's a completely 
personal judgement, so maybe it does not make sense to others.
 
 It's not a good idea to do this kind of reformatting in in PR, I also was 
thinking of do it as a separated PR or do not dot it if we have no necessary 
reason.
 
 Should I remove these not-related changes?

if it were up to me, i'd say yes. it's not though, so i'll go with the flow.

i'm still trying to get a feel for what the spark community likes in its 
PRs and JIRAs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53415478
  
 It's supposed that count() will cheaper than collect(), we call count() 
instead of collect() to trigger the calculation in Scala/Java, It's better to 
keep the same style in Python.
 
 But in PySpark, count() depends on collect(), which will dump the result 
into disks and load them into Python. In future, this is maybe changed, count() 
will returned a number from JVM.
 
 Right now, no strong reason to change collect() to count(), revert it?

thank you for the explaination, it wasn't clear from the code. my 
preference is for isolated changes, so i'd suggest reverting and doing it 
separately. however, others may not agree. so i'd say at least add a comment 
about why count() is used -- someone might come along and change it back to 
collect() without knowing they shouldn't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add histgram() API

2014-08-26 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2091#issuecomment-53416722
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3131][SQL] Allow user to set parquet co...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2039#issuecomment-53418058
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19209/consoleFull)
 for   PR 2039 at commit 
[`2f44964`](https://github.com/apache/spark/commit/2f4496492c7d68cf90094c900aa8905b6d6f9241).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-911] allow efficient queries for a rang...

2014-08-26 Thread aaronjosephs

Github user aaronjosephs commented on the pull request:

https://github.com/apache/spark/pull/1381#issuecomment-53425210
  
After taking a look at this again I realized I should actually be using 
PartitionPruningRDD to avoid launching tasks on bad partitions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3216] [SPARK-3232] Spark-shell is broke...

2014-08-26 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2136

[SPARK-3216] [SPARK-3232] Spark-shell is broken in branch-1.0 / Backport 
SPARK-3006 into branch-1.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-3216

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2136


commit 15cd980534650e120352ff97204966f18fcdf278
Author: Andrew Or andrewo...@gmail.com
Date:   2014-08-25T23:02:28Z

Fix spark-shell in branch-1.0

commit bbc722170fe04563c89516e6ec48b9588cb07176
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-08-26T13:42:07Z

Replace 4-space into 2-space

commit af0517fdf078f3f2078233efbe7a19e2a1b3d32d
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-08-26T14:02:41Z

Modified spark-shell.cmd, backporting SPARK-3006




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: mllib: Clarify learning interfaces

2014-08-26 Thread BigCrunsh

GitHub user BigCrunsh opened a pull request:

https://github.com/apache/spark/pull/2137

mllib: Clarify learning interfaces

** Make threshold mandatory **
Currently, the output of ``predict`` for an example is either the score
or the class. This side-effect is caused by ``clearThreshold``. To
clarify that behaviour three different types of predict (predictScore,
predictClass, predictProbabilty) were introduced; the threshold is not
longer optional.

** Clarify classification interfaces
Currently, some functionality is spreaded over multiple models.
In order to clarify the structure and simplify the implementation of
more complex models (like multinomial logistic regression), two new
classes are introduced:
- BinaryClassificationModel: for all models that derives a binary
classification from a single weight vector. Comprises the tresholding
functionality to derive a prediction from a score. It basically captures
SVMModel and LogisticRegressionModel.
- ProbabilitistClassificaitonModel: This trait defines the interface for
models that return a calibrated confidence score (aka probability).

** Misc
- some renaming
- add test for probabilistic output

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/soundcloud/spark mllib-improvements

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2137


commit b015b7a6bfe1db4cf57cce1e96c08904f0758100
Author: Christoph Sawade christ...@sawade.me
Date:   2014-08-22T20:38:40Z

Clarify learning interfaces

* Make threshold mandatory
Currently, the output of ``predict`` for an example is either the score
or the class. This side-effect is caused by ``clearThreshold``. To
clarify that behaviour three different types of predict (predictScore,
predictClass, predictProbabilty) were introduced; the threshold is not
longer optional.

* Clarify classification interfaces
Currently, some functionality is spreaded over multiple models.
In order to clarify the structure and simplify the implementation of
more complex models (like multinomial logistic regression), two new
classes are introduced:
- BinaryClassificationModel: for all models that derives a binary
classification from a single weight vector. Comprises the tresholding
functionality to derive a prediction from a score. It basically captures
SVMModel and LogisticRegressionModel.
- ProbabilitistClassificaitonModel: This trait defines the interface for
models that return a calibrated confidence score (aka probability).

* Misc
- some renaming
- add test for probabilistic output




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2873] [SQL] using ExternalAppendOnlyMap...

2014-08-26 Thread guowei2

Github user guowei2 commented on the pull request:

https://github.com/apache/spark/pull/2029#issuecomment-53426380
  
```
import org.apache.spark.sql.catalyst.types.{IntegerType, DataType}
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.execution._
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.SparkContext._
import org.apache.spark._
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.execution.OnHeapAggregate
import org.apache.spark.sql.catalyst.expressions.Alias
import org.apache.spark.sql.catalyst.expressions.BoundReference


object AggregateBenchMark extends App {

  val sc = new SparkContext(
new SparkConf().setMaster(local).setAppName(agg-benchmark))

  val dataType: DataType = IntegerType
  val aggExps = Seq(Alias(sum(BoundReference(1, dataType, true)),sum)())
  val groupExps = Seq(BoundReference(0, dataType, true))
  val attributes =  aggExps.map(_.toAttribute)
  val childPlan = rowsPlan(sc, attributes)

  def benchmarkOnHeap = {
val begin = System.currentTimeMillis()
OnHeapAggregate(false, groupExps, aggExps, 
childPlan).execute().foreach(_ = {})
val end = System.currentTimeMillis()
end - begin
  }

  def benchmarkExternal = {
val begin = System.currentTimeMillis()
ExternalAggregate(false, groupExps, aggExps, 
childPlan).execute().foreach(_ = {})
val end = System.currentTimeMillis()
end - begin
  }

  (1 to 5).map(_= println(OnHeapAggregate time: + benchmarkOnHeap))
  (1 to 5).map(_= println(ExternalAggregate time: + benchmarkExternal))

}
private[spark] class TestRDD(
   sc: SparkContext,
   numPartitions: Int) extends RDD[Row](sc, Nil) with Serializable {

  override def compute(split: Partition, context: TaskContext): 
Iterator[Row] = {
new Iterator[Row] {
  var lines = 0
  override final def hasNext: Boolean = lines  300
  override final def next(): Row = {
lines += 1
val row = new GenericMutableRow(2)
row(0) = (math.random * 2000).toInt
row(1) = (math.random * 50).toInt
row.asInstanceOf[Row]
  }
}
  }
  override def getPartitions = (0 until numPartitions).map(i = new 
Partition {
override def index = i
  }).toArray
  override def getPreferredLocations(split: Partition): Seq[String] = Nil
  override def toString: String = TestRDD  + id
}


case class rowsPlan(@transient val sc:SparkContext, attributes: 
Seq[Attribute]) extends LeafNode {

  override def output = attributes

  override def execute() = {
new TestRDD(sc, 1).asInstanceOf[RDD[Row]]
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: mllib: Clarify learning interfaces

2014-08-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2137#issuecomment-53426978
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3216] [SPARK-3232] Spark-shell is broke...

2014-08-26 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2136#issuecomment-53427178
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3216] [SPARK-3232] Spark-shell is broke...

2014-08-26 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2136#issuecomment-53428221
  
@andrewor14 I also tried but couldn't let Jenkins work...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...

2014-08-26 Thread byF

Github user byF commented on the pull request:

https://github.com/apache/spark/pull/2084#issuecomment-53428397
  
@marmbrus `PromoteStrings` makes perfect sense; I've got troubles running 
tests on my machine, I'll push the fix once I make the test work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2873] [SQL] using ExternalAppendOnlyMap...

2014-08-26 Thread guowei2

Github user guowei2 commented on the pull request:

https://github.com/apache/spark/pull/2029#issuecomment-53433705
  
@marmbrus 

it's very sad about the result of benchmark above.
once one spill happen, usually batch of spills will happen one by one.

the size of AppendOnlyMap is according to the number of keys for values 
with the same key merged

i think it's not a good way by using ExternalAppendOnlyMap,fot it is too 
expensive when records with the same key spill to disk over and over again.

otherwise, user can easily avoid OOM by raising 
spark.sql.shuffle.partitions to reduce the key numbsers

i think the logic of ExternalAppendOnlyMap should  Optimize.

join seems have similar problems. meanwhile, both left and right table put 
into ExternalAppendOnlyMap is expensive too 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-08-26 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53434178
  
I assume you are not using secure HDFS?  Is this problem something caused 
inside of Mesos? Like is it doing nested doAs calls.  I'm not familiar with the 
mesos deploys.

You shouldn't need to set the HADOOP_USER_NAME variable because it is 
already creating a remote user and then doing a  doAs with that user, which 
should be setting the user without the need for the env variable.

What is the debug statment returning for the user when you run into the 
problem?
 logDebug(running as user:  + user)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3110][YARN] Add a ha mode in YARN mod...

2014-08-26 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2024#issuecomment-53435471
  
I do like the idea of breaking things up into small patches but I also like 
the idea of having a good understanding of how the feature will work together.  

What happen if someone sets the -ha flag right now?  Do the containers stay 
around forever?  I assume at some point they will timeout but until then will 
be wasting resource.

I don't think this small patch provides much as is so could be combined 
with more of the functionality.  So I would rather see this one closed and this 
just combined with the next one where we have more context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3177 : Yarn-alpha ClientBaseSuite Unit t...

2014-08-26 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2111#issuecomment-53436623
  
Which versions of hadoop are you using when you say yarn-alpha vs yarn?  
hadoop 0.23 doesn't even contain DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH so I 
assume you are talking an early version of 2.x?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2608] fix executor backend launch commo...

2014-08-26 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/1986#discussion_r16719740
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 ---
@@ -96,18 +99,39 @@ private[spark] class MesosSchedulerBackend(
 .setValue(value)
 .build())
 }
-val command = CommandInfo.newBuilder()
+val mesosCommand = CommandInfo.newBuilder()
   .setEnvironment(environment)
-val uri = sc.conf.get(spark.executor.uri, null)
-if (uri == null) {
-  command.setValue(new File(sparkHome, 
/sbin/spark-executor).getCanonicalPath)
+  
+val extraJavaOpts = conf.getOption(spark.executor.extraJavaOptions)
+  .map(Utils.splitCommandString).getOrElse(Seq.empty)
+
+// Start executors with a few necessary configs for registering with 
the scheduler
+val sparkJavaOpts = Utils.sparkJavaOpts(conf, 
SparkConf.isExecutorStartupConf)
+val javaOpts = sparkJavaOpts ++ extraJavaOpts
+
+val classPathEntries = 
conf.getOption(spark.executor.extraClassPath).toSeq.flatMap { cp =
+  cp.split(java.io.File.pathSeparator)
+}
+val libraryPathEntries =
+  conf.getOption(spark.executor.extraLibraryPath).toSeq.flatMap { cp 
=
+cp.split(java.io.File.pathSeparator)
+  }
+
+val command = Command(
+  org.apache.spark.executor.MesosExecutorBackend, Nil, 
sc.executorEnvs,
+  classPathEntries, libraryPathEntries, javaOpts)
--- End diff --

i am wondering it will be ok if set ```PYTHONPATH``` to environment as 
flowing
``` scala
val environment = Environment.newBuilder()
sc.executorEnvs.foreach { case (key, value) =
  environment.addVariables(Environment.Variable.newBuilder()
.setName(key)
.setValue(value)
.build())
}
environment.addVariables(Environment.Variable.newBuilder()
  .setName(PYTHONPATH)
  .setValue(sys.env.getOrElse(PYTHONPATH, ))
  .build())
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3233] Executor never stop its SparnEnv,...

2014-08-26 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2138

[SPARK-3233] Executor never stop its SparnEnv, BlockManager, 
ConnectionManager etc.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-3233

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2138


commit e5ad9d3c070a8b9770c0eef23c9d570bffef43d5
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-08-26T15:18:01Z

Modified Executor to stop SparnEnv at the end of itself




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3233] Executor never stop its SparnEnv,...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2138#issuecomment-53440646
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19212/consoleFull)
 for   PR 2138 at commit 
[`e5ad9d3`](https://github.com/apache/spark/commit/e5ad9d3c070a8b9770c0eef23c9d570bffef43d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-53440660
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19213/consoleFull)
 for   PR 1983 at commit 
[`05a1c79`](https://github.com/apache/spark/commit/05a1c795deda2466698f536b58ef29909c157854).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-08-26 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53443546
  
Yes, I'm not using secure HDFS for some reasons. Mesos is just a resource 
manager so It doesn't care running program's id. mesos with switch_user option 
change the running program's id to an account of running spark-submit, but it 
may occurs another issue like every slave machine knows an account id of 
running spark-submit. So spark is changing their user id whatever option on 
mesos about switch_user.

HADOOP_USER_NAME is only valid for non-secure mode. In a secure mode, that 
property is meaningless and we must use switch_user option.

logDebug(running as user:  + user) changes and be changed remote user to 
SPARK_USER, and spark application runs as that user. But HDFS is not working 
like that. in non-secure mode. the user of Filesystem is decided by steps the 
following, check if hdfs runs a secure mode(KERBEROS) or not, then if it's not 
in secure mode, check if HADOOP_USER_NAME is set in System.getenv or 
System.getProperty, and finally, hdfs use system 
user.(UserGroupInformation.commit())

Spark on mesos runs in a non-secure hdfs mode, hdfs client use system user 
if HADOOP_USER_NAME is not set, and system user is mesos' id not SPARK_USER. 
Thus the driver's hdfs user name of running spark-submit is not as same as the 
id of executor's hdfs client name. this occurs a permission problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2889] Create Hadoop config objects cons...

2014-08-26 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1843#discussion_r16722717
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -68,7 +68,26 @@ class SparkHadoopUtil extends Logging {
* Return an appropriate (subclass) of Configuration. Creating config 
can initializes some Hadoop
* subsystems.
*/
-  def newConfiguration(): Configuration = new Configuration()
+  def newConfiguration(conf: SparkConf): Configuration = {
--- End diff --

I know the whole deploy package is excluded from mima checks (because I 
added the exclude at @pwendell's request). How is it documented that these 
packages are private, if at all? Do we need explicit annotations in that case?

(http://spark.apache.org/docs/1.0.0/api/scala/#package does not list the 
package, so maybe that's it?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3198] [SQL] Generate the expression id ...

2014-08-26 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2114#discussion_r16722823
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -38,7 +38,7 @@ abstract class TreeNode[BaseType : TreeNode[BaseType]] {
* Unlike `equals`, `id` can be used to differentiate distinct but 
structurally
* identical branches of a tree.
*/
-  val id = TreeNode.nextId()
+  @transient lazy val id = TreeNode.nextId()
--- End diff --

Yes, you're right, we need to think about the id usage, but currently it is 
the workaround for performance. I noticed that the aggregation performance is 
not so good because large number of `AggregateFunction` objects were created 
during the execution on slave, you know the `AggregationFunction` is a sub 
class of `TreeNode`, actually the id generation here is the bottleneck in 
multithreading env(because of the memory barrier in a multi-core system).

On the other hand, I don't think we have the logic to call the `eq` of an 
expression object during the execution time on slaves directly or indirectly. 
Those calls are supposed to be done in master(for example in logical plan 
analysis  optimization).

So I am OK to merge this PR for the quick fix or rewrite the code for 
getting ride of the id entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3197] [SQL] Reduce the Expression tree ...

2014-08-26 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2113#discussion_r16722891
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -105,17 +105,18 @@ case class Min(child: Expression) extends 
PartialAggregate with trees.UnaryNode[
 case class MinFunction(expr: Expression, base: AggregateExpression) 
extends AggregateFunction {
   def this() = this(null, null) // Required for serialization.
 
-  var currentMin: Any = _
+  @transient var currentMin: MutableLiteral = MutableLiteral(null, 
expr.dataType)
--- End diff --

Oh, yes, you're right, I will update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-26 Thread erikerlandson

Github user erikerlandson commented on the pull request:

https://github.com/apache/spark/pull/1964#issuecomment-53445323
  
@yu-iskw, I'm in favor of adopting a standardized distance metric class.
How best to proceed is a question of architecture and road map.   I'm 
interested in @mengxr 's opinions on where to take it from here, as it would 
have a nontrivial impact on MLLib interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3233] Executor never stop its SparnEnv,...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2138#issuecomment-53446686
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19212/consoleFull)
 for   PR 2138 at commit 
[`e5ad9d3`](https://github.com/apache/spark/commit/e5ad9d3c070a8b9770c0eef23c9d570bffef43d5).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-08-26 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53447464
  
This is debug logs for two different versions.

HADOOP_USER_NAME is not set:
14/08/27 01:11:01 DEBUG UserGroupInformation: hadoop login
14/08/27 01:11:01 DEBUG UserGroupInformation: hadoop login commit
14/08/27 01:11:01 DEBUG UserGroupInformation: using local 
user:UnixPrincipal: hdfs
14/08/27 01:11:01 DEBUG UserGroupInformation: UGI loginUser:hdfs 
(auth:SIMPLE)

HADOOP_USER_NAME is set:
14/08/26 20:18:18 DEBUG SparkHadoopUtil: running as user: 1001079
14/08/26 20:18:18 DEBUG SparkHadoopUtil: running hadoop client as user: 
1001079
14/08/26 20:18:18 DEBUG UserGroupInformation: hadoop login
14/08/26 20:18:18 DEBUG UserGroupInformation: hadoop login commit
14/08/26 20:18:18 DEBUG UserGroupInformation: UGI loginUser:1001079 
(auth:SIMPLE)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-53449444
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19213/consoleFull)
 for   PR 1983 at commit 
[`05a1c79`](https://github.com/apache/spark/commit/05a1c795deda2466698f536b58ef29909c157854).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Document(docId: Int, content: Iterable[Int], var topics: 
Iterable[Int] = null,`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3233] Executor never stop its SparnEnv,...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2138#issuecomment-53450738
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19214/consoleFull)
 for   PR 2138 at commit 
[`6058a58`](https://github.com/apache/spark/commit/6058a58bdf670327252ef613e531f4ca734a097b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53451126
  
@ScrapCodes I had ran pyspark with PyPy successfully, will send out PR and 
some benchmarks later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2889] Create Hadoop config objects cons...

2014-08-26 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1843#issuecomment-53451443
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19215/consoleFull)
 for   PR 1843 at commit 
[`3d345cb`](https://github.com/apache/spark/commit/3d345cba145cdaaaddea23708f7c825f4640).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53451465
  
 Could we merge this, maybe it can catch the last train of 1.1?

i understand, but not my call.

i'll probably have a stronger opinion in coming weeks *smile*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2871] [PySpark] add RDD.lookup(key)

2014-08-26 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2093#issuecomment-53450874
  
@mattf I'm working on Spark since recently, also trying to follow the 
process as others, and made some mistakes sometime, hope that I will do better, 
thanks.

There are many things needed to do (especially to 1.1 release) recently, 
quality and process are both important things we should take care of.

Could we merge this, maybe it can catch the last train of 1.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-08-26 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53452530
  
What does the spark log say is the user when it does the runAsUser: 
logDebug(running as user:  + user)?  I assume the proper user like 1001079?

Right, what commit does after checking the HADOOP_USER_NAME is look at the 
os name and this is where I thought the doAs would properly set it.  Perhaps 
I'm mistaken.  

To clarify your setup is like this?
 - on mesos the executors run as a super user like 'mesos'
 - hdfs cluster is running as user 'hdfs'
 - when it does the runAsUser it switches to try to use the actual user 
(SPARK_USER) - for example 'joe'

one other reason I ask about this is that it works fine on yarn.  Running 
insecure hdfs and yarn as the user 'yarn' and and then access hdfs as the 
actual user (joe) works fine. permissions are set properly.  So I'm trying to 
figure out what the differences is with mesos


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2845] Add timestamps to block manager e...

2014-08-26 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/654#issuecomment-53452975
  
Ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add line continuation for script to work w/ py...

2014-08-26 Thread mattf

GitHub user mattf opened a pull request:

https://github.com/apache/spark/pull/2139

Add line continuation for script to work w/ py2.7.5

Error was -

$ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py
  File ./dev/create-release/generate-changelist.py, line 128
if day  SPARK_REPO_CHANGE_DATE1 or
  ^
SyntaxError: invalid syntax

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mattf/spark 
master-fix-generate-changelist.py-0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2139


commit 1380afa649f94ac44e19bd1cff87b0d8e9e888c0
Author: Matthew Farrellee m...@redhat.com
Date:   2014-08-26T16:58:42Z

Add line continuation for script to work w/ py2.7.5

Error was -

$ SPARK_HOME=$PWD/dist ./dev/create-release/generate-changelist.py
  File ./dev/create-release/generate-changelist.py, line 128
if day  SPARK_REPO_CHANGE_DATE1 or
  ^
SyntaxError: invalid syntax




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add line continuation for script to work w/ py...

2014-08-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2139#issuecomment-53454019
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in Stor...

2014-08-26 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2131#issuecomment-53454018
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-08-26 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2019#discussion_r16726926
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/ConnectionManager.scala ---
@@ -280,42 +280,46 @@ private[spark] class ConnectionManager(
 }
 
 while(!keyInterestChangeRequests.isEmpty) {
+  // Expect key interested in OP_ACCEPT is not change its interest
--- End diff --

Not sure I understand what the comment is trying to say.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3229] spark.shuffle.safetyFraction and ...

2014-08-26 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2135#issuecomment-53454566
  
@sarutak These configs are internal to Spark and are only relevant because 
of the way we do size estimation. They aren't intended to be documented and 
exposed to the public, which might be confusing to the user because of all the 
different fractions we have. If the user wishes, s/he can lower the relevant 
`memoryFraction` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 517 matches

Mail list logo