[GitHub] spark pull request: Kevincox spark timestamp

kevincox Wed, 19 Aug 2015 12:46:11 -0700

GitHub user kevincox opened a pull request:

    https://github.com/apache/spark/pull/8319


    Kevincox spark timestamp

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Shopify/spark kevincox-spark-timestamp

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8319
    
----
commit 6cf67a7780ffa5ffe6ace9c1efea94419c595459
Author: Dale <[email protected]>
Date:   2015-01-04T21:28:37Z

    [SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs
    
    Author: Dale <[email protected]>
    
    Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits:
    
    5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the 
exception by using a finally block.
    2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs 
during DAGScheduler initialization.

commit 37d7d5cc295956be636e4ef89ee03550d82c4808
Author: bilna <[email protected]>
Date:   2015-01-05T03:37:48Z

    [SPARK-4631] unit test for MQTT
    
    Please review the unit test for MQTT
    
    Author: bilna <[email protected]>
    Author: Bilna P <[email protected]>
    
    Closes #3844 from Bilna/master and squashes the following commits:
    
    acea3a3 [bilna] Adding dependency with scope test
    28681fa [bilna] Merge remote-tracking branch 'upstream/master'
    fac3904 [bilna] Correction in Indentation and coding style
    ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
    4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
    04503cf [bilna] Added embedded broker service for mqtt test
    89d804e [bilna] Merge remote-tracking branch 'upstream/master'
    fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
    4b58094 [Bilna P] Update MQTTStreamSuite.scala
    b1ac4ad [bilna] Added BeforeAndAfter
    5f6bfd2 [bilna] Added BeforeAndAfter
    e8b6623 [Bilna P] Update MQTTStreamSuite.scala
    5ca6691 [Bilna P] Update MQTTStreamSuite.scala
    8616495 [bilna] [SPARK-4631] unit test for MQTT

commit 3fc949771fc907b5b7ebd3c712fe9f5b310b3505
Author: Josh Rosen <[email protected]>
Date:   2015-01-05T04:26:18Z

    [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs
    
    This patch disables output spec. validation for jobs launched through Spark 
Streaming, since this interferes with checkpoint recovery.
    
    Hadoop OutputFormats have a `checkOutputSpecs` method which performs 
certain checks prior to writing output, such as checking whether the output 
directory already exists.  SPARK-1100 added checks for FileOutputFormat, 
SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and 
SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just 
FileOutputFormat.
    
    In Spark Streaming, we might have to re-process a batch during checkpoint 
recovery, so `save` actions may be called multiple times.  In addition to 
`DStream`'s own save actions, users might use `transform` or `foreachRDD` and 
call the `RDD` and `PairRDD` save actions.  When output spec. validation is 
enabled, the second calls to these actions will fail due to existing output.
    
    This patch automatically disables output spec. validation for jobs 
submitted by the Spark Streaming scheduler.  This is done by using Scala's 
`DynamicVariable` to propagate the bypass setting without having to mutate 
SparkConf or introduce a global variable.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits:
    
    36eaf35 [Josh Rosen] Add comment explaining use of transform() in test.
    6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform()
    7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; 
update conf. guide
    bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not 
refer to Spark Streaming.
    e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic.
    762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark 
Streaming jobs.

commit 3dbf4f2100fb5dc2d7f8f68ea3147ceed47e6e9d
Author: zsxwing <[email protected]>
Date:   2015-01-05T05:03:17Z

    [SPARK-5067][Core] Use '===' to compare well-defined case class
    
    A simple fix would be adding `assert(e1.appId == e2.appId)` for 
`SparkListenerApplicationStart`. But actually we can use `===` for well-defined 
case class directly. Therefore, instead of fixing this issue, I use `===` to 
compare those well-defined case classes (all fields have implemented a correct 
`equals` method, such as primitive types)
    
    Author: zsxwing <[email protected]>
    
    Closes #3886 from zsxwing/SPARK-5067 and squashes the following commits:
    
    0a51711 [zsxwing] Use '===' to compare well-defined case class

commit 5583c3bc16056ad1796a9a1b39bfc94993298a4e
Author: zsxwing <[email protected]>
Date:   2015-01-05T05:06:04Z

    [SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler
    
    It's not necessary to set `TaskSchedulerImpl.dagScheduler` in preStart. 
It's safe to set it after `initializeEventProcessActor()`.
    
    Author: zsxwing <[email protected]>
    
    Closes #3887 from zsxwing/SPARK-5069 and squashes the following commits:
    
    d95894f [zsxwing] Fix the race condition of TaskSchedulerImpl.dagScheduler

commit ed6dc94455ccdc020004afce2219d844be91a9fc
Author: zsxwing <[email protected]>
Date:   2015-01-05T05:09:21Z

    [SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite
    
    Because `sparkEnv.blockManager.master.removeBlock` is asynchronous, we need 
to make sure the block has already been removed before calling 
`super.enqueueSuccessfulTask`.
    
    Author: zsxwing <[email protected]>
    
    Closes #3894 from zsxwing/SPARK-5083 and squashes the following commits:
    
    d97c03d [zsxwing] Fix a flaky test in TaskResultGetterSuite

commit 136141c558d5c7e075f9db4dd48e76c2e2e41f9a
Author: zsxwing <[email protected]>
Date:   2015-01-05T05:18:33Z

    [SPARK-5074][Core] Fix a non-deterministic test failure
    
    Add `assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))` to make 
sure `sparkListener` receive the message.
    
    Author: zsxwing <[email protected]>
    
    Closes #3889 from zsxwing/SPARK-5074 and squashes the following commits:
    
    e61c198 [zsxwing] Fix a non-deterministic test failure

commit 2bcf38f02c6abd4d5fabf1b133735ad8cb4d2c86
Author: Varun Saxena <[email protected]>
Date:   2015-01-05T18:32:37Z

    [SPARK-4688] Have a single shared network timeout in Spark
    
    [SPARK-4688] Have a single shared network timeout in Spark
    
    Author: Varun Saxena <[email protected]>
    Author: varunsaxena <[email protected]>
    
    Closes #3562 from varunsaxena/SPARK-4688 and squashes the following commits:
    
    6e97f72 [Varun Saxena] [SPARK-4688] Single shared network timeout
    cd783a2 [Varun Saxena] SPARK-4688
    d6f8c29 [Varun Saxena] SCALA-4688
    9562b15 [Varun Saxena] SPARK-4688
    a75f014 [varunsaxena] SPARK-4688
    594226c [varunsaxena] SPARK-4688

commit 618d9d5c37db15bd8836b3230cb34ed4ecb13921
Author: WangTao <[email protected]>
Date:   2015-01-05T19:59:38Z

    [SPARK-5057] Log message in failed askWithReply attempts
    
    https://issues.apache.org/jira/browse/SPARK-5057
    
    Author: WangTao <[email protected]>
    Author: WangTaoTheTonic <[email protected]>
    
    Closes #3875 from WangTaoTheTonic/SPARK-5057 and squashes the following 
commits:
    
    1503487 [WangTao] use string interpolation
    706c8a7 [WangTaoTheTonic] log more messages

commit a81b62424ae2945be4d3597be7fac4e43540573f
Author: Jongyoul Lee <[email protected]>
Date:   2015-01-05T20:05:09Z

    [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environme...
    
    ...nt at all.
    
    - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to 
MesosExecutorBackend.launchTask
    - See the Jira Issue for more details.
    
    Author: Jongyoul Lee <[email protected]>
    
    Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits:
    
    46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect 
TaskRunner in Mesos environment at all. - Removed unused import
    3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect 
TaskRunner in Mesos environment at all. - Removed comments and adjusted 
indentations
    2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect 
TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from 
MesosExecutorDriver.run to MesosExecutorBackend.launchTask

commit dda9d6f9316c5fedd4289bdea9504a64c64c1a7d
Author: freeman <[email protected]>
Date:   2015-01-05T21:10:59Z

    [SPARK-5089][PYSPARK][MLLIB] Fix vector convert
    
    This is a small change addressing a potentially significant bug in how 
PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to 
`DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark 
should automatically upcast to float64s, but currently this wasn't actually 
happening. As a result, non-float64 would be silently parsed inappropriately 
during SerDe, yielding erroneous results when running, for example, KMeans.
    
    The PR includes the fix, as well as a new test for the correct conversion 
behavior.
    
    davies
    
    Author: freeman <[email protected]>
    
    Closes #3902 from freeman-lab/fix-vector-convert and squashes the following 
commits:
    
    764db47 [freeman] Add a test for proper conversion behavior
    704f97e [freeman] Return array after changing type

commit 2ce91a3dc06d72f328174ed8b7846372d1b751e3
Author: Reynold Xin <[email protected]>
Date:   2015-01-05T23:19:53Z

    [SPARK-5093] Set spark.network.timeout to 120s consistently.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #3903 from rxin/timeout-120 and squashes the following commits:
    
    7c2138e [Reynold Xin] [SPARK-5093] Set spark.network.timeout to 120s 
consistently.

commit dbd12d51d4f310b5205f4b008bd64c45dedbd181
Author: Reynold Xin <[email protected]>
Date:   2015-01-05T23:34:22Z

    [SPARK-5040][SQL] Support expressing unresolved attributes using 
$"attribute name" notation in SQL DSL.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #3862 from rxin/stringcontext-attr and squashes the following 
commits:
    
    9b10f57 [Reynold Xin] Rename StrongToAttributeConversionHelper
    72121af [Reynold Xin] [SPARK-5040][SQL] Support expressing unresolved 
attributes using $"attribute name" notation in SQL DSL.

commit 6ef366009a7900e1a721be9aec7dbf089acc6a46
Author: Kostas Sakellis <[email protected]>
Date:   2015-01-06T07:26:33Z

    SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
    
    ExecutorRunnableUtil is a parent of ExecutorRunnable because of the 
yarn-alpha and yarn-stable split. Now that yarn-alpha is gone, this commit 
squashes the unnecessary hierarchy. The methods from ExecutorRunnableUtil are 
added as private.
    
    Author: Kostas Sakellis <[email protected]>
    
    Closes #3696 from ksakellis/kostas-spark-4843 and squashes the following 
commits:
    
    486716f [Kostas Sakellis] Moved prepareEnvironment call to after yarnConf 
declaration
    470e22e [Kostas Sakellis] Fixed indentation and renamed sparkConf variable
    9b1b1c9 [Kostas Sakellis] SPARK-4843 [YARN] Squash ExecutorRunnableUtil and 
ExecutorRunnable

commit a305931805237996b0ad6fcac4a48c55d4274d4d
Author: Josh Rosen <[email protected]>
Date:   2015-01-06T08:31:19Z

    [SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls 
and SystemClock usage
    
    This patch refactors Spark Streaming's FileInputStream tests to remove uses 
of Thread.sleep() and SystemClock, which should hopefully resolve some 
longstanding flakiness in these tests (see SPARK-1600).
    
    Key changes:
    
    - Modify FileInputDStream to use the scheduler's Clock instead of 
System.currentTimeMillis(); this allows it to be tested using ManualClock.
    - Fix a synchronization issue in ManualClock's `currentTime` method.
    - Add a StreamingTestWaiter class which allows callers to block until a 
certain number of batches have finished.
    - Change the FileInputStream tests so that files' modification times are 
manually set based off of ManualClock; this eliminates many Thread.sleep calls.
    - Update these tests to use the withStreamingContext fixture.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #3801 from JoshRosen/SPARK-1600 and squashes the following commits:
    
    e4494f4 [Josh Rosen] Address a potential race when setting file 
modification times
    8340bd0 [Josh Rosen] Use set comparisons for output.
    0b9c252 [Josh Rosen] Fix some ManualClock usage problems.
    1cc689f [Josh Rosen] ConcurrentHashMap -> SynchronizedMap
    db26c3a [Josh Rosen] Use standard timeout in ScalaTest `eventually` blocks.
    3939432 [Josh Rosen] Rename StreamingTestWaiter to BatchCounter
    0b9c3a1 [Josh Rosen] Wait for checkpoint to complete
    863d71a [Josh Rosen] Remove Thread.sleep that was used to make task run 
slowly
    b4442c3 [Josh Rosen] batchTimeToSelectedFiles should be thread-safe
    15b48ee [Josh Rosen] Replace several TestWaiter methods w/ ScalaTest 
eventually.
    fffc51c [Josh Rosen] Revert "Remove last remaining sleep() call"
    dbb8247 [Josh Rosen] Remove last remaining sleep() call
    566a63f [Josh Rosen] Fix log message and comment typos
    da32f3f [Josh Rosen] Fix log message and comment typos
    3689214 [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
SPARK-1600
    c8f06b1 [Josh Rosen] Remove Thread.sleep calls in FileInputStream 
CheckpointSuite test.
    d4f2d87 [Josh Rosen] Refactor file input stream tests to not rely on 
SystemClock.
    dda1403 [Josh Rosen] Add StreamingTestWaiter class.
    3c3efc3 [Josh Rosen] Synchronize `currentTime` in ManualClock
    a95ddc4 [Josh Rosen] Modify FileInputDStream to use Clock class.

commit 8373706a67a58e57132414490bdb7f20b9f5c76a
Author: kj-ki <[email protected]>
Date:   2015-01-06T17:49:37Z

    [Minor] Fix comments for GraphX 2D partitioning strategy
    
    The sum of vertices on matrix (v0 to v11) is 12. And, I think one same 
block overlaps in this strategy.
    
    This is minor PR, so I didn't file in JIRA.
    
    Author: kj-ki <[email protected]>
    
    Closes #3904 from kj-ki/fix-partitionstrategy-comments and squashes the 
following commits:
    
    79829d9 [kj-ki] Fix comments for 2D partitioning.

commit b491f13705545c98b0aaf932e15de9198422b0f1
Author: Sean Owen <[email protected]>
Date:   2015-01-06T20:02:08Z

    SPARK-4159 [CORE] Maven build doesn't run JUnit test suites
    
    This PR:
    
    - Reenables `surefire`, and copies config from `scalatest` (which is itself 
an old fork of `surefire`, so similar)
    - Tells `surefire` to test only Java tests
    - Enables `surefire` and `scalatest` for all children, and in turn 
eliminates some duplication.
    
    For me this causes the Scala and Java tests to be run once each, it seems, 
as desired. It doesn't affect the SBT build but works for Maven. I still need 
to verify that all of the Scala tests and Java tests are being run.
    
    Author: Sean Owen <[email protected]>
    
    Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
    
    2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN 
tests as it appears to be obsolete
    12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that 
both surefire and scalatest output is preserved. Also standardize/correct 
comments a bit.
    e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config 
cloned from scalatest; centralize test config in the parent

commit 8c1a588e52e54b713955c8a913775e40fd80d12e
Author: Travis Galoppo <[email protected]>
Date:   2015-01-06T21:57:42Z

    SPARK-5017 [MLlib] - Use SVD to compute determinant and inverse of 
covariance matrix
    
    MultivariateGaussian was calling both pinv() and det() on the covariance 
matrix, effectively performing two matrix decompositions.  Both values are now 
computed using the singular value decompositon. Both the pseudo-inverse and the 
pseudo-determinant are used to guard against singular matrices.
    
    Author: Travis Galoppo <[email protected]>
    
    Closes #3871 from tgaloppo/spark-5017 and squashes the following commits:
    
    383b5b3 [Travis Galoppo] MultivariateGaussian - minor optimization in 
density calculation
    a5b8bc5 [Travis Galoppo] Added additional points to tests in test suite. 
Fixed comment in MultivariateGaussian
    629d9d0 [Travis Galoppo] Moved some test values from var to val.
    dc3d0f7 [Travis Galoppo] Catch potential exception calculating 
pseudo-determinant. Style improvements.
    d448137 [Travis Galoppo] Added test suite for MultivariateGaussian, 
including test for degenerate case.
    1989be0 [Travis Galoppo] SPARK-5017 - Fixed to use SVD to compute 
determinant and inverse of covariance matrix.  Previous code called both pinv() 
and det(), effectively performing two matrix decompositions. Additionally, the 
pinv() implementation in Breeze is known to fail for singular matrices.
    b4415ea [Travis Galoppo] Merge branch 'spark-5017' of 
https://github.com/tgaloppo/spark into spark-5017
    6f11b6d [Travis Galoppo] SPARK-5017 - Use SVD to compute determinant and 
inverse of covariance matrix. Code was calling both det() and pinv(), 
effectively performing two matrix decompositions. Futhermore, Breeze pinv() 
currently fails for singular matrices.
    fd9784c [Travis Galoppo] SPARK-5017 - Use SVD to compute determinant and 
inverse of covariance matrix

commit e8f748449e76792272321297f579ac6f1fadeffe
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-01-06T22:00:45Z

    [SPARK-5050][Mllib] Add unit test for sqdist
    
    Related to #3643. Follow the previous suggestion to add unit test for 
`sqdist` in `VectorsSuite`.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #3869 from viirya/sqdist_test and squashes the following commits:
    
    fb743da [Liang-Chi Hsieh] Modified for comment and fix bug.
    90a08f3 [Liang-Chi Hsieh] Modified for comment.
    39a3ca6 [Liang-Chi Hsieh] Take care of special case.
    b789f42 [Liang-Chi Hsieh] More proper unit test with random sparsity 
pattern.
    c36be68 [Liang-Chi Hsieh] Add unit test for sqdist.

commit 977cc3163fa51eaa76255606b7452fca7f6d01b8
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-01-07T05:23:31Z

    [SPARK-5099][Mllib] Simplify logistic loss function
    
    This is a minor pr where I think that we can simply take minus of `margin`, 
instead of subtracting  `margin`.
    
    Mathematically, they are equal. But the modified equation is the common 
form of logistic loss function and so more readable. It also computes more 
accurate value as some quick tests show.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #3899 from viirya/logit_func and squashes the following commits:
    
    91a3860 [Liang-Chi Hsieh] Modified for comment.
    0aa51e4 [Liang-Chi Hsieh] Further simplified.
    72a295e [Liang-Chi Hsieh] Revert LogLoss back and add more considerations 
in Logistic Loss.
    a3f83ca [Liang-Chi Hsieh] Fix a bug.
    2bc5712 [Liang-Chi Hsieh] Simplify loss function.

commit 0e703eb42e58463612cb464c46b77bdc047759fb
Author: Masayoshi TSUZUKI <[email protected]>
Date:   2015-01-07T15:32:16Z

    [SPARK-2458] Make failed application log visible on History Server
    
    Enabled HistoryServer to show incomplete applications.
    We can see the log for incomplete applications by clicking the bottom link.
    
    Author: Masayoshi TSUZUKI <[email protected]>
    
    Closes #3467 from tsudukim/feature/SPARK-2458-2 and squashes the following 
commits:
    
    76205d2 [Masayoshi TSUZUKI] Fixed and added test code.
    29a04a9 [Masayoshi TSUZUKI] Merge branch 'master' of 
github.com:tsudukim/spark into feature/SPARK-2458-2
    f9ef854 [Masayoshi TSUZUKI] Added space between "if" and "(". Fixed 
"Incomplete" as capitalized in the web UI. Modified double negative variable 
name.
    9b465b0 [Masayoshi TSUZUKI] Modified typo and better implementation.
    3ed8a41 [Masayoshi TSUZUKI] Modified too long lines.
    08ea14d [Masayoshi TSUZUKI] [SPARK-2458] Make failed application log 
visible on History Server

commit 718750c22238f96306c0f05842bbe82e5fd75236
Author: huangzhaowei <[email protected]>
Date:   2015-01-07T14:10:42Z

    [YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA
    
    Nowadays, yarn-client will exit directly when the HA change happens no 
matter how many times the am should retry.
    The reason may be that the default final status only considerred the 
sys.exit, and the yarn-client HA cann't benefit from this.
    So we should distinct the default final status between client and cluster, 
because the SUCCEEDED status may cause the HA failed in client mode and 
UNDEFINED may cause the error reporter in cluster when using sys.exit.
    
    Author: huangzhaowei <[email protected]>
    
    Closes #3771 from SaintBacchus/YarnHA and squashes the following commits:
    
    c02bfcc [huangzhaowei] Improve the comment of the funciton 
'getDefaultFinalStatus'
    0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA

commit 84182f0400f14c0beaebdfefcaac10dd20d64c19
Author: WangTaoTheTonic <[email protected]>
Date:   2015-01-07T14:14:39Z

    [SPARK-2165][YARN]add support for setting maxAppAttempts in the 
ApplicationSubmissionContext
    
    ...xt
    
    https://issues.apache.org/jira/browse/SPARK-2165
    
    I still have 2 questions:
    * If this config is not set, we should use yarn's corresponding value or a 
default value(like 2) on spark side?
    * Is the config name best? Or "spark.yarn.am.maxAttempts"?
    
    Author: WangTaoTheTonic <[email protected]>
    
    Closes #3878 from WangTaoTheTonic/SPARK-2165 and squashes the following 
commits:
    
    1416c83 [WangTaoTheTonic] use the name spark.yarn.maxAppAttempts
    202ac85 [WangTaoTheTonic] rephrase some
    afdfc99 [WangTaoTheTonic] more detailed description
    91562c6 [WangTaoTheTonic] add support for setting maxAppAttempts in the 
ApplicationSubmissionContext

commit d1e87b33e907a6b5316c4f364d0bd0fb5a1d9087
Author: DB Tsai <[email protected]>
Date:   2015-01-07T18:13:41Z

    [SPARK-5128][MLLib] Add common used log1pExp API in MLUtils
    
    When `x` is positive and large, computing `math.log(1 + math.exp(x))` will 
lead to arithmetic
    overflow. This will happen when `x > 709.78` which is not a very large 
number.
    It can be addressed by rewriting the formula into `x + 
math.log1p(math.exp(-x))` when `x > 0`.
    
    Author: DB Tsai <[email protected]>
    
    Closes #3915 from dbtsai/mathutil and squashes the following commits:
    
    bec6a84 [DB Tsai] remove empty line
    3239541 [DB Tsai] revert part of patch into another PR
    23144f3 [DB Tsai] doc
    49f3658 [DB Tsai] temp
    6c29ed3 [DB Tsai] formating
    f8447f9 [DB Tsai] address another overflow issue in gradientMultiplier in 
LOR gradient code
    64eefd0 [DB Tsai] first commit

commit 60fde12bc4e824c1447db69f92387f35e9b67331
Author: hushan[è¡ç] <[email protected]>
Date:   2015-01-07T20:09:12Z

    [SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson
    
    SPARK-5132:
    stageInfoToJson: Stage Attempt Id
    stageInfoFromJson: Attempt Id
    
    Author: hushan[è¡ç] <[email protected]>
    
    Closes #3932 from suyanNone/json-stage and squashes the following commits:
    
    41419ab [hushan[è¡ç]] Correct stage Attempt Id key in stageInfofromJson

commit 65c9e1022521053e130220802bbfddd1dba0733e
Author: zsxwing <[email protected]>
Date:   2015-01-08T07:01:30Z

    [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid 
urls can crash the process.
    
    Because `actorSelection` will return `deadLetters` for an invalid path,  
Worker keeps quiet for an invalid master url. It's better to log an error so 
that people can find such problem quickly.
    
    This PR will check the url before sending to `actorSelection`, throw and 
log a SparkException for an invalid url.
    
    Author: zsxwing <[email protected]>
    
    Closes #3927 from zsxwing/SPARK-5126 and squashes the following commits:
    
    9d429ee [zsxwing] Create a utility method in Utils to parse Spark url; 
verify urls before creating Actors so that invalid urls can crash the process.
    8286e51 [zsxwing] Check the url before sending to Akka and log the error if 
the url is invalid

commit 536b82f9cb5535e57393eee401ebddad524aee26
Author: Shuo Xiang <[email protected]>
Date:   2015-01-08T07:22:37Z

    [SPARK-5116][MLlib] Add extractor for SparseVector and DenseVector
    
    Add extractor for SparseVector and DenseVector in MLlib to save some code 
while performing pattern matching on Vectors. For example, previously we may 
use:
    
         vec match {
              case dv: DenseVector =>
                val values = dv.values
                ...
              case sv: SparseVector =>
                val indices = sv.indices
                val values = sv.values
                val size = sv.size
                ...
          }
    
    with extractor it is:
    
        vec match {
            case DenseVector(values) =>
              ...
            case SparseVector(size, indices, values) =>
              ...
        }
    
    Author: Shuo Xiang <[email protected]>
    
    Closes #3919 from coderxiang/extractor and squashes the following commits:
    
    359e8d5 [Shuo Xiang] merge master
    ca5fc3e [Shuo Xiang] merge master
    0b1e190 [Shuo Xiang] use extractor for vectors in RowMatrix.scala
    e961805 [Shuo Xiang] use extractor for vectors in StandardScaler.scala
    c2bbdaf [Shuo Xiang] use extractor for vectors in IDFscala
    8433922 [Shuo Xiang] use extractor for vectors in NaiveBayes.scala and 
Normalizer.scala
    d83c7ca [Shuo Xiang] use extractor for vectors in Vectors.scala
    5523dad [Shuo Xiang] Add extractor for SparseVector and DenseVector

commit 0114e817977782e2e9ae6eeb3d2719f5aa76148b
Author: Sandy Ryza <[email protected]>
Date:   2015-01-08T17:25:43Z

    SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
    
    Author: Sandy Ryza <[email protected]>
    
    Closes #3896 from sryza/sandy-spark-5087 and squashes the following commits:
    
    65611d0 [Sandy Ryza] Review feedback
    3294176 [Sandy Ryza] SPARK-5087. [YARN] Merge yarn.Client and 
yarn.ClientBase

commit 46dca8c79d6de431a8088f1346ddd500d91a7203
Author: Takeshi Yamamuro <[email protected]>
Date:   2015-01-08T17:55:12Z

    [SPARK-4917] Add a function to convert into a graph with canonical edges in 
GraphOps
    
    Convert bi-directional edges into uni-directional ones instead of 
'canonicalOrientation' in GraphLoader.edgeListFile.
    This function is useful when a graph is loaded as it is and then is 
transformed into one with canonical edges.
    It rewrites the vertex ids of edges so that srcIds are bigger than dstIds, 
and merges the duplicated edges.
    
    Author: Takeshi Yamamuro <[email protected]>
    
    Closes #3760 from maropu/ConvertToCanonicalEdgesSpike and squashes the 
following commits:
    
    7f8b580 [Takeshi Yamamuro] Add a function to convert into a graph with 
canonical edges in GraphOps

commit 60b922795d0d6a5e0db96c11416804153e307810
Author: Zhang, Liye <[email protected]>
Date:   2015-01-08T18:40:26Z

    [SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in 
standalone mode
    
    when enabling eventlog in standalone mode, if give the wrong configuration, 
the standalone cluster will down (cause master restart, lose connection with 
workers).
    How to reproduce: just give an invalid value to "spark.eventLog.dir", for 
example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will 
throw illegalArgumentException, which will cause the Master restart. And the 
whole cluster is not available.
    
    Author: Zhang, Liye <[email protected]>
    
    Closes #3824 from liyezhang556520/wrongConf4Cluster and squashes the 
following commits:
    
    3c24d98 [Zhang, Liye] revert change with logwarning and excetption for 
FileNotFoundException
    3c1ac2e [Zhang, Liye] change var to val
    a49c52f [Zhang, Liye] revert wrong modification
    12eee85 [Zhang, Liye] add more message in log and on webUI
    5c1fa33 [Zhang, Liye] cache exceptions when eventlog with wrong conf

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Kevincox spark timestamp

Reply via email to