[jira] [Created] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc

2014-11-25 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4617: - Summary: Fix spark.yarn.applicationMaster.waitTries doc Key: SPARK-4617 URL: https://issues.apache.org/jira/browse/SPARK-4617 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4352: -- Description: Currently, achieving data locality in Spark is difficult u preferredNodeLocalityData

[jira] [Updated] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4352: -- Description: Currently, achieving data locality in Spark is difficult unless an application takes

[jira] [Assigned] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4447: - Assignee: Patrick Wendell Remove layers of abstraction in YARN code no longer needed after

[jira] [Assigned] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4447: - Assignee: Sandy Ryza (was: Patrick Wendell) Remove layers of abstraction in YARN code no

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222572#comment-14222572 ] Sandy Ryza commented on SPARK-4452: --- [~tianshuo], I took a look at the patch, and the

[jira] [Created] (SPARK-4569) Rename externalSorting in Aggregator

2014-11-23 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4569: - Summary: Rename externalSorting in Aggregator Key: SPARK-4569 URL: https://issues.apache.org/jira/browse/SPARK-4569 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-1956) Enable shuffle consolidation by default

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221453#comment-14221453 ] Sandy Ryza commented on SPARK-1956: --- This is of smaller importance now that sort-based

[jira] [Created] (SPARK-4550) In sort-based shuffle, store map outputs as serialized

2014-11-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4550: - Summary: In sort-based shuffle, store map outputs as serialized Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4550: -- Summary: In sort-based shuffle, store map outputs in serialized form (was: In sort-based shuffle,

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2014-11-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221710#comment-14221710 ] Sandy Ryza commented on SPARK-4550: --- We don't, though it would allow us to be much more

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-11-20 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220620#comment-14220620 ] Sandy Ryza commented on SPARK-2089: --- Another possible solution here is SPARK-4352.

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216933#comment-14216933 ] Sandy Ryza commented on SPARK-4452: --- One issue with a limits-by-object approach is that

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217340#comment-14217340 ] Sandy Ryza commented on SPARK-4452: --- [~matei] my point is not that forced spilling

[jira] [Created] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4447: - Summary: Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha Key: SPARK-4447 URL: https://issues.apache.org/jira/browse/SPARK-4447

[jira] [Updated] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4447: -- Description: For example, YarnRMClient and YarnRMClientImpl can be merged YarnAllocator and

[jira] [Commented] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214411#comment-14214411 ] Sandy Ryza commented on SPARK-4447: --- Planning to work on this. Remove layers of

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215018#comment-14215018 ] Sandy Ryza commented on SPARK-4452: --- I haven't thought the implications out fully, but

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Affects Version/s: 1.1.0 Enhance Sort-based Shuffle to avoid spilling small files

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215047#comment-14215047 ] Sandy Ryza commented on SPARK-4452: --- A third possible fix would be to have the shuffle

[jira] [Created] (SPARK-4456) Document why spilling depends on both elements read and memory used

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4456: - Summary: Document why spilling depends on both elements read and memory used Key: SPARK-4456 URL: https://issues.apache.org/jira/browse/SPARK-4456 Project: Spark

[jira] [Created] (SPARK-4457) Document how to build for Hadoop versions greater than 2.4

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4457: - Summary: Document how to build for Hadoop versions greater than 2.4 Key: SPARK-4457 URL: https://issues.apache.org/jira/browse/SPARK-4457 Project: Spark Issue

[jira] [Updated] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Summary: Shuffle data structures can starve others on the same thread for memory (was: Enhance

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215269#comment-14215269 ] Sandy Ryza commented on SPARK-4452: --- Updated the title to reflect the specific problem.

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215436#comment-14215436 ] Sandy Ryza commented on SPARK-4452: --- [~andrewor14], IIUC, (2) shouldn't happen in

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215448#comment-14215448 ] Sandy Ryza commented on SPARK-4452: --- Ah, true. Shuffle data structures can starve

[jira] [Created] (SPARK-4375) assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4375: - Summary: assembly built with Maven is missing most of repl classes Key: SPARK-4375 URL: https://issues.apache.org/jira/browse/SPARK-4375 Project: Spark Issue

[jira] [Updated] (SPARK-4375) assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4375: -- Description: In particular, the ones in the split scala-2.10/scala-2.11 directories aren't being added

[jira] [Updated] (SPARK-4375) Assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4375: -- Summary: Assembly built with Maven is missing most of repl classes (was: assembly built with Maven is

[jira] [Commented] (SPARK-4375) Assembly built with Maven is missing most of repl classes

2014-11-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209388#comment-14209388 ] Sandy Ryza commented on SPARK-4375: --- This all makes sense to me. Will put up a patch.

[jira] [Commented] (SPARK-4338) Remove yarn-alpha support

2014-11-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206104#comment-14206104 ] Sandy Ryza commented on SPARK-4338: --- Planning to take a stab at this Remove yarn-alpha

[jira] [Created] (SPARK-4338) Remove yarn-alpha support

2014-11-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4338: - Summary: Remove yarn-alpha support Key: SPARK-4338 URL: https://issues.apache.org/jira/browse/SPARK-4338 Project: Spark Issue Type: Sub-task Components:

[jira] [Created] (SPARK-4352) Incorporate locality preferences in dynamic allocation requests

2014-11-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4352: - Summary: Incorporate locality preferences in dynamic allocation requests Key: SPARK-4352 URL: https://issues.apache.org/jira/browse/SPARK-4352 Project: Spark

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205346#comment-14205346 ] Sandy Ryza commented on SPARK-4290: --- SparkFiles.get needs to be called, but it will only

[jira] [Created] (SPARK-4337) Add ability to cancel pending requests to YARN

2014-11-10 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4337: - Summary: Add ability to cancel pending requests to YARN Key: SPARK-4337 URL: https://issues.apache.org/jira/browse/SPARK-4337 Project: Spark Issue Type:

[jira] [Commented] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202319#comment-14202319 ] Sandy Ryza commented on SPARK-4267: --- Strange. Checked in the code and it seems like

[jira] [Commented] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202382#comment-14202382 ] Sandy Ryza commented on SPARK-4280: --- So it looks like the block IDs of broadcast

[jira] [Created] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-06 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4280: - Summary: In dynamic allocation, add option to never kill executors with cached blocks Key: SPARK-4280 URL: https://issues.apache.org/jira/browse/SPARK-4280 Project: Spark

[jira] [Commented] (SPARK-4280) In dynamic allocation, add option to never kill executors with cached blocks

2014-11-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200801#comment-14200801 ] Sandy Ryza commented on SPARK-4280: --- My thinking was that it would just be based on

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201572#comment-14201572 ] Sandy Ryza commented on SPARK-4290: --- If you call SparkContext#addFile, the file will be

[jira] [Comment Edited] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196464#comment-14196464 ] Sandy Ryza edited comment on SPARK-4214 at 11/4/14 6:00 PM: We

[jira] [Commented] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196464#comment-14196464 ] Sandy Ryza commented on SPARK-4214: --- We can implement this in either a weak way or a

[jira] [Comment Edited] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196464#comment-14196464 ] Sandy Ryza edited comment on SPARK-4214 at 11/4/14 6:00 PM: We

[jira] [Created] (SPARK-4227) Document external shuffle service

2014-11-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4227: - Summary: Document external shuffle service Key: SPARK-4227 URL: https://issues.apache.org/jira/browse/SPARK-4227 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-4230) Doc for spark.default.parallelism is incorrect

2014-11-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4230: - Summary: Doc for spark.default.parallelism is incorrect Key: SPARK-4230 URL: https://issues.apache.org/jira/browse/SPARK-4230 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4230) Doc for spark.default.parallelism is incorrect

2014-11-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4230: -- Description: The default default parallelism for shuffle transformations is actually the maximum

[jira] [Created] (SPARK-4214) With dynamic allocation, avoid outstanding requests for more executors than pending tasks need

2014-11-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4214: - Summary: With dynamic allocation, avoid outstanding requests for more executors than pending tasks need Key: SPARK-4214 URL: https://issues.apache.org/jira/browse/SPARK-4214

[jira] [Created] (SPARK-4175) Exception on stage page

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4175: - Summary: Exception on stage page Key: SPARK-4175 URL: https://issues.apache.org/jira/browse/SPARK-4175 Project: Spark Issue Type: Bug Affects Versions: 1.2.0

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192604#comment-14192604 ] Sandy Ryza commented on SPARK-4016: --- It looks like after this change, stage-level

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192609#comment-14192609 ] Sandy Ryza commented on SPARK-4016: --- Also, it looks like this can cause an exception:

[jira] [Created] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4178: - Summary: Hadoop input metrics ignore bytes read in RecordReader instantiation Key: SPARK-4178 URL: https://issues.apache.org/jira/browse/SPARK-4178 Project: Spark

[jira] [Commented] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

2014-10-31 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192773#comment-14192773 ] Sandy Ryza commented on SPARK-4178: --- Thanks [~kostas] for noticing this. Hadoop input

[jira] [Created] (SPARK-4136) Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty

2014-10-29 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4136: - Summary: Under dynamic allocation, cancel outstanding executor requests when pending task queue is empty Key: SPARK-4136 URL: https://issues.apache.org/jira/browse/SPARK-4136

[jira] [Commented] (SPARK-3573) Dataset

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183173#comment-14183173 ] Sandy Ryza commented on SPARK-3573: --- Is this still targeted for 1.2? Dataset ---

[jira] [Commented] (SPARK-1856) Standardize MLlib interfaces

2014-10-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183174#comment-14183174 ] Sandy Ryza commented on SPARK-1856: --- Is this work still targeted for 1.2? Standardize

[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2014-10-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179631#comment-14179631 ] Sandy Ryza commented on SPARK-2926: --- [~rxin] did you ever get a chance to try this out?

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171024#comment-14171024 ] Sandy Ryza commented on SPARK-3174: --- bq. If I understand correctly, your concern with

[jira] [Comment Edited] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171024#comment-14171024 ] Sandy Ryza edited comment on SPARK-3174 at 10/14/14 3:03 PM: -

[jira] [Commented] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-10-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171238#comment-14171238 ] Sandy Ryza commented on SPARK-3360: --- bq. You don't need Vector.multiply(RowMatrix)

[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop

2014-10-13 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169846#comment-14169846 ] Sandy Ryza commented on SPARK-1209: --- Definitely worth changing, in my opinion. This has

[jira] [Created] (SPARK-3884) Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster

2014-10-09 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3884: - Summary: Don't set SPARK_SUBMIT_DRIVER_MEMORY if deploy mode is cluster Key: SPARK-3884 URL: https://issues.apache.org/jira/browse/SPARK-3884 Project: Spark

[jira] [Updated] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3884: -- Summary: If deploy mode is cluster, --driver-memory shouldn't apply to client JVM (was: Don't set

[jira] [Commented] (SPARK-3884) If deploy mode is cluster, --driver-memory shouldn't apply to client JVM

2014-10-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165776#comment-14165776 ] Sandy Ryza commented on SPARK-3884: --- Accidentally assigned this to myself, but others

[jira] [Commented] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161636#comment-14161636 ] Sandy Ryza commented on SPARK-3797: --- Not necessarily opposed to this, but wanted to

[jira] [Created] (SPARK-3837) Warn when YARN is killing containers for exceeding memory limits

2014-10-07 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3837: - Summary: Warn when YARN is killing containers for exceeding memory limits Key: SPARK-3837 URL: https://issues.apache.org/jira/browse/SPARK-3837 Project: Spark

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Attachment: SPARK-3682Design.pdf Posting an initial design Add helpful warnings to the UI

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162685#comment-14162685 ] Sandy Ryza commented on SPARK-3174: --- bq. Maybe it makes sense to just call it

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160937#comment-14160937 ] Sandy Ryza commented on SPARK-3174: --- Thanks for posting the detailed design, Andrew. A

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an

[jira] [Updated] (SPARK-3797) Enable running shuffle service in separate process from executor

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: This could either mean * Running the shuffle service inside the YARN NodeManager as an

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160960#comment-14160960 ] Sandy Ryza commented on SPARK-3174: --- bq. for instance, lets say I do some ETL stuff

[jira] [Commented] (SPARK-3174) Provide elastic scaling within a Spark application

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161048#comment-14161048 ] Sandy Ryza commented on SPARK-3174: --- Ah, misread. My opinion is that, for a first cut

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Summary: Run the shuffle service inside the YARN NodeManager as an AuxiliaryService (was: Enable

[jira] [Updated] (SPARK-3797) Run the shuffle service inside the YARN NodeManager as an AuxiliaryService

2014-10-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3797: -- Description: It's also worth considering running the shuffle service in a YARN container beside the

[jira] [Commented] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159252#comment-14159252 ] Sandy Ryza commented on SPARK-3464: --- Did you mean to resolve this as Fixed? Graceful

[jira] [Comment Edited] (SPARK-3464) Graceful decommission of executors

2014-10-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159252#comment-14159252 ] Sandy Ryza edited comment on SPARK-3464 at 10/4/14 7:27 PM:

[jira] [Commented] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158571#comment-14158571 ] Sandy Ryza commented on SPARK-3561: --- I think there may be somewhat of a misunderstanding

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's API is tightly coupled with its backend execution engine. It could be

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3561: -- Description: Currently Spark's API is tightly coupled with its backend execution engine. It could be

[jira] [Comment Edited] (SPARK-3561) Decouple Spark's API from its execution engine

2014-10-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158571#comment-14158571 ] Sandy Ryza edited comment on SPARK-3561 at 10/3/14 11:00 PM: -

[jira] [Resolved] (SPARK-3422) JavaAPISuite.getHadoopInputSplits isn't used anywhere

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-3422. --- Resolution: Fixed JavaAPISuite.getHadoopInputSplits isn't used anywhere

[jira] [Commented] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148303#comment-14148303 ] Sandy Ryza commented on SPARK-3693: --- Spark's documentation actually makes a note of

[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Description: Spark has a zillion configuration options and a zillion different things that can go

[jira] [Commented] (SPARK-3682) Add helpful warnings to the UI

2014-09-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148379#comment-14148379 ] Sandy Ryza commented on SPARK-3682: --- Oops, that should have read increased. When a task

[jira] [Created] (SPARK-3682) Add helpful warnings to the UI

2014-09-24 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3682: - Summary: Add helpful warnings to the UI Key: SPARK-3682 URL: https://issues.apache.org/jira/browse/SPARK-3682 Project: Spark Issue Type: New Feature

[jira] [Resolved] (SPARK-2131) Collect per-task filesystem-bytes-read/written metrics

2014-09-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2131. --- Resolution: Duplicate Collect per-task filesystem-bytes-read/written metrics

[jira] [Resolved] (SPARK-2142) Give better indicator of how GC cuts into task time

2014-09-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2142. --- Resolution: Not a Problem I ran some tests that indicated that only stop-the-world GC time gets

[jira] [Commented] (SPARK-3468) WebUI Timeline-View feature

2014-09-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145829#comment-14145829 ] Sandy Ryza commented on SPARK-3468: --- This looks like a really cool addition. WebUI

[jira] [Created] (SPARK-3642) Better document the nuances of shared variables

2014-09-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3642: - Summary: Better document the nuances of shared variables Key: SPARK-3642 URL: https://issues.apache.org/jira/browse/SPARK-3642 Project: Spark Issue Type:

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143908#comment-14143908 ] Sandy Ryza commented on SPARK-3622: --- Is this a duplicate of SPARK-2688? Provide a

[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2014-09-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142524#comment-14142524 ] Sandy Ryza commented on SPARK-3577: --- No problem. Yeah, I agree that a spill time metric

[jira] [Commented] (SPARK-3612) Executor shouldn't quit if heartbeat message fails to reach the driver

2014-09-20 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142006#comment-14142006 ] Sandy Ryza commented on SPARK-3612: --- Yeah, we should catch this. Will post a patch.

[jira] [Created] (SPARK-3605) Typo in SchemaRDD JavaDoc

2014-09-19 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3605: - Summary: Typo in SchemaRDD JavaDoc Key: SPARK-3605 URL: https://issues.apache.org/jira/browse/SPARK-3605 Project: Spark Issue Type: Bug Components: SQL

[jira] [Commented] (SPARK-3573) Dataset

2014-09-19 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141063#comment-14141063 ] Sandy Ryza commented on SPARK-3573: --- Currently SchemaRDD does depend on Catalyst. Are

[jira] [Updated] (SPARK-3560) In yarn-cluster mode, the same jars are distributed through multiple mechanisms.

2014-09-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3560: -- Summary: In yarn-cluster mode, the same jars are distributed through multiple mechanisms. (was: In

[jira] [Commented] (SPARK-3573) Dataset

2014-09-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139958#comment-14139958 ] Sandy Ryza commented on SPARK-3573: --- Currently SchemaRDD lives inside SQL. Would we

[jira] [Commented] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136882#comment-14136882 ] Sandy Ryza commented on SPARK-3560: --- Right. I believe Min from LinkedIn who discovered

[jira] [Commented] (SPARK-3574) Shuffle finish time always reported as -1

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138246#comment-14138246 ] Sandy Ryza commented on SPARK-3574: --- On it Shuffle finish time always reported as -1

[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138245#comment-14138245 ] Sandy Ryza commented on SPARK-3577: --- On it Shuffle write time incorrect for sort-based

[jira] [Commented] (SPARK-3530) Pipeline and Parameters

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138319#comment-14138319 ] Sandy Ryza commented on SPARK-3530: --- bq. Isn't the fit multiple models at once part a

<    1   2   3   4   5   >