[jira] [Commented] (SPARK-704) ConnectionManager sometimes cannot detect loss of sending connections

2014-06-21 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039742#comment-14039742 ] Mridul Muralidharan commented on SPARK-704: --- If remote node goes

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-06-20 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039236#comment-14039236 ] Mridul Muralidharan commented on SPARK-2089: [~pwendell] SplitInfo is

[jira] [Commented] (SPARK-2223) Building and running tests with maven is extremely slow

2014-06-20 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039217#comment-14039217 ] Mridul Muralidharan commented on SPARK-2223: [~tgraves] You could try run

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Mridul Muralidharan
On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman wrote: > Patrick, > > My team is using shuffle consolidation but not speculation. We are also > using persist(DISK_ONLY) for caching. Use of shuffle consolidation is probably what is causing the issue. Would be good idea to try again with th

[jira] [Commented] (SPARK-1353) IllegalArgumentException when writing to disk

2014-06-17 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033625#comment-14033625 ] Mridul Muralidharan commented on SPARK-1353: This is due to limitatio

Re: Big-Endian (IBM Power7) Spark Serialization issue

2014-06-16 Thread Mridul Muralidharan
In that case, does it work if you use snappy instead of lzf ? Regards, Mridul On Mon, Jun 16, 2014 at 7:34 AM, gchen wrote: > To anyone who is interested in this issue, the root cause if from a third > party code com.ning.compress.lzf.impl.UnsafeChunkEncoderBE class since they > have a broken

[jira] [Commented] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue

2014-06-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027808#comment-14027808 ] Mridul Muralidharan commented on SPARK-2018: Ah ! This is an interesting

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-06-10 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026397#comment-14026397 ] Mridul Muralidharan commented on SPARK-2089: preferredNodeLocationData

[jira] [Commented] (SPARK-2064) web ui should not remove executors if they are dead

2014-06-07 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021020#comment-14021020 ] Mridul Muralidharan commented on SPARK-2064: Ah, I assumed there w

[jira] [Commented] (SPARK-2064) web ui should not remove executors if they are dead

2014-06-07 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021011#comment-14021011 ] Mridul Muralidharan commented on SPARK-2064: I am probably missing the in

[jira] [Commented] (SPARK-2064) web ui should not remove executors if they are dead

2014-06-07 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021008#comment-14021008 ] Mridul Muralidharan commented on SPARK-2064: Unfortunately OOM is a very

[jira] [Commented] (SPARK-2064) web ui should not remove executors if they are dead

2014-06-07 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020936#comment-14020936 ] Mridul Muralidharan commented on SPARK-2064: It is 100 MB (or more) of me

[jira] [Commented] (SPARK-2064) web ui should not remove executors if they are dead

2014-06-07 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020789#comment-14020789 ] Mridul Muralidharan commented on SPARK-2064: Depending on how long a job

[jira] [Commented] (SPARK-2045) Sort-based shuffle implementation

2014-06-05 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019427#comment-14019427 ] Mridul Muralidharan commented on SPARK-2045: The plan Tom and I had wa

[jira] [Commented] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large

2014-06-05 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019394#comment-14019394 ] Mridul Muralidharan commented on SPARK-2017: Currently, for our jobs, I

[jira] [Comment Edited] (SPARK-1956) Enable shuffle consolidation by default

2014-05-29 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012296#comment-14012296 ] Mridul Muralidharan edited comment on SPARK-1956 at 5/29/14 11:4

[jira] [Commented] (SPARK-1956) Enable shuffle consolidation by default

2014-05-29 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012296#comment-14012296 ] Mridul Muralidharan commented on SPARK-1956: We have not taken a call on

[jira] [Commented] (SPARK-1956) Enable shuffle consolidation by default

2014-05-28 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011741#comment-14011741 ] Mridul Muralidharan commented on SPARK-1956: shuffle consolidation MUST

[jira] [Commented] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001381#comment-14001381 ] Mridul Muralidharan commented on SPARK-1855: Currently, StorageL

[jira] [Commented] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001377#comment-14001377 ] Mridul Muralidharan commented on SPARK-1855: Did not realize that

[jira] [Commented] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001378#comment-14001378 ] Mridul Muralidharan commented on SPARK-1855: matei.zaha...@gmail.c

[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-18 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001376#comment-14001376 ] Mridul Muralidharan commented on SPARK-1767: Did not realize that

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan
DISK_2 or construct your own StorageLevel with your own custom > replication factor. > > BTW you guys should probably have this discussion on the JIRA rather than the > dev list; I think the replies somehow ended up on the dev list. > > Matei > > On May 17, 2014, at 1:36

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
intaining backward > compatibility will be high. > > We just need to make an informed decision to live with that cost, not hand > wave it away. > > Regards > Mridul > >> there is anything apparent now that is expected to require such disruptive >> changes if we wer

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
ust need to make an informed decision to live with that cost, not hand wave it away. Regards Mridul > there is anything apparent now that is expected to require such disruptive > changes if we were to commit to the current release candidate as our > guaranteed 1.0.0 baseline. > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
May 17, 2014, at 10:32 AM, Kan Zhang wrote: > > > +1 on the running commentary here, non-binding of course :-) > > > > > > On Sat, May 17, 2014 at 8:44 AM, Andrew Ash > wrote: > > > >> +1 on the next release feeling more like a 0.10 than a 1.0 > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
cle Regards Mridul > If you can tell me about specific changes in the current release candidate > that occasion new arguments for why a 1.0 release is an unacceptable idea, > then I'm listening. > > > On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan wrote: > > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
muzzle the discussion. Regards Mridul > issue, and what I am asking, is which pending bug fixes does anyone > anticipate will require breaking the public API guaranteed in rc9 > > > On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan wrote: > > > We made incompatible api chan

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
ng changes, now would be a > good time to set me straight. > > > On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan >wrote: > > > I had echoed similar sentiments a while back when there was a discussion > > around 0.10 vs 1.0 ... I would have preferred 0.10 to stabil

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-05-17 Thread Mridul Muralidharan
I suspect this is an issue we have fixed internally here as part of a larger change - the issue we fixed was not a config issue but bugs in spark. Unfortunately we plan to contribute this as part of 1.1 Regards, Mridul On 17-May-2014 4:09 pm, "sam (JIRA)" wrote: > sam created SPARK-1867: >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before 1.0 But the community preferred a 1.0 :-) Regards, Mridul On 17-May-2014 3:19

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Mridul Muralidharan
we have > enough memory to use. We need to investigate more to find a good > solution. -Xiangrui > > On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan > wrote: > > Effectively this is persist without fault tolerance. > > Failure of any node means complete lack o

[jira] [Commented] (SPARK-1865) Improve behavior of cleanup of disk state

2014-05-17 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000716#comment-14000716 ] Mridul Muralidharan commented on SPARK-1865: You could also modify

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-17 Thread Mridul Muralidharan
Can you try moving your mapPartitions to another class/object which is referenced only after sc.addJar ? I would suspect CNFEx is coming while loading the class containing mapPartitions before addJars is executed. In general though, dynamic loading of classes means you use reflection to instantia

[jira] [Commented] (SPARK-1849) Broken UTF-8 encoded data gets character replacements and thus can't be "fixed"

2014-05-16 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000580#comment-14000580 ] Mridul Muralidharan commented on SPARK-1849: You are missing my p

Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Mridul Muralidharan
So was rc5 cancelled ? Did not see a note indicating that or why ... [1] - Mridul [1] could have easily missed it in the email storm though ! On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This pa

[jira] [Commented] (SPARK-1849) Broken UTF-8 encoded data gets character replacements and thus can't be "fixed"

2014-05-16 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000397#comment-14000397 ] Mridul Muralidharan commented on SPARK-1849: Looks like textFile is prob

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Mridul Muralidharan
Effectively this is persist without fault tolerance. Failure of any node means complete lack of fault tolerance. I would be very skeptical of truncating lineage if it is not reliable. On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" wrote: > Xiangrui Meng created SPARK-1855: >

Re: [jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-15 Thread Mridul Muralidharan
Hi Sandy, I assume you are referring to caching added to datanodes via new caching api via NN ? (To preemptively mmap blocks). I have not looked in detail, but does NN tell us about this in block locations? If yes, we can simply make those process local instead of node local for executors on th

[jira] [Commented] (SPARK-1813) Add a utility to SparkConf that makes using Kryo really easy

2014-05-13 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996390#comment-13996390 ] Mridul Muralidharan commented on SPARK-1813: Writing a KryoRegistrator is

[jira] [Commented] (SPARK-1772) Spark executors do not successfully die on OOM

2014-05-12 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995600#comment-13995600 ] Mridul Muralidharan commented on SPARK-1772: BTW, for specific case of

Re: bug using kryo as closure serializer

2014-05-04 Thread Mridul Muralidharan
On a slightly related note (apologies Soren for hijacking the thread), Reynold how much better is kryo from spark's usage point of view compared to the default java serialization (in general, not for closures) ? The numbers on kyro site are interesting, but since you have played the most with kryo

[jira] [Commented] (SPARK-1706) Allow multiple executors per worker in Standalone mode

2014-05-03 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988868#comment-13988868 ] Mridul Muralidharan commented on SPARK-1706: Oh my, this was supposed t

[jira] [Commented] (SPARK-1697) Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException

2014-05-03 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988807#comment-13988807 ] Mridul Muralidharan commented on SPARK-1697: I would suspect this is du

[jira] [Issue Comment Deleted] (SPARK-1606) spark-submit needs `--arg` for every application parameter

2014-05-03 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-1606: --- Comment: was deleted (was: Crap, got to this too late. We really should not have

[jira] [Commented] (SPARK-1606) spark-submit needs `--arg` for every application parameter

2014-05-03 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988756#comment-13988756 ] Mridul Muralidharan commented on SPARK-1606: Crap, got to this too late

[jira] [Updated] (BOOKKEEPER-648) BasicJMSTest failed

2014-04-25 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated BOOKKEEPER-648: --- Assignee: (was: Mridul Muralidharan) > BasicJMSTest fai

[jira] [Updated] (BOOKKEEPER-560) Create readme for hedwig-client-jms

2014-04-25 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated BOOKKEEPER-560: --- Assignee: (was: Mridul Muralidharan) > Create readme for hed

[jira] [Resolved] (SPARK-1587) Fix thread leak in spark

2014-04-25 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-1587. Resolution: Fixed Fixed, https://github.com/apache/spark/pull/504 > Fix thr

[jira] [Commented] (SPARK-1586) Fix issues with spark development under windows

2014-04-25 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981321#comment-13981321 ] Mridul Muralidharan commented on SPARK-1586: Immediate issues fixed th

[jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line

2014-04-25 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981313#comment-13981313 ] Mridul Muralidharan commented on SPARK-1576: There is a misunderstanding

[jira] [Commented] (SPARK-1588) SPARK_JAVA_OPTS is not getting propagated

2014-04-23 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978827#comment-13978827 ] Mridul Muralidharan commented on SPARK-1588: Apparently, SPARK_YARN_USER

Re: [jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line

2014-04-23 Thread Mridul Muralidharan
Sorry, I misread - I meant SPARK_JAVA_OPTS - not JAVA_OPTS. See here : https://issues.apache.org/jira/browse/SPARK-1588 Regards, Mridul On Wed, Apr 23, 2014 at 6:37 PM, Mridul Muralidharan wrote: > This breaks all existing jobs which are not using spark-submit. > The consensus was not to

Re: [jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line

2014-04-23 Thread Mridul Muralidharan
This breaks all existing jobs which are not using spark-submit. The consensus was not to break compatibility unless there was an overriding reason to do so On Apr 23, 2014 6:32 PM, "Thomas Graves (JIRA)" wrote: > > [ > https://issues.apache.org/jira/browse/SPARK-1576?page=com.atlassian.jira.p

[jira] [Commented] (SPARK-1588) SPARK_JAVA_OPTS is not getting propagated

2014-04-23 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978095#comment-13978095 ] Mridul Muralidharan commented on SPARK-1588: Noticed this specificall

[jira] [Created] (SPARK-1588) SPARK_JAVA_OPTS is not getting propagated

2014-04-23 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-1588: -- Summary: SPARK_JAVA_OPTS is not getting propagated Key: SPARK-1588 URL: https://issues.apache.org/jira/browse/SPARK-1588 Project: Spark Issue

[jira] [Assigned] (SPARK-1586) Fix issues with spark development under windows

2014-04-23 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-1586: -- Assignee: Mridul Muralidharan > Fix issues with spark development un

[jira] [Created] (SPARK-1587) Fix thread leak in spark

2014-04-23 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-1587: -- Summary: Fix thread leak in spark Key: SPARK-1587 URL: https://issues.apache.org/jira/browse/SPARK-1587 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-1587) Fix thread leak in spark

2014-04-23 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-1587: -- Assignee: Mridul Muralidharan > Fix thread leak in sp

[jira] [Created] (SPARK-1586) Fix issues with spark development under windows

2014-04-23 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-1586: -- Summary: Fix issues with spark development under windows Key: SPARK-1586 URL: https://issues.apache.org/jira/browse/SPARK-1586 Project: Spark

Re: all values for a key must fit in memory

2014-04-21 Thread Mridul Muralidharan
nto memory at once. The ShuffledRDD is agnostic to what goes inside P. > > On Sun, Apr 20, 2014 at 11:36 AM, Mridul Muralidharan wrote: > >> An iterator does not imply data has to be memory resident. >> Think merge sort output as an iterator (disk backed). >> >> Tom is

Re: all values for a key must fit in memory

2014-04-20 Thread Mridul Muralidharan
An iterator does not imply data has to be memory resident. Think merge sort output as an iterator (disk backed). Tom is actually planning to work on something similar with me on this hopefully this or next month. Regards, Mridul On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza wrote: > Hey all, >

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-04-17 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972978#comment-13972978 ] Mridul Muralidharan commented on SPARK-1476: [~matei] We are having

[jira] [Commented] (SPARK-1524) TaskSetManager'd better not schedule tasks which has no preferred executorId using PROCESS_LOCAL in the first search process

2014-04-17 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972864#comment-13972864 ] Mridul Muralidharan commented on SPARK-1524: The expectation is to fall

[jira] [Commented] (SPARK-1453) Improve the way Spark on Yarn waits for executors before starting

2014-04-14 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968390#comment-13968390 ] Mridul Muralidharan commented on SPARK-1453: (d) becomes relevant in cas

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-04-14 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968147#comment-13968147 ] Mridul Muralidharan commented on SPARK-1476: [~pwendell] IMO both

[jira] [Assigned] (SPARK-1476) 2GB limit in spark for blocks

2014-04-14 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-1476: -- Assignee: Mridul Muralidharan > 2GB limit in spark for blo

[jira] [Comment Edited] (SPARK-1476) 2GB limit in spark for blocks

2014-04-13 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967854#comment-13967854 ] Mridul Muralidharan edited comment on SPARK-1476 at 4/13/14 2:4

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-04-13 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967854#comment-13967854 ] Mridul Muralidharan commented on SPARK-1476: There are multiple issue

[jira] [Updated] (SPARK-1476) 2GB limit in spark for blocks

2014-04-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-1476: --- Fix Version/s: 1.1.0 > 2GB limit in spark for blo

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-04-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967419#comment-13967419 ] Mridul Muralidharan commented on SPARK-1476: WIP Proposal: - All refere

[jira] [Created] (SPARK-1476) 2GB limit in spark for blocks

2014-04-11 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-1476: -- Summary: 2GB limit in spark for blocks Key: SPARK-1476 URL: https://issues.apache.org/jira/browse/SPARK-1476 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-1453) Improve the way Spark on Yarn waits for executors before starting

2014-04-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967193#comment-13967193 ] Mridul Muralidharan commented on SPARK-1453: The timeout gets hit only

[jira] [Commented] (SPARK-542) Cache Miss when machine have multiple hostname

2014-04-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967185#comment-13967185 ] Mridul Muralidharan commented on SPARK-542: --- Spark uses only hostnames -

[jira] [Commented] (SPARK-1391) BlockManager cannot transfer blocks larger than 2G in size

2014-04-11 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966332#comment-13966332 ] Mridul Muralidharan commented on SPARK-1391: Another place where thi

Re: ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
, which is stored in a remote cluster or machines. And the > goal is to load the remote raw data only once? > > Haoyuan > > > On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan >wrote: > > > Hi, > > > > We have a requirement to use a (potential) ephemeral

[jira] [Resolved] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation

2014-04-05 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-1393. Resolution: Fixed > fix computePreferredLocations signature to not depend

[jira] [Commented] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation

2014-04-05 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961278#comment-13961278 ] Mridul Muralidharan commented on SPARK-1393: Merged https://github.com/ap

ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
Hi, We have a requirement to use a (potential) ephemeral storage, which is not within the VM, which is strongly tied to a worker node. So source of truth for a block would still be within spark; but to actually do computation, we would need to copy data to external device (where it might lie aro

[jira] [Updated] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation

2014-04-02 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-1393: --- Description: computePreferredLocations in core/src/main/scala/org/apache/spark

[jira] [Created] (SPARK-1393) fix computePreferredLocations signature to not depend on underlying implementation

2014-04-02 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-1393: -- Summary: fix computePreferredLocations signature to not depend on underlying implementation Key: SPARK-1393 URL: https://issues.apache.org/jira/browse/SPARK-1393

[jira] [Commented] (SPARK-1350) YARN ContainerLaunchContext should use cluster's JAVA_HOME

2014-03-31 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955997#comment-13955997 ] Mridul Muralidharan commented on SPARK-1350: You mistook my question; I m

[jira] [Commented] (SPARK-1350) YARN ContainerLaunchContext should use cluster's JAVA_HOME

2014-03-31 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955978#comment-13955978 ] Mridul Muralidharan commented on SPARK-1350: We will need a way to confi

Re: JIRA. github and asf updates

2014-03-29 Thread Mridul Muralidharan
>> You can unsubscribe yourself from any of these sources, right? >> >> - Patrick >> >> >> On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan >> wrote: >> >>> Hi, >>> >>> So we are now receiving updates from three source

JIRA. github and asf updates

2014-03-29 Thread Mridul Muralidharan
Hi, So we are now receiving updates from three sources for each change to the PR. While each of them handles a corner case which others might miss, would be great if we could minimize the volume of duplicated communication. Regards, Mridul

Mailbomb from amplabs jenkins ?

2014-03-27 Thread Mridul Muralidharan
Got some 100 odd mails from jenkins (?) with "Can one of the admins verify this patch?" Part of upgrade or some other issue ? Significantly reduced the snr of my inbox ! Regards, Mridul

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
ly long running job (30 mins+) working on non trivial dataset will fail due to accumulated failures in spark. Regards, Mridul > > TD > > > > > On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan wrote: > >> Forgot to mention this in the earlier request for PR'

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
ut not pulled into branch 0.9. I am >> not sure it is a good idea to pull that in. We can pull those changes later >> for 0.9.2 if required. >> >> TD >> >> >> >> >> On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan > >wrote: >> >

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
Forgot to mention this in the earlier request for PR's. If there is another RC being cut, please add https://github.com/apache/spark/pull/159 to it too (if not done already !). Thanks, Mridul On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das wrote: > Hello everyone, > > Since the release of Spark

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
ected to be release around end of April (not too far > ;) ). > > TD > > > On Wed, Mar 19, 2014 at 5:57 PM, Mridul Muralidharan wrote: > >> Would be great if the garbage collection PR is also committed - if not >> the whole thing, atleast the part to unpersist broadca

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
Would be great if the garbage collection PR is also committed - if not the whole thing, atleast the part to unpersist broadcast variables explicitly would be great. Currently we are running with a custom impl which does something similar, and I would like to move to standard distribution for that.

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-28 Thread Mridul Muralidharan
;t work anymore if > we standardize on SBT. These have no obvious work around at this point > as far as I see. > > - Patrick > > On Wed, Feb 26, 2014 at 7:09 PM, Mridul Muralidharan wrote: >> On Feb 26, 2014 11:12 PM, "Patrick Wendell" wrote: >>> >&g

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Mridul Muralidharan
tions to using sbt or maven ! > >> Too many exclude versions, pinned versions, etc would just make things > >> unmanageable in future. > >> > >> > >> Regards, > >> Mridul > >> > >> > >> > >> > >> On W

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
others. (#NelsonMandela) > >> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan wrote: >> >>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell wrote: >>> Evan - this is a good thing to bring up. Wrt the shader plug-in - >>> right now we don't actuall

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell wrote: > Evan - this is a good thing to bring up. Wrt the shader plug-in - > right now we don't actually use it for bytecode shading - we simply > use it for creating the uber jar with excludes (which sbt supports > just fine via assembly). Not re

Re: Preparing to provide a small text files input API in mllib

2014-02-24 Thread Mridul Muralidharan
Hi, I have not looked into why this would be needed, but given it is needed, I added a couple of comments to the PR. Overall, it looks promising. Regards, Mridul On Tue, Feb 25, 2014 at 8:05 AM, 尹绪森 wrote: > Hi community, > > As I moving forward to write a LDA (Latent Dirichlet Allocation) t

Re: Anyone wants to look at SPARK-1123?

2014-02-24 Thread Mridul Muralidharan
Curious, what was the issue ? - Mridul On Sun, Feb 23, 2014 at 11:41 PM, Nan Zhu wrote: > OK, I know where I was wrong > > > Best, > > -- > Nan Zhu > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > On Sunday, February 23, 2014 at 12:50 PM, Nan Zhu wrote: > >> String, it should be ge

Re: [DISCUSS] Extending public API

2014-02-23 Thread Mridul Muralidharan
e (sparkbank)? Curious to know how you'd decide what > should go where. > > Amandeep > > > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan > wrote: > > > > Hi, > > > > Over the past few months, I have seen a bunch of pull requests which > have

[DISCUSS] Extending public API

2014-02-22 Thread Mridul Muralidharan
Hi, Over the past few months, I have seen a bunch of pull requests which have extended spark api ... most commonly RDD itself. Most of them are either relatively niche case of specialization (which might not be useful for most cases) or idioms which can be expressed (sometimes with minor perf p

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-20 Thread Mridul Muralidharan
I am not sure if this is resolved now - but maven was better at building the assembly jars compared to sbt. To the point where I stopped using sbt due to unpredictable order in which it unjars the dependencies to create the assembled jar (we do have quite a lot of conflicting classes in our depende

<    5   6   7   8   9   10   11   12   13   14   >