Re: [DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

2020-07-24 Thread Imran Rashid
Hi Nupur, Is what you're trying to do already possible via the spark.{driver,executor}.userClassPathFirst options? https://github.com/apache/spark/blob/b890fdc8df64f1d0b0f78b790d36be883e852b0d/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L853 On Wed, Jul 22, 2020 at 5:50 PM nupu

Re: [PSA] Apache Spark uses GitHub Actions to run the tests

2020-07-23 Thread Imran Rashid
Thanks for setting this up, Hyukjin. How do you re-trigger tests in github actions? Eg. there is a failure that appears to be some random infra thing or a flaky test, or maybe the tests were just run a while back so you want to get a fresh batch of tests. I think the old "Jenkins, retest this pl

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Imran Rashid
Sure, that sounds good to me. +1 On Wed, Jul 22, 2020 at 1:50 PM Holden Karau wrote: > > > On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > wrote: > >> Hi Holden, >> >> thanks for leading this discussion, I'm in favor in general. I h

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-22 Thread Imran Rashid
Hi Holden, thanks for leading this discussion, I'm in favor in general. I have one specific question -- these two sections seem to contradict each other slightly: > If there is a -1 from a non-committer, multiple committers or the PMC should be consulted before moving forward. > >If the original

Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Imran Rashid
+1 I think this is going to be a really important feature for Spark and I'm glad to see Holden focusing on it. On Wed, Jul 1, 2020 at 8:38 PM Mridul Muralidharan wrote: > +1 > > Thanks, > Mridul > > On Wed, Jul 1, 2020 at 6:36 PM Hyukjin Kwon wrote: > >> +1 >> >> 2020년 7월 2일 (목) 오전 10:08, Marc

Re: Enabling fully disaggregated shuffle on Spark

2019-12-05 Thread Imran Rashid
> > As far as I'm aware, supportsRelocationOfSerializedObjects only means that > a given object can be moved around within a segment of serialized data. > (For example, certain object graphs with cycles or other unusual data > structures can be encoded but impose requirements on

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Imran Rashid
Hi Ben, in general everything you're proposing sounds reasonable. For me, at least, I'd need more details on most of the points before I fully understand them, but I'm definitely in favor of the general goal for making spark support fully disaggregated shuffle. Of course, I also want to make sur

Re: Is RDD thread safe?

2019-11-25 Thread Imran Rashid
I think Chang is right, but I also think this only comes up in limited scenarios. I initially thought it wasn't a bug, but after some more thought I have some concerns in light of the issues we've had w/ nondeterministic RDDs, eg. repartition(). Say I have code like this: val cachedRDD = sc.text

CVE-2019-10099: Apache Spark unencrypted data on local disk

2019-08-06 Thread Imran Rashid
Severity: Important Vendor: The Apache Software Foundation Versions affected: All Spark 1.x, Spark 2.0.x, Spark 2.1.x, and 2.2.x versions Spark 2.3.0 to 2.3.2 Description: Prior to Spark 2.3.3, in certain situations Spark would write user data to local disk unencrypted, even if spark.io.encryp

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Imran Rashid
+1 (binding) I think this is a really important feature for spark. First, there is already a lot of interest in alternative shuffle storage in the community. There is already a lot of interest in alternative shuffle storage, from dynamic allocation in kubernetes, to even just improving stabilit

Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-10 Thread Imran Rashid
t; > > > -Matt Cheah > > > > *From: *"Yifei Huang (PD)" > *Date: *Monday, May 13, 2019 at 1:04 PM > *To: *Mridul Muralidharan > *Cc: *Bo Yang , Ilan Filonenko , Imran > Rashid , Justin Uang , Liang > Tang , Marcelo Vanzin , Matei > Zaharia , Ma

Re: Resolving all JIRAs affecting EOL releases

2019-05-17 Thread Imran Rashid
+1, thanks for taking this on On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon wrote: > oh, wait. 'Incomplete' can still make sense in this way then. > Yes, I am good with 'Incomplete' too. > > 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: > >> I actually recently used 'Incomplete' a bit when the

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Imran Rashid
sorry I am late to the discussion here -- the jira mentions using this extensions for dealing with shuffles, can you explain that part? I don't see how you would use this to change shuffle behavior at all. On Tue, May 14, 2019 at 10:59 AM Thomas graves wrote: > Thanks for replying, I'll extend

Re: Interesting implications of supporting Scala 2.13

2019-05-10 Thread Imran Rashid
+1 on making whatever api changes we can now for 3.0. I don't think that is making any commitments to supporting scala 2.13 in any specific version. We'll have to deal with all the other points you raised when we do cross that bridge, but hopefully those are things we can cover in a minor release

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-26 Thread Imran Rashid
ic resources, but it does feel to me like we >>>>>>>>> should >>>>>>>>> at least be considering doing that deeper redesign. >>>>>>>>> >>>>>>>>> On Thu, Mar 21, 2019 at 7:33 AM Tom Graves &g

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-20 Thread Imran Rashid
clear what proposed are? I like Felix's suggestion to switch to the new >> Heilmeier template, which helps clarify what are proposed and what are not. >> Then let's review the new SPIP and resume the vote. >> >> On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid wrote: &

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Imran Rashid
Even if only PMC are able to veto a release, I believe all community members are encouraged to vote, even a -1, to express their opinions, right? I am -0.5 on the release because of SPARK-27112. It is not a regression, so in that sense I don't think it must hold the release. But it is fixing a p

Re: [build system] jenkins wedged again, rebooting master node

2019-03-19 Thread Imran Rashid
seems wedged again? sorry for the bad news Shane, thanks for all the work on fixing it On Mon, Mar 18, 2019 at 4:02 PM shane knapp wrote: > ok, i dug through the logs and noticed that rsyslogd was dropping messages > to do imuxsock being spammed by postfix... which i then tracked down to > our

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Imran Rashid
r questions then I think we should > resolve here or take the discussion of what a SPIP is to a different thread > and then come back to this, thoughts? > > Note there is a high level design for at least the core piece, which is > what people seem concerned with, already so includin

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Imran Rashid
On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng wrote: > On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung > wrote: > >> IMO upfront allocation is less useful. Specifically too expensive for >> large jobs. >> > > This is also an API/design discussion. > I agree with Felix -- this is more than just an A

scheduler braindump: architecture, gotchas, etc.

2019-02-04 Thread Imran Rashid
The scheduler has been pretty error-prone and hard to work on, and I feel like there may be a dwindling core of active experts. I'm sure its very discouraging to folks trying to make what seem like simple changes, and then find they are in a rats nest of complex issues they weren't expecting. But

Re: CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-31 Thread Imran Rashid
I received some questions about what the exact change was which fixed the issue, and the PMC decided to post info in jira to make it easier for the community to track. The relevant details are all on https://issues.apache.org/jira/browse/SPARK-26802 On Mon, Jan 28, 2019 at 1:08 PM Imran Rashid

CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-28 Thread Imran Rashid
Severity: Important Vendor: The Apache Software Foundation Versions affected: All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions Spark 2.2.0 to 2.2.2 Spark 2.3.0 to 2.3.1 Description: When using PySpark , it's possible for a different local user to connect to the Spark application and imperson

Re: Spark Scheduler - Task and job levels - How it Works?

2019-01-08 Thread Imran Rashid
Hi Miguel, On Sun, Jan 6, 2019 at 11:35 AM Miguel F. S. Vasconcelos < miguel.vasconce...@usp.br> wrote: > When an action is performed onto a RDD, Spark send it as a job to the > DAGScheduler; > The DAGScheduler compute the execution DAG based on the RDD's lineage, and > split the job into stages

Re: proposal for expanded & consistent timestamp types

2018-12-11 Thread Imran Rashid
oposed already because Python > timestamps are basically LocalDateTime or OffsetDateTime. > > Li > > > > On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid > wrote: > >> Hi, >> >> I'd like to discuss the future of timestamp support in Spark, in >> part

proposal for expanded & consistent timestamp types

2018-12-06 Thread Imran Rashid
Hi, I'd like to discuss the future of timestamp support in Spark, in particular with respect of handling timezones in different SQL types. In a nutshell: * There are at least 3 different ways of handling the timestamp type across timezone changes * We'd like Spark to clearly distinguish the 3 t

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-22 Thread Imran Rashid
+1 No blockers and our internal tests are all passing. (I did file https://issues.apache.org/jira/browse/SPARK-25805, but this is just a minor issue with a flaky test) On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-15 Thread Imran Rashid
I just discovered https://issues.apache.org/jira/browse/SPARK-25738 with some more testing. I only marked it as critical, but seems pretty bad -- I'll defer to others opinion On Sat, Oct 13, 2018 at 4:15 PM Dongjoon Hyun wrote: > Yes. From my side, it's -1 for RC3. > > Bests, > Dongjoon. > > On

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-10 Thread Imran Rashid
Sorry I had messed up my testing earlier, so I only just discovered https://issues.apache.org/jira/browse/SPARK-25704 I dont' think this is a release blocker, because its not a regression and there is a workaround, just fyi. On Wed, Oct 10, 2018 at 11:47 AM Wenchen Fan wrote: > Please vote on r

Re: python tests: any reason for a huge tests.py?

2018-09-12 Thread Imran Rashid
4 to eventually proceed without never ending merge conflicts with other changes that are also adding new tests. On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid wrote: > I filed https://issues.apache.org/jira/browse/SPARK-25344 > > On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin wrote: > >>

Re: python test infrastructure

2018-09-06 Thread Imran Rashid
I agree with this idea in general. https://issues.apache.org/jira/browse/SPARK-25359 > 2018년 9월 6일 (목) 오전 5:41, Imran Rashid 님이 작성: > >> one more: seems like python/run-tests should have an option at least to >> not bail at the first failure: >> https://github.com/apach

Re: python test infrastructure

2018-09-05 Thread Imran Rashid
hether you *only* had a failure in that flaky test, or if there was some other real failure as well. On Wed, Sep 5, 2018 at 1:31 PM Imran Rashid wrote: > Hi all, > > More pyspark noob questions from me. I find it really hard to figure out > what versions of python I should be test

python test infrastructure

2018-09-05 Thread Imran Rashid
Hi all, More pyspark noob questions from me. I find it really hard to figure out what versions of python I should be testing and what is tested upstream. While I'd like to just know the answers to those questions, more importantly I'd like to make sure that info is visible somewhere so all devs c

Re: python tests: any reason for a huge tests.py?

2018-09-05 Thread Imran Rashid
I filed https://issues.apache.org/jira/browse/SPARK-25344 On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin wrote: > We should break it. > > On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid > wrote: > >> Hi, >> >> another question from looking more at python recently.

Re: [DISCUSS] move away from python doctests

2018-08-29 Thread Imran Rashid
(Also, maybe there are already good unit tests, and I just don't know where to find them, as Bryan Cutler pointed out for the bit of code I was originally asking about.) On Wed, Aug 29, 2018 at 3:26 PM Imran Rashid wrote: > Hi Li, > > yes that makes perfect sense. That more-or-le

Re: [DISCUSS] move away from python doctests

2018-08-29 Thread Imran Rashid
; > Does this make sense? > > Li > > On Wed, Aug 29, 2018 at 6:35 PM Imran Rashid > wrote: > >> Hi, >> >> I'd like to propose that we move away from such heavy reliance on >> doctests in python, and move towards more traditional unit tests. The ma

[DISCUSS] move away from python doctests

2018-08-29 Thread Imran Rashid
Hi, I'd like to propose that we move away from such heavy reliance on doctests in python, and move towards more traditional unit tests. The main reason is that its hard to share test code in doc tests. For example, I was just looking at https://github.com/apache/spark/commit/82c18c240a6913a917df

[VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Imran Rashid
There has been discussion on jira & the PR, all generally positive, so I'd like to call a vote for this spip. I'll start with own +1. On Fri, Aug 3, 2018 at 11:59 AM Imran Rashid wrote: > I'd like to propose adding a plugin api for Executors, primarily for > inst

Re: no logging in pyspark code?

2018-08-27 Thread Imran Rashid
ase of print(_, file=sys.stderr) in a most recent > review. I agree that we should include logging for PySpark workers. > > On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid < > iras...@cloudera.com.invalid> wrote: > >> Another question on pyspark code -- how come there is no

no logging in pyspark code?

2018-08-27 Thread Imran Rashid
Another question on pyspark code -- how come there is no logging at all? does python logging have an unreasonable overhead, or its impossible to configure or something? I'm really surprised nobody has ever wanted to me able to turn on some debug or trace logging in pyspark by just configuring a lo

python tests: any reason for a huge tests.py?

2018-08-24 Thread Imran Rashid
Hi, another question from looking more at python recently. Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files? Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run

Re: best way to run one python test?

2018-08-20 Thread Imran Rashid
ic tests. For instance: > > SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests > > I have a partial fix for our testing script to support this way in my > local but couldn't have enough time to make a PR for it yet. > > > 2018년 8월 20일 (월) 오전 11:08, Imran Rashid

best way to run one python test?

2018-08-19 Thread Imran Rashid
Hi, I haven't spent a lot of time working on the python side of spark before so apologize if this is a basic question, but I'm trying to figure out the best way to run a small subset of python tests in a tight loop while developing. The closer I can get to sbt's "~test-only *FooSuite -- -z test-b

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-14 Thread Imran Rashid
+1 on what we should do. On Mon, Aug 13, 2018 at 3:06 PM, Tom Graves wrote: > > > I mean, what are concrete steps beyond saying this is a problem? That's > the important thing to discuss. > > Sorry I'm a bit confused by your statement but also think I agree. I > started this thread for this rea

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Imran Rashid
I don't think we've been great about backporting correctness issues. This is one example which comes to mind (not to point fingers, just the one I know of immediately): https://issues.apache.org/jira/browse/SPARK-23207 we also let another related issue slide for quite a while: https://issues.ap

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-08 Thread Imran Rashid
On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: > > SPARK-23243 : > Shuffle+Repartition > on an RDD could lead to incorrect answers > It turns out to be a very complicated issue, there is no consensus about > what is the right fix yet. Likely

SPIP: Executor Plugin (SPARK-24918)

2018-08-03 Thread Imran Rashid
I'd like to propose adding a plugin api for Executors, primarily for instrumentation and debugging ( https://issues.apache.org/jira/browse/SPARK-24918). The changes are small, but as its adding a new api, it might be spip-worthy. I mentioned it as well in a recent email I sent about memory monito

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Imran Rashid
I still would like to do more review on barrier mode changes, but from what I've seen so far I agree. I dunno if it'll really be ready for use, but it should not pose much risk for code which doesn't touch the new features. of course, every change has some risk, especially in the scheduler which ha

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Imran Rashid
I'd like to add SPARK-24296, replicating large blocks over 2GB. Its been up for review for a while, and would end the 2GB block limit (well ... subject to a couple of caveats on SPARK-6235). On Mon, Jul 30, 2018 at 9:01 PM, Wenchen Fan wrote: > I went through the open JIRA tickets and here is a

offheap memory usage & netty configuration

2018-07-26 Thread Imran Rashid
*I’ve been looking at where untracked memory is getting used in spark, especially offheap memory, and I’ve discovered some things I’d like to share with the community. Most of what I’ve learned has been about the way spark is using netty -- I’ll go into some more detail about that below. I’m also

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-17 Thread Imran Rashid
I just found https://issues.apache.org/jira/browse/SPARK-24309 which is pretty serious. I've marked it a blocker, I think it should go into 2.3.1. I'll also take a closer look comparing to the behavior of the old listener bus. On Thu, May 17, 2018 at 12:18 PM, Marcelo Vanzin wrote: > Wenchen r

Re: Fair scheduler pool leak

2018-04-09 Thread Imran Rashid
t;> worker's scheduler pool, for operations that don't fit into the > >> >> driver. > >> >> > >> >> We decided to use these fair scheduler pools (w/ fair scheduling > >> >> across pools, FIFO per pool) instead of the default FIFO schedule

Re: Fair scheduler pool leak

2018-04-06 Thread Imran Rashid
Hi Matthias, This doeesn't look possible now. It may be worth filing an improvement jira for. But I'm trying to understand what you're trying to do a little better. So you intentionally have each thread create a new unique pool when its submits a job? So that pool will just get the default poo

Re: spark-tests.appspot status?

2017-12-15 Thread Imran Rashid
; Most likely the job that uploads this stuff at databricks is broken. >> >> On Thu, Dec 14, 2017 at 12:41 PM, Imran Rashid >> wrote: >> >>> Hi, >>> >>> I was trying to look at some flaky tests and old jiras, and noticed that >>> spark-t

spark-tests.appspot status?

2017-12-14 Thread Imran Rashid
Hi, I was trying to look at some flaky tests and old jiras, and noticed that spark-tests.appspot.com is still live, but hasn't updated with any builds from the last 2 months. I was curious what the status is -- intentionally deprecated? just needs a restart? more dev work required? its pretty

Re: Timestamp interoperability design doc available for review

2017-09-11 Thread Imran Rashid
I've posted a design doc on SPARK-12297, which builds on what Zoltan posted here earlier. It addresses the parquet issues and also considers current inconsistencies in timestamp behavior for spark across data formats and versions. I believe this incorporates all of the prior concerns and feedback

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-31 Thread Imran Rashid
Congrats Jerry! On Mon, Aug 28, 2017 at 8:28 PM, Matei Zaharia wrote: > Hi everyone, > > The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai > has been contributing to many areas of the project for a long time, so it’s > great to see him join. Join me in thanking and congrat

Re: SPIP: Spark on Kubernetes

2017-08-21 Thread Imran Rashid
Overall this looks like a good proposal. I do have some concerns which I'd like to discuss -- please understand I'm taking a "devil's advocate" stance here for discussion, not that I'm giving a -1. My primary concern is about testing and maintenance. My concerns might be addressed if the doc inc

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-21 Thread Imran Rashid
-1 I'm sorry for discovering this so late, but I just filed https://issues.apache.org/jira/browse/SPARK-21165 which I think should be a blocker, its a regression from 2.1 On Wed, Jun 21, 2017 at 1:43 PM, Nick Pentreath wrote: > As before, release looks good, all Scala, Python tests pass. R test

Re: SQL TIMESTAMP semantics vs. SPARK-18350

2017-05-27 Thread Imran Rashid
I had asked zoltan to bring this discussion to the dev list because I think it's a question that extends beyond a single jira (we can't figure out the semantics of timestamp in parquet if we don't k ow the overall goal of the timestamp type) and since its a design question the entire community shou

Re: planning & discussion for larger scheduler changes

2017-03-29 Thread Imran Rashid
Thanks for the responses all. I may have worded my original email poorly -- I don't want to focus too much on SPARK-14649 and SPARK-13669 in particular, but more on how we should be approaching these changes. On Mon, Mar 27, 2017 at 9:01 PM, Kay Ousterhout wrote: > (1) I'm pretty hesitant to me

planning & discussion for larger scheduler changes

2017-03-24 Thread Imran Rashid
Kay and I were discussing some of the bigger scheduler changes getting proposed lately, and realized there is a broader discussion to have with the community, outside of any single jira. I'll start by sharing my initial thoughts, I know Kay has thoughts on this too, but it would be good to input

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Imran Rashid
://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jan 26, 2017 at 3:43 PM, Imran Rashid > wrote: > > one is used when exactly one task has finished -- that mean

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Imran Rashid
one is used when exactly one task has finished -- that means you now have free resources on just that one executor, so you only need to look for something to schedule on that one. the other one is used when you want to schedule everything you can across the entire cluster. For example, you have j

Re: Spark Improvement Proposals

2017-01-03 Thread Imran Rashid
I'm also in favor of this. Thanks for your persistence Cody. My take on the specific issues Joseph mentioned: 1) voting vs. consensus -- I agree with the argument Ryan Blue made earlier for consensus: > Majority vs consensus: My rationale is that I don't think we want to consider a proposal app

Re: Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-03 Thread Imran Rashid
Hi Jacek, I'm not entirely sure I understand your question, but the reason preferredLocs can be transient is b/c that is used to define where the scheduler (on the driver) should prefer to assign the task. But no matter the value, the task could still get assigned anywhere. By the time that task

Re: DAGScheduler.handleJobCancellation uses jobIdToStageIds for verification while jobIdToActiveJob for lookup?

2016-10-13 Thread Imran Rashid
Hi Jacek, doesn't look like there is any good reason -- Mark Hamstra might know this best. Feel free to open a jira & pr for it, you can ping Mark, Kay Ousterhout, and me (@squito) for review. Imran On Thu, Oct 13, 2016 at 7:56 AM, Jacek Laskowski wrote: > Hi, > > Is there a reason why DAGSch

RFC / PRD: new executor & node blacklist mechanism (SPARK-8425)

2016-10-12 Thread Imran Rashid
Some new features are about to land in spark to improve Spark's ability to handle bad executors and nodes. These are some significant changes, and we'd like to gather more input from the community about it, especially folks that use *large clusters*. We've spent a lot of time discussing the right

apologies for flaky BlacklistIntegrationSuite

2016-06-06 Thread Imran Rashid
Hi all, just a heads up, I introduced a flaky test, BlacklistIntegrationSuite, a week ago or so. I *thought* I had solved the problems, but turns out there was more flakiness remaining. for now I've just turned the tests off, so if you this has led to failures for you, just re-trigger your build

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-06 Thread Imran Rashid
I've been a bit on the fence on this, but I agree that Luciano makes a compelling reason for why we really should publish things to maven central. Sure we slightly increase the risk somebody refers to the preview release too late, but really that is their own fault. And I also I agree with commen

Re: [VOTE] Removing module maintainer process

2016-05-23 Thread Imran Rashid
+1 (binding) On Mon, May 23, 2016 at 8:13 AM, Tom Graves wrote: > +1 (binding) > > Tom > > > On Sunday, May 22, 2016 7:34 PM, Matei Zaharia > wrote: > > > It looks like the discussion thread on this has only had positive replies, > so I'm going to call a VOTE. The proposal is to remove the main

Re: [DISCUSS] Removing or changing maintainer process

2016-05-19 Thread Imran Rashid
+1 (binding) on removal of maintainers I dont' have a strong opinion yet on how to have a system for finding the right reviewers. I agree it would be nice to have something to help you find reviewers, though I'm a little skeptical of anything automatic. On Thu, May 19, 2016 at 10:34 AM, Matei Za

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-14 Thread Imran Rashid
Hi Nezih, I just reported a somewhat similar issue, and I have a potential fix -- SPARK-14560, looks like you are already watching it :). You can try out that patch, you have to explicitly enable the change in behavior with "spark.shuffle.spillAfterRead=true". Honestly, I don't think these issue

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Imran Rashid
On Thu, Mar 17, 2016 at 2:55 PM, Cody Koeninger wrote: > Why would a PMC vote be necessary on every code deletion? > Certainly PMC votes are not necessary on *every* code deletion. I dont' think there is a very clear rule on when such discussion is warranted, just a soft expectation that commit

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Imran Rashid
On Fri, Mar 18, 2016 at 3:15 PM, Shane Curcuru wrote: > Question: why was the code removed from the Spark repo? What's the harm > in keeping it available here? Assuming the Spark PMC has no plan on releasing the code, why would we keep it in our codebase? It only makes the codebase harder to

Re: New Spark json endpoints

2015-09-17 Thread Imran Rashid
Hi Kevin, I think it would be great if you added this. It never got added in the first place b/c the original PR was already pretty bloated, and just never got back to this. I agree with Reynold -- you shouldn't need to increase the version for just adding new endpoints (or even adding new field

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-13 Thread Imran Rashid
} > } > > partitionsArray > > > > > Thanks > Best Regards > > On Wed, Aug 12, 2015 at 10:57 PM, Imran Rashid > wrote: > >> yikes. >> >> Was this a one-time thing? Or does it happen consistently? can you turn >> on debug logging

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-12 Thread Imran Rashid
yikes. Was this a one-time thing? Or does it happen consistently? can you turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...) On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das wrote: > Hi > > My Spark job (running in local[*] with spark 1.4.1) reads data from a > thrift

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-29 Thread Imran Rashid
Store&ensureFreeSpace(-2147483592) called with > curMem=6888, maxMem=92610625536& > 19177&INFO&MemoryStore&Block broadcast_2 stored as values in memory > (estimated size -2147483592.0 B, free 88.3 GB)& > Exception in thread "main" java.lang.IllegalAr

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Imran Rashid
Hi Mike, are you sure there the size isn't off 2x somehow? I just tried to reproduce with a simple test in BlockManagerSuite: test("large block") { store = makeBlockManager(4e9.toLong) val arr = new Array[Double](1 << 28) println(arr.size) val blockId = BlockId("rdd_3_10") val result =

Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-27 Thread Imran Rashid
Does scoverage work with the spark build in 2.11? That sounds like a big win On Sun, Jul 26, 2015 at 1:29 PM, Josh Rosen wrote: > Given that 2.11 may be more stringent with respect to warnings, we might > consider building with 2.11 instead of 2.10 in the pull request builder. > This would also

Re: enum-like types in Spark

2015-07-02 Thread Imran Rashid
. We can list the values in the JavaDoc and >> hope Scala will be able to correctly generate docs for Java enums in >> the future. -Xiangrui >> >> On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid >> wrote: >> > any update here? This is relevant for a currently ope

Re: OK to add committers active on JIRA to JIRA admin role?

2015-06-24 Thread Imran Rashid
+1 (partially b/c I would like jira admin myself) On Tue, Jun 23, 2015 at 3:47 AM, Sean Owen wrote: > There are some committers who are active on JIRA and sometimes need to > do things that require JIRA admin access -- in particular thinking of > adding a new person as "Contributor" in order to

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-11 Thread Imran Rashid
rrency, since I would like to determine why > our jobs have under-utilization and poor weak scaling efficiency. > > I will cc this thread over to the dev list. I did not cc them in case > my previous question was trivial---I didn't want to spam the list > unnecessarily, since I d

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-08 Thread Imran Rashid
Hi Mike, all good questions, let me take a stab at answering them: 1. Event Logs + Stages: Its normal for stages to get skipped if they are shuffle map stages, which get read multiple times. Eg., here's a little example program I wrote earlier to demonstrate this: "d3" doesn't need to be re-shu

flaky tests & scaled timeouts

2015-05-28 Thread Imran Rashid
Hi, I was just fixing a problem with too short a timeout on one of the unit tests I added (https://issues.apache.org/jira/browse/SPARK-7919), and I was wondering if this is a common problem w/ a lot of our flaky tests. Its really hard to know what to set the timeouts to -- you set the timeout so

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-20 Thread Imran Rashid
-1 discovered I accidentally removed master & worker json endpoints, will restore https://issues.apache.org/jira/browse/SPARK-7760 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.4.0! > > The tag to be voted

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-18 Thread Imran Rashid
On Fri, May 8, 2015 at 4:16 AM, Steve Loughran wrote: > Would there be a place in the code tree for some tests to run against > things like this? They're cloud integration tests rather than unit tests > and nobody would want them to be on by default, but it could be good for > regression testing

Re: Thanking Test Partners

2015-05-05 Thread Imran Rashid
+1 testing is super important, it'll be good to give recognition for it. On Mon, May 4, 2015 at 5:46 PM, Patrick Wendell wrote: > Hey All, > > Community testing during the QA window is an important part of the > release cycle in Spark. It helps us deliver higher quality releases by > vetting ou

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Imran Rashid
The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used in SparkHadoopWriter (which in turn is used by PairRDDFunctions.saveAsHadoopDataset). You could change the output committer to not use tmp files (eg. use this from Aaron Da

Re: [jira] [Commented] (SPARK-6889) Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-14 Thread Imran Rashid
These are great questions -- I dunno the answer to most of them, but I'll try to at least give my take on "What should be rejected and why?" For new features, I'm often really confused by our guidelines on what to include and what to exclude. Maybe we should ask that all new features make it clea

Re: Catching executor exception from executor in driver

2015-04-14 Thread Imran Rashid
(+dev) Hi Justin, short answer: no, there is no way to do that. I'm just guessing here, but I imagine this was done to eliminate serialization problems (eg., what if we got an error trying to serialize the user exception to send from the executors back to the driver?). Though, actually that isn'

Re: Using memory mapped file for shuffle

2015-04-14 Thread Imran Rashid
That limit doesn't have anything to do with the amount of available memory. Its just a tuning parameter, as one version is more efficient for smaller files, the other is better for bigger files. I suppose the comment is a little better in FileSegmentManagedBuffer: https://github.com/apache/spark

Re: enum-like types in Spark

2015-04-09 Thread Imran Rashid
. On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid wrote: > well, perhaps I overstated things a little, I wouldn't call it the > "official" solution, just a recommendation in the never-ending debate (and > the recommendation from folks with their hands on scala itself). &

Re: 1.3 Build Error with Scala-2.11

2015-04-07 Thread Imran Rashid
did you run dev/change-version-to-2.11.sh before compiling? When I ran this on current master, it mostly worked: dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Pscala-2.11 -DskipTests clean package There was a failure in building catalyst, but core built just fine for me. The error I g

Re: Spark config option 'expression language' feedback request

2015-04-02 Thread Imran Rashid
IMO, spark's config is kind of a mess right now. I completely agree with Reynold that Spark's handling of config ought to be super-simple, its not the kind of thing we want to put much effort in spark itself. It sounds so trivial that everyone wants to redo it, but then all these additional featu

Re: hadoop input/output format advanced control

2015-03-25 Thread Imran Rashid
on, together > with something like CombineFileInputFormat? > > On Tue, Mar 24, 2015 at 5:28 PM, Imran Rashid > wrote: > > > I think this would be a great addition, I totally agree that you need to > be > > able to set these at a finer context than just the SparkConte

Re: hadoop input/output format advanced control

2015-03-24 Thread Imran Rashid
I think this would be a great addition, I totally agree that you need to be able to set these at a finer context than just the SparkContext. Just to play devil's advocate, though -- the alternative is for you just subclass HadoopRDD yourself, or make a totally new RDD, and then you could expose wh

Re: enum-like types in Spark

2015-03-23 Thread Imran Rashid
nd of > > huge. > > > > I confess I'm swayed a bit back to Java enums, seeing what it > > involves. The hashCode() issue can be 'solved' with the hash of the > > String representation. > > > > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid

Re: enum-like types in Spark

2015-03-23 Thread Imran Rashid
>> unpredictable results at times [1]. > >>> One of the reasons why we prevent enum's from being key : though it is > >>> highly possible users might depend on it internally and shoot > >>> themselves in the foot. > >>> > >>>

  1   2   >