Re: PSA: JIRA resolutions and meanings

2016-10-09 Thread Sean Owen
I added a variant on this text to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingtoJIRAMaintenance On Sat, Oct 8, 2016 at 10:09 AM Sean Owen wrote: > That flood of emails means several people (Xiao, Holden mostly

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Yeah, I've looked at KIPs and Scala SIPs. I'm reluctant to use the Kafka structured streaming as an example because of the pre-existing conflict around it. If Michael or another committer wanted to put it forth as an example, I'd participate in good faith though. On Sun, Oct 9, 2016 at 5:07 PM,

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
Well, I think there are a few things here that don't make sense. First, why should only committers submit SIPs? Development in the project should be open to all contributors, whether they're committers or not. Second, I think unrealistic goals can be found just by inspecting the goals, and I'm

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Only committers should formally submit SIPs because in an apache project only commiters have explicit political power. If a user can't find a commiter willing to sponsor an SIP idea, they have no way to get the idea passed in any case. If I can't find a committer to sponsor this meta-SIP idea,

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Felix Cheung
Should we just link to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" > wrote: Thanks for confirming this, Sean. I filed this in

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
Thanks for confirming this, Sean. I filed this in https://issues.apache.org/jira/browse/SPARK-17840 I would appreciate if anyone who has a better writing skills better than me tries to fix this. I don't want to let reviewers make an effort to correct the grammar. On 10 Oct 2016 1:34 a.m.,

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Here's my specific proposal (meta-proposal?) Spark Improvement Proposals (SIP) Background: The current problem is that design and implementation of large features are often done in private, before soliciting user feedback. When feedback is solicited, it is often as to detailed design

Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
Hi all, I just noticed the README.md (https://github.com/apache/spark) does not describe the steps or links to follow for creating a PR or JIRA directly. I know probably it is sensible to search google about the contribution guides first before trying to make a PR/JIRA but I think it seems not

Re: PSA: JIRA resolutions and meanings

2016-10-09 Thread Cody Koeninger
That's awesome Sean, very clear. One minor thing, noncommiters can't change assigned field as far as I know. On Oct 9, 2016 3:40 AM, "Sean Owen" wrote: I added a variant on this text to https://cwiki.apache.org/

Re: Spark Improvement Proposals

2016-10-09 Thread Ofir Manor
This is a great discussion! Maybe you could have a look at Kafka's process - it also uses Rejected Alternatives and I personally find it very clear actually (the link also leads to all KIPs): https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Cody - maybe you could take

Re: Spark Improvement Proposals

2016-10-09 Thread Nicholas Chammas
On Sun, Oct 9, 2016 at 5:19 PM Cody Koeninger wrote: > Regarding name, if the SIP overlap is a concern, we can pick a different > name. > > My tongue in cheek suggestion would be > > Spark Lightweight Improvement process (SPARKLI) > If others share my minor concern about the

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Reynold Xin
Github already links to CONTRIBUTING.md. -- of course, a lot of people ignore that. One thing we can do is to add an explicit link to the wiki contributing page in the template (but note that even that introduces some overhead for every pull request). Aside from that, I am not sure if the other

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Reynold Xin
Actually let's move the discussion to the JIRA ticket, given there is a ticket. On Sun, Oct 9, 2016 at 5:36 PM, Reynold Xin wrote: > Github already links to CONTRIBUTING.md. -- of course, a lot of people > ignore that. One thing we can do is to add an explicit link to the

SPARK-17845 - window function frame boundary API

2016-10-09 Thread Reynold Xin
Hi all, I tried to use the window function DataFrame API this weekend and found it awkward to use, especially with respect to specifying frame boundaries. I wrote down some options here and am curious your thoughts. If you have suggestions on the API beyond what's already listed in the JIRA

Re: SPARK-17845 - window function frame boundary API

2016-10-09 Thread ayan guha
Hi Reynold Thanks for asking. I am from sql world and use sparl sql with analytical functions prety heavily. IMHO, Window.rowsBetween() as a function name looks fine. What i would propose would be: Window.rowsBetween(startFrom=UNBOUNDED,endTo=CURRENT_ROW,preceeding=0,following=0) startFrom,

Re: This Exception has been really hard to trace

2016-10-09 Thread kant kodali
Hi Reynold, Actually, I did that a well before posting my question here. Thanks,kant On Sun, Oct 9, 2016 8:48 PM, Reynold Xin r...@databricks.com wrote: You should probably check with DataStax who build the Cassandra connector for Spark. On Sun, Oct 9, 2016 at 8:13 PM, kant kodali

This Exception has been really hard to trace

2016-10-09 Thread kant kodali
I tried SpanBy but look like there is a strange error that happening no matter which way I try. Like the one here described for Java solution. http://qaoverflow.com/question/how-to-use-spanby-in-java/ java.lang.ClassCastException: cannot assign instance of

Re: This Exception has been really hard to trace

2016-10-09 Thread Reynold Xin
You should probably check with DataStax who build the Cassandra connector for Spark. On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote: > > I tried SpanBy but look like there is a strange error that happening no > matter which way I try. Like the one here described for Java

Re: Spark Improvement Proposals

2016-10-09 Thread Nicholas Chammas
- Rejected strategies: I personally wouldn’t put this, because what’s the point of voting to reject a strategy before you’ve really begun designing and implementing something? What if you discover that the strategy is actually better when you start doing stuff? I would guess the point

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
If there's confusion there, the document is specifically what I'm proposing. The email is just by way of introduction. On Sun, Oct 9, 2016 at 3:47 PM, Nicholas Chammas wrote: > Oh, hmm… I guess I’m a little confused on the relation between Cody’s > email and the

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
Yup, this is the stuff that I found unclear. Thanks for clarifying here, but we should also clarify it in the writeup. In particular: - Goals needs to be about user-facing behavior ("people" is broad) - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up one of these and say

Re: Spark Improvement Proposals

2016-10-09 Thread Nicholas Chammas
Oh, hmm… I guess I’m a little confused on the relation between Cody’s email and the document he linked to, which says: https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md#when SIPs should be used for significant user-facing or cross-cutting changes, not day-to-day

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
Yup, but the example you gave is for alternatives about *user-facing behavior*, not implementation. The current SIP doc describes "strategy" more as implementation strategy. I'm just saying there are different possible goals for these types of docs. BTW, PEPs and Scala SIPs focus primarily on

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Regarding name, if the SIP overlap is a concern, we can pick a different name. My tongue in cheek suggestion would be Spark Lightweight Improvement process (SPARKLI) On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger wrote: > So to focus the discussion on the specific strategy

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Users instead of people, sure. Commiters and contributors are (or at least should be) a subset of users. Non goals, sure. I don't care what the name is, but we need to clearly say e.g. 'no we are not maintaining compatibility with XYZ right now'. API, what I care most about is whether it allows

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
So to focus the discussion on the specific strategy I'm suggesting, documented at https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md "Goals: What must this allow people to do, that they can't currently?" Is it unclear that this is focusing specifically on