Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Mark Grover
Great, thanks for confirming, Reynold. Appreciate it! On Tue, Apr 19, 2016 at 4:20 PM, Reynold Xin wrote: > I talked to Lianhui offline and he said it is not that big of a deal to > revert the patch. > > > On Tue, Apr 19, 2016 at 9:52 AM, Mark Grover wrote: > >> Thanks. >> >> I'm more than happ

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Reynold Xin
I talked to Lianhui offline and he said it is not that big of a deal to revert the patch. On Tue, Apr 19, 2016 at 9:52 AM, Mark Grover wrote: > Thanks. > > I'm more than happy to wait for more people to chime in here but I do feel > that most of us are leaning towards Option B anyways. So, I cr

Re: Possible deadlock in registering applications in the recovery mode

2016-04-19 Thread Niranda Perera
Hi Reynold, I have created a JIRA for this [1]. I have also created a PR for the same issue [2]. Would be very grateful if you could look into this, because this is a blocker in our spark deployment, which uses number of spark custom extension. thanks best [1] https://issues.apache.org/jira/bro

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 11:21 AM, Ted Yu wrote: > Clarification: in my previous email, I was not talking > about spark-streaming-flume artifact or spark-streaming-kafka artifact. > I was talking about examples for these projects, such > as examples//src/main/python/streaming/flume_wordcount.py >

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
Clarification: in my previous email, I was not talking about spark-streaming-flume artifact or spark-streaming-kafka artifact. I was talking about examples for these projects, such as examples//src/main/python/streaming/flume_wordcount.py On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin wrote:

Re: Organizing Spark ML example packages

2016-04-19 Thread Bryan Cutler
+1, adding some organization would make it easier for people to find a specific example On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang wrote: > This sounds good to me, and it will make ML examples more neatly. > > 2016-04-14 5:28 GMT-07:00 Nick Pentreath : > >> Hey Spark devs >> >> I noticed that

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu wrote: > The same question can be asked w.r.t. examples for other projects, such as > flume > and kafka. > The main difference being that flume and kafka integration are part of Spark itself. HBase integration is not. > On Tue, Apr 19, 2016 at 11:01 A

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
The same question can be asked w.r.t. examples for other projects, such as flume and kafka. On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin wrote: > Let's posit that the spark example is much better than what is available > in HBase. Why is that a reason to keep it within Spark? > > On Tue, Apr

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcin Tustin
Let's posit that the spark example is much better than what is available in HBase. Why is that a reason to keep it within Spark? On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu wrote: > bq. HBase's current support, even if there are bugs or things that still > need to be done, is much better than the Sp

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. HBase's current support, even if there are bugs or things that still need to be done, is much better than the Spark example In my opinion, a simple example that works is better than a buggy package. I hope before long the hbase-spark module in HBase can arrive at a state which we can advertis

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
You're completely missing my point. I'm saying that HBase's current support, even if there are bugs or things that still need to be done, is much better than the Spark example, which is basically a call to "SparkContext.hadoopRDD". Spark's example is not helpful in learning how to build an HBase a

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
There is an Open JIRA for fixing the documentation: HBASE-15473 I would say the refguide link you provided should not be considered as complete. Note it is marked as Blocker by Sean B. On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin wrote: > You're entitled to your own opinions. > > While you

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
'bq.' is used in JIRA to quote what other people have said. On Tue, Apr 19, 2016 at 10:42 AM, Reynold Xin wrote: > Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian) > syntax? They are not being rendered in email. > > > On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > >

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
You're entitled to your own opinions. While you're at it, here's some much better documentation, from the HBase project themselves, than what the Spark example provides: http://hbase.apache.org/book.html#spark On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > bq. it's actually in use right now i

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Reynold Xin
Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian) syntax? They are not being rendered in email. On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > bq. it's actually in use right now in spite of not being in any upstream > HBase release > > If it is not in upstream, then

Re: RFC: Remote "HBaseTest" from examples?

2016-04-19 Thread Josh Rosen
+1; I think that it's preferable for code examples, especially third-party integration examples, to live outside of Spark. On Tue, Apr 19, 2016 at 10:29 AM Reynold Xin wrote: > Yea in general I feel examples that bring in a large amount of > dependencies should be outside Spark. > > > On Tue, Ap

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. it's actually in use right now in spite of not being in any upstream HBase release If it is not in upstream, then it is not relevant for discussion on Apache mailing list. On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin wrote: > Alright, if you prefer, I'll say "it's actually in use right

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
Alright, if you prefer, I'll say "it's actually in use right now in spite of not being in any upstream HBase release", and it's more useful than a single example file in the Spark repo for those who really want to integrate with HBase. Spark's example is really very trivial (just uses one of HBase

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. create a separate tarball for them Probably another thread can be started for the above. I am fine with it. On Tue, Apr 19, 2016 at 10:34 AM, Marcelo Vanzin wrote: > On Tue, Apr 19, 2016 at 10:28 AM, Reynold Xin wrote: > > Yea in general I feel examples that bring in a large amount of > de

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 10:28 AM, Reynold Xin wrote: > Yea in general I feel examples that bring in a large amount of dependencies > should be outside Spark. Another option to avoid the dependency problem is to not ship examples in the distribution, and maybe create a separate tarball for them; r

Re: RFC: Remote "HBaseTest" from examples?

2016-04-19 Thread Reynold Xin
Yea in general I feel examples that bring in a large amount of dependencies should be outside Spark. On Tue, Apr 19, 2016 at 10:15 AM, Marcelo Vanzin wrote: > Hey all, > > Two reasons why I think we should remove that from the examples: > > - HBase now has Spark integration in its own repo, so

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. I wouldn't call it "incomplete". I would call it incomplete. Please see HBASE-15333 'Enhance the filter to handle short, integer, long, float and double' which is a bug fix. Please exclude presence of related of module in vendor distro from this discussion. Thanks On Tue, Apr 19, 2016 at 1

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu wrote: > I want to note that the hbase-spark module in HBase is incomplete. Zhan has > several patches pending review. I wouldn't call it "incomplete". Lots of functionality is there, which doesn't mean new ones, or more efficient implementations of existi

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
Corrected typo in subject. I want to note that the hbase-spark module in HBase is incomplete. Zhan has several patches pending review. hbase-spark module is currently only in master branch which would be released as 2.0 However the release date for 2.0 is unclear - probably half a year from now.

RFC: Remote "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
Hey all, Two reasons why I think we should remove that from the examples: - HBase now has Spark integration in its own repo, so that really should be the template for how to use HBase from Spark, making that example less useful, even misleading. - It brings up a lot of extra dependencies that ma

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Mark Grover
Thanks. I'm more than happy to wait for more people to chime in here but I do feel that most of us are leaning towards Option B anyways. So, I created a JIRA (SPARK-14731) for reverting SPARK-12130 in Spark 2.0 and file a PR shortly. Mark On Tue, Apr 19, 2016 at 7:44 AM, Tom Graves wrote: > It

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Mark Grover
On Tue, Apr 19, 2016 at 2:26 AM, Steve Loughran wrote: > > > On 18 Apr 2016, at 23:05, Marcelo Vanzin wrote: > > > > On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin > wrote: > >> The bigger problem is that it is much easier to maintain backward > >> compatibility rather than dictating forward comp

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Tom Graves
It would be nice if we could keep this compatible between 1.6 and 2.0 so I'm more for Option B at this point since the change made seems minor and we can change to have shuffle service do internally like Marcelo mention. Then lets try to keep compatible, but if there is a forcing function lets f

Question about storage memory in unified memory manager

2016-04-19 Thread Patrick Woody
Hey all, I had a question about the MemoryStore for the BlockManager with the unified memory manager v.s. the legacy mode. In the unified format, I would expect the max size of the MemoryStore to be * * in the same way that when using the StaticMemoryManager it is * * . Instead it appea

Introduction to Spark workshop, May 9, New York

2016-04-19 Thread Rich Bowen
Hi, folks, I received the following request: --- The guy who was going to teach the Introduction to Spark workshop at Data Summit on May 9th has changed jobs and can no longer do the workshop. Know anybody in the New York area who could fill in? It's scheduled from 9 to 12 at the New Yo

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-19 Thread Hyukjin Kwon
I agree with you, Sean, almost all. If feedback can be given enough, it might be okay to apply the rule as Reynold said it looks they are not mutually exclusive (although I still think there should be more committers or should be more active because the main reason for this looks there are PRs not

Re: [spark.ml] Why is private class ColumnPruner?

2016-04-19 Thread Jacek Laskowski
Hi Yanbo, https://issues.apache.org/jira/browse/SPARK-14730 Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Apr 19, 2016 at 8:55 AM, Yanbo Liang wro

Re: more uniform exception handling?

2016-04-19 Thread Steve Loughran
On 18 Apr 2016, at 20:16, Reynold Xin mailto:r...@databricks.com>> wrote: Josh's pull request on rpc exception handling got me to think ... In my experience, there have been a few things related exceptions that created a lot of trouble for us in pro

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Steve Loughran
> On 18 Apr 2016, at 23:05, Marcelo Vanzin wrote: > > On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin wrote: >> The bigger problem is that it is much easier to maintain backward >> compatibility rather than dictating forward compatibility. For example, as >> Marcin said, if we come up with a sligh

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-19 Thread Sean Owen
I support this. We used to do this, right? Anecdotally, from watching the stream most days, most stale PRs are, in descending order of frequency: 1. Probably not a good change, and not looked at (as a result) 2. Abandoned by submitter at some stage 3. Not an important change, not so bad, not reall

Re: more uniform exception handling?

2016-04-19 Thread Sean Owen
We already have SparkException, indeed. The ID is an interesting idea; simple to implement and might help disambiguate. Does it solve a lot of problems of this form? if something is squelching Exception or SparkException the result will be the same. #2 is something we can sniff out with static ana