Re: hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread Christopher Nguyen
Welcome, Shane. As a former prof and eng dir at Google, I've been expecting this to be a first-class engineering college subject. I just didn't expect it to come through this route :-) So congrats, and I hope you represent the beginning of a great new trend at universities. Sent while mobile. Ple

Re: CoHadoop Papers

2014-08-26 Thread Christopher Nguyen
Gary, do you mean Spark and HDFS separately, or Spark's use of HDFS? If the former, Spark does support copartitioning. If the latter, it's an HDFS scope that's outside of Spark. On that note, Hadoop does also make attempts to collocate data, e.g., rack awareness. I'm sure the paper makes useful c

Re: Welcoming two new committers

2014-08-08 Thread Christopher Nguyen
+1 Joey & Andrew :) -- Christopher T. Nguyen Co-founder & CEO, Adatao [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez wrote: > Hi Everyone, > > Thank you for inviting me to be a committer. I look forward to working > with everyone t

Re: "Dynamic variables" in Spark

2014-07-21 Thread Christopher Nguyen
Hi Neil, first off, I'm generally a sympathetic advocate for making changes to Spark internals to make it easier/better/faster/more awesome. In this case, I'm (a) not clear about what you're trying to accomplish, and (b) a bit worried about the proposed solution. On (a): it is stated that you wan

Re: Opiq for SParkSQL?

2014-06-07 Thread Christopher Nguyen
Yan, it looks like Julian did anticipate exactly this possibility: https://github.com/julianhyde/optiq/tree/master/spark Optiq is a cool project vision in terms of hiding various engines behind one consistent API. That said, from just the Spark perspective, I don't see a huge value add to layer

Re: Announcing Spark 1.0.0

2014-05-30 Thread Christopher Nguyen
Awesome work, Pat et al.! -- Christopher T. Nguyen Co-founder & CEO, Adatao linkedin.com/in/ctnguyen On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell wrote: > I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 > is a milestone release as the first in the

Re: LogisticRegression: Predicting continuous outcomes

2014-05-28 Thread Christopher Nguyen
Bharath, (apologies if you're already familiar with the theory): the proposed approach may or may not be appropriate depending on the overall transfer function in your data. In general, a single logistic regressor cannot approximate arbitrary non-linear functions (of linear combinations of the inpu

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Christopher Nguyen
Qing Yang, Andy is correct in answering your direct question. At the same time, depending on your context, you may be able to apply a pattern where you turn the single Spark application into a service, and multiple clients if that service can indeed share access to the same RDDs. Several groups h

Re: Announcing the official Spark Job Server repo

2014-03-19 Thread Christopher Nguyen
+1, Evan et al. -- Christopher T. Nguyen Co-founder & CEO, Adatao linkedin.com/in/ctnguyen On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan wrote: > Dear Spark developers, > > Ooyala is happy to announce that we have pushed our official, Spark > 0.9.0 / Scala 2.10-compatible, jo