date:20150502

Re: Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Pramod Biligiri

Thanks for the info. I agree, it makes sense the way it is designed. Pramod On Sat, May 2, 2015 at 10:37 PM, Mridul Muralidharan wrote: > I agree, this is better handled by the filesystem cache - not to > mention, being able to do zero copy writes. > > Regards, > Mridul > > On Sat, May 2, 2015

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-02 Thread Reynold Xin

Part of the reason is that it is really easy to just call toDF on Scala, and we already have a lot of createDataFrame functions. (You might find some of the cross-language differences confusing, but I'd argue most real users just stick to one language, and developers or trainers are the only ones

Re: Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Mridul Muralidharan

I agree, this is better handled by the filesystem cache - not to mention, being able to do zero copy writes. Regards, Mridul On Sat, May 2, 2015 at 10:26 PM, Reynold Xin wrote: > I've personally prototyped completely in-memory shuffle for Spark 3 times. > However, it is unclear how big of a gain

Submit & Kill Spark Application program programmatically from another application

2015-05-02 Thread Yijie Shen

Hi, I’ve posted this problem in user@spark but find no reply, therefore moved to dev@spark, sorry for duplication. I am wondering if it is possible to submit, monitor & kill spark applications from another service. I have wrote a service this: parse user commands translate them into understan

Re: Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Reynold Xin

I've personally prototyped completely in-memory shuffle for Spark 3 times. However, it is unclear how big of a gain it would be to put all of these in memory, under newer file systems (ext4, xfs). If the shuffle data is small, they are still in the file system buffer cache anyway. Note that network

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan

Hi Shane, Since we are still maintaining support for jdk6, jenkins should be using jdk6 [1] to ensure we do not inadvertently use jdk7 or higher api which breaks source level compat. -source and -target is insufficient to ensure api usage is conformant with the minimum jdk version we are support

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Koert Kuipers

i think i might be misunderstanding, but shouldnt java 6 currently be used in jenkins? On Sat, May 2, 2015 at 11:53 PM, shane knapp wrote: > that's kinda what we're doing right now, java 7 is the default/standard on > our jenkins. > > or, i vote we buy a butler's outfit for thomas and have a sec

Re: [discuss] ending support for Java 6?

2015-05-02 Thread shane knapp

that's kinda what we're doing right now, java 7 is the default/standard on our jenkins. or, i vote we buy a butler's outfit for thomas and have a second jenkins instance... ;) On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan wrote: > We could build on minimum jdk we support for testing pr's

Re: What is the location in the source code of the computation of the elements in a map transformation?

2015-05-02 Thread Patrick Wendell

Maybe I can help a bit. What happens when you call .map(my func) is that you create a MapPartitionsRDD that has a reference to that closure in it's compute() function. When a job is run (jobs are run as the result of RDD actions): https://github.com/apache/spark/blob/master/core/src/main/scala/org

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Ted Yu

+1 On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan wrote: > We could build on minimum jdk we support for testing pr's - which will > automatically cause build failures in case code uses newer api ? > > Regards, > Mridul > > On Fri, May 1, 2015 at 2:46 PM, Reynold Xin wrote: > > It's really

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan

We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin wrote: > It's really hard to inspect API calls since none of us have the Java > standard library in

Re: Pandas' Shift in Dataframe

2015-05-02 Thread Olivier Girardot

To close this thread rxin created a broader Jira to handle window functions in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322 Thanks everyone. Le mer. 29 avr. 2015 à 22:51, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > To give you a broader idea of the current use

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Reynold Xin

It's really hard to inspect API calls since none of us have the Java standard library in our brain. The only way we can enforce this is to have it in Jenkins, and Tom you are currently our mini-Jenkins server :) Joking aside, looks like we should support Java 6 in 1.4, and in the release notes inc

What is the location in the source code of the computation of the elements in a map transformation?

2015-05-02 Thread Tom Hubregtsen

I am trying to understand what the data and computation flow is in Spark, and believe I fairly understand the Shuffle (both map and reduce side), but I do not get what happens to the computation from the map stages. I know all maps gets pipelined on the shuffle (when there is no other action in bet

createDataFrame allows column names as second param in Python not in Scala

2015-05-02 Thread Olivier Girardot

Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : >>> l = [('Alice', 1)] >>> sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] >>> sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala> val data

Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Pramod Biligiri

Hi, I was trying to see if I can make Spark avoid hitting the disk for small jobs, but I see that the SortShuffleWriter.write() always writes to disk. I found an older thread ( http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-td584.html) saying that it doesn't call

Re: Why does SortShuffleWriter write to disk always?

Re: createDataFrame allows column names as second param in Python not in Scala

Re: Why does SortShuffleWriter write to disk always?

Submit & Kill Spark Application program programmatically from another application

Re: Why does SortShuffleWriter write to disk always?

Re: [discuss] ending support for Java 6?

Re: [discuss] ending support for Java 6?

Re: [discuss] ending support for Java 6?

Re: What is the location in the source code of the computation of the elements in a map transformation?

Re: [discuss] ending support for Java 6?

Re: [discuss] ending support for Java 6?

Re: Pandas' Shift in Dataframe

Re: [discuss] ending support for Java 6?

What is the location in the source code of the computation of the elements in a map transformation?

createDataFrame allows column names as second param in Python not in Scala

Why does SortShuffleWriter write to disk always?

16 matches

Site Navigation

Mail list logo

Footer information