Thanks for the info. I agree, it makes sense the way it is designed.
Pramod
On Sat, May 2, 2015 at 10:37 PM, Mridul Muralidharan
wrote:
> I agree, this is better handled by the filesystem cache - not to
> mention, being able to do zero copy writes.
>
> Regards,
> Mridul
>
> On Sat, May 2, 2015
Part of the reason is that it is really easy to just call toDF on Scala,
and we already have a lot of createDataFrame functions.
(You might find some of the cross-language differences confusing, but I'd
argue most real users just stick to one language, and developers or
trainers are the only ones
I agree, this is better handled by the filesystem cache - not to
mention, being able to do zero copy writes.
Regards,
Mridul
On Sat, May 2, 2015 at 10:26 PM, Reynold Xin wrote:
> I've personally prototyped completely in-memory shuffle for Spark 3 times.
> However, it is unclear how big of a gain
Hi,
I’ve posted this problem in user@spark but find no reply, therefore moved to
dev@spark, sorry for duplication.
I am wondering if it is possible to submit, monitor & kill spark applications
from another service.
I have wrote a service this:
parse user commands
translate them into understan
I've personally prototyped completely in-memory shuffle for Spark 3 times.
However, it is unclear how big of a gain it would be to put all of these in
memory, under newer file systems (ext4, xfs). If the shuffle data is small,
they are still in the file system buffer cache anyway. Note that network
Hi Shane,
Since we are still maintaining support for jdk6, jenkins should be
using jdk6 [1] to ensure we do not inadvertently use jdk7 or higher
api which breaks source level compat.
-source and -target is insufficient to ensure api usage is conformant
with the minimum jdk version we are support
i think i might be misunderstanding, but shouldnt java 6 currently be used
in jenkins?
On Sat, May 2, 2015 at 11:53 PM, shane knapp wrote:
> that's kinda what we're doing right now, java 7 is the default/standard on
> our jenkins.
>
> or, i vote we buy a butler's outfit for thomas and have a sec
that's kinda what we're doing right now, java 7 is the default/standard on
our jenkins.
or, i vote we buy a butler's outfit for thomas and have a second jenkins
instance... ;)
On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan
wrote:
> We could build on minimum jdk we support for testing pr's
Maybe I can help a bit. What happens when you call .map(my func) is
that you create a MapPartitionsRDD that has a reference to that
closure in it's compute() function. When a job is run (jobs are run as
the result of RDD actions):
https://github.com/apache/spark/blob/master/core/src/main/scala/org
+1
On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan
wrote:
> We could build on minimum jdk we support for testing pr's - which will
> automatically cause build failures in case code uses newer api ?
>
> Regards,
> Mridul
>
> On Fri, May 1, 2015 at 2:46 PM, Reynold Xin wrote:
> > It's really
We could build on minimum jdk we support for testing pr's - which will
automatically cause build failures in case code uses newer api ?
Regards,
Mridul
On Fri, May 1, 2015 at 2:46 PM, Reynold Xin wrote:
> It's really hard to inspect API calls since none of us have the Java
> standard library in
To close this thread rxin created a broader Jira to handle window functions
in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322
Thanks everyone.
Le mer. 29 avr. 2015 à 22:51, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> To give you a broader idea of the current use
It's really hard to inspect API calls since none of us have the Java
standard library in our brain. The only way we can enforce this is to have
it in Jenkins, and Tom you are currently our mini-Jenkins server :)
Joking aside, looks like we should support Java 6 in 1.4, and in the
release notes inc
I am trying to understand what the data and computation flow is in Spark, and
believe I fairly understand the Shuffle (both map and reduce side), but I do
not get what happens to the computation from the map stages. I know all maps
gets pipelined on the shuffle (when there is no other action in bet
Hi everyone,
SQLContext.createDataFrame has different behaviour in Scala or Python :
>>> l = [('Alice', 1)]
>>> sqlContext.createDataFrame(l).collect()
[Row(_1=u'Alice', _2=1)]
>>> sqlContext.createDataFrame(l, ['name', 'age']).collect()
[Row(name=u'Alice', age=1)]
and in Scala :
scala> val data
Hi,
I was trying to see if I can make Spark avoid hitting the disk for small
jobs, but I see that the SortShuffleWriter.write() always writes to disk. I
found an older thread (
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-td584.html)
saying that it doesn't call
16 matches
Mail list logo