Hi
What KeyClass and ValueClass are you trying to save as the keys/values of
your dataset?
On Sun, Feb 23, 2014 at 10:48 AM, Nan Zhu wrote:
> Hi, all
>
> I found the weird thing on saveAsNewAPIHadoopFile in
> PairRDDFunctions.scala when working on the other issue,
>
> saveAsNewAPIHadoopFile
Thanks Parviz, this looks great and good to see it getting updated. Look
forward to 0.9.0!
A perhaps stupid question - where does the KinesisWordCount example live?
Is that an Amazon example, since I don't see it under the streaming
examples included in the Spark project. If it's a third party exa
@fommil @mengxr I think it's always worth a shot at a license change. Scikit
learn devs have been successful before in getting such things over the line.
Assuming we can make that happen, what do folks think about MTJ vs Breeze vs
JBLAS + commons-math since these seem like the viable alternativ
;> >> related to fast and flexible large-scale data analysis
> > >> >> on clusters; and be it further RESOLVED, that the office
> > >> >> of "Vice President, Apache Spark" be and hereby is created,
> > >> >> the person holding such
Is Spark active in submitting anything for this?
-- Forwarded message --
From: Rich Bowen
Date: Mon, Jan 27, 2014 at 4:20 PM
Subject: Represent your project at ApacheCon
To: committ...@apache.org
Folks,
5 days from the end of the CFP, we have only 50 talks submitted. We need
t
If you want to spend the time running 50 iterations, you're better off
re-running 5x10 iterations with different random start to get a better local
minimum...—
Sent from Mailbox for iPhone
On Sun, Jan 26, 2014 at 9:59 AM, Matei Zaharia
wrote:
> I looked into this after I opened that JIRA and i
Agree that it should be fixed if possible. But why run ALS for 50 iterations?
It tends to pretty much converge (to within 0.001 or so RMSE) after 5-10 and
even 20 is probably overkill.—
Sent from Mailbox for iPhone
On Sun, Jan 26, 2014 at 9:59 AM, Matei Zaharia
wrote:
> I looked into this afte
+1 fantastic news —
Sent from Mailbox for iPhone
On Fri, Jan 24, 2014 at 6:43 AM, Mridul Muralidharan
wrote:
> Great news !
> +1
> Regards,
> Mridul
> On Fri, Jan 24, 2014 at 4:15 AM, Matei Zaharia
> wrote:
>> Hi folks,
>>
>> We’ve been working on the transition to Apache for a while, and our
+1 for getOrElse
When I was new to Scala I tended to use match almost like if/else statements
with Option. These days I try to use map/flatMap instead and use getOrElse
extensively and I for one find it very intuitive.
I also agree that the fold syntax seems way less intuitive and I certain
t;>> directory. I would love it if Spark used the Tanuki Service Wrapper,
>>>> which
>>>>> is widely-used for Java service daemons, supports retries,
>> installation
>>>> as
>>>>> init scripts that can be chkconfig'd, etc.
One option that is 3rd party that works nicely for the Hadoop project and it's
related projects is http://search-hadoop.com - managed by sematext. Perhaps we
can plead with Otis to add Spark lists to search-spark.com, or the existing
site?
Just throwing it out there as a potential solution to a
esome if you could use either the hostname or the FQDN or the
> IP address in the Spark URL and not have Akka barf at you.
> I've been telling myself I'd look into these at some point but just haven't
> gotten around to them myself yet. Some day! I would prioritize
still needs a bit of
clean up work, and I need to add the concept of "wrapper functions" to
deserialize classes that MsgPack can't handle out the box.
N
—
Sent from Mailbox for iPhone
On Fri, Nov 8, 2013 at 12:20 PM, Nick Pentreath
wrote:
> Wow Josh, that looks great. I
Or if you're extremely ambitious work in implementing Spark Streaming in Python—
Sent from Mailbox for iPhone
On Thu, Dec 19, 2013 at 8:30 PM, Matei Zaharia
wrote:
> Hi Matt,
> If you want to get started looking at Spark, I recommend the following
> resources:
> - Our issue tracker at http://sp
? ie you no longer need to run gen-idea.
>
>
> On Sat, Dec 7, 2013 at 4:15 AM, Nick Pentreath >wrote:
>
> > Hi Spark Devs,
> >
> > Hoping someone cane help me out. No matter what I do, I cannot get
> Intellij
> > to build Spark from source. I am using IDEA
Whoohoo!
Great job everyone especially Prashant!
—
Sent from Mailbox for iPhone
On Sat, Dec 14, 2013 at 10:59 AM, Patrick Wendell
wrote:
> Alright I just merged this in - so Spark is officially "Scala 2.10"
> from here forward.
> For reference I cut a new branch called scala-2.9 with the commi
- Successfully built via sbt/sbt assembly/assembly on Mac OS X, as well
as on a dev Ubuntu EC2 box
- Successfully tested via sbt/sbt test locally
- Successfully built and tested using mvn package locally
- I've tested my own Spark jobs (built against 0.8.0-incubating) on this
RC a
Hi Spark Devs,
Hoping someone cane help me out. No matter what I do, I cannot get Intellij
to build Spark from source. I am using IDEA 13. I run sbt gen-idea and
everything seems to work fine.
When I try to build using IDEA, everything compiles but I get the error
below.
Have any of you come acr
Hi devs
I came across Dill (
http://trac.mystic.cacr.caltech.edu/project/pathos/wiki/dill) for Python
serialization. Was wondering if it may be a replacement to the cloudpickle
stuff (and remove that piece of code that needs to be maintained within
PySpark)?
Josh have you looked into Dill? Any th
Hi Spark Devs
An idea developed recently out of a scikit-learn mailing list discussion (
http://sourceforge.net/mailarchive/forum.php?thread_name=CAFvE7K5HGKYH9Myp7imrJ-nU%3DpJgeGqcCn3JC0m4MmGWZi35Hw%40mail.gmail.com&forum_name=scikit-learn-general)
to have a coding sprint around Strata in Feb, fo
CC'ing Spark Dev list
I have been thinking about this for quite a while and would really love to
see this happen.
Most of my pipeline ends up in Scala/Spark these days - which I love, but
it is partly because I am reliant on custom Hadoop input formats that are
just way easier to use from Scala/J
serializers based on each stage's
> > input and output formats (
> >
> https://github.com/JoshRosen/spark/blob/59b6b43916dc84fc8b83f22eb9ce13a27bc51ec0/python/pyspark/rdd.py#L42
> > ).
> >
> > At some point, I'd like to port my custom serializers
> the need for a delimiter by creating a PythonRDD from the newHadoopFile
> > JavaPairRDD and adding a new method to writeAsPickle (
> >
> >
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L224
> >
Hi Spark Devs
If you could pick one language binding to add to Spark what would it be?
Probably Clojure or JRuby if JVM is of interest.
I'm quite excited about Julia as a language for scientific computing (
http://julialang.org). The Julia community have been very focused on things
like interop w
Hi Spark Devs
I was wondering what appetite there may be to add the ability for PySpark
users to create RDDs from (somewhat) arbitrary Hadoop InputFormats.
In my data pipeline for example, I'm currently just using Scala (partly
because I love it but also because I am heavily reliant on quite cust
There was another discussion on the old dev list about this:
https://groups.google.com/forum/#!msg/spark-developers/GL2_DwAeh5s/9rwQ3iDa2t4J
I tend to agree with having configuration sitting in JSON (or properties
files) and using the Typesafe Config library which can parse both.
Something I've u
Is mLI available? Where is the repo located?
—
Sent from Mailbox for iPhone
On Tue, Sep 10, 2013 at 10:45 PM, Gowtham N
wrote:
> It worked.
> I was using old master for spark, which I forked many days a ago.
> On Tue, Sep 10, 2013 at 1:25 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley
for sufficiently large /skewed datasets. I
> guess I am interested in GraphX release to replace reliance on Bagel.
> 5. if the task reformulation is accepted, there are further
> optimizations that could be applied to blocking -- but this
> implementation gets the gist of it what i did in
Hi
I know everyone's pretty busy with getting 0.8.0 out, but as and when folks
have time it would be great to get your feedback on this PR adding support
for the 'implicit feedback' model variant to ALS:
https://github.com/apache/incubator-spark/pull/4
In particular any potential efficiency impro
Hi
I submitted my license agreement and account name request a while back, but
still haven't received any correspondence. Just wondering what I need to do
in order to follow this up?
Thanks
Nick
Quite interesting, and timely given current thinking around MLlib and MLI
http://orbi.ulg.ac.be/bitstream/2268/154357/1/paper.pdf
I do really like the way they have approached their API - and so far MLlib
seems to be following a (roughly) similar approach.
Interesting in particular they obviousl
used in practice, and it
>> would be great to add them to the MLI library (and perhaps also MLlib).
>>
>> -Ameet
>>
>>
>> On Thu, Jul 25, 2013 at 6:44 AM, Nick Pentreath
>> wrote:
>>
>>> Hi
>>>
>>> Ok, that all makes sense
ibuting
> to it. MLI is a private repository right now, but we'll make it public
> soon though, and Evan Sparks or I will let you know when we do so.
>
> Thanks again for getting in touch with us!
>
> -Ameet
>
>
> On Wed, Jul 24, 2013 at 11:47 AM, Rey
Hi dev team
(Apologies for a long email!)
Firstly great news about the inclusion of MLlib into the Spark project!
I've been working on a concept and some code for a machine learning library
on Spark, and so of course there is a lot of overlap between MLlib and what
I've been doing.
I wanted to
34 matches
Mail list logo