Hi All,
I'm happy to announce the Spark 1.4.1 maintenance release.
We recommend all users on the 1.4 branch upgrade to
this release, which contain several important bug fixes.
Download Spark 1.4.1 - http://spark.apache.org/downloads.html
Release notes -
Hi All,
Today, I'm happy to announce SparkHub
(http://sparkhub.databricks.com), a service for the Apache Spark
community to easily find the most relevant Spark resources on the web.
SparkHub is a curated list of Spark news, videos and talks, package
releases, upcoming events around the world,
...@gmail.com wrote:
Ok so it is the case that small shuffles can be done without hitting any
disk. Is this the same case for the aux shuffle service in yarn? Can that be
done without hitting disk?
On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote:
In many cases the shuffle
Hi All,
I'm happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is
the fifth release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 210 developers and more
than 1,000 commits!
A huge thanks go to all of the individuals and organizations
In many cases the shuffle will actually hit the OS buffer cache and
not ever touch spinning disk if it is a size that is less than memory
on the machine.
- Patrick
On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com wrote:
So with this... to help my understanding of Spark under the
Hi Deepak - please direct this to the user@ list. This list is for
development of Spark itself.
On Sun, Apr 26, 2015 at 12:42 PM, Deepak Gopalakrishnan
dgk...@gmail.com wrote:
Hello All,
I'm trying to process a 3.5GB file on standalone mode using spark. I could
run my spark job succesfully on
Hi All,
I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases.
We recommend all users on the 1.3 and 1.2 Spark branches upgrade to
these releases, which contain several important bug fixes.
Download Spark 1.3.1 or 1.2.2:
http://spark.apache.org/downloads.html
Release notes:
Hey Jonathan,
Are you referring to disk space used for storing persisted RDD's? For
that, Spark does not bound the amount of data persisted to disk. It's
a similar story to how Spark's shuffle disk output works (and also
Hadoop and other frameworks make this assumption as well for their
shuffle
If you invoke this, you will get at-least-once semantics on failure.
For instance, if a machine dies in the middle of executing the foreach
for a single partition, that will be re-executed on another machine.
It could even fully complete on one machine, but the machine dies
immediately before
The source code should match the Spark commit
4aaf48d46d13129f0f9bdafd771dd80fe568a7dc. Do you see any differences?
On Fri, Mar 27, 2015 at 11:28 AM, Manoj Samel manojsamelt...@gmail.com wrote:
While looking into a issue, I noticed that the source displayed on Github
site does not matches the
I think we have a version of mapPartitions that allows you to tell
Spark the partitioning is preserved:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L639
We could also add a map function that does same. Or you can just write
your map using an
Hey Jim,
Thanks for reporting this. Can you give a small end-to-end code
example that reproduces it? If so, we can definitely fix it.
- Patrick
On Tue, Mar 24, 2015 at 4:55 PM, Jim Carroll jimfcarr...@gmail.com wrote:
I have code that works under 1.2.1 but when I upgraded to 1.3.0 it fails to
Hey Yiannis,
If you just perform a count on each name, date pair... can it succeed?
If so, can you do a count and then order by to find the largest one?
I'm wondering if there is a single pathologically large group here that is
somehow causing OOM.
Also, to be clear, you are getting GC limit
Hi All,
I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is
the fourth release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 172 developers and more
than 1,000 commits!
Visit the release notes [1] to read about the new features, or
We don't support expressions or wildcards in that configuration. For
each application, the local directories need to be constant. If you
have users submitting different Spark applications, those can each set
spark.local.dirs.
- Patrick
On Wed, Mar 11, 2015 at 12:14 AM, Jianshi Huang
You may need to add the -Phadoop-2.4 profile. When building or release
packages for Hadoop 2.4 we use the following flags:
-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
- Patrick
On Thu, Mar 5, 2015 at 12:47 PM, Kelly, Jonathan jonat...@amazon.com wrote:
I confirmed that this has nothing to
I think we need to just update the docs, it is a bit unclear right
now. At the time, we made it worded fairly sternly because we really
wanted people to use --jars when we deprecated SPARK_CLASSPATH. But
there are other types of deployments where there is a legitimate need
to augment the classpath
Added - thanks! I trimmed it down a bit to fit our normal description length.
On Mon, Jan 5, 2015 at 8:24 AM, Thomas Stone tho...@prediction.io wrote:
Please can we add PredictionIO to
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
PredictionIO
http://prediction.io/
I've added it, thanks!
On Fri, Feb 20, 2015 at 12:22 AM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
Could you please add Big Industries to the Powered by Spark page at
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ?
Company Name: Big Industries
URL:
is exactly the issue: on my master node UI
at the bottom I can see the list of Completed Drivers all with ERROR
state...
Thanks,
Oleg
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Monday, February 23, 2015 12:59 AM
To: Oleg Shirokikh
Cc: user
The map will start with a capacity of 64, but will grow to accommodate
new data. Are you using the groupBy operator in Spark or are you using
Spark SQL's group by? This usually happens if you are grouping or
aggregating in a way that doesn't sufficiently condense the data
created from each input
I think there is a minor error here in that the first example needs a
tail after the seq:
df.map { row =
(row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double]))
}.toDataFrame(label, features)
On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust
mich...@databricks.com wrote:
It sounds like
Hey Jerry,
I think standalone mode will still add more features over time, but
the goal isn't really for it to become equivalent to what Mesos/YARN
are today. Or at least, I doubt Spark Standalone will ever attempt to
manage _other_ frameworks outside of Spark and become a general
purpose
Akhil,
Those are handled by ASF infrastructure, not anyone in the Spark
project. So this list is not the appropriate place to ask for help.
- Patrick
On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
My mails to the mailing list are getting rejected, have opened a
It should appear in the page for any stage in which accumulators are updated.
On Wed, Jan 14, 2015 at 6:46 PM, Justin Yip yipjus...@prediction.io wrote:
Hello,
From accumulator documentation, it says that if the accumulator is named, it
will be displayed in the WebUI. However, I cannot find
What do you mean when you say the overhead of spark shuffles start to
accumulate? Could you elaborate more?
In newer versions of Spark shuffle data is cleaned up automatically
when an RDD goes out of scope. It is safe to remove shuffle data at
this point because the RDD can no longer be
Hey Eric,
I'm just curious - which specific features in 1.2 do you find most
help with usability? This is a theme we're focusing on for 1.3 as
well, so it's helpful to hear what makes a difference.
- Patrick
On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
eric.d.fried...@gmail.com wrote:
Hi
Is it sufficient to set spark.hadoop.validateOutputSpecs to false?
http://spark.apache.org/docs/latest/configuration.html
- Patrick
On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai saisai.s...@intel.com wrote:
Hi,
We have such requirements to save RDD output to HDFS with saveAsTextFile
like
: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:22 PM
To: Shao, Saisai
Cc: user@spark.apache.org; d...@spark.apache.org
Subject: Re: Question on saveAsTextFile with overwrite option
Is it sufficient to set spark.hadoop.validateOutputSpecs to false?
http
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is
the third release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 172 developers and more
than 1,000 commits!
This release brings operational and performance improvements in Spark
, 2014 at 12:57 AM, Patrick Wendell pwend...@gmail.com wrote:
The second choice is better. Once you call collect() you are pulling
all of the data onto a single node, you want to do most of the
processing in parallel on the cluster, which is what map() will do.
Ideally you'd try to summarize
Hey Manoj,
One proposal potentially of interest is the Spark Kernel project from
IBM - you should look for their. The interface in that project is more
of a remote REPL interface, i.e. you submit commands (as strings)
and get back results (as strings), but you don't have direct
programmatic
Yeah the main way to do this would be to have your own static cache of
connections. These could be using an object in Scala or just a static
variable in Java (for instance a set of connections that you can
borrow from).
- Patrick
On Thu, Dec 4, 2014 at 5:26 PM, Tobias Pfeiffer t...@preferred.jp
Thanks for flagging this. I reverted the relevant YARN fix in Spark
1.2 release. We can try to debug this in master.
On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri,
to bypass the error.
This was caused by a local change, so no impact on the 1.2 release.
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Wednesday, November 26, 2014 8:17 AM
To: Judy Nash
Cc: Denny Lee; Cheng Lian; u...@spark.incubator.apache.org
Subject: Re
I recently posted instructions on loading Spark in Intellij from scratch:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA
You need to do a few extra steps for the YARN project to work.
Also, for questions like this that
Hi Judy,
Are you somehow modifying Spark's classpath to include jars from
Hadoop and Hive that you have running on the machine? The issue seems
to be that you are somehow including a version of Hadoop that
references the original guava package. The Hadoop that is bundled in
the Spark jars should
/Preconitions.checkArgument:(ZLjava/lang/Object;)V
50: invokestatic #502// Method
org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLjava/lang/Object;)V
On Wed, Nov 26, 2014 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote:
Hi Judy,
Are you somehow
It looks like you are trying to directly import the toLocalIterator
function. You can't import functions, it should just appear as a
method of an existing RDD if you have one.
- Patrick
On Thu, Nov 13, 2014 at 10:21 PM, Deep Pradhan
pradhandeep1...@gmail.com wrote:
Hi,
I am using Spark 1.0.0
The doc build appears to be broken in master. We'll get it patched up
before the release:
https://issues.apache.org/jira/browse/SPARK-4326
On Tue, Nov 11, 2014 at 10:50 AM, Alessandro Baretta
alexbare...@gmail.com wrote:
Nichols and Patrick,
Thanks for your help, but, no, it still does not
Hi There,
Because Akka versions are not binary compatible with one another, it
might not be possible to integrate Play with Spark 1.1.0.
- Patrick
On Tue, Nov 11, 2014 at 8:21 AM, Akshat Aranya aara...@gmail.com wrote:
Hi,
Sorry if this has been asked before; I didn't find a satisfactory
Hey Cheng,
Right now we aren't using stable API's to communicate with the Hive
Metastore. We didn't want to drop support for Hive 0.12 so right now
we are using a shim layer to support compiling for 0.12 and 0.13. This
is very costly to maintain.
If Hive has a stable meta-data API for talking to
Hey Jim,
There are some experimental (unstable) API's that support running jobs
which might short-circuit:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1126
This can be used for doing online aggregations like you are
describing. And in one
Hey Ryan,
I've found that filing issues with the Scala/Typesafe JIRA is pretty
helpful if the issue can be fully reproduced, and even sometimes
helpful if it can't. You can file bugs here:
https://issues.scala-lang.org/secure/Dashboard.jspa
The Spark SQL code in particular is typically the
It shows the amount of memory used to store RDD blocks, which are created
when you run .cache()/.persist() on an RDD.
On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, please take a look at the attached screen-shot. I wonders what's the
Memory Used column mean.
I
IIRC - the random is seeded with the index, so it will always produce
the same result for the same index. Maybe I don't totally follow
though. Could you give a small example of how this might change the
RDD ordering in a way that you don't expect? In general repartition()
will not preserve the
Spark will need to connect both to the hive metastore and to all HDFS
nodes (NN and DN's). If that is all in place then it should work. In
this case it looks like maybe it can't connect to a datanode in HDFS
to get the raw data. Keep in mind that the performance might not be
very good if you are
Hey Grzegorz,
EMR is a service that is not maintained by the Spark community. So
this list isn't the right place to ask EMR questions.
- Patrick
On Thu, Sep 18, 2014 at 3:19 AM, Grzegorz Białek
grzegorz.bia...@codilime.com wrote:
Hi,
I would like to run Spark application on Amazon EMR. I have
guess I would need to
override mapPartitions() directly within my RDD. Right?
On Tue, Sep 16, 2014 at 4:57 PM, Patrick Wendell pwend...@gmail.com wrote:
If each partition can fit in memory, you can do this using
mapPartitions and then building an inverse mapping within each
partition. You'd need
If each partition can fit in memory, you can do this using
mapPartitions and then building an inverse mapping within each
partition. You'd need to construct a hash map within each partition
yourself.
On Tue, Sep 16, 2014 at 4:27 PM, Akshat Aranya aara...@gmail.com wrote:
I have a use case where
Yeah that issue has been fixed by adding better docs, it just didn't make
it in time for the release:
https://github.com/apache/spark/blob/branch-1.1/make-distribution.sh#L54
On Thu, Sep 11, 2014 at 11:57 PM, Zhanfeng Huo huozhanf...@gmail.com
wrote:
resolved:
./make-distribution.sh --name
[moving to user@]
This would typically be accomplished with a union() operation. You
can't mutate an RDD in-place, but you can create a new RDD with a
union() which is an inexpensive operator.
On Fri, Sep 12, 2014 at 5:28 AM, Archit Thakur
archit279tha...@gmail.com wrote:
Hi,
We have a use
Hey SK,
Yeah, the documented format is the same (we expect users to add the
jar at the end) but the old spark-submit had a bug where it would
actually accept inputs that did not match the documented format. Sorry
if this was difficult to find!
- Patrick
On Fri, Sep 12, 2014 at 1:50 PM, SK
I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is
the second release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 171 developers!
This release brings operational and performance improvements in Spark
core including a new
I would say that the first three are all used pretty heavily. Mesos
was the first one supported (long ago), the standalone is the
simplest and most popular today, and YARN is newer but growing a lot
in activity.
SIMR is not used as much... it was designed mostly for environments
where users had
Changing this is not supported, it si immutable similar to other spark
configuration settings.
On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote:
Dear all:
Spark uses memory to cache RDD and the memory size is specified by
spark.storage.memoryFraction.
One the Executor starts,
Yeah - each batch will produce a new RDD.
On Wed, Aug 27, 2014 at 3:33 PM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
Thanks.
Just to double check, rdd.id would be unique for a batch in a DStream?
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng men...@gmail.com wrote:
You can use RDD
Hi All,
I want to invite users to submit to the Spark Powered By page. This page
is a great way for people to learn about Spark use cases. Since Spark
activity has increased a lot in the higher level libraries and people often
ask who uses each one, we'll include information about which
might land in Spark 1.2,
is that being tracked in https://issues.apache.org/jira/browse/SPARK-1823
or is there another ticket I should be following?
Thanks!
Andrew
On Tue, Aug 5, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com
wrote:
Hi Jens,
Within a partition things will spill
Yep - that's correct. As an optimization we save the shuffle output and
re-use if if you execute a stage twice. So this can make A:B tests like
this a bit confusing.
- Patrick
On Friday, August 22, 2014, Nieyuan qiushuiwuh...@gmail.com wrote:
Because map-reduce tasks like join will save
Your rdd2 and rdd3 differ in two ways so it's hard to track the exact
effect of caching. In rdd3, in addition to the fact that rdd will be
cached, you are also doing a bunch of extra random number generation. So it
will be hard to isolate the effect of caching.
On Wed, Aug 20, 2014 at 7:48 AM,
For large objects, it will be more efficient to broadcast it. If your array
is small it won't really matter. How many centers do you have? Unless you
are finding that you have very large tasks (and Spark will print a warning
about this), it could be okay to just reference it directly.
On Wed,
The reason is that some operators get pipelined into a single stage.
rdd.map(XX).filter(YY) - this executes in a single stage since there is no
data movement needed in between these operations.
If you call toDeubgString on the final RDD it will give you some
information about the exact lineage.
out sequentially on disk on one big file, you can call `sortByKey`
with a hashed suffix as well. The sort functions are externalized in Spark
1.1 (which is in pre-release).
- Patrick
On Tue, Aug 5, 2014 at 2:39 PM, Jens Kristian Geyti sp...@jkg.dk wrote:
Patrick Wendell wrote
In the latest
It seems possible that you are running out of memory unrolling a single
partition of the RDD. This is something that can cause your executor to
OOM, especially if the cache is close to being full so the executor doesn't
have much free memory left. How large are your executors? At the time of
BTW - the reason why the workaround could help is because when persisting
to DISK_ONLY, we explicitly avoid materializing the RDD partition in
memory... we just pass it through to disk
On Mon, Aug 4, 2014 at 1:10 AM, Patrick Wendell pwend...@gmail.com wrote:
It seems possible that you
For hortonworks, I believe it should work to just link against the
corresponding upstream version. I.e. just set the Hadoop version to 2.4.0
Does that work?
- Patrick
On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
wrote:
Hi,
Not sure whose issue this is, but if
Are you directly caching files from Hadoop or are you doing some
transformation on them first? If you are doing a groupBy or some type of
transformation, then you could be causing data skew that way.
On Sun, Aug 3, 2014 at 1:19 PM, iramaraju iramar...@gmail.com wrote:
I am running spark 1.0.0,
If you want to customize the logging behavior - the simplest way is to copy
conf/log4j.properties.tempate to conf/log4j.properties. Then you can go and
modify the log level in there. The spark shells should pick this up.
On Sun, Aug 3, 2014 at 6:16 AM, Sean Owen so...@cloudera.com wrote:
This is a Scala bug - I filed something upstream, hopefully they can fix it
soon and/or we can provide a work around:
https://issues.scala-lang.org/browse/SI-8772
- Patrick
On Fri, Aug 1, 2014 at 3:15 PM, Holden Karau hol...@pigscanfly.ca wrote:
Currently scala 2.10.2 can't be pulled in from
I've had intermiddent access to the artifacts themselves, but for me the
directory listing always 404's.
I think if sbt hits a 404 on the directory, it sends a somewhat confusing
error message that it can't download the artifact.
- Patrick
On Fri, Aug 1, 2014 at 3:28 PM, Shivaram Venkataraman
All of the scripts we use to publish Spark releases are in the Spark
repo itself, so you could follow these as a guideline. The publishing
process in Maven is similar to in SBT:
https://github.com/apache/spark/blob/master/dev/create-release/create-release.sh#L65
On Mon, Jul 28, 2014 at 12:39 PM,
Adding new build modules is pretty high overhead, so if this is a case
where a small amount of duplicated code could get rid of the
dependency, that could also be a good short-term option.
- Patrick
On Mon, Jul 14, 2014 at 2:15 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Yeah, I'd just add
I am happy to announce the availability of Spark 1.0.1! This release
includes contributions from 70 developers. Spark 1.0.0 includes fixes
across several areas of Spark, including the core API, PySpark, and
MLlib. It also includes new features in Spark's (alpha) SQL library,
including support for
There isn't currently a way to do this, but it will start dropping
older applications once more than 200 are stored.
On Wed, Jul 9, 2014 at 4:04 PM, Haopu Wang hw...@qilinsoft.com wrote:
Besides restarting the Master, is there any other way to clear the
Completed Applications in Master web UI?
It fulfills a few different functions. The main one is giving users a
way to inject Spark as a runtime dependency separately from their
program and make sure they get exactly the right version of Spark. So
a user can bundle an application and then use spark-submit to send it
to different types of
Hey Mikhail,
I think (hope?) the -em and -dm options were never in an official
Spark release. They were just in the master branch at some point. Did
you use these during a previous Spark release or were you just on
master?
- Patrick
On Wed, Jul 9, 2014 at 9:18 AM, Mikhail Strebkov
Hi There,
There is an issue with PySpark-on-YARN that requires users build with
Java 6. The issue has to do with how Java 6 and 7 package jar files
differently.
Can you try building spark with Java 6 and trying again?
- Patrick
On Fri, Jun 27, 2014 at 5:00 PM, sdeb sangha...@gmail.com wrote:
Hey There,
I'd like to start voting on this release shortly because there are a
few important fixes that have queued up. We're just waiting to fix an
akka issue. I'd guess we'll cut a vote in the next few days.
- Patrick
On Thu, Jun 19, 2014 at 10:47 AM, Mingyu Kim m...@palantir.com wrote:
Hi
I'll make a comment on the JIRA - thanks for reporting this, let's get
to the bottom of it.
On Thu, Jun 19, 2014 at 11:19 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
I've created an issue for this but if anyone has any advice, please let me
know.
Basically, on about 10 GBs of
Hey Jeremy,
This is patched in the 1.0 and 0.9 branches of Spark. We're likely to
make a 1.0.1 release soon (this patch being one of the main reasons),
but if you are itching for this sooner, you can just checkout the head
of branch-1.0 and you will be able to use r3.XXX instances.
- Patrick
On
By the way, in case it's not clear, I mean our maintenance branches:
https://github.com/apache/spark/tree/branch-1.0
On Tue, Jun 17, 2014 at 8:35 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey Jeremy,
This is patched in the 1.0 and 0.9 branches of Spark. We're likely to
make a 1.0.1
to lean in the
direction of Cassandra as the distributed data store...
On Wed, Jun 18, 2014 at 1:46 PM, Patrick Wendell pwend...@gmail.com wrote:
By the way, in case it's not clear, I mean our maintenance branches:
https://github.com/apache/spark/tree/branch-1.0
On Tue, Jun 17, 2014 at 8:35
These paths get passed directly to the Hadoop FileSystem API and I
think the support globbing out-of-the box. So AFAIK it should just
work.
On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi Jianshi,
I have used wild card characters (*) in my program and it
Out of curiosity - are you guys using speculation, shuffle
consolidation, or any other non-default option? If so that would help
narrow down what's causing this corruption.
On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Matt/Ryan,
Did you make any headway
I you run locally then Spark doesn't launch remote executors. However,
in this case you can set the memory with --spark-driver-memory flag to
spark-submit. Does that work?
- Patrick
On Mon, Jun 9, 2014 at 3:24 PM, Henggang Cui cuihengg...@gmail.com wrote:
Hi,
I'm trying to run the SimpleApp
Paul,
Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?
Just off the cuff, I wonder if this is related to:
https://issues.apache.org/jira/browse/SPARK-1520
If it is, it could appear that certain functions are not in
Also I should add - thanks for taking time to help narrow this down!
On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend...@gmail.com wrote:
Paul,
Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?
Just off
12:05
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
1560 06-08-14 12:05
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
Best.
-- Paul
—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend
In 1.0+ you can just pass the --executor-memory flag to ./bin/spark-shell.
On Fri, Jun 6, 2014 at 12:32 AM, Oleg Proudnikov
oleg.proudni...@gmail.com wrote:
Thank you, Hassan!
On 6 June 2014 03:23, hassan hellfire...@gmail.com wrote:
just use -Dspark.executor.memory=
--
View this
They are forked and slightly modified for two reasons:
(a) Hive embeds a bunch of other dependencies in their published jars
such that it makes it really hard for other projects to depend on
them. If you look at the hive-exec jar they copy a bunch of other
dependencies directly into this jar. We
Hey, thanks a lot for reporting this. Do you mind making a JIRA with
the details so we can track it?
- Patrick
On Wed, Jun 4, 2014 at 9:24 AM, Marek Wiewiorka
marek.wiewio...@gmail.com wrote:
Exactly the same story - it used to work with 0.9.1 and does not work
anymore with 1.0.0.
I ran tests
Hey There,
This is only possible in Scala right now. However, this is almost
never needed since the core API is fairly flexible. I have the same
question as Andrew... what are you trying to do with your RDD?
- Patrick
On Wed, Jun 4, 2014 at 7:49 AM, Andrew Ash and...@andrewash.com wrote:
Just
Hey Chirag,
Those init scripts are part of the Cloudera Spark package (they are
not in the Spark project itself) so you might try e-mailing their
support lists directly.
- Patrick
On Wed, Jun 4, 2014 at 7:19 AM, chirag lakhani chirag.lakh...@gmail.com wrote:
I recently spun up an AWS cluster
Hey Jeremy,
The issue is that you are using one of the external libraries and
these aren't actually packaged with Spark on the cluster, so you need
to create an uber jar that includes them.
You can look at the example here (I recently did this for a kafka
project and the idea is the same):
Hey Sam,
You mentioned two problems here, did your VPC error message get fixed
or only the key permissions problem?
I noticed we had some report a similar issue with the VPC stuff a long
time back (but there is no real resolution here):
https://spark-project.atlassian.net/browse/SPARK-1166
If
You can set an arbitrary properties file by adding --properties-file
argument to spark-submit. It would be nice to have spark-submit also
look in SPARK_CONF_DIR as well by default. If you opened a JIRA for
that I'm sure someone would pick it up.
On Tue, Jun 3, 2014 at 7:47 AM, Eugen Cepoi
.
-Simon
On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell pwend...@gmail.com
wrote:
As a debugging step, does it work if you use a single resource manager
with the key yarn.resourcemanager.address instead of using two named
resource managers? I wonder if somehow the YARN client can't
Hey There,
The issue was that the old behavior could cause users to silently
overwrite data, which is pretty bad, so to be conservative we decided
to enforce the same checks that Hadoop does.
This was documented by this JIRA:
https://issues.apache.org/jira/browse/SPARK-1100
-1677 is talking about
the same thing?
How about assigning it to me?
I think I missed the configuration part in my previous commit, though I
declared that in the PR description
Best,
--
Nan Zhu
On Monday, June 2, 2014 at 3:03 PM, Patrick Wendell wrote:
Hey There,
The issue
1 - 100 of 162 matches
Mail list logo