These paths get passed directly to the Hadoop FileSystem API and I
think the support globbing out-of-the box. So AFAIK it should just
work.
On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi Jianshi,
I have used wild card characters (*) in my program and it
Out of curiosity - are you guys using speculation, shuffle
consolidation, or any other non-default option? If so that would help
narrow down what's causing this corruption.
On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Matt/Ryan,
Did you make any headway
I'll make a comment on the JIRA - thanks for reporting this, let's get
to the bottom of it.
On Thu, Jun 19, 2014 at 11:19 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
I've created an issue for this but if anyone has any advice, please let me
know.
Basically, on about 10 GBs of
Hey There,
I'd like to start voting on this release shortly because there are a
few important fixes that have queued up. We're just waiting to fix an
akka issue. I'd guess we'll cut a vote in the next few days.
- Patrick
On Thu, Jun 19, 2014 at 10:47 AM, Mingyu Kim m...@palantir.com wrote:
Hi
Hi There,
There is an issue with PySpark-on-YARN that requires users build with
Java 6. The issue has to do with how Java 6 and 7 package jar files
differently.
Can you try building spark with Java 6 and trying again?
- Patrick
On Fri, Jun 27, 2014 at 5:00 PM, sdeb sangha...@gmail.com wrote
There isn't currently a way to do this, but it will start dropping
older applications once more than 200 are stored.
On Wed, Jul 9, 2014 at 4:04 PM, Haopu Wang hw...@qilinsoft.com wrote:
Besides restarting the Master, is there any other way to clear the
Completed Applications in Master web UI?
It fulfills a few different functions. The main one is giving users a
way to inject Spark as a runtime dependency separately from their
program and make sure they get exactly the right version of Spark. So
a user can bundle an application and then use spark-submit to send it
to different types of
Hey Mikhail,
I think (hope?) the -em and -dm options were never in an official
Spark release. They were just in the master branch at some point. Did
you use these during a previous Spark release or were you just on
master?
- Patrick
On Wed, Jul 9, 2014 at 9:18 AM, Mikhail Strebkov streb
I am happy to announce the availability of Spark 1.0.1! This release
includes contributions from 70 developers. Spark 1.0.0 includes fixes
across several areas of Spark, including the core API, PySpark, and
MLlib. It also includes new features in Spark's (alpha) SQL library,
including support for
Adding new build modules is pretty high overhead, so if this is a case
where a small amount of duplicated code could get rid of the
dependency, that could also be a good short-term option.
- Patrick
On Mon, Jul 14, 2014 at 2:15 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Yeah, I'd just add
All of the scripts we use to publish Spark releases are in the Spark
repo itself, so you could follow these as a guideline. The publishing
process in Maven is similar to in SBT:
https://github.com/apache/spark/blob/master/dev/create-release/create-release.sh#L65
On Mon, Jul 28, 2014 at 12:39 PM,
Hi,
We would like to use Spark SQL to store data in Parquet format and then
query that data using Impala.
We've tried to come up with a solution and it is working but it doesn't
seem good. So I was wondering if you guys could tell us what is the
correct way to do this. We are using Spark 1.0
insert data from SparkSQL into a Parquet table which can be
directly queried by Impala?
Best regards,
Patrick
On 1 August 2014 16:18, Patrick McGloin mcgloin.patr...@gmail.com wrote:
Hi,
We would like to use Spark SQL to store data in Parquet format and then
query that data using Impala
This is a Scala bug - I filed something upstream, hopefully they can fix it
soon and/or we can provide a work around:
https://issues.scala-lang.org/browse/SI-8772
- Patrick
On Fri, Aug 1, 2014 at 3:15 PM, Holden Karau hol...@pigscanfly.ca wrote:
Currently scala 2.10.2 can't be pulled in from
I've had intermiddent access to the artifacts themselves, but for me the
directory listing always 404's.
I think if sbt hits a 404 on the directory, it sends a somewhat confusing
error message that it can't download the artifact.
- Patrick
On Fri, Aug 1, 2014 at 3:28 PM, Shivaram Venkataraman
of the best
practice for loading data into Parquet tables. Is the way we are doing the
Spark part correct in your opinion?
Best regards,
Patrick
On 1 August 2014 19:32, Michael Armbrust mich...@databricks.com wrote:
So is the only issue that impala does not see changes until you refresh
If you want to customize the logging behavior - the simplest way is to copy
conf/log4j.properties.tempate to conf/log4j.properties. Then you can go and
modify the log level in there. The spark shells should pick this up.
On Sun, Aug 3, 2014 at 6:16 AM, Sean Owen so...@cloudera.com wrote:
/spark/pull/1165
A (potential) workaround would be to first persist your data to disk, then
re-partition it, then cache it. I'm not 100% sure whether that will work
though.
val a =
sc.textFile(s3n://some-path/*.json).persist(DISK_ONLY).repartition(larger
nr of partitions).cache()
- Patrick
On Fri
BTW - the reason why the workaround could help is because when persisting
to DISK_ONLY, we explicitly avoid materializing the RDD partition in
memory... we just pass it through to disk
On Mon, Aug 4, 2014 at 1:10 AM, Patrick Wendell pwend...@gmail.com wrote:
It seems possible that you
For hortonworks, I believe it should work to just link against the
corresponding upstream version. I.e. just set the Hadoop version to 2.4.0
Does that work?
- Patrick
On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
wrote:
Hi,
Not sure whose issue
Are you directly caching files from Hadoop or are you doing some
transformation on them first? If you are doing a groupBy or some type of
transformation, then you could be causing data skew that way.
On Sun, Aug 3, 2014 at 1:19 PM, iramaraju iramar...@gmail.com wrote:
I am running spark 1.0.0,
out sequentially on disk on one big file, you can call `sortByKey`
with a hashed suffix as well. The sort functions are externalized in Spark
1.1 (which is in pre-release).
- Patrick
On Tue, Aug 5, 2014 at 2:39 PM, Jens Kristian Geyti sp...@jkg.dk wrote:
Patrick Wendell wrote
In the latest
for a collection of
types I had.
Best regards,
Patrick
On 6 August 2014 07:58, Amit Kumar kumarami...@gmail.com wrote:
Hi All,
I am having some trouble trying to write generic code that uses sqlContext
and RDDs. Can you suggest what might be wrong?
class SparkTable[T : ClassTag](val
Your rdd2 and rdd3 differ in two ways so it's hard to track the exact
effect of caching. In rdd3, in addition to the fact that rdd will be
cached, you are also doing a bunch of extra random number generation. So it
will be hard to isolate the effect of caching.
On Wed, Aug 20, 2014 at 7:48 AM,
For large objects, it will be more efficient to broadcast it. If your array
is small it won't really matter. How many centers do you have? Unless you
are finding that you have very large tasks (and Spark will print a warning
about this), it could be okay to just reference it directly.
On Wed,
The reason is that some operators get pipelined into a single stage.
rdd.map(XX).filter(YY) - this executes in a single stage since there is no
data movement needed in between these operations.
If you call toDeubgString on the final RDD it will give you some
information about the exact lineage.
Yep - that's correct. As an optimization we save the shuffle output and
re-use if if you execute a stage twice. So this can make A:B tests like
this a bit confusing.
- Patrick
On Friday, August 22, 2014, Nieyuan qiushuiwuh...@gmail.com wrote:
Because map-reduce tasks like join will save
Hey Andrew,
We might create a new JIRA for it, but it doesn't exist yet. We'll create
JIRA's for the major 1.2 issues at the beginning of September.
- Patrick
On Mon, Aug 25, 2014 at 8:53 AM, Andrew Ash and...@andrewash.com wrote:
Hi Patrick,
For the spilling within on key work you mention
:
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
- Patrick
Yeah - each batch will produce a new RDD.
On Wed, Aug 27, 2014 at 3:33 PM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
Thanks.
Just to double check, rdd.id would be unique for a batch in a DStream?
On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng men...@gmail.com wrote:
You can use RDD
Changing this is not supported, it si immutable similar to other spark
configuration settings.
On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote:
Dear all:
Spark uses memory to cache RDD and the memory size is specified by
spark.storage.memoryFraction.
One the Executor starts,
I would say that the first three are all used pretty heavily. Mesos
was the first one supported (long ago), the standalone is the
simplest and most popular today, and YARN is newer but growing a lot
in activity.
SIMR is not used as much... it was designed mostly for environments
where users had
, and congratulations!
- Patrick
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
[moving to user@]
This would typically be accomplished with a union() operation. You
can't mutate an RDD in-place, but you can create a new RDD with a
union() which is an inexpensive operator.
On Fri, Sep 12, 2014 at 5:28 AM, Archit Thakur
archit279tha...@gmail.com wrote:
Hi,
We have a use
Hey SK,
Yeah, the documented format is the same (we expect users to add the
jar at the end) but the old spark-submit had a bug where it would
actually accept inputs that did not match the documented format. Sorry
if this was difficult to find!
- Patrick
On Fri, Sep 12, 2014 at 1:50 PM, SK
Yeah that issue has been fixed by adding better docs, it just didn't make
it in time for the release:
https://github.com/apache/spark/blob/branch-1.1/make-distribution.sh#L54
On Thu, Sep 11, 2014 at 11:57 PM, Zhanfeng Huo huozhanf...@gmail.com
wrote:
resolved:
./make-distribution.sh --name
If each partition can fit in memory, you can do this using
mapPartitions and then building an inverse mapping within each
partition. You'd need to construct a hash map within each partition
yourself.
On Tue, Sep 16, 2014 at 4:27 PM, Akshat Aranya aara...@gmail.com wrote:
I have a use case where
...@gmail.com wrote:
Patrick,
If I understand this correctly, I won't be able to do this in the closure
provided to mapPartitions() because that's going to be stateless, in the
sense that a hash map that I create within the closure would only be useful
for one call of MapPartitionsRDD.compute(). I
Hey Grzegorz,
EMR is a service that is not maintained by the Spark community. So
this list isn't the right place to ask EMR questions.
- Patrick
On Thu, Sep 18, 2014 at 3:19 AM, Grzegorz Białek
grzegorz.bia...@codilime.com wrote:
Hi,
I would like to run Spark application on Amazon EMR. I have
doesn't find the class. Here is the command:
sudo ./spark-submit --class aac.main.SparkDriver --master
spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar
Any pointers would be appreciated!
Best regards,
Patrick
FYI, in case anybody else has this problem, we switched to Spark 1.1
(outside CDH) and the same Spark application worked first time (once
recompiled with Spark 1.1 libs of course). I assume this is because Spark
1.1 is compiled with Hive.
On 29 September 2014 17:41, Patrick McGloin mcgloin.patr
IIRC - the random is seeded with the index, so it will always produce
the same result for the same index. Maybe I don't totally follow
though. Could you give a small example of how this might change the
RDD ordering in a way that you don't expect? In general repartition()
will not preserve the
Spark will need to connect both to the hive metastore and to all HDFS
nodes (NN and DN's). If that is all in place then it should work. In
this case it looks like maybe it can't connect to a datanode in HDFS
to get the raw data. Keep in mind that the performance might not be
very good if you are
do a mvn install first then (I
think) you can test sub-modules independently:
mvn test -pl streaming ...
- Patrick
On Wed, Oct 22, 2014 at 10:00 PM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
I started building Spark / running Spark tests this weekend and on maybe
5-10 occasions have run
It shows the amount of memory used to store RDD blocks, which are created
when you run .cache()/.persist() on an RDD.
On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, please take a look at the attached screen-shot. I wonders what's the
Memory Used column mean.
I
the following
error is logged by the worker who tries to use Akka Camel:
-- Forwarded message --
From: Patrick McGloin mcgloin.patr...@gmail.com
Date: 24 October 2014 15:09
Subject: Re: [akka-user] Akka Camel plus Spark Streaming
To: akka-u...@googlegroups.com
Hi Patrik,
Thanks
it is in the assembled jar file. Please see the mails below,
which I sent to the Akka group for details.
Is there something I am doing wrong? Is there a way to get the Akka
Cluster to load the reference.conf from Camel?
Any help greatly appreciated!
Best regards,
Patrick
On 27 October 2014 11:33, Patrick
/browse/SPARK-4114
This is a very important issue for Spark SQL, so I'd welcome comments
on that JIRA from anyone who is familiar with Hive/HCatalog internals.
- Patrick
On Mon, Oct 27, 2014 at 9:54 PM, Cheng, Hao hao.ch...@intel.com wrote:
Hi, all
I have some PRs blocked by hive upgrading
or two cases we've exposed functions that rely
on this:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L334
I would expect more robust support for online aggregation to show up
in a future version of Spark.
- Patrick
On Tue, Oct 28
The doc build appears to be broken in master. We'll get it patched up
before the release:
https://issues.apache.org/jira/browse/SPARK-4326
On Tue, Nov 11, 2014 at 10:50 AM, Alessandro Baretta
alexbare...@gmail.com wrote:
Nichols and Patrick,
Thanks for your help, but, no, it still does
Hi There,
Because Akka versions are not binary compatible with one another, it
might not be possible to integrate Play with Spark 1.1.0.
- Patrick
On Tue, Nov 11, 2014 at 8:21 AM, Akshat Aranya aara...@gmail.com wrote:
Hi,
Sorry if this has been asked before; I didn't find a satisfactory
It looks like you are trying to directly import the toLocalIterator
function. You can't import functions, it should just appear as a
method of an existing RDD if you have one.
- Patrick
On Thu, Nov 13, 2014 at 10:21 PM, Deep Pradhan
pradhandeep1...@gmail.com wrote:
Hi,
I am using Spark 1.0.0
Dear all,
Currently, I am running spark standalone cluster with ~100 nodes.
Multiple users can connect to the cluster by Spark-shell or PyShell.
However, I can't find an efficient way to control the resources among multiple
users.
I can set spark.deploy.defaultCores in the server side to
not do this.
- Patrick
On Wed, Nov 26, 2014 at 1:45 AM, Judy Nash
judyn...@exchange.microsoft.com wrote:
Looks like a config issue. I ran spark-pi job and still failing with the
same guava error
Command ran:
.\bin\spark-class.cmd org.apache.spark.deploy.SparkSubmit --class
/Preconitions.checkArgument:(ZLjava/lang/Object;)V
50: invokestatic #502// Method
org/spark-project/guava/common/base/Preconitions.checkArgument:(ZLjava/lang/Object;)V
On Wed, Nov 26, 2014 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote:
Hi Judy,
Are you somehow
I recently posted instructions on loading Spark in Intellij from scratch:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA
You need to do a few extra steps for the YARN project to work.
Also, for questions like this that
present it can cause issues.
On Sun, Nov 30, 2014 at 10:53 PM, Judy Nash
judyn...@exchange.microsoft.com wrote:
Thanks Patrick and Cheng for the suggestions.
The issue was Hadoop common jar was added to a classpath. After I removed
Hadoop common jar from both master and slave, I was able
Thanks for flagging this. I reverted the relevant YARN fix in Spark
1.2 release. We can try to debug this in master.
On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri,
Yeah the main way to do this would be to have your own static cache of
connections. These could be using an object in Scala or just a static
variable in Java (for instance a set of connections that you can
borrow from).
- Patrick
On Thu, Dec 4, 2014 at 5:26 PM, Tobias Pfeiffer t...@preferred.jp
.
- Patrick
On Fri, Dec 12, 2014 at 10:06 AM, Manoj Samel manojsamelt...@gmail.com wrote:
Thanks Marcelo.
Spark Gurus/Databricks team - do you have something in roadmap for such a
spark server ?
Thanks,
On Thu, Dec 11, 2014 at 5:43 PM, Marcelo Vanzin van...@cloudera.com wrote:
Oops, sorry
to produce a side effect and map for something that will
return a new dataset.
On Wed, Dec 17, 2014 at 5:43 AM, Gerard Maas gerard.m...@gmail.com wrote:
Patrick,
I was wondering why one would choose for rdd.map vs rdd.foreach to execute a
side-effecting function on an RDD.
-kr, Gerard.
On Sat, Dec 6
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is
the third release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 172 developers and more
than 1,000 commits!
This release brings operational and performance improvements in Spark
Is it sufficient to set spark.hadoop.validateOutputSpecs to false?
http://spark.apache.org/docs/latest/configuration.html
- Patrick
On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai saisai.s...@intel.com wrote:
Hi,
We have such requirements to save RDD output to HDFS with saveAsTextFile
like
alternatives. This is already pretty easy IMO.
- Patrick
On Wed, Dec 24, 2014 at 11:28 PM, Cheng, Hao hao.ch...@intel.com wrote:
I am wondering if we can provide more friendly API, other than configuration
for this purpose. What do you think Patrick?
Cheng Hao
-Original Message-
From
be referenced. If you are
seeing a large build up of shuffle data, it's possible you are
retaining references to older RDDs inadvertently. Could you explain
what your job actually doing?
- Patrick
On Mon, Dec 22, 2014 at 2:36 PM, Ganelin, Ilya
ilya.gane...@capitalone.com wrote:
Hi all, I have a long running
Hey Eric,
I'm just curious - which specific features in 1.2 do you find most
help with usability? This is a theme we're focusing on for 1.3 as
well, so it's helpful to hear what makes a difference.
- Patrick
On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
eric.d.fried...@gmail.com wrote:
Hi
Akhil,
Those are handled by ASF infrastructure, not anyone in the Spark
project. So this list is not the appropriate place to ask for help.
- Patrick
On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
My mails to the mailing list are getting rejected, have opened
It should appear in the page for any stage in which accumulators are updated.
On Wed, Jan 14, 2015 at 6:46 PM, Justin Yip yipjus...@prediction.io wrote:
Hello,
From accumulator documentation, it says that if the accumulator is named, it
will be displayed in the WebUI. However, I cannot find
partition.
- Patrick
On Wed, Feb 11, 2015 at 9:37 PM, fightf...@163.com fightf...@163.com wrote:
Hi,
Really have no adequate solution got for this issue. Expecting any available
analytical rules or hints.
Thanks,
Sun.
fightf...@163.com
From: fightf
Hadoop stack via something like YARN.
- Patrick
On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai saisai.s...@intel.com wrote:
Hi all,
I have some questions about the future development of Spark's standalone
resource scheduler. We've heard some users have the requirements to have
multi-tenant
I think there is a minor error here in that the first example needs a
tail after the seq:
df.map { row =
(row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double]))
}.toDataFrame(label, features)
On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust
mich...@databricks.com wrote:
It sounds like
You may need to add the -Phadoop-2.4 profile. When building or release
packages for Hadoop 2.4 we use the following flags:
-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
- Patrick
On Thu, Mar 5, 2015 at 12:47 PM, Kelly, Jonathan jonat...@amazon.com wrote:
I confirmed that this has nothing
We don't support expressions or wildcards in that configuration. For
each application, the local directories need to be constant. If you
have users submitting different Spark applications, those can each set
spark.local.dirs.
- Patrick
On Wed, Mar 11, 2015 at 12:14 AM, Jianshi Huang jianshi.hu
Hi All,
I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is
the fourth release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 172 developers and more
than 1,000 commits!
Visit the release notes [1] to read about the new features, or
have to be on the
internal form, not the user visible form.
On Tue, Mar 24, 2015 at 12:25 PM, Patrick Woody patrick.woo...@gmail.com
wrote:
Hey all,
Currently looking into UDTs and I was wondering if it is reasonable to
add the ability to define an Ordering (or if this is possible, then how
Hey Jim,
Thanks for reporting this. Can you give a small end-to-end code
example that reproduces it? If so, we can definitely fix it.
- Patrick
On Tue, Mar 24, 2015 at 4:55 PM, Jim Carroll jimfcarr...@gmail.com wrote:
I have code that works under 1.2.1 but when I upgraded to 1.3.0 it fails
I think we need to just update the docs, it is a bit unclear right
now. At the time, we made it worded fairly sternly because we really
wanted people to use --jars when we deprecated SPARK_CLASSPATH. But
there are other types of deployments where there is a legitimate need
to augment the classpath
not starting correctly.
- Patrick
On Mon, Feb 23, 2015 at 1:13 AM, Oleg Shirokikh o...@solver.com wrote:
Patrick,
I haven't changed the configs much. I just executed ec2-script to create 1
master, 2 slaves cluster. Then I try to submit the jobs from remote machine
leaving all defaults configured
into, but past 255, you run into underlying
limitations of the JVM https://issues.scala-lang.org/browse/SI-7324).
Best,
Patrick
On Thu, Feb 26, 2015 at 11:58 AM, anamika gupta anamika.guo...@gmail.com
wrote:
Hi Patrick
Thanks a ton for your in-depth answer. The compilation error is now
Added - thanks! I trimmed it down a bit to fit our normal description length.
On Mon, Jan 5, 2015 at 8:24 AM, Thomas Stone tho...@prediction.io wrote:
Please can we add PredictionIO to
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
PredictionIO
http://prediction.io/
I've added it, thanks!
On Fri, Feb 20, 2015 at 12:22 AM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
Could you please add Big Industries to the Powered by Spark page at
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ?
Company Name: Big Industries
URL:
reporting the result back to the driver.
This means you need to make sure the side-effects are idempotent, or
use some transactional locking. Spark's own output operations, such as
saving to Hadoop, use such mechanisms. For instance, in the case of
Hadoop it uses the OutputCommitter classes.
- Patrick
The source code should match the Spark commit
4aaf48d46d13129f0f9bdafd771dd80fe568a7dc. Do you see any differences?
On Fri, Mar 27, 2015 at 11:28 AM, Manoj Samel manojsamelt...@gmail.com wrote:
While looking into a issue, I noticed that the source displayed on Github
site does not matches the
an iterator.
- Patrick
On Thu, Mar 26, 2015 at 3:07 PM, Jonathan Coveney jcove...@gmail.com wrote:
This is just a deficiency of the api, imo. I agree: mapValues could
definitely be a function (K, V)=V1. The option isn't set by the function,
it's on the RDD. So you could look at the code and do
Hey all,
Currently looking into UDTs and I was wondering if it is reasonable to add
the ability to define an Ordering (or if this is possible, then how)?
Currently it will throw an error when non-Native types are used.
Thanks!
-Pat
warnings on the executors, not
the driver. Correct?
- Patrick
On Mon, Mar 23, 2015 at 10:21 AM, Martin Goodson mar...@skimlinks.com
wrote:
Have you tried to repartition() your original data to make more partitions
before you aggregate?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
that would make sense.
- Patrick
On Mon, Apr 13, 2015 at 8:19 AM, Jonathan Coveney jcove...@gmail.com wrote:
I'm surprised that I haven't been able to find this via google, but I
haven't...
What is the setting that requests some amount of disk space for the
executors? Maybe I'm misunderstanding
Hi Deepak - please direct this to the user@ list. This list is for
development of Spark itself.
On Sun, Apr 26, 2015 at 12:42 PM, Deepak Gopalakrishnan
dgk...@gmail.com wrote:
Hello All,
I'm trying to process a 3.5GB file on standalone mode using spark. I could
run my spark job succesfully on
: http://spark.apache.org/releases/spark-release-1-3-1.html
1.2.2: http://spark.apache.org/releases/spark-release-1-2-2.html
Comprehensive list of fixes:
1.3.1: http://s.apache.org/spark-1.3.1
1.2.2: http://s.apache.org/spark-1.2.2
Thanks to everyone who worked on these releases!
- Patrick
-images-td6752.html
Further, I'd like to have the imagery in HDFS rather than on the file
system to avoid I/O bottlenecks if possible!
Thanks for any ideas and advice!
-Patrick
the job
will fail if shuffle output exceeds memory.
- Patrick
On Wed, Jun 10, 2015 at 9:50 PM, Davies Liu dav...@databricks.com wrote:
If you have enough memory, you can put the temporary work directory in
tempfs (in memory file system).
On Wed, Jun 10, 2015 at 8:43 PM, Corey Nolet cjno
Hey all,
I've recently run into an issue where spark dynamicAllocation has asked for
-1 executors from YARN. Unfortunately, this raises an exception that kills
the executor-allocation thread and the application can't request more
resources.
Has anyone seen this before? It is spurious and the
In many cases the shuffle will actually hit the OS buffer cache and
not ever touch spinning disk if it is a size that is less than memory
on the machine.
- Patrick
On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com wrote:
So with this... to help my understanding of Spark under
Hey Sandy,
I'll test it out on 1.4. Do you have a bug number or PR that I could reference
as well?
Thanks!
-Pat
Sent from my iPhone
On Jun 13, 2015, at 11:38 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Patrick,
I'm noticing that you're using Spark 1.3.1. We fixed a bug in dynamic
Hey all,
Is it possible to reliably get the version string of a Spark cluster prior
to trying to connect via the SparkContext on the client side? Most of the
errors I've seen on mismatched versions have been cryptic, so it would be
helpful if I could throw an exception earlier.
I know it is
To somewhat answer my own question - it looks like an empty request to the
rest API will throw an error which returns the version in JSON as well.
Still not ideal though. Would there be any objection to adding a simple
version endpoint to the API?
On Sat, Jul 4, 2015 at 4:00 PM, Patrick Woody
Hi All,
I'm happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is
the fifth release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 210 developers and more
than 1,000 commits!
A huge thanks go to all of the individuals and organizations
-mail: user-h...@spark.apache.org
--
*-Barak*
--
Patrick Lam
Institute for Quantitative Social Science, Harvard University
http://www.patricklam.org
How can I tell if it's the sample stream or full stream ?
Thanks
Sent from my iPhone
On Jul 23, 2015, at 4:17 PM, Enno Shioji
eshi...@gmail.commailto:eshi...@gmail.com wrote:
You are probably listening to the sample stream, and THEN filtering. This means
you listen to 1% of the twitter
, Patrick McCarthy
pmccar...@eatonvance.commailto:pmccar...@eatonvance.com wrote:
How can I tell if it's the sample stream or full stream ?
Thanks
Sent from my iPhone
On Jul 23, 2015, at 4:17 PM, Enno Shioji
eshi...@gmail.commailto:eshi...@gmail.com wrote:
You are probably listening to the sample
101 - 200 of 338 matches
Mail list logo