Hi list,
We (scrapinghub) are planning to deploy spark in a 10+ node cluster, mainly
for processing data in HDFS and kafka streaming. We are thinking of using
mesos instead of yarn as the cluster resource manager so we can use docker
container as the executor and makes deployment easier. But there
Hi,
We have tried to use spark sql to process some gzipped json-format log
files stored on S3 or HDFS. But the performance is very poor.
For example, here is the code that I run over 20 gzipped files (total size
of them is 4GB compressed and ~40GB when decompressed)
gzfile = 's3n://my-logs-bucke
Not sure where you see " 0x7C6C105FFC8ED089". I think the release is signed
with the key https://people.apache.org/keys/committer/pwendell.asc .
I think this tutorial can be helpful:
http://www.apache.org/info/verification.html
On Mon, Jul 11, 2016 at 12:57 AM, Phil Steitz wrote:
> I can't seem
>
> at least links to the keys used to sign releases on the
> download page
+1 for that.
On Mon, Jul 11, 2016 at 3:35 AM, Phil Steitz wrote:
> On 7/10/16 10:57 AM, Shuai Lin wrote:
> > Not sure where you see " 0x7C6C105FFC8ED089". I
>
> That's the
I would suggest you run the scala version of the example first, so you can
tell whether it's a problem of the data you provided or a problem of the
java code.
On Mon, Jul 11, 2016 at 2:37 AM, Biplob Biswas
wrote:
> Hi,
>
> I know i am asking again, but I tried running the same thing on mac as
>
I think there are two options for you:
First you can set `--conf spark.mesos.executor.docker.image=
adolphlwq/mesos-for-spark-exector-image:1.6.0.beta2` in your spark submit
args, so mesos would launch the executor with your custom image.
Or you can remove the `local:` prefix in the --jars flag,
It's added in not-released-yet 2.0.0 version.
https://issues.apache.org/jira/browse/SPARK-13036
https://github.com/apache/spark/commit/83302c3b
so i guess you need to wait for 2.0 release (or use the current rc4).
On Wed, Jul 20, 2016 at 6:54 AM, Ajinkya Kale wrote:
> Is there a way to save a
Good summary! One more advantage of running spark on mesos: community
support. There are quite a big user base that runs spark on mesos, so if
you encounter a problem with your deployment, it's very likely you can get
the answer by a simple google search, or asking in the spark/mesos user
list. By
>
> An alternative behavior is to launch the job with the best resource offer
> Mesos is able to give
Michael has just made an excellent explanation about dynamic allocation
support in mesos. But IIUC, what you want to achieve is something like
(using RAM as an example) : "Launch each executor wi
+1 for Jacek's suggestion
FWIW: another possible *hacky* way is to write a package
in org.apache.spark.sql namespace so it can access the
sparkSession.sharedState.cacheManager. Then use scala reflection to read
the cache manager's `cachedData` field, which can provide the list of
cached relations.
Links that was helpful to me during learning about the spark source code:
- Articles with "spark" tag in this blog:
http://hydronitrogen.com/tag/spark.html
- Jacek's "mastering apache spark" git book:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Hope those can help.
On Sat, Apr 8,
11 matches
Mail list logo