from:"\"Shuai Lin\""

Questions about Spark On Mesos

2016-03-15 Thread Shuai Lin

Hi list, We (scrapinghub) are planning to deploy spark in a 10+ node cluster, mainly for processing data in HDFS and kafka streaming. We are thinking of using mesos instead of yarn as the cluster resource manager so we can use docker container as the executor and makes deployment easier. But there

Poor performance of using spark sql over gzipped json files

2016-06-24 Thread Shuai Lin

Hi, We have tried to use spark sql to process some gzipped json-format log files stored on S3 or HDFS. But the performance is very poor. For example, here is the code that I run over 20 gzipped files (total size of them is 4GB compressed and ~40GB when decompressed) gzfile = 's3n://my-logs-bucke

Re: KEYS file?

2016-07-10 Thread Shuai Lin

Not sure where you see " 0x7C6C105FFC8ED089". I think the release is signed with the key https://people.apache.org/keys/committer/pwendell.asc . I think this tutorial can be helpful: http://www.apache.org/info/verification.html On Mon, Jul 11, 2016 at 12:57 AM, Phil Steitz wrote: > I can't seem

Re: KEYS file?

2016-07-10 Thread Shuai Lin

> > at least links to the keys used to sign releases on the > download page +1 for that. On Mon, Jul 11, 2016 at 3:35 AM, Phil Steitz wrote: > On 7/10/16 10:57 AM, Shuai Lin wrote: > > Not sure where you see " 0x7C6C105FFC8ED089". I > > That's the

Re: StreamingKmeans Spark doesn't work at all

2016-07-10 Thread Shuai Lin

I would suggest you run the scala version of the example first, so you can tell whether it's a problem of the data you provided or a problem of the java code. On Mon, Jul 11, 2016 at 2:37 AM, Biplob Biswas wrote: > Hi, > > I know i am asking again, but I tried running the same thing on mac as >

Re: Dependencies with runing Spark Streaming on Mesos cluster using Python

2016-07-13 Thread Shuai Lin

I think there are two options for you: First you can set `--conf spark.mesos.executor.docker.image= adolphlwq/mesos-for-spark-exector-image:1.6.0.beta2` in your spark submit args, so mesos would launch the executor with your custom image. Or you can remove the `local:` prefix in the --jars flag,

Re: Saving a pyspark.ml.feature.PCA model

2016-07-19 Thread Shuai Lin

It's added in not-released-yet 2.0.0 version. https://issues.apache.org/jira/browse/SPARK-13036 https://github.com/apache/spark/commit/83302c3b so i guess you need to wait for 2.0 release (or use the current rc4). On Wed, Jul 20, 2016 at 6:54 AM, Ajinkya Kale wrote: > Is there a way to save a

Re: mesos or kubernetes ?

2016-08-13 Thread Shuai Lin

Good summary! One more advantage of running spark on mesos: community support. There are quite a big user base that runs spark on mesos, so if you encounter a problem with your deployment, it's very likely you can get the answer by a simple google search, or asking in the spark/mesos user list. By

Re: Dynamic resource allocation to Spark on Mesos

2017-01-28 Thread Shuai Lin

> > An alternative behavior is to launch the job with the best resource offer > Mesos is able to give Michael has just made an excellent explanation about dynamic allocation support in mesos. But IIUC, what you want to achieve is something like (using RAM as an example) : "Launch each executor wi

Re: Cached table details

2017-01-28 Thread Shuai Lin

+1 for Jacek's suggestion FWIW: another possible *hacky* way is to write a package in org.apache.spark.sql namespace so it can access the sparkSession.sharedState.cacheManager. Then use scala reflection to read the cache manager's `cachedData` field, which can provide the list of cached relations.

Re: Contributed to spark

2017-04-08 Thread Shuai Lin

Links that was helpful to me during learning about the spark source code: - Articles with "spark" tag in this blog: http://hydronitrogen.com/tag/spark.html - Jacek's "mastering apache spark" git book: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Hope those can help. On Sat, Apr 8,

Questions about Spark On Mesos

Poor performance of using spark sql over gzipped json files

Re: KEYS file?

Re: KEYS file?

Re: StreamingKmeans Spark doesn't work at all

Re: Dependencies with runing Spark Streaming on Mesos cluster using Python

Re: Saving a pyspark.ml.feature.PCA model

Re: mesos or kubernetes ?

Re: Dynamic resource allocation to Spark on Mesos

Re: Cached table details

Re: Contributed to spark

11 matches

Site Navigation

Mail list logo

Footer information