Running Spark 2.2.1 with extra packages

2018-02-02 Thread Conconscious
Hi list, I have a Spark cluster with 3 nodes. I'm calling spark-shell with some packages to connect to AWS S3 and Cassandra: spark-shell \   --packages org.apache.hadoop:hadoop-aws:2.7.3,com.amazonaws:aws-java-sdk:1.7.4,datastax:spark-cassandra-connector:2.0.6-s_2.11 \   --conf spark.cassandra.co

Custom build - missing images on MasterWebUI

2018-01-25 Thread Conconscious
Hi list, I'm trying to make a custom build of Spark, but in the end on Web UI there's no images. Some help please. Build from: git checkout v2.2.1 ./dev/make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver -Pmesos -Pyarn -

Re: Spark querying C* in Scala

2018-01-23 Thread Conconscious
quot;true"))   .load()   .select("kafka") dfs.printSchema() Any way to put this schema in json? Thanks in advance On 22-01-2018 17:51, Sathish Kumaran Vairavelu wrote: > You have to register a Cassandra table in spark as dataframes > > > https://github.com/datastax/spa

Spark querying C* in Scala

2018-01-22 Thread Conconscious
Hi list, I have a Cassandra table with two fields; id bigint, kafka text My goal is to read only the kafka field (that is a JSON) and infer the schema Hi have this skeleton code (not working): sc.stop import org.apache.spark._ import com.datastax.spark._ import org.apache.spark.sql.functions.ge

Re: Python vs. Scala

2017-09-06 Thread Conconscious
Just run by yourself this test and check the results. During the run also check with top a worker. Python: import random def inside(p): x, y = random.random(), random.random() return x * x + y * y < 1 def estimate_pi(num_samples): count = sc.parallelize(xrange(0, num_samples)).filte

json in Cassandra to RDDs

2017-07-01 Thread Conconscious
Hi list, I'm using Cassandra with only 2 fields (id, json). I'm using Spark to query the json. Until now I can use a json file and query that file, but Cassandra and RDDs of the json field not yet. sc = spark.sparkContext path = "/home/me/red50k.json" redirectsDF = spark.read.json(path) redirects