Spark - Mesos HTTP API

2016-04-03 Thread John Omernik
Hey all, are there any plans to implement the Mesos HTTP API rather than native libs? The reason I ask is I am trying to run an application in a docker container (Zeppelin) that would use Spark connecting to Mesos, but I am finding that using the NATIVE_LIB from docker is difficult or would

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
I am afraid print(Integer.MAX_VALUE) does not return any lines! However, print(1000) does Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: multiple splits fails

2016-04-03 Thread Ted Yu
Since num is an Int, you can specify Integer.MAX_VALUE FYI On Sun, Apr 3, 2016 at 10:58 AM, Mich Talebzadeh wrote: > Thanks Ted. > > As I see print() materializes the first 10 rows. On the other hand > print(n) will materialise n rows. > > How about if I wanted to

GenericRowWithSchema to case class

2016-04-03 Thread asethia
Hi, My Cassandra table has custom user defined say example: CREATE TYPE address ( addressline1 text, addressline2 text, city text, state text, country text, pincode text ) create table person ( id text, name text, addresses set>, PRIMARY KEY (id)); val

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Thanks Ted. As I see print() materializes the first 10 rows. On the other hand print(n) will materialise n rows. How about if I wanted to materialize all rows? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: multiple splits fails

2016-04-03 Thread Ted Yu
Mich: See the following method of DStream: * Print the first num elements of each RDD generated in this DStream. This is an output * operator, so this DStream will be registered as an output stream and there materialized. */ def print(num: Int): Unit = ssc.withScope { On Sun, Apr 3,

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Thanks Ted This works scala> val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topic) messages: org.apache.spark.streaming.dstream.InputDStream[(String, String)] = org.apache.spark.streaming.kafka.DirectKafkaInputDStream@3bfc0063 scala>

Re: spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
Cool. My users tend to interact with the driver via iPython Notebook, so clearly I’ll have to leave (fairly significant amounts of) ram for that. But I should be able to write a one liner into the spark-env.sh that will determine whether it’s on a 128 or 256GB node and have it size itself

RE: spark.driver.memory meaning

2016-04-03 Thread Yong Zhang
In the standalone mode, it applies to the Driver JVM processor heap size. You should consider giving enough memory space to it, in standalone mode, due to: 1) Any data you bring back to the driver will store in it, like RDD.collect or DF.show2) The Driver also host a web UI for the application

Re: multiple splits fails

2016-04-03 Thread Ted Yu
bq. is not a member of (String, String) As shown above, contains shouldn't be applied directly on a tuple. Choose the element of the tuple and then apply contains on it. On Sun, Apr 3, 2016 at 7:54 AM, Mich Talebzadeh wrote: > Thank you gents. > > That should "\n"

spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
In the spark-env.sh example file, the comments indicate that the spark.driver.memory is the memory for the master in YARN mode. None of that actually makes any sense… In any case, I’m using spark in a standalone mode, running the driver on a separate machine from the master. I have a few

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Thank you gents. That should "\n" as carriage return OK I am using spark streaming to analyse the message It does the streaming import _root_.kafka.serializer.StringDecoder import org.apache.spark.SparkConf import org.apache.spark.streaming._ import org.apache.spark.streaming.kafka.KafkaUtils

Re: multiple splits fails

2016-04-03 Thread Ted Yu
bq. split"\t," splits the filter by carriage return Minor correction: "\t" denotes tab character. On Sun, Apr 3, 2016 at 7:24 AM, Eliran Bivas wrote: > Hi Mich, > > 1. The first underscore in your filter call is refering to a line in the > file (as textFile() results in a

Re: multiple splits fails

2016-04-03 Thread Eliran Bivas
Hi Mich, 1. The first underscore in your filter call is refering to a line in the file (as textFile() results in a collection of strings) 2. You're correct. No need for it. 3. Filter is expecting a Boolean result. So you can merge your contains filters to one with AND (&&) statement. 4.

Evicting a lower version of a library loaded in Spark Worker

2016-04-03 Thread Yuval.Itzchakov
My code uses "com.typesafe.config" in order to read configuration values. Currently, our code uses v1.3.0, whereas Spark uses 1.2.1 internally. When I initiate a job, the worker process invokes a method in my code but fails, because it's defined abstract in v1.2.1 whereas in v1.3.0 it is not. The

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Hi Eliran, Many thanks for your input on this. I thought about what I was trying to achieve so I rewrote the logic as follows: 1. Read the text file in 2. Filter out empty lines (well not really needed here) 3. Search for lines that contain "ASE 15" and further have sentence

Re: multiple splits fails

2016-04-03 Thread Eliran Bivas
Hi Mich, Few comments: When doing .filter(_ > “”) you’re actually doing a lexicographic comparison and not filtering for empty lines (which could be achieved with _.notEmpty or _.length > 0). I think that filtering with _.contains should be sufficient and the first filter can be omitted. As

multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Hi, I am not sure this is the correct approach Read a text file in val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt") Now I want to get rid of empty lines and filter only the lines that contain "ASE15" f.filter(_ > "").filter(_ contains("ASE15")). The above works but I am not sure whether I

Re: value saveToCassandra is not a member of org.apache.spark.streaming.dstream.DStream[(String, Int)]

2016-04-03 Thread Yuval.Itzchakov
You need to import com.datastax.spark.connector.streaming._ to have the methods available. https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md -- View this message in context:

RDDs caching in typical machine learning use cases

2016-04-03 Thread Sergey
Hi Spark ML experts! Do you use RDDs caching somewhere together with ML lib to speed up calculation? I mean typical machine learning use cases. Train-test split, train, evaluate, apply model. Sergey.

Random forest implementation details

2016-04-03 Thread Sergey
Hi! I'm playing with random forest implementation in Apache Spark. First impression is - it is not fast :-( Does somebody know how random forest is parallelized in Spark? I mean both fitting and predicting. And also what do mean this parameters? Didn't find documentation for them.

Re: spark-shell failing but pyspark works

2016-04-03 Thread Mich Talebzadeh
This works fine for me val sparkConf = new SparkConf(). setAppName("StreamTest"). setMaster("yarn-client"). set("spark.cores.max", "12"). set("spark.driver.allowMultipleContexts", "true"). set("spark.hadoop.validateOutputSpecs",