Re: How to use spark-on-k8s pod template?

2019-11-08 Thread David Mitchell
Are you using Spark 2.3 or above? See the documentation: https://spark.apache.org/docs/latest/running-on-kubernetes.html I looks like you do not need: --conf spark.kubernetes.driver.podTemplateFile='/spark-pod-template.yaml' \ --conf

Re: What benefits do we really get out of colocation?

2016-12-03 Thread David Mitchell
To get a node local read from Spark to Cassandra, one has to use a read consistency level of LOCAL_ONE. For some use cases, this is not an option. For example, if you need to use a read consistency level of LOCAL_QUORUM, as many use cases demand, then one is not going to get a node local read.

Re: How to avoid Spark shuffle spill memory?

2015-10-06 Thread David Mitchell
Hi unk1102, Try adding more memory to your nodes. Are you running Spark in the cloud? If so, increase the memory on your servers. Do you have default parallelism set (spark.default.parallelism)? If so, unset it, and let Spark decided how many partitions to allocate. You can also try refactoring

Re: submit_spark_job_to_YARN

2015-08-30 Thread David Mitchell
Hi Ajay, Are you trying to save to your local file system or to HDFS? // This would save to HDFS under /user/hadoop/counter counter.saveAsTextFile(/user/hadoop/counter); David On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander itsche...@gmail.com wrote: Hi Everyone, Recently we have installed

Re: No. of Task vs No. of Executors

2015-07-18 Thread David Mitchell
This is likely due to data skew. If you are using key-value pairs, one key has a lot more records, than the other keys. Do you have any groupBy operations? David On Tue, Jul 14, 2015 at 9:43 AM, shahid sha...@trialx.com wrote: hi I have a 10 node cluster i loaded the data onto hdfs, so

Re: Spark performance

2015-07-11 Thread David Mitchell
You can certainly query over 4 TB of data with Spark. However, you will get an answer in minutes or hours, not in milliseconds or seconds. OLTP databases are used for web applications, and typically return responses in milliseconds. Analytic databases tend to operate on large data sets, and

Re: spark sql - reading data from sql tables having space in column names

2015-06-02 Thread David Mitchell
I am having the same problem reading JSON. There does not seem to be a way of selecting a field that has a space, Executor Info from the Spark logs. I suggest that we open a JIRA ticket to address this issue. On Jun 2, 2015 10:08 AM, ayan guha guha.a...@gmail.com wrote: I would think the

ORCFiles

2015-04-24 Thread David Mitchell
Does anyone know in which version of Spark will there be support for ORCFiles via spark.sql.hive? Will it be in 1.4? David

Re: Spark Release 1.3.0 DataFrame API

2015-03-15 Thread David Mitchell
14, 2015 at 5:33 PM, David Mitchell jdavidmitch...@gmail.com wrote: I am pleased with the release of the DataFrame API. However, I started playing with it, and neither of the two main examples in the documentation work: http://spark.apache.org/docs/1.3.0/sql-programming-guide.html

Spark Release 1.3.0 DataFrame API

2015-03-14 Thread David Mitchell
I am pleased with the release of the DataFrame API. However, I started playing with it, and neither of the two main examples in the documentation work: http://spark.apache.org/docs/1.3.0/sql-programming-guide.html Specfically: - Inferring the Schema Using Reflection - Programmatically