Hi Darshan ,
When you get org.apache.spark.SparkException: Task not serializable
exception, it means that you are using a reference to an instance of a
non-serialize class inside a transformation.
Hope following link will help.
Hello,
I am getting the famous serialization exception on running some code as
below,
val correctColNameUDF = udf(getNewColumnName(_: String, false:
Boolean): String);
val charReference: DataFrame = thinLong.select("char_name_id",
"char_name").withColumn("columnNameInDimTable",
Thanks TD and Marco for the feedback.
The directory referenced by SPARK_LOCAL_DIRS did not exist. After creating
that directory, it worked.
This was the first time I was trying to run spark on standalone cluster, so
I missed it.
Thanks
On Fri, Feb 17, 2017 at 12:35 PM, Tathagata Das
one executor per Spark slave should be fine right I am not really sure what
benefit one would get by starting more executors (jvm's) on one node? End
of the day JVM creates native/kernel threads through system calls so if
those threads are spawned by one or multiple processes I dont see much
I found in some previous CDH versions that Spark starts only one executor
per Spark slave, and DECREASING the executor-cores in standalone makes the
total # of executors go up. Just my 2¢.
On Fri, Feb 17, 2017 at 5:20 PM, kant kodali wrote:
> Hi Satish,
>
> I am using spark
Hi Satish,
I am using spark 2.0.2. And no I have not passed those variables because I
didn't want to shoot in the dark. According to the documentation it looks
like SPARK_WORKER_CORES is the one which should do it. If not, can you
please explain how these variables inter play together?
Have you tried passing --executor-cores or –total-executor-cores as arguments,
, depending on the spark version?
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Friday, February 17, 2017 5:03 PM
To: Alex Kozlov
Cc: user @spark
Subject: Re: question
Standalone.
On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov wrote:
> What Spark mode are you running the program in?
>
> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali wrote:
>
>> when I submit a job using spark shell I get something like this
>>
>> [Stage
What Spark mode are you running the program in?
On Fri, Feb 17, 2017 at 4:55 PM, kant kodali wrote:
> when I submit a job using spark shell I get something like this
>
> [Stage 0:>(36814 + 4) / 220129]
>
>
> Now all I want is I want to increase number of parallel
when I submit a job using spark shell I get something like this
[Stage 0:>(36814 + 4) / 220129]
Now all I want is I want to increase number of parallel tasks running from
4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
conf/spark-env.sh. I though that should do
Hi, Abdelfatah,
How to you read these files? spark.read.parquet or spark.sql?
Could you show some code?
On Wed, Feb 15, 2017 at 8:47 PM, Ahmed Kamal Abdelfatah <
ahmed.abdelfa...@careem.com> wrote:
> Hi folks,
>
>
>
> How can I force spark sql to recursively get data stored in parquet format
>
I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang wrote:
> If it works under hive, do you try just create the DF from
Hi, Basu,
if all columns is separated by delimter "\t", csv parser might be a better
choice.
for example:
```scala
spark.read
.option("sep", "\t")
.option("header", fasle)
.option("inferSchema", true)
.csv("/user/root/spark_demo/scala/data/Stations.txt")
```
How do I increase readTimeoutMillis parameter in Spark-shell? because in
the middle of CassandraCount The job aborts with the following exception
java.io.IOException: Exception during execution of SELECT count(*) FROM
"test"."hello" WHERE token("cid") > ? AND token("cid") <= ? ALLOW
FILTERING:
I would like to understand spark Application UI's executor tab values better.
Are the values for Input, Shuffle Rad and Shuffle Write for sum of values
for all tasks across all stages?
If yes, then it appears that value isnt much of help while debugging?
Or am I missing the point of these
Seems like an issue with the HDFS you are using for checkpointing. Its not
able to write data properly.
On Thu, Feb 16, 2017 at 2:40 PM, shyla deshpande
wrote:
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> File
>
There's a JIRA here: https://issues.apache.org/jira/browse/SPARK-11638
I haven't had time to look at it.
On Thu, Feb 16, 2017 at 11:00 AM, cherryii wrote:
> I'm getting errors when I try to run my docker container in bridge
> networking
> mode on mesos.
> Here is my spark
Not sure I follow your question. Do you want to use ALS or GraphX?
Thank You,
Irving Duran
On Fri, Feb 17, 2017 at 7:07 AM, balaji9058 wrote:
> Hi,
>
> Where can i find the the ALS recommendation algorithm for large data set?
>
> Please feel to share your
Oops I think that should fix it. I am going to try it now..
Great catch! I feel like an idiot.
On Fri, Feb 17, 2017 at 10:02 AM, Russell Spitzer wrote:
> Great catch Anastasios!
>
> On Fri, Feb 17, 2017 at 9:59 AM Anastasios Zouzias
> wrote:
>
>>
Great catch Anastasios!
On Fri, Feb 17, 2017 at 9:59 AM Anastasios Zouzias
wrote:
> Hey,
>
> Can you try with the 2.11 spark-cassandra-connector? You just reported
> that you use spark-cassandra-connector*_2.10*-2.0.0-RC1.jar
>
> Best,
> Anastasios
>
> On Fri, Feb 17, 2017 at
Hey,
Can you try with the 2.11 spark-cassandra-connector? You just reported that
you use spark-cassandra-connector*_2.10*-2.0.0-RC1.jar
Best,
Anastasios
On Fri, Feb 17, 2017 at 6:40 PM, kant kodali wrote:
> Hi,
>
>
> val df =
Hi,
val df = spark.read.format("org.apache.spark.sql.cassandra").options(Map(
"table" -> "hello", "keyspace" -> "test" )).load()
This line works fine. I can see it actually pulled the table schema from
cassandra. however when I do
df.count I get the error below.
I am using the following
It depends how you salt it. See slide 40 and onwards from a spark summit
talk here: http://www.slideshare.net/cloudera/top-5-mistakes-
to-avoid-when-writing-apache-spark-applications The speakers use a mod8
integer salt appended to the end of the key, the salt that works best for
you might be
Hey Chris,
Thanks for your quick help. Actually the dataset had issues, otherwise the
logic I implemented was not wrong.
I did this -
1) *V.Imp *– Creating row by segregating columns after reading the tab
delimited file before converting into DF=
val stati = stat.map(x =>
Hi,
Where can i find the the ALS recommendation algorithm for large data set?
Please feel to share your ideas/algorithms/logic to build recommendation
engine by using spark graphx
Thanks in advance.
Thanks,
Balaji
--
View this message in context:
Hi Aakash,
You can try this:
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StringType, StructField, StructType}
val header = Array("col1", "col2", "col3", "col4")
val schema = StructType(header.map(StructField(_, StringType, true)))
val statRow = stat.map(line =>
Hi All,
object Step1 {
def main(args: Array[String]) = {
val sparkConf = new SparkConf().setAppName("my-app")
val sc = new SparkContext(sparkConf)
val hiveSqlContext: HiveContext = new
org.apache.spark.sql.hive.HiveContext(sc)
27 matches
Mail list logo