spark-shell - modes

2017-08-06 Thread karan alang
Hello all - i'd a basic question on the modes in which spark-shell can be run .. when i run the following command, does Spark run in local mode i.e. outside of YARN & using the local cores ? (since '--master' option is missing) ./bin/spark-shell --driver-memory 512m --executor-memory 512m

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread Cody Koeninger
If your complaint is about offsets being committed that you didn't expect... auto commit being false on executors shouldn't have anything to do with that. Executors shouldn't be auto-committing, that's why it's being overridden. What you've said and the code you posted isn't really enough to

Re: spark-shell - modes

2017-08-06 Thread karan alang
update - seems 'spark-shell' does not support mode -> yarn-cluster (i guess since it is an interactive shell) The only modes supported include -> yarn-client & local Pls let me know if my understanding is incorrect. Thanks! On Sun, Aug 6, 2017 at 10:07 AM, karan alang

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
Thanks Cody for your response. All I want to do is, commit the offsets only if I am successfully able to write to cassandra database. The line //save the rdd to Cassandra database is rdd.map { record => ()}.saveToCassandra("kayspace1", "table1") What do you mean by Executors shouldn't be

Re: SPARK Issue in Standalone cluster

2017-08-06 Thread Marco Mistroni
Sengupta further to this, if you try the following notebook in databricks cloud, it will read a .csv file , write to a parquet file and read it again (just to count the number of rows stored) Please note that the path to the csv file might differ for you. So, what you will need todo is 1 -

Re: SPARK Issue in Standalone cluster

2017-08-06 Thread Gourav Sengupta
Hi Marco, thanks a ton, I will surely use those alternatives. Regards, Gourav Sengupta On Sun, Aug 6, 2017 at 3:45 PM, Marco Mistroni wrote: > Sengupta > further to this, if you try the following notebook in databricks cloud, > it will read a .csv file , write to a

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread Cody Koeninger
I mean that the kafka consumers running on the executors should not be automatically committing, because the fact that a message was read by the consumer has no bearing on whether it was actually successfully processed after reading. It sounds to me like you're confused about where code is

Unsubscribe

2017-08-06 Thread 郭鹏飞
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How can i split dataset to multi dataset

2017-08-06 Thread Jone Zhang
val schema = StructType( Seq( StructField("app", StringType, nullable = true), StructField("server", StringType, nullable = true), StructField("file", StringType, nullable = true), StructField("...", StringType, nullable = true) ) ) val row =

Re: How can i split dataset to multi dataset

2017-08-06 Thread Deepak Sharma
This can be mapped as below: dataset.map(x=>((x(0),x(1),x(2)),x) This works with Dataframe of rows but i haven't tried with dataset Thanks Deepak On Mon, Aug 7, 2017 at 8:21 AM, Jone Zhang wrote: > val schema = StructType( > Seq( > StructField("app",

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
rdd.map { record => ()}.saveToCassandra("keyspace1", "table1") --> is running on executor stream1.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges) --> is running on driver. Is this the reason why kafka offsets are committed even when an exception is raised? If so is there a way to

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
Thanks again Cody, My understanding is all the code inside foreachRDD is running on the driver except for rdd.map { record => ()}.saveToCassandra("keyspace1", "table1"). When the exception is raised, I was thinking I won't be committing the offsets, but the offsets are committed all the time

spark-shell not getting launched - Queue's AM resource limit exceeded.

2017-08-06 Thread karan alang
Hello - i've HDP 2.5.x and i'm trying to launch spark-shell .. ApplicationMaster gets launched, but YARN is not able to assign containers. *Command ->* ./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m *Error ->* [Sun Aug 06 19:33:29 + 2017] Application is