Re: Will this use-case can be handled with spark-sql streaming and cassandra?
What exactly is your requirement? Is the read before write mandatory? Are you maintaining states in Cassandra? Regards Prathmesh Ranaut https://linkedin.com/in/prathmeshranaut > On Aug 29, 2019, at 3:35 PM, Shyam P wrote: > > > thanks Aayush. For every record I need to get the data from cassandra > table and update it ? Else it may not update the existing record. > > What is this datastax-spark-connector ? is that not a "Cassandra > connector library written for spark"? > If not , how to write ourselves. > Where and how to start ? Can you please guide me. > > > > Thank you. > Shyam > > > > > On Thu, Aug 29, 2019 at 5:03 PM Aayush Ranaut > wrote: > >> Cassandra is upsert, you should be able to do what you need with a single >> statement unless you’re looking to maintain counters. >> >> I’m not sure if there is a Cassandra connector library written for spark >> streaming because we wrote one ourselves when we wanted to do the same. >> >> Regards >> Prathmesh Ranaut >> https://linkedin.com/in/prathmeshranaut >> >> >> On Aug 29, 2019, at 7:21 AM, Shyam P >> wrote: >> >> >>> Hi, >>> I need to do a PoC for a business use-case. >>> >>> Use case : Need to update a record in Cassandra table if exists. >>> >>> Will spark streaming support compare each record and update existing >>> Cassandra record ? >>> >>> For each record received from kakfa topic , If I want to check and compare >>> each record whether its already there in Cassandra or not , if yes , update >>> the record else insert a new record. >>> >>> How can be this done using spark-structured streaming and cassandra? any >>> snippet or sample if you have. >>> >>> Thank you, >>> >>> Shyam >>> >>> >>> >>> >>> >> >> >
Re: Will this use-case can be handled with spark-sql streaming and cassandra?
Cassandra is upsert, you should be able to do what you need with a single statement unless you’re looking to maintain counters. I’m not sure if there is a Cassandra connector library written for spark streaming because we wrote one ourselves when we wanted to do the same. Regards Prathmesh Ranaut https://linkedin.com/in/prathmeshranaut > On Aug 29, 2019, at 7:21 AM, Shyam P wrote: > > Hi, > I need to do a PoC for a business use-case. > > Use case : Need to update a record in Cassandra table if exists. > > Will spark streaming support compare each record and update existing > Cassandra record ? > > For each record received from kakfa topic , If I want to check and compare > each record whether its already there in Cassandra or not , if yes , update > the record else insert a new record. > > How can be this done using spark-structured streaming and cassandra? any > snippet or sample if you have. > > Thank you, > > Shyam
Re: Long-Running Spark application doesn't clean old shuffle data correctly
This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps: spark.cleaner.periodicGC.interval spark.cleaner.referenceTracking spark.cleaner.referenceTracking.blocking.shuffle Regards Prathmesh Ranaut > On Jul 21, 2019, at 11:36 AM, Prathmesh Ranaut Gmail > wrote: > > > This is the job of ContextCleaner. There are few a property that you can > tweak to see if that helps: > spark.cleaner.periodicGC.interval > > spark.cleaner.referenceTracking > > spark.cleaner.referenceTracking.blocking.shuffle > > > > Regards > > Prathmesh Ranaut >> On Jul 21, 2019, at 11:31 AM, Alex Landa wrote: >> >> >> Hi, >> >> We are running a long running Spark application ( which executes lots of >> quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0. >> We see that old shuffle files ( a week old for example ) are not deleted >> during the execution of the application, which leads to out of disk space >> errors on the executor. >> If we re-deploy the application, the Spark cluster take care of the cleaning >> and deletes the old shuffle data (since we have >> /-Dspark.worker.cleanup.enabled=true/ in the worker config). >> I don't want to re-deploy our app every week or two, but to be able to >> configure spark to clean old shuffle data (as it should). >> >> How can I configure Spark to delete old shuffle data during the life time of >> the application (not after)? >> >> >> Thanks, >> Alex
Re: Spark Write method not ignoring double quotes in the csv file
Question 2: You might be creating a dataframe while reading a parquet file. df = spark.read.load(“file.parquet”) df.select(rtrim(“columnName”)); Regards Prathmesh Ranaut https://linkedin.com/in/prathmeshranaut > On Jul 12, 2019, at 9:15 AM, anbutech wrote: > > Hello All, Could you please help me to fix the below questions > > Question 1: > > I have tried the below options while writing the final data in a csv file to > ignore double quotes in the same csv file .nothing is worked. I'm using > spark version 2.2 and scala version 2.11 . > > option("quote", "\"") > > .option("escape", ":") > > .option("escape", "") > > .option("quote", "\u") > > Code: > > finaldataset > > .repartitions(numberofpartitions) > > .mode(Savemode.overwrite) > > .option("delimiter","|") > > .option("header","true") > > .csv("path") > > output_data.csv > > field|field2|""|field4|field5|""|field6|""|field7 > > I want to remove double quotes in the csv file while writing spark method.is > there any options available? > > Question 2: Is there any way to remove the trailing white spaces in the > fields while reading the parquet file. > > Thanks Anbu > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org