Re: Error while merge in delta table

2023-05-12 Thread Farhan Misarwala
PySpark app which writes in parallel. It would help if you can reproduce this and give us a shareable code snippet here. All the best, Farhan. On Fri, May 12, 2023 at 10:17 AM Karthick Nk wrote: > Hi Farhan, > Thank you for your response, I am using databricks with 11.3x-scala2.12. > &g

Re: Error while merge in delta table

2023-05-11 Thread Farhan Misarwala
if this is the case. Thanks, Farhan. On Thu, May 11, 2023 at 2:54 PM Jacek Laskowski wrote: > Hi Karthick, > > Sorry to say it but there's not enough "data" to help you. There should be > something more above or below this exception snippet you posted that could >

Re: Spark JDBC errors out

2021-05-02 Thread Farhan Misarwala
say about this. Thanks for looking into it :) Regards, Farhan. On Fri, Apr 30, 2021 at 7:01 PM Mich Talebzadeh wrote: > Hi Farhan, > > I have used it successfully and it works. The only thing that potentially > can cause this issue is the jdbc driver itself. Have you tried another &g

Re: Spark JDBC errors out

2021-04-30 Thread Farhan Misarwala
Hi Mich, I have tried this already. I am using the same methods you are using in my Java code. I see the same error, where 'dbtable' or 'query' gets added as a connection property in the JDBC connection string for the source db, which is AAS in my case. Thanks, Farhan. On Fri, Apr 30, 2021

Re: Record count query parallel processing in databricks spark delta lake

2020-01-19 Thread Farhan Misarwala
Hi Anbutech, If I am not mistaken, I believe you are trying to read multiple dataframes from around 150 different paths (in your case the Kafka topics) to count their records. You have all these paths stored in a CSV with columns year, month, day and hour. Here is what I came up with; I have

mllib kmeans produce 1 large and many extremely small clusters

2015-08-09 Thread farhan
I tried running mllib k-means with 20newsgroups data set from sklearn. On a 5000 document data set I get one cluster with most of the documents and other clusters just have handful of documents. #code newsgroups_train = fetch_20newsgroups(subset='train',random_state=1,remove=('headers',