from:"Swapnil Shinde"

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Swapnil Shinde

Great news.. thank you very much! On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos < stavros.kontopou...@lightbend.com wrote: > Awesome! > > On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote: > >> Indeed! >> >> Sent from my iPhone >> Pardon the dumb thumb typos :) >> >> On Nov 8, 2018, at 11:31

Spark scala development in Sbt vs Maven

2018-03-05 Thread Swapnil Shinde

Hello SBT's incremental compilation was a huge plus to build spark+scala applications in SBT for some time. It seems Maven can also support incremental compilation with Zinc server. Considering that, I am interested to know communities experience - 1. Spark documentation says SBT is being used

Minimum cost flow problem solving in Spark

2017-09-13 Thread Swapnil Shinde

Hello Has anyone used Spark to solve minimum cost flow problems in Spark? I am quite new to combinatorial optimization algorithms so any help or suggestions, libraries are very appreciated. Thanks Swapnil

Re: Inconsistent results with combineByKey API

2017-09-05 Thread Swapnil Shinde

Ping.. Can someone please correct me whether this is an issue or not. - Swapnil On Thu, Aug 31, 2017 at 12:27 PM, Swapnil Shinde wrote: > Hello All > > I am observing some strange results with aggregateByKey API which is > implemented with combineByKey. Not sure if this is by d

Inconsistent results with combineByKey API

2017-08-31 Thread Swapnil Shinde

Hello All I am observing some strange results with aggregateByKey API which is implemented with combineByKey. Not sure if this is by design or bug - I created this toy example but same problem can be observed on large datasets as well - *case class ABC(key: Int, c1: Int, c2: Int)* *case class AB

CSV output with JOBUUID

2017-05-10 Thread Swapnil Shinde

Hello I am using spark-2.0.1 and saw that CSV fileformat stores output with JOBUUID in it. https://github.com/apache/spark/blob/v2.0.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala#L191 I want to avoid csv writing JOBUUID in it. Is there any property

Re: Huge partitioning job takes longer to close after all tasks finished

2017-03-08 Thread Swapnil Shinde

, 2017 at 10:47 PM, cht liu wrote: > Do you enable the spark fault tolerance mechanism, RDD run at the end of > the job, will start a separate job, to the checkpoint data written to the > file system before the persistence of high availability > > 2017-03-08 2:45 GMT+08:00 Swapnil Shin

Huge partitioning job takes longer to close after all tasks finished

2017-03-07 Thread Swapnil Shinde

Hello all I have a spark job that reads parquet data and partition it based on one of the columns. I made sure partitions equally distributed and not skewed. My code looks like this - datasetA.write.partitonBy("column1").parquet(outputPath) Execution plan - [image: Inline image 1] All tasks(~

Spark shuffle: FileNotFound exception

2016-12-03 Thread Swapnil Shinde

Hello All I am facing FileNotFoundException for shuffle index file when running job with large data. Same job runs fine with smaller datasets. These our my cluster specifications - No of nodes - 19 Total cores - 380 Memory per executor - 32G Spark 1.6 mapr version spark.shuffle.service.enabled

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Swapnil Shinde

; you will see it in explain. > > On Sat, Nov 26, 2016 at 10:51 AM, Swapnil Shinde > wrote: > >> Hello >> I am trying a broadcast join on dataframes but it is still doing >> SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold >> hig

Dataframe broadcast join hint not working

2016-11-26 Thread Swapnil Shinde

Hello I am trying a broadcast join on dataframes but it is still doing SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold higher but still no luck. Related piece of code- val c = a.join(braodcast(b), "id") On a side note, if I do SizeEstimator.estimate(b) and it

No plan for broadcastHint

2015-10-02 Thread Swapnil Shinde

Hello I am trying to do inner join with broadcastHint and getting below exception - I tried to increase "sqlContext.conf.autoBroadcastJoinThreshold" but still no luck. *Code snippet-* val dpTargetUvOutput = pweCvfMUVDist.as("a").join(broadcast(sourceAssgined.as("b")), $"a.web_id" === $"b.source_id

Re: Spark driver locality

2015-08-28 Thread Swapnil Shinde

of NFS workings should correct > if I am wrong. > > > On Fri, Aug 28, 2015 at 1:12 AM, Swapnil Shinde > wrote: > >> Thanks Rishitesh !! >> 1. I get that driver doesn't need to be on master but there is lot of >> communication between driver and cluster. That&

Re: Spark driver locality

2015-08-27 Thread Swapnil Shinde

> > On Thursday, August 27, 2015, Swapnil Shinde > wrote: > >> Hello >> I am new to spark world and started to explore recently in standalone >> mode. It would be great if I get clarifications on below doubts- >> >> 1. Driver locality - It is mentioned in documen

Spark driver locality

2015-08-27 Thread Swapnil Shinde

Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on below doubts- 1. Driver locality - It is mentioned in documentation that "client" deploy-mode is not good if machine running "spark-submit" is not co-located with worker m

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

Spark scala development in Sbt vs Maven

Minimum cost flow problem solving in Spark

Re: Inconsistent results with combineByKey API

Inconsistent results with combineByKey API

CSV output with JOBUUID

Re: Huge partitioning job takes longer to close after all tasks finished

Huge partitioning job takes longer to close after all tasks finished

Spark shuffle: FileNotFound exception

Re: Dataframe broadcast join hint not working

Dataframe broadcast join hint not working

No plan for broadcastHint

Re: Spark driver locality

Re: Spark driver locality

Spark driver locality

15 matches

Site Navigation

Mail list logo

Footer information