Great news.. thank you very much!
On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com wrote:
> Awesome!
>
> On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote:
>
>> Indeed!
>>
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>>
>> On Nov 8, 2018, at 11:31
Hello
SBT's incremental compilation was a huge plus to build spark+scala
applications in SBT for some time. It seems Maven can also support
incremental compilation with Zinc server. Considering that, I am interested
to know communities experience -
1. Spark documentation says SBT is being used
Hello
Has anyone used Spark to solve minimum cost flow problems in Spark? I
am quite new to combinatorial optimization algorithms so any help or
suggestions, libraries are very appreciated.
Thanks
Swapnil
Ping.. Can someone please correct me whether this is an issue or not.
-
Swapnil
On Thu, Aug 31, 2017 at 12:27 PM, Swapnil Shinde
wrote:
> Hello All
>
> I am observing some strange results with aggregateByKey API which is
> implemented with combineByKey. Not sure if this is by d
Hello All
I am observing some strange results with aggregateByKey API which is
implemented with combineByKey. Not sure if this is by design or bug -
I created this toy example but same problem can be observed on large
datasets as well -
*case class ABC(key: Int, c1: Int, c2: Int)*
*case class AB
Hello
I am using spark-2.0.1 and saw that CSV fileformat stores output with
JOBUUID in it.
https://github.com/apache/spark/blob/v2.0.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala#L191
I want to avoid csv writing JOBUUID in it. Is there any property
, 2017 at 10:47 PM, cht liu wrote:
> Do you enable the spark fault tolerance mechanism, RDD run at the end of
> the job, will start a separate job, to the checkpoint data written to the
> file system before the persistence of high availability
>
> 2017-03-08 2:45 GMT+08:00 Swapnil Shin
Hello all
I have a spark job that reads parquet data and partition it based on one
of the columns. I made sure partitions equally distributed and not skewed.
My code looks like this -
datasetA.write.partitonBy("column1").parquet(outputPath)
Execution plan -
[image: Inline image 1]
All tasks(~
Hello All
I am facing FileNotFoundException for shuffle index file when running
job with large data. Same job runs fine with smaller datasets. These our my
cluster specifications -
No of nodes - 19
Total cores - 380
Memory per executor - 32G
Spark 1.6 mapr version
spark.shuffle.service.enabled
; you will see it in explain.
>
> On Sat, Nov 26, 2016 at 10:51 AM, Swapnil Shinde > wrote:
>
>> Hello
>> I am trying a broadcast join on dataframes but it is still doing
>> SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold
>> hig
Hello
I am trying a broadcast join on dataframes but it is still doing
SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold
higher but still no luck.
Related piece of code-
val c = a.join(braodcast(b), "id")
On a side note, if I do SizeEstimator.estimate(b) and it
Hello
I am trying to do inner join with broadcastHint and getting below exception
-
I tried to increase "sqlContext.conf.autoBroadcastJoinThreshold" but still
no luck.
*Code snippet-*
val dpTargetUvOutput =
pweCvfMUVDist.as("a").join(broadcast(sourceAssgined.as("b")), $"a.web_id"
=== $"b.source_id
of NFS workings should correct
> if I am wrong.
>
>
> On Fri, Aug 28, 2015 at 1:12 AM, Swapnil Shinde
> wrote:
>
>> Thanks Rishitesh !!
>> 1. I get that driver doesn't need to be on master but there is lot of
>> communication between driver and cluster. That&
>
> On Thursday, August 27, 2015, Swapnil Shinde
> wrote:
>
>> Hello
>> I am new to spark world and started to explore recently in standalone
>> mode. It would be great if I get clarifications on below doubts-
>>
>> 1. Driver locality - It is mentioned in documen
Hello
I am new to spark world and started to explore recently in standalone mode.
It would be great if I get clarifications on below doubts-
1. Driver locality - It is mentioned in documentation that "client"
deploy-mode is not good if machine running "spark-submit" is not co-located
with worker m
15 matches
Mail list logo