Efficiently updating running sums only on new data

2022-10-11 Thread Greg Kopff
I'm new to Spark and would like to seek some advice on how to approach a problem. I have a large dataset that has dated observations. There are also columns that are running sums of some of other columns. date | thing | foo | bar | foo_sum | bar_sum |

Re: Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-11 Thread Sadha Chilukoori
I have faced the same problem, where hive and spark orc were using the snappy compression. Hive 2.1 Spark 2.4.8 I'm curious to learn what could be the root cause of this. -S On Tue, Oct 11, 2022, 2:18 AM Chartist <13289341...@163.com> wrote: > > Hi,All > > I encountered a problem as the

Re: As a Scala newbie starting to work with Spark does it make more sense to learn Scala 2 or Scala 3?

2022-10-11 Thread Sean Owen
See the pom.xml file https://github.com/apache/spark/blob/master/pom.xml#L3590 2.13.8 at the moment; IIRC there was some Scala issue that prevented updating to 2.13.9. Search issues/PRs. On Tue, Oct 11, 2022 at 6:11 PM Henrik Park wrote: > scala 2.13.9 was released. do you know which spark

Re: As a Scala newbie starting to work with Spark does it make more sense to learn Scala 2 or Scala 3?

2022-10-11 Thread Henrik Park
scala 2.13.9 was released. do you know which spark version would have it built-in? thanks Sean Owen wrote: I would imagine that Scala 2.12 support goes away, and Scala 3 support is added, for maybe Spark 4.0, and maybe that happens in a year or so. -- Simple Mail https://simplemail.co.in/

Re: As a Scala newbie starting to work with Spark does it make more sense to learn Scala 2 or Scala 3?

2022-10-11 Thread Sean Owen
For Spark, the issue is maintaining simultaneous support for multiple Scala versions, which has historically been mutually incompatible across minor versions. Until Scala 2.12 support is reasonable to remove, it's hard to also support Scala 3, as it would mean maintaining three versions of code. I

Re: As a Scala newbie starting to work with Spark does it make more sense to learn Scala 2 or Scala 3?

2022-10-11 Thread Никита Романов
No one knows for sure except Apache, but I’d learn Scala 2 if I were you. Even if Spark one day migrates to Scala 3 (which is not given), it’ll take a while for the industry to adjust. It even takes a while to move from Spark 2 to Spark 3 (Scala 2.11 to Scala 2.12). I don’t think your

Why the same INSERT OVERWRITE sql , final table file produced by spark sql is larger than hive sql?

2022-10-11 Thread Chartist
Hi,All I encountered a problem as the e-mail subject described. And the followings are the details: SQL: insert overwrite table mytable partition(pt='20220518') select guid, user_new_id, sum_credit_score, sum_credit_score_change, platform_credit_score_change, bike_credit_score_change,