Re: [SparkSql] Casting of Predicate Literals

2020-08-19 Thread Bart Samwel
And how are we doing here on integer pushdowns? If someone does e.g. CAST(short_col AS LONG) < 1000, can we still push down "short_col < 1000" without the cast? On Tue, Aug 4, 2020 at 6:55 PM Russell Spitzer wrote: > Thanks! That's exactly what I was hoping for! Thanks for finding the Jira >

Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

2020-08-19 Thread golokeshpatra.patra
Adding this simple setting helped me overcome the issue - *spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic") * My Issue - In a S3 Folder, I previously had data partitionedBy - *ingestiontime* . Now I wanted to reprocess this data and partition it by - businessname &

Re: [SparkSql] Casting of Predicate Literals

2020-08-19 Thread Wenchen Fan
Currently we can't. This is something we should improve, by either pushing down the cast to the data source, or simplifying the predicates to eliminate the cast. On Wed, Aug 19, 2020 at 5:09 PM Bart Samwel wrote: > And how are we doing here on integer pushdowns? If someone does e.g. >

Allow average out of a Date

2020-08-19 Thread Driesprong, Fokko
Hi all, Personally, I'm a big fan of the .summary() function to compute statistics of a dataframe. I often use this for debugging pipelines, and check what the impact of the RDD is after changing code. I've noticed that not all datatypes are in this summary. Currently, there is a list

RE: [VOTE] Release Spark 2.4.7 (RC1)

2020-08-19 Thread Nicholas Marion
It appears all 3 issues slated for Spark 2.4.7 have been merged. Should we be looking at getting RC2 ready? Regards,

Re: Running K8s integration tests for changes in core?

2020-08-19 Thread shane knapp ☠
we'll be gated by the number of ubuntu workers w/minikube and docker, but it shouldn't be too bad as the full integration test takes ~45m, vs 4+ hrs for the regular PRB. i can enable this in about 1m of time if the consensus is for us to want this. On Wed, Aug 19, 2020 at 11:37 AM Holden Karau

Re: Running K8s integration tests for changes in core?

2020-08-19 Thread Holden Karau
Sounds good. In the meantime would folks committing things in core run the K8s PRB or run it locally? A second change this morning was committed that broke the K8s PR tests. On Tue, Aug 18, 2020 at 9:53 PM Prashant Sharma wrote: > +1, we should enable. > > On Wed, Aug 19, 2020 at 9:18 AM Holden

Re: [VOTE] Release Spark 2.4.7 (RC1)

2020-08-19 Thread Wenchen Fan
I think so. I don't see other bug reports for 2.4. On Thu, Aug 20, 2020 at 12:11 AM Nicholas Marion wrote: > It appears all 3 issues slated for Spark 2.4.7 have been merged. Should we > be looking at getting RC2 ready? > > > Regards, > > *NICHOLAS T. MARION * > IBM Open Data Analytics for z/OS

Re: SPIP: Catalog API for view metadata

2020-08-19 Thread Ryan Blue
I think it is a good idea to keep tables and views separate. The main two arguments I’ve heard for combining lookup into a single function are the ones brought up in this thread. First, an identifier in a catalog must be either a view or a table and should not collide. Second, a single lookup is

Question about Expression Encoders

2020-08-19 Thread Mark Hamilton
Dear Spark Developers, In our teams Spark Library we utilize ExpressionEncoders to help us automatically generate spark SQL types from scala case classes. https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/core/schema/SparkBindings.scala