Hi all,
I’ve been playing around with the Vector and Matrix UDTs in pyspark.ml and
I’ve found myself wanting more.
There is a minor issue in that with the arrow serialization enabled, these
types don’t serialize properly in python UDF calls or in toPandas. There’s
a natural representation for
Great.
If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache
Spark 2.3.1, let's upgrade orc dependency from 1.4.1 to 1.4.3 together.
Currently, the patch is only merged into master branch now. 1.4.1 has the
following issue.
https://issues.apache.org/jira/browse/SPARK-23340
Seems like this would make sense... we usually make maintenance releases
for bug fixes after a month anyway.
On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson wrote:
>
>
> On 11 April 2018 at 12:47, Ryan Blue wrote:
>
>> I think a 1.8.3 Parquet
On 11 April 2018 at 12:47, Ryan Blue wrote:
> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of
> Spark.
>
> To be clear though, this only affects Spark when reading data written by
> Impala, right? Or does Parquet CPP also produce data like this?
>
I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of Spark.
To be clear though, this only affects Spark when reading data written by
Impala, right? Or does Parquet CPP also produce data like this?
On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson wrote:
> Hi
Hi all -
SPARK-23852 (where a query can silently give wrong results thanks to a
predicate pushdown bug in Parquet) is a fairly bad bug. In other projects
I've been involved with, we've released maintenance releases for bugs of
this severity.
Since Spark 2.4.0 is probably a while away, I wanted
Hi,
I'm looking into the Parquet format support for the File source in
Structured Streaming.
The docs mention the use of the option 'mergeSchema' to merge the schemas
of the part files found.[1]
What would be the practical use of that in a streaming context?
In its batch counterpart,
it was helpful,
Then, the OS needs to fill some pressure from the applications
requesting memory to free some memory cache?
Exactly under which circumstances the OS free that memory to give it to
applications requesting it?
I mean if the total memory is 16GB and 10GB are used for OS cache,