,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
I¹ve made as well.
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
aggregateByKey is also affected.
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
it and need to apply the change to aggregate()? It seems
appropriate to target a fix for 1.3.0.
-Matt Cheah
From: Josh Rosen rosenvi...@gmail.com
Date: Wednesday, February 18, 2015 at 6:12 AM
To: Matt Cheah mch...@palantir.com
Cc: dev@spark.apache.org dev@spark.apache.org, Mingyu Kim
m
regression.
I was wondering if anyone else observed this regression, and if so, if
anyone would have any idea what could possibly have caused it between Spark
1.0.2 and Spark 1.1.1?
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
I actually tested Spark 1.2.0 with the code in the rdd.take() method
swapped out for what was in Spark 1.0.2. The run time was still slower,
which indicates to me something at work lower in the stack.
-Matt Cheah
On 2/18/15, 4:54 PM, Patrick Wendell pwend...@gmail.com wrote:
I believe
for the
quick and accurate response!
-Matt CHeah
From: Aaron Davidson ilike...@gmail.com
Date: Wednesday, February 18, 2015 at 5:25 PM
To: Matt Cheah mch...@palantir.com
Cc: Patrick Wendell pwend...@gmail.com, dev@spark.apache.org
dev@spark.apache.org, Mingyu Kim m...@palantir.com, Sandor Van
Excellent! Where can I find the code, pull request, and Spark ticket where
this was introduced?
Thanks,
-Matt Cheah
From: Reynold Xin r...@databricks.com
Date: Monday, June 1, 2015 at 10:25 PM
To: Matt Cheah mch...@palantir.com
Cc: dev@spark.apache.org dev@spark.apache.org, Mingyu Kim
m
output format, but it looks like
ParquetTableOperations.scala has fixed the output format to
AppendingParquetOutputFormat.
Also, I was wondering if it would be valuable to contribute writing Parquet
in partition directories as a PR.
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic
with
map-side-combine set to false. Is it something specific to how Pyspark can
potentially spill the individual groups to disk?
Thanks,
-Matt Cheah
P.S. Relevant Links:
https://issues.apache.org/jira/browse/SPARK-3074
https://github.com/apache/spark/pull/1977
smime.p7s
Description: S/MIME
I was executing on Spark 1.4 so I didn¹t notice the Tungsten option would
make spilling happen in 1.5. I¹ll upgrade to 1.5 and see how that turns out.
Thanks!
From: Reynold Xin <r...@databricks.com>
Date: Monday, September 21, 2015 at 5:36 PM
To: Matt Cheah <mch...@palantir.com&g
hought on this problem is. Did we
consciously think about the robustness implications when choosing to use an
in memory Hash Map to compute the aggregation? Is this an inherent
limitation of the aggregation implementation in Data Frames?
Thanks,
-Matt Cheah
smime.p7s
Description:
Hi everyone,
A very brief question out of curiosity is there any particular reason why
we don¹t publish the Spark assembly jar on the Maven repository?
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
-1 because of SPARK-16181 which is a correctness regression from 1.6. Looks
like the patch is ready though: https://github.com/apache/spark/pull/13884 – it
would be ideal for this patch to make it into the release.
-Matt Cheah
From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.
be
appreciated!
-Matt Cheah
From: Reynold Xin <r...@databricks.com>
Date: Sunday, February 7, 2016 at 11:11 PM
To: Matt Cheah <mch...@palantir.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Mingyu Kim
<m...@palantir.com>
Subject: Re: Preserving
ing the amount of data that is shuffled?
2) If the planner takes advantage of co-partitioning, is the renaming of the
columns invalidating the partitioning of the grouped Data Frame? When I look at
the planner's conversion from logical.Project to the physical plan, I only see
it invoking child.mapPartitions without specifying the preservesPartitioning
flag.
Thanks,
-Matt Cheah
iver’s memory, but my driver ran out of memory after I increased the
autoBroadcastJoinThreshold. YourKit is indicating that this logic is consuming
more memory than my driver can handle.
Thanks,
-Matt Cheah
shuffle data? This is the scenario I’m running into most,
where my tasks fail because they try to reach the shuffle service instead of
trying to recompute the lost shuffle files.
Thanks,
-Matt Cheah
that it is difficult to reason
about dataset size on disk vs. memory.
-Matt Cheah
On 3/2/16, 10:15 AM, "Davies Liu" <dav...@databricks.com> wrote:
>UnsafeHashedRelation and HashedRelation could also be used in Executor
>(for non-broadcast hash join), then the UnsafeRow could come from
>
at all. However jersey-client looks relatively harmless since it does
not bundle in JAX-RS classes, nor does it appear to have anything weird in
its META-INF folder.
-Matt Cheah
On 5/9/16, 3:10 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>Hi Jesse,
>
>On Mo
nagers.
A first draft of a proposal outlining a potential long-term plan around this
feature has been attached to the JIRA ticket. Any feedback and discussion would
be greatly appreciated.
Thanks,
-Matt Cheah
submission client, so that would need to be
pluggable as well.
More discussion on fully pluggable scheduler backends is at
https://issues.apache.org/jira/browse/SPARK-19700.
-Matt Cheah
From: Erik Erlandson <eerla...@redhat.com>
Date: Friday, August 18, 2017 at 8:34 AM
To
.
From: Anirudh Ramanathan <ramanath...@google.com.INVALID>
Date: Monday, January 8, 2018 at 9:48 AM
To: Felix Cheung <felixcheun...@hotmail.com>
Cc: 蒋星博 <jiangxb1...@gmail.com>, Marcelo Vanzin <van...@cloudera.com>, dev
<dev@spark.apache.org>, Matt Cheah <mch...@p
Think we can allow for different images and default to them being the same.
Apologize if I missed that as being the original intention though.
-Matt Cheah
On 1/8/18, 1:45 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
On Mon, Jan 8, 2018 at 1:39 PM, Matt Cheah <
// Fixing Anirudh's email address
From: Matt Cheah
Sent: Monday, January 8, 2018 1:39:12 PM
To: Anirudh Ramanathan; Felix Cheung
Cc: 蒋星博; Marcelo Vanzin; dev; Timothy Chen
Subject: Re: Kubernetes backend and docker images
We would still want images to be able
anzin" <van...@cloudera.com> wrote:
On Wed, Jan 10, 2018 at 1:33 PM, Matt Cheah <mch...@palantir.com> wrote:
> If we use spark-submit in client mode from the driver container, how do
we handle needing to switch between a cluster-mode scheduler backend and a
client-mode
circumstances, versus client mode being allowed with a
specific flag. If we’re saying that we don’t support client mode, we should
bias towards making client mode as difficult as possible to access, i.e.
impossible with a standard Spark distribution.
-Matt Cheah
On 1/10/18, 1:24 PM, "Marcelo V
don’t need to
use spark-submit at all, meaning that the differences can more or less be
ignored at least in this particular context.
-Matt Cheah
On 1/10/18, 8:40 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
On a side note, while it's great that you guys have meeting
definitely a niche use case – I’m not sure how often pod presets are used
in practice - but it’s an example to illustrate why the separation of concerns
can be beneficial.
-Matt Cheah
On 1/10/18, 2:36 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
On Wed, Jan 10, 2018 at 2
.
From: Yinan Li <liyinan...@gmail.com>
Date: Tuesday, January 9, 2018 at 7:16 PM
To: Nicholas Chammas <nicholas.cham...@gmail.com>
Cc: Anirudh Ramanathan <ramanath...@google.com.invalid>, Marcelo Vanzin
<van...@cloudera.com>, Matt Cheah <mch...@palantir.com>, Kimo
with the Spark community via the Spark mailing list and Spark JIRA
tickets. We’re specifically aiming to deprecate the fork and migrate all the
work done on the fork into the main line.
-Matt Cheah
From: Mark Hamstra <m...@clearstorydata.com>
Date: Monday, February 5, 2018 at 1:44 PM
To: Matt
/1XPLh3E2JJ7yeJSDLZWXh_lUcjZ1P0dy9QeUEyxIlfak/edit#
I hope that we can have a productive discussion and continue improving the
Kubernetes integration further.
Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
a lot
that needs to be discussed for this improvement, so we hope to get as much
input as possible before moving forward with a design.
Please feel free to leave comments and suggestions on the JIRA ticket or on the
discussion document.
Thank you!
-Matt Cheah
smime.p7s
Description
from more Spark
users.
The experience would be greatly appreciated in the discussion.
-Matt Cheah
From: Yuanjian Li
Date: Friday, August 31, 2018 at 8:29 PM
To: Matt Cheah
Cc: Spark dev list
Subject: Re: [Feedback Requested] SPARK-25299: Using Distributed Storage for
Persisting Shuffle
The question is more so generally what an advised best practice is for setting
CPU limits. It’s not immediately clear what a correct value is for setting CPU
limits if one wants to provide guarantees for consistent / guaranteed execution
performance while also not degrading performance.
Re: Hadoop versioning – it seems reasonable enough for us to be publishing an
image per Hadoop version. We should essentially have image configuration parity
with what we publish as distributions on the Spark website.
Sometimes jars need to be swapped out entirely instead of being strictly
of all proposed breaking
changes / JIRA tickets? Perhaps we can include it in the JIRA ticket that can
be filtered down to somehow?
Thanks,
-Matt Cheah
From: Vinoo Ganesh
Date: Monday, November 12, 2018 at 2:48 PM
To: Reynold Xin
Cc: Xiao Li , Matei Zaharia ,
Ryan Blue , Mark Hamstra , dev
I just added the label to https://issues.apache.org/jira/browse/SPARK-25908.
Unsure if there are any others. I’ll look through the tickets and see if there
are any that are missing the label.
-Matt Cheah
From: Sean Owen
Date: Tuesday, November 13, 2018 at 12:09 PM
To: Matt Cheah
Cc
ease-notes' with a description of the change. The release itself
has a migration guide that's being updated as we go.
On Mon, Nov 12, 2018 at 5:49 PM Matt Cheah wrote:
>
> I wanted to clarify what categories of APIs are eligible to be broken in
Spark 3
Relying on kubectl exec may not be the best solution because clusters with
locked down security will not grant users permissions to execute arbitrary code
in pods. I can’t think of a great alternative right now but I wanted to bring
this to our attention for the time being.
-Matt Cheah
than Hive UDFs.
-Matt Cheah
From: Reynold Xin
Date: Friday, December 14, 2018 at 1:49 PM
To: "rb...@netflix.com"
Cc: Spark Dev List
Subject: Re: [DISCUSS] Function plugins
Having a way to register UDFs that are not using Hive APIs would be great!
On Fri, Dec 14,
further discussion in this space.
You may comment in this e-mail thread or by commenting on the progress report
document.
Looking forward to hearing from you. Thanks,
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
+1 for n-part namespace as proposed. Agree that a short SPIP would be
appropriate for this. Perhaps also a JIRA ticket?
-Matt Cheah
From: Felix Cheung
Date: Sunday, January 20, 2019 at 4:48 PM
To: "rb...@netflix.com" , Spark Dev List
Subject: Re: [DISCUSS] Identifiers
+1 (non-binding)
Are identifiers and namespaces going to be rolled under one of those six points?
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Thursday, February 28, 2019 at 8:39 AM
To: Spark Dev List
Subject: [VOTE] Functional DataSourceV2 in Spark 3.0
I’d like to call a
+1 (non-binding)
From: Jamison Bennett
Date: Thursday, February 28, 2019 at 8:28 AM
To: Ryan Blue , Spark Dev List
Subject: Re: [VOTE] SPIP: Spark API for Table Metadata
+1 (non-binding)
Jamison Bennett
Cloudera Software Engineer
jamison.benn...@cloudera.com
515 Congress Ave, Suite
), what appeal is there for users to
upgrade to that latest version?
-Matt Cheah
On 2/28/19, 1:37 PM, "Mridul Muralidharan" wrote:
I am -1 on this vote for pretty much all the reasons that Mark mentioned.
A major version change gives us an opportunity to remove
is going to take to
implement and review.
-Matt Cheah
On 2/24/19, 3:05 PM, "Sean Owen" wrote:
Sure, I don't read anyone making these statements though? Let's assume
good intent, that "foo should happen" as "my opinion as a member of
the community, whi
Reynold made a note earlier about a proper Row API that isn’t InternalRow – is
that still on the table?
-Matt Cheah
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Tuesday, February 26, 2019 at 4:40 PM
To: Matt Cheah
Cc: Sean Owen , Wenchen Fan , Xiao Li
, Matei Zahar
Will that then require an API break down the line? Do we save that for Spark 4?
-Matt Cheah?
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Tuesday, February 26, 2019 at 4:53 PM
To: Matt Cheah
Cc: Sean Owen , Wenchen Fan , Xiao Li
, Matei Zaharia , Spark Dev
List
S
point release. But #1 and #2 are
also the features that have remained open for the longest time and we really
need to move forward on these. Putting a target release for 3.0 will help in
that regard.
-Matt Cheah
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Thursday, F
, and then provide detailed instructions for how to build custom
Docker images (mostly just needing to make sure the custom image has the right
entry point).
-Matt Cheah
From: Rong Ou
Date: Friday, February 8, 2019 at 2:28 PM
To: "dev@spark.apache.org"
Subject: building docker images for
this to a voting phase and to begin proposing
our work against upstream Spark?
Thanks,
-Matt Cheah
From: "Yifei Huang (PD)"
Date: Monday, May 13, 2019 at 1:04 PM
To: Mridul Muralidharan
Cc: Bo Yang , Ilan Filonenko , Imran Rashid
, Justin Uang , Liang Tang
, Marcelo Vanzin , Mat
We opened a thread for voting yesterday, so please participate!
-Matt Cheah
From: Yue Li
Date: Thursday, June 13, 2019 at 7:22 PM
To: Saisai Shao , Imran Rashid
Cc: Matt Cheah , "Yifei Huang (PD)" ,
Mridul Muralidharan , Bo Yang , Ilan Filonenko
, Imran Rashid , Justin Uan
or not this proposal is agreeable to you.
Thanks!
-Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature
enchen Fan , Hyukjin Kwon ,
Russell Spitzer , Ryan Blue ,
Reynold Xin , Matt Cheah , Takeshi
Yamamuro , Spark dev list
Subject: Re: [Discuss] Follow ANSI SQL on table insertion
Hi all,
Let me explain a little bit on the proposal.
By default, we follow the store assignment rule
specific semantics we don’t support in the V2 API. For example,
one cannot commit multiple write operations in a single transaction right now.
That would require changes to the DDL and a pretty substantial change to the
design of Spark-SQL more broadly.
-Matt Cheah
From: Shiv Prashant
the Spark 3 preview release specifically on SPARK-25299.
-Matt Cheah
From: Xiao Li
Date: Tuesday, September 17, 2019 at 12:00 AM
To: Erik Erlandson
Cc: Sean Owen , dev
Subject: Re: Thoughts on Spark 3 release, or a preview release
https://issues.apache.org/jira/browse/SPARK-28264
+1 as both a contributor and a user.
From: John Zhuge
Date: Thursday, September 12, 2019 at 4:15 PM
To: Jungtaek Lim
Cc: Jean Georges Perrin , Hyukjin Kwon ,
Dongjoon Hyun , dev
Subject: Re: Thoughts on Spark 3 release, or a preview release
+1 Like the idea as a user and a DSv2
Sorry I meant the current behavior for V2, which fails the query compilation if
the cast is not safe.
Agreed that a separate discussion about overflow might be warranted. I’m
surprised we don’t throw an error now, but it might be warranted to do so.
-Matt Cheah
From: Reynold Xin
, or perhaps the behavior can be flagged
by the destination writer at write time.
-Matt Cheah
From: Hyukjin Kwon
Date: Monday, July 29, 2019 at 11:33 PM
To: Wenchen Fan
Cc: Russell Spitzer , Takeshi Yamamuro
, Gengliang Wang , Ryan
Blue , Spark dev list
Subject: Re: [Discuss] Follow ANSI SQL
There might be some help from the staging table catalog as well.
-Matt Cheah
From: Wenchen Fan
Date: Monday, August 5, 2019 at 7:40 PM
To: Shiv Prashant Sood
Cc: Ryan Blue , Jungtaek Lim , Spark Dev
List
Subject: Re: DataSourceV2 : Transactional Write support
I agree with the temp
ential viable options, so I’m looking forward to
engaging with dialogue moving forward.
Thanks!
-Matt Cheah
63 matches
Mail list logo