Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/9386#issuecomment-153156750
If I recall, we specifically decided against a conditional in the BSP
function at that point because the branching might causes hotspots. If that's
still a concern
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/9386#issuecomment-153207454
Yes, I agree, it shouldn't add overhead.
Sent from my iPhone
> On Nov 2, 2015, at 4:35 PM, DB Tsai <notificati...@github.com> wrote:
&
Github user dwmclary closed the pull request at:
https://github.com/apache/spark/pull/6919
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/6919#issuecomment-128134014
Closed.
On Wed, Aug 5, 2015 at 12:49 PM, Reynold Xin notificati...@github.com
wrote:
@dwmclary https://github.com/dwmclary do you mind closing
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/6919#issuecomment-119633169
No problem -- just wanted to make sure it was on your radar.
On Wed, Jul 8, 2015 at 12:55 AM, Reynold Xin notificati...@github.com
wrote:
Sorry
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/6919#issuecomment-118956763
ping @rxin ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/6919#issuecomment-117236371
@davies any review comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/6919#issuecomment-114857450
So, I'm wondering if the Scala-specific method actually needs to
re-implement, or if it be cleaner to just call mutable.copyToArray and pass it
to the agnostic
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/6919
Spark 7998 freq item api
Here's a better frequent item API which provides a DataFrame with each
ArrayBuffer expanded into a column. There's surely some improvement that could
be done here, but I
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-98199444
Thanks Joey, I appreciate it. I can see your concern w/r/t the branching.
If I can get some HW and time, I'll see if I notice a performance regression
with the change
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-96816965
@jegonzal does this algorithm look correct to you?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user dwmclary closed the pull request at:
https://github.com/apache/spark/pull/5066
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-95256308
@jegonzal does this algorithm look correct to you?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-94025941
OK, I'll update w/r/t the comments today. I'd appreciate it if someone
took a glance at the algorithm; it's as specified in the referred paper,
but another set
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-93562171
Is this going to get merged at some point?
On Tue, Mar 31, 2015 at 10:51 AM, Yusup notificati...@github.com wrote:
+1
â
Reply
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-83669688
Good to merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/5066
Spark 6359 expose imain binding
As per the associated JIRA ticket: in 1.2, some projects (e.g. Apache
Zeppelin) rely on the ILoop exposing its IMain object for the purpose of
binding UI variables
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-81829391
OK, that should fix the binary incompatibility on the vertexProgram.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-78824623
OK, got 'em.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-78804717
Whitespace removed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-78766347
OK, that should be a reasonable solution. Thanks for the advice @rxin.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-78614976
I certainly agree that binary compatibility matters. I think it's mainly a
question of which is more desirable: fewer repeated LOC or binary
compatibility
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-78594489
Does anyone have a comment on this MiMa failure? The fact that
PageRankSuite passes illustrates that it's source compatible.
---
If your project is set up for it, you
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-77436963
OK, thanks Sean, that was my reading of it too.
On Thu, Mar 5, 2015 at 11:32 AM, Sean Owen notificati...@github.com wrote:
I think this change actually
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-77433149
I'm not really sure what to do about this MiMa error. Suggestions?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-77398055
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/4774
Spark-5854 personalized page rank
Here's a modification to PageRank which does personalized PageRank. The
approach is basically similar to that outlined by Bahmani et al. from 2010
(http
Github user dwmclary closed the pull request at:
https://github.com/apache/spark/pull/4421
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73783731
I've been thinking of it as equivalent to a CREATE TABLE, in which case I
think it's dialect-specific. Perhaps ANSI and pgSQL allow it, but, for
example, Oracle
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73770007
OK, I've updated this to use as a reference. One thing we may want to take
from this PR is that toDataFrame and createDataFrame absolutely need to check
reserved words
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73771478
So, we'll allow a column named SELECT regardless of whether it's been
called out as `SELECT`? It just seems to me that it invites a lot of
potentially erroneous
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73623236
Reynold,
It is similar, but I think the distinction here is that toDataFrame
appears to require that old names (and a schema) exist. Or, at least
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73626452
Ah, yes, I see that now.
Python doesn't seem to have a toDataFrame, so maybe the logical thing to do
here is to just do a new PR with a Python implementation
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73626542
Or, I guess I can just do it in this PR if you don't mind it changing a
bunch.
On Mon, Feb 9, 2015 at 5:18 PM, Dan McClary dan.mccl...@gmail.com wrote
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73632890
Sounds like a plan -- I'll do it on top of #4479.
Thought: I've added a getReservedWords private method to SQLContext.scala.
I feel like leaving
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/4421#issuecomment-73325875
Updated to keep reserved words in the JVM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/4421
Spark-2789: Apply names to RDD to create DataFrame
This seemed like a reasonably useful function to add to SparkSQL. However,
unlike the [JIRA](https://issues.apache.org/jira/browse/SPARK-2789
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/4421#discussion_r24253601
--- Diff: python/pyspark/sql.py ---
@@ -1469,6 +1470,44 @@ def applySchema(self, rdd, schema):
df = self._ssql_ctx.applySchemaToPythonRDD
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63884730
Michael; thanks for being willing to pick up the final changes!
I'm happy to get a chance to contribute again. Hopefully the next PR won't
require so much
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63716980
Is this good to merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20610021
--- Diff: python/pyspark/sql.py ---
@@ -1870,6 +1870,10 @@ def limit(self, num):
rdd =
self._jschema_rdd.baseSchemaRDD().limit(num
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63744673
This may be an intermittent diff; it's not in the code path modified in
this PR.
On Wed, Nov 19, 2014 at 4:03 PM, UCB AMPLab notificati...@github.com
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63748417
Ugh, yeah, just wasn't paying attention. Fixed now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63376043
Thanks -- I'll clean up the style issues straight away. I'm glad to see
this getting close to finished.
As for additional tests, I'd been thinking along
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20466740
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala ---
@@ -131,6 +134,80 @@ class SchemaRDD(
*/
lazy val schema
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63386849
I'm going to go with JSONSuite. I don't think it's big enough to warrant a
whole suite. I'm putting rowToJSON in JsonRDD right after asRow.
---
If your project
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20478747
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
---
@@ -591,6 +591,30 @@ class SQLQuerySuite extends QueryTest
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20479917
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala
---
@@ -779,4 +780,52 @@ class JsonSuite extends QueryTest {
Seq(null
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63419169
OK, pulled in the bulk of the tests for primitive and complex types from
other parts of JsonSuite. I think we're pretty heavily exercising the code
at this point
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20410937
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala ---
@@ -131,6 +134,69 @@ class SchemaRDD(
*/
lazy val schema
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63194531
@davies -- that's much cleaner; thanks! I think unicode should be default,
but optional for the deserializer so I added that to the method.
@yhuai https
Github user dwmclary commented on a diff in the pull request:
https://github.com/apache/spark/pull/3213#discussion_r20406513
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala ---
@@ -131,6 +134,68 @@ class SchemaRDD(
*/
lazy val schema
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63145286
Happy to help; these changes should be quick.
- Sure, the wrapper for pyspark makes more sense; I hadn't considered
that we'd be shipping the objects
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63152791
I pushed up a Jackson version, which cuts down the size quite a bit. At
present we're not handling complex types, correct?
What I'm a bit stuck
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63157839
Yin,
Thanks for jumping in. I'll run some complex types through ObjectMapper
and see how it compares to JsonFactory. I figure object creation overhead
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/3213
SPARK-4228 SchemaRDD to JSON
Here's a simple fix for SchemaRDD to JSON.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dwmclary/spark SPARK
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/122#issuecomment-37976438
@ScrapCodes This is updated to pick up the changes from 1246.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
57 matches
Mail list logo