Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/11290#issuecomment-186951853
This looks good to me. @insidedctm thanks for reviving the PR and @srowen
thanks for taking a look at this! My only minor concern is that it will change
the results
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168628357
@davies and @JoshRosen I have finished a working prototype that passes the
tests. I would be interested in your thoughts.
---
If your project is set up
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168850430
@davies thanks for taking look! I will open a JIRA issue later today.
With respect to the disk based design, I had considered it but it has a few
limitations
Github user jegonzal closed the pull request at:
https://github.com/apache/spark/pull/10550
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168568011
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168566477
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168497312
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168368506
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10550#issuecomment-168362426
@davies and @JoshRosen let me know what you think of this design.
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/10550
Adding zipPartitions to PySpark
The following working WIP adds support for `zipPartitions` to PySpark.
This is accomplished by modifying the PySpark `worker` (in both daemon and
non-deamon mode
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/9386#issuecomment-153207973
This is actually a pretty serious error since it could lead to mass being
accumulated on unreachable sub-graphs. The performance implications of the
above branch
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/5142#issuecomment-137170613
@srowen GraphX is still active we have just been pretty busy with some
other changes. Let me see what needs to be done with this PR.
---
If your project is set up
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/7354#issuecomment-120998075
I will make the suggested changes now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/7354#issuecomment-121000162
I have merged upstream changes and added back the requested paragraph
blocks (correctly).
---
If your project is set up for it, you can reply to this email and have
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/7354#discussion_r34419761
--- Diff:
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -25,9 +25,9 @@
import static
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/7354
[SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc
These are minor corrections in the documentation of several classes that
are preventing:
```bash
build/sbt
Github user jegonzal closed the pull request at:
https://github.com/apache/spark/pull/1228
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-103186580
I think we have covered most of this code in later tests (PR #1217) and the
remaining tests need to be substantially updated which I can do in a later PR.
I am going
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/4774#discussion_r29521247
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -103,8 +132,14 @@ object PageRank extends Logging
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/4774#issuecomment-98196331
Overall this looks great! I apologize for the delayed response. I am
going to go ahead and merge this now and then we can tune the performance in a
later pull
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/5403#issuecomment-97975070
This PR could have important performance implications for algorithms in
GraphX and MLlib (e.g., ALS) which introduce relatively lightweight shuffle
stages at each
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1128#issuecomment-75451894
Great! I agree with this proposal as well. I apologize for letting it sit
so long.
---
If your project is set up for it, you can reply to this email and have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2495#issuecomment-70925718
Great! What else needs to be done? There was some discussion about how
this might change the semantics of the triangle count function? Is this
still true
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1297#issuecomment-69504481
We should really address this stack overflow issue. Is there a JIRA we can
promote?
---
If your project is set up for it, you can reply to this email and have your
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1297#issuecomment-69531109
Hmm, we really need to elevate this to a full issue. I have run into the
stack overflow in MLlib (ALS) as well.
---
If your project is set up for it, you can reply
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/3472
Removing confusing TripletFields
After additional discussion with @rxin, I think having all the possible
`TripletField` options is confusing. This pull request reduces the triplet
fields
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3472#issuecomment-64520673
@ankurdave, what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3472#issuecomment-64520711
This is consistent with the current discussion in the graphx programming
guide and so it is unlikely users have started using the more obscure
combinations that were
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3359#issuecomment-63743607
Sounds good. I can fix it now if you want.
Joey
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3359#issuecomment-63597028
@rxin and @ankurdave, take a look when you get a chance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-63597173
@ankurdave and @rxin can we merge this now?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-63597212
@ankurdave should I try and update this with your latest changes or do you
want to create a new one?
---
If your project is set up for it, you can reply to this email
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/3359#discussion_r20559596
--- Diff: project/SparkBuild.scala ---
@@ -328,7 +328,7 @@ object Unidoc {
unidocProjectFilter in(ScalaUnidoc, unidoc) :=
inAnyProject
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3303#issuecomment-63267156
Looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-62917449
@ankurdave is this already covered in your latest PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2495#issuecomment-62917547
@ankurdave take a look at this when you get a chance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/3100#discussion_r20243545
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/TripletFields.java
---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/3100#discussion_r20257677
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/TripletFields.java
---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user jegonzal closed the pull request at:
https://github.com/apache/spark/pull/2815
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3099#issuecomment-61922586
The model serving work would really benefit from being able to evaluate
models without requiring a Spark context especially since we are shooting for
10s millisecond
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/3099#issuecomment-61934417
@jkbradley Right now we are planning to serve linear combinations of models
derived from MLlib (currently latent factor models, naive bayes, and decision
trees
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2815#issuecomment-61349221
I added the `TripletFields` enum and updated all the dependent files. I
can't deprecate the old API since they have the same function signature up to
default arguments
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2815#issuecomment-61349425
At this point I could also imagine actually having a separate function
closure for each version.
```scala
mapTriplets(f: Edge = ED2)
mapTriplets(f
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/2996
[SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace
This simple patch filters out extra whitespace entries.
You can merge this pull request into a Git repository by running
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61026000
Not sure why it failed the test. Is this an issue with the testing
framework?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2996#issuecomment-61026298
The following implementation seems a bit more efficient but is needlessly
complicated.
```scala
// Count the number of empty values
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/3006
[SPARK-4142][GraphX] Default numEdgePartitions
Changing the default number of edge partitions to match spark parallelism.
You can merge this pull request into a Git repository by running
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2815#issuecomment-61026843
What is the status on this patch? I would like to merge it soon so that
the python GraphX API can support these additional flags.
---
If your project is set up
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2495#issuecomment-61026881
What is the status on this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-61027554
This is still work in progress and we need to discuss these API changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-61029490
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-61029482
This should now be addressed in the latest master and does not depend on PR
#1217
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/2815
Remove Bytecode Inspection for Join Elimination
Removing bytecode inspection from triplet operations and introducing
explicit join elimination flags. The explicit flags make the join elimination
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2815#issuecomment-59263992
@ankurdave and @rxin I have not updated the applications to use the new
explicit flags. I will do that in this PR pending approval for the API changes.
---
If your
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2815#issuecomment-59275910
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2439#issuecomment-56438311
This looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/2495
[SPARK-3650] Fix TriangleCount handling of reverse edges
This PR causes the TriangleCount algorithm to remove self-edges, direct
edges from low-id to high-id (canonical direction), and then remove
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/2168#issuecomment-53760885
The code changes look good to me (and were badly need). Thanks for fixing
it!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-53776044
Yes. This is an extension of the unit tests to catch a class of bugs
addressed in PR #1217 (which has not been merged). I believe @ankurdave was
working on a merge
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-47193665
I spent some time verifying the math behind the PageRank (in particular
starting values) to ensure that the delta formulation behaves identically to
the static
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/1217#discussion_r14227560
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -158,4 +169,125 @@ object Pregel extends Logging {
g
} // end
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/1217#discussion_r14227573
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -158,4 +169,125 @@ object Pregel extends Logging {
g
} // end
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1228#issuecomment-47200276
@ankurdave thanks for pointing out this bug!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-47204112
@ankurdave and @rxin there is an issue with the current API. The
`sendMessage` function pull the active field out of the vertex value here:
https://github.com
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/1217
Introducing an Improved Pregel API
The initial Pregel API coupled voting to halt with message reception. In
this revised the vertex program receives a `PregelContext` which enables the
user
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/1217#issuecomment-47167314
@ankurdave unfortunately to full accept this change we will need to break
compatibility with the current Pregel API. I cannot seem to overload the apply
method
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/720#issuecomment-42757213
Good point! I moved the benchmark into the examples folder. Is there a
standard format for command line args in the example applications?
---
If your project is set
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/720
Synthetic GraphX Benchmark
This PR accomplishes two things:
1. It introduces a Synthetic Benchmark application that generates an
arbitrarily large log-normal graph and executes either
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/719
Enable repartitioning of graph over different number of partitions
It is currently very difficult to repartition a graph over a different
number of partitions. This PR adds an additional
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/497#issuecomment-42618546
I went through this PR with Ankur and it looks good to me. There are a few
minor changes but those can be moved to a second PR.
---
If your project is set up
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/742
SPARK-1786: Reopening PR 724
Addressing issue in MimaBuild.scala.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jegonzal/spark
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/742#issuecomment-42868913
@ankurdave and @pwendell I am reopening the PR 724 to address the issue
with MimaBuild. I believe I made the required changes but how can I verify?
---
If your project
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/724#issuecomment-42787343
I would like to get it into 1.0 if possible. Otherwise, we could run into
issues if the user persists graphs to disk or straggler mitigation is used.
@ankurdave do you
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/709#issuecomment-42703154
@rxin and @ankurdave take a look at this minor change when you get a
chance. I would like to get it into the next release if possible.
---
If your project is set up
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/724#issuecomment-42793347
My only concern is that I would prefer things work slowly than fail. With
reference tracking disabled it is not possible to serialize user defined types
from the spark
GitHub user jegonzal opened a pull request:
https://github.com/apache/spark/pull/499
SPARK-1577: Enabling reference tracking by default in GraphX
KryoRegistrator.
We had originally disabled reference tracking by default however this now
seems to create serious issues in the spark
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/10#discussion_r11915905
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/lib/ShortestPaths.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/10#discussion_r11916394
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/lib/ShortestPaths.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/10#discussion_r11916531
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/lib/ShortestPaths.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software
Github user jegonzal commented on a diff in the pull request:
https://github.com/apache/spark/pull/10#discussion_r11916619
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/lib/ShortestPaths.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software
Github user jegonzal commented on the pull request:
https://github.com/apache/spark/pull/10#issuecomment-41199189
This code looks good to me. All my comments are with respect to potential
performance issues.
---
If your project is set up for it, you can reply to this email and have
81 matches
Mail list logo