Github user yanboliang closed the pull request at:
https://github.com/apache/spark/pull/4527
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-87971697
Thanks - sorry for not having looked at this earlier. Do you see any
performance gains with this change? My understanding is that JSON is already
very slow, and thus the
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410038
[Test build #27513 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27513/consoleFull)
for PR 4527 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410073
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410071
[Test build #27513 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27513/consoleFull)
for PR 4527 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410504
[Test build #27514 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27514/consoleFull)
for PR 4527 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410182
[Test build #27514 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27514/consoleFull)
for PR 4527 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74421311
[Test build #27522 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27522/consoleFull)
for PR 4527 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74421514
[Test build #27524 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27524/consoleFull)
for PR 4527 at commit
Github user yanbohappy commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424959
cc @liancheng @rxin @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424392
[Test build #27524 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27524/consoleFull)
for PR 4527 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424393
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424550
[Test build #27522 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27522/consoleFull)
for PR 4527 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424555
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user yanbohappy commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74424908
This improvement is very similar with #758, so I have run the similar
performance test.
The benchmark suggests this optimization made the optimized version about
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74410507
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user yanbohappy commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74388397
@yhuai
This improvement is very similar with #758, so I have leverage the
performance test there.
The benchmark suggests this optimization made the optimized
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74059744
[Test build #27351 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27351/consoleFull)
for PR 4527 at commit
Github user yanbohappy commented on a diff in the pull request:
https://github.com/apache/spark/pull/4527#discussion_r24574169
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala
---
@@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {
Github user yanbohappy commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74059428
@chenghao-intel @yhuai
Thank you for your advice and it's very useful.
We can use mutable rows for both top level records and inner structures at
present.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74066899
[Test build #27351 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27351/consoleFull)
for PR 4527 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-74066906
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-73925301
Also, can you add performance numbers?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-73924548
Thank you for working on it.
Seems `new SpecificMutableRow(schema.fields.map(_.dataType))` cannot handle
nested structure. I think we need to use the schema to
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-73927673
Oh, `enforceCorrectType` will take care inner structures by calling
`asRow`.
It will be great if we can use mutable rows for inner structures as well.
---
If
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/4527#discussion_r24503655
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala
---
@@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/4527#discussion_r24504239
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala
---
@@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/4527#discussion_r24503309
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala
---
@@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-73864173
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4527#issuecomment-73864166
[Test build #27288 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27288/consoleFull)
for PR 4527 at commit
30 matches
Mail list logo