Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Thank you everybody sincerely.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17956
thanks, merging to master/2.2!
I think this change is pretty safe, we can discuss 2 things later:
1. if we want to support more special strings like `Inf`
2. if we want to make it
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76894/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76894 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76894/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
@gatorsmile, @cloud-fan and @viirya, could you take another look please? I
tried to get rid of all the behaviour changes existing in both previous PRs but
only leave the change to avoid the
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Unfortunately, we already support
```
"NaN"
"-Infinity"
"Infinity"
```
Now, this PR targets primarily to avoid unnecessary conversion try
primarily.
---
If
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Note, we do not support the following cases
```
def floatRecords: Dataset[String] =
spark.createDataset(spark.sparkContext.parallelize(
"""{"f": "18.00"}""" ::
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
We are handling the Json data sources here. Are the following inputs widely
used?
```
{"a": "+INF"}
{"a": "INF"}
{"a": "-INF"}
{"a": "NaN"}
{"a": "+NaN"}
{"a": "-NaN"}
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76894 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76894/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Given the existing logic below:
```
if (lowerCaseValue.equals("nan") ||
lowerCaseValue.equals("infinity") ||
lowerCaseValue.equals("-infinity") ||
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Yea, Let's focus on the topic. For the cases below:
```
{"a": NaN}
{"a": Infinity}
{"a": +Infinity}
{"a": -Infinity}
```
They are related with
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
BTW, if we want to support the JSON strings like `{"a": "-Infinity"}` as
the FLOAT type, I think we also should support the float data in a string.
Below is an example.
```Scala
def
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Based on my previous work experience, I am very conservative to introduce
any behavior change unless we are sure this is the right thing to do. I do not
think this is a high priority PR. I can
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
1. It is not good to hold off PRs. If a PR looks good and coherent, I think
we could merge.
2. "How many JSON writers write a float "INF" as a string?", if it is your
worry, I will
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Let me share my understanding.
First, we do not have a deadline for any non-urgent fixes. This PR is an
example. How many JSON writers write a float "INF" as a string?
Being a
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
WDYT? I think it'd be faster to get this merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Hey @gatorsmile, how about picking up the commits here and opening a PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
We really need to be careful when making the code changes. If we support
such a feature, we need to support all the data types. This is unexpected to
users.
---
If your project is set up for
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Yea, I am aware of it because we support special floating cases in string
and I think we'd expect other doubles could be in string?
---
If your project is set up for it, you can reply to this
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
This PR also introduces the following behavior changes.
```Scala
def floatRecords: Dataset[String] =
spark.createDataset(spark.sparkContext.parallelize(
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
`allowNonNumericNumbers` option is undocumented and incomplete for now -
https://github.com/apache/spark/pull/9724#r44883828
> allowNonNumericNumbers is undocumented for now, since I
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Before this PR, the option `allowNonNumericNumbers` affects the results of
the following four cases:
```Json
{"a": NaN}
{"a": Infinity}
{"a": +Infinity}
{"a": -Infinity}
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76858/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76858 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76858/testReport)**
for PR 17956 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76857/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76857 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76857/testReport)**
for PR 17956 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76854/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76854 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76854/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
(Ah, it was about Jackson's `JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS`
... I see.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17956
> Do we need to follow allowNonNumericNumbers when parsing """{"a":
"-Infinity"}"""? cc @cloud-fan @viirya
`allowNonNumericNumbers` supports it and Scala supports it too. It seems to
me
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
I didn't understand what ^ means. Could you elaborate please?
`allowNonNumericNumbers` option here for now is incomplete up to my knowledge.
---
If your project is set up for it, you can reply
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Do we need to follow `allowNonNumericNumbers` when parsing ` """{"a":
"-Infinity"}"""`? cc @cloud-fan @viirya
---
If your project is set up for it, you can reply to this email and have your
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
uh... That is different... This is String. That is Float. nvm. They are
different issues.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
Yes, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
If we do not upgrade `Jackson`, is that possible we can do it by checking
the flag `allowNonNumericNumbers` before [doing the
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
We should upgrade Jackson accordingly to support that option correctly.
This looks reverted due to dependency problem -
https://github.com/apache/spark/pull/9759#issuecomment-222759764. It
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
Could we update the JIRA, PR title, and PR description?
Also check the JSON flag `allowNonNumericNumbers` when doing the
conversion?
---
If your project is set up for it, you can
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76858 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76858/testReport)**
for PR 17956 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76857 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76857/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
@gatorsmail, I added some more cases I could identify based on
http://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#valueOf(java.lang.String)
(this looks calling
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76854 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76854/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76851/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/17956
This PR description is misleading. This PR is actually a bug fix, I think
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76851 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76851/testReport)**
for PR 17956 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76846/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76846 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)**
for PR 17956 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/17956
LGTM except for a minor comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17956
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76841 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)**
for PR 17956 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/17956
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76846 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)**
for PR 17956 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/17956
cc @NathanHowell, @cloud-fan and @viirya.
(I just want to note this will not change any input/output but just the
exception type and avoid additional conversion try.)
---
If your
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17956
**[Test build #76841 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)**
for PR 17956 at commit
61 matches
Mail list logo