Github user petervandenabeele commented on the pull request:
https://github.com/apache/spark/pull/3517#issuecomment-65846515
More problematic (and sorry I had not seen that before) ... there already
_is_ an example file named `people.txt` with a different format:
```
$ spark git:(pv-docs-note-on-jsonFile-format/01) cat
examples/src/main/resources/people.txt
Michael, 29
Andy, 30
Justin, 19
```
In that case, I could rename the example jsonFile to `people.jsons`. It is
a weird name, but it's _reasonably_ accurate (following the `xs` pattern from
Scala, as it is like a list of json objects).
I would then indeed also need to change the name in all other locations
where a reference to `people.json` is made (confirming the list mentioned by
@marmbrus):
```
spark git:(pv-docs-note-on-jsonFile-format/01) grep -r 'people\.json' * |
grep -v Binary | grep -v _site
examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java:
String path = "examples/src/main/resources/people.json";
examples/src/main/python/sql.py: path =
os.path.join(os.environ['SPARK_HOME'],
"examples/src/main/resources/people.json")
```
On a more fundamental note, from the outside, I would have perceived it
following the "principle of least astonishment" (POLA) if the import to this
function required a standard valid json file that needs to be formatted as an
array of hashes with identical "schema", like e.g.
```
[
{"name": "Tom",
"character":"cat"},
{"name":"Jerry",
"character":"mouse"}
]
```
This would have allowed us to simply import data generated from any other
language with `array.to_json`.
I hear the proposal from @marmbrus to also improve the error message (that
would also have helped us in more quickly understanding the issue), but it
would suggest to put that in a different JIRA issue (that needs some real
programming and testing work).
I look forward to directions on how to best fix at least the documentation
to avoid this confusion for others.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]