subject:"\[jira\] \[Commented\] \(SPARK\-18352\) Parse normal, multi\-line JSON files \(not just JSON Lines\)"

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2017-03-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890907#comment-15890907
 ] 

Apache Spark commented on SPARK-18352:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17128

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Nathan Howell
>  Labels: releasenotes
> Fix For: 2.2.0
>
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-12-22 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771689#comment-15771689
 ] 

Apache Spark commented on SPARK-18352:
--

User 'NathanHowell' has created a pull request for this issue:
https://github.com/apache/spark/pull/16386

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: releasenotes
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-29 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706213#comment-15706213
 ] 

Josh Rosen commented on SPARK-18352:


Yeah, I'll update my patch to roll back my JSON changes so it shouldn't 
conflict.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: releasenotes
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-29 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705944#comment-15705944
 ] 

Reynold Xin commented on SPARK-18352:
-

I've asked [~joshrosen] to do that only for the text format, and not json.


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: releasenotes
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-29 Thread Nathan Howell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705940#comment-15705940
 ] 

Nathan Howell commented on SPARK-18352:
---

Got hung up on some other stuff, haven't been able to get back to adding tests 
yet. WIP code is up here: 
https://github.com/NathanHowell/spark/commits/SPARK-18352

Question though. https://github.com/apache/spark/pull/15813 touches a bunch of 
areas I was also working on. Do you think this patch will land soon? Should I 
rework mine on top?

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>  Labels: releasenotes
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675966#comment-15675966
 ] 

Nathan Howell commented on SPARK-18352:
---

Sounds good to me. I have an implementation that's passing basic tests but 
needs to be cleaned up a bit. I'll get a pull request up in the next few days.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675517#comment-15675517
 ] 

Reynold Xin commented on SPARK-18352:
-

Actually just talked to [~marmbrus] and now I understand more how JSON reader 
works.

I'd say we always turn the top level array into multiple records, and then have 
only one option: wholeFile. This same option can be used in json and text.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675466#comment-15675466
 ] 

Hyukjin Kwon commented on SPARK-18352:
--

Ah, you meant producing each row while parsing the whole text in iteration. I 
see.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675446#comment-15675446
 ] 

Reynold Xin commented on SPARK-18352:
-

No that's not sufficient. It doesn't do streaming.


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675441#comment-15675441
 ] 

Hyukjin Kwon commented on SPARK-18352:
--

Hi [~rxin], I think it seems this can be simply done after 
https://github.com/apache/spark/pull/14151 and 
https://github.com/apache/spark/pull/15813 are merged. I guess we could just 
add another option in `JSONOptions` which sets `wholetext` internally. Would 
this be what you think in your mind already? If so, I can work on this if 
anyone is not supposed to do this. (I am fine if anyone is assigned to this 
internally).

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675437#comment-15675437
 ] 

Reynold Xin commented on SPARK-18352:
-

I guess maybe it should be a user-configurable option? Otherwise Spark on its 
own don't have enough information to disambiguate.


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675421#comment-15675421
 ] 

Nathan Howell commented on SPARK-18352:
---

Do you have any ideas how to support this? {{DataFrameReader.schema}} currently 
takes a {{StructType}} and the existing row level json reader flattens arrays 
out to support this restriction.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675405#comment-15675405
 ] 

Reynold Xin commented on SPARK-18352:
-

Are these actually record delimiters? If the top level structure is an array, 
would we want to parse a single file as multiple records?


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675386#comment-15675386
 ] 

Nathan Howell commented on SPARK-18352:
---

Any opinions on configuring this with an option instead of a creating a new 
data source? It looks fairly straightforward to support this as an option. E.g.:

{code}
// parse one json value per line
// this would be the default behavior, for backwards compatibility
spark.read.option("recordDelimiter", "line").json(???)

// parse one json value per file
spark.read.option("recordDelimiter", "file").json(???)
{code}

The refactoring work would be the same in either case, but it would require 
less plumbing for Python/Java/etc to enable this with an option.

As an aside... it also is straightforward to extend this to support {{Text}} 
and {{UTF8String}} values directly, avoiding a string conversion of the entire 
column prior to parsing.

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650025#comment-15650025
 ] 

Reynold Xin commented on SPARK-18352:
-

Again, this has nothing to do with streaming. It should just be an option (e.g. 
multilineJson, or wholeFile) for JSON.


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Thomas Sebastian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649982#comment-15649982
 ] 

Thomas Sebastian commented on SPARK-18352:
--

Hi Reynold,
So, do you mean that stream API need not be used,and  there should be a new API 
which can read multiple json files?

-Thomas

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649835#comment-15649835
 ] 

Reynold Xin commented on SPARK-18352:
-

There is already a readStream.json.

"Stream" here means not having to read the entire file in memory at once, but 
rather just "stream through" it, i.e. parse as we scan.


> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Jayadevan M (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649803#comment-15649803
 ] 

Jayadevan M commented on SPARK-18352:
-

[~rxin] Are you looking a new api like spark.readStream.json(path) similor to 
spark.read.json(path) ?

> Parse normal, multi-line JSON files (not just JSON Lines)
> -
>
> Key: SPARK-18352
> URL: https://issues.apache.org/jira/browse/SPARK-18352
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> Spark currently can only parse JSON files that are JSON lines, i.e. each 
> record has an entire line and records are separated by new line. In reality, 
> a lot of users want to use Spark to parse actual JSON files, and are 
> surprised to learn that it doesn't do that.
> We can introduce a new mode (wholeJsonFile?) in which we don't split the 
> files, and rather stream through them to parse the JSON files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

18 matches

Site Navigation

Mail list logo

Footer information