Thank you Peter for the clarification. Regards, Rajesh
On Tue, Dec 16, 2014 at 12:42 AM, Michael Armbrust <mich...@databricks.com> wrote: > > Underneath the covers, jsonFile uses TextInputFormat, which will split > files correctly based on new lines. Thus, there is no fixed maximum size > for a json object (other than the fact that it must fit into memory on the > executors). > > On Mon, Dec 15, 2014 at 7:22 AM, Madabhattula Rajesh Kumar < > mrajaf...@gmail.com> wrote: >> >> Hi Peter, >> >> Thank you for the clarification. >> >> Now we need to store each JSON object into one line. Is there any >> limitation of length of JSON object? So, JSON object will not go to the >> next line. >> >> What will happen if JSON object is a big/huge one? Will it store in a >> single line in HDFS? >> >> What will happen, if JSON object contains BLOB/CLOB value? Is this entire >> JSON object stores in single line of HDFS? >> >> What will happen, if JSON object exceeding the HDFS block size. For >> example, single JSON object split into two different worker nodes. In this >> case, How Spark will read this JSON object? >> >> Could you please clarify above questions >> >> Regards, >> Rajesh >> >> >> On Mon, Dec 15, 2014 at 6:52 PM, Peter Vandenabeele < >> pe...@vandenabeele.com> wrote: >>> >>> >>> >>> On Sat, Dec 13, 2014 at 5:43 PM, Helena Edelson < >>> helena.edel...@datastax.com> wrote: >>> >>>> One solution can be found here: >>>> https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets >>>> >>>> >>> As far as I understand, the people.json file is not really a proper json >>> file, but a file documented as: >>> >>> "... JSON files where each line of the files is a JSON object.". >>> >>> This means that is a file with multiple lines, but each line needs to >>> have a fully self-contained JSON object >>> (initially confusing, this will not parse a standard multi-line JSON >>> file). We are working to clarify this in >>> https://github.com/apache/spark/pull/3517 >>> >>> HTH, >>> >>> Peter >>> >>> >>> >>> >>>> - Helena >>>> @helenaedelson >>>> >>>> On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar < >>>> mrajaf...@gmail.com> wrote: >>>> >>>> Hi Team, >>>> >>>> I have a large JSON file in Hadoop. Could you please let me know >>>> >>>> 1. How to read the JSON file >>>> 2. How to parse the JSON file >>>> >>>> Please share any example program based on Scala >>>> >>>> Regards, >>>> Rajesh >>>> >>>> >>>> >>> >>> >>> -- >>> Peter Vandenabeele >>> http://www.allthingsdata.io >>> http://www.linkedin.com/in/petervandenabeele >>> https://twitter.com/peter_v >>> gsm: +32-478-27.40.69 >>> e-mail: pe...@vandenabeele.com >>> skype: peter_v_be >>> >>