Re: JSON Input files

2014-12-15 Thread Madabhattula Rajesh Kumar
Thank you Peter for the clarification. Regards, Rajesh On Tue, Dec 16, 2014 at 12:42 AM, Michael Armbrust wrote: > > Underneath the covers, jsonFile uses TextInputFormat, which will split > files correctly based on new lines. Thus, there is no fixed maximum size > for a json object (other than

Re: JSON Input files

2014-12-15 Thread Michael Armbrust
Underneath the covers, jsonFile uses TextInputFormat, which will split files correctly based on new lines. Thus, there is no fixed maximum size for a json object (other than the fact that it must fit into memory on the executors). On Mon, Dec 15, 2014 at 7:22 AM, Madabhattula Rajesh Kumar < mraja

Re: JSON Input files

2014-12-15 Thread Madabhattula Rajesh Kumar
Hi Peter, Thank you for the clarification. Now we need to store each JSON object into one line. Is there any limitation of length of JSON object? So, JSON object will not go to the next line. What will happen if JSON object is a big/huge one? Will it store in a single line in HDFS? What will h

Re: JSON Input files

2014-12-15 Thread Peter Vandenabeele
On Sat, Dec 13, 2014 at 5:43 PM, Helena Edelson wrote: > One solution can be found here: > https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets > > As far as I understand, the people.json file is not really a proper json file, but a file documented as: "... JSON files w

Re: JSON Input files

2014-12-15 Thread Madabhattula Rajesh Kumar
Hi Helena and All, I have found one example "multi-line json file" into an RDD using " https://github.com/alexholmes/json-mapreduce";. val data = sc.newAPIHadoopFile( filepath, classOf[MultiLineJsonInputFormat], classOf[LongWritable], classOf[Text], conf ).map(p => (p._1.get,

Re: JSON Input files

2014-12-14 Thread Madabhattula Rajesh Kumar
Thank you Yanbo Regards, Rajesh On Sun, Dec 14, 2014 at 3:15 PM, Yanbo wrote: > > Pay attention to your JSON file, try to change it like following. > Each record represent as a JSON string. > > {"NAME" : "Device 1", > "GROUP" : "1", > "SITE" : "qqq", > "DIRECTION" : "East"

Re: JSON Input files

2014-12-14 Thread Yanbo
Pay attention to your JSON file, try to change it like following. Each record represent as a JSON string. {"NAME" : "Device 1", "GROUP" : "1", "SITE" : "qqq", "DIRECTION" : "East", } {"NAME" : "Device 2", "GROUP" : "2", "SITE" : "sss", "DIRECTION" : "

Re: JSON Input files

2014-12-14 Thread Madabhattula Rajesh Kumar
Hi Helena and All, I have a below example JSON file format. My use case is to read "NAME" variable. When I execute I got next exception *"Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'NAME, tree:Project ['NAME] Subquery device"

Re: JSON Input files

2014-12-13 Thread Helena Edelson
One solution can be found here: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets - Helena @helenaedelson On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar wrote: > Hi Team, > > I have a large JSON file in Hadoop. Could you please let me know > > 1. How to

JSON Input files

2014-12-13 Thread Madabhattula Rajesh Kumar
Hi Team, I have a large JSON file in Hadoop. Could you please let me know 1. How to read the JSON file 2. How to parse the JSON file Please share any example program based on Scala Regards, Rajesh