subject:"Processing json document"

Re: Processing json document

2016-07-08 Thread Jörn Franke

hen you have to preprocess it. Or write your own > implementation to handle the record delimiter, for your json data case. But > good luck with that. There is no perfect generic solution for any kind of > JSON data you want to handle. > > Yong > > From: ljia...@gmail.com

RE: Processing json document

2016-07-07 Thread Hyukjin Kwon

- From: ljia...@gmail.com Date: Thu, 7 Jul 2016 11:57:26 -0500 Subject: Re: Processing json document To: gurwls...@gmail.com CC: jornfra...@gmail.com; user@spark.apache.org Hi, there, Thank you all for your input. @Hyukjin, as a matter of fact, I have read the blog link you posted before as

RE: Processing json document

2016-07-07 Thread Yong Zhang

. Yong From: ljia...@gmail.com Date: Thu, 7 Jul 2016 11:57:26 -0500 Subject: Re: Processing json document To: gurwls...@gmail.com CC: jornfra...@gmail.com; user@spark.apache.org Hi, there, Thank you all for your input. @Hyukjin, as a matter of fact, I have read the blog link you posted before

Re: Processing json document

2016-07-07 Thread Lan Jiang

Hi, there, Thank you all for your input. @Hyukjin, as a matter of fact, I have read the blog link you posted before asking the question on the forum. As you pointed out, the link uses wholeTextFiles(0, which is bad in my case, because my json file can be as large as 20G+ and OOM might occur. I am

Re: Processing json document

2016-07-06 Thread Hyukjin Kwon

The link uses wholeTextFiles() API which treats each file as each record. 2016-07-07 15:42 GMT+09:00 Jörn Franke : > This does not need necessarily the case if you look at the Hadoop > FileInputFormat architecture then you can even split large multi line Jsons > without issues. I would need to h

Re: Processing json document

2016-07-06 Thread Jörn Franke

This does not need necessarily the case if you look at the Hadoop FileInputFormat architecture then you can even split large multi line Jsons without issues. I would need to have a look at it, but one large file does not mean one Executor independent of the underlying format. > On 07 Jul 2016,

Re: Processing json document

2016-07-06 Thread Hyukjin Kwon

There is a good link for this here, http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files If there are a lot of small files, then it would work pretty okay in a distributed manner, but I am worried if it is single large file. In this case, this would only work in single

Re: Processing json document

2016-07-06 Thread Jean Georges Perrin

do you want id1, id2, id3 to be processed similarly? The Java code I use is: df = df.withColumn(K.NAME, df.col("fields.premise_name")); the original structure is something like {"fields":{"premise_name":"ccc"}} hope it helps > On Jul 7, 2016, at 1:48 AM, Lan Jiang wrote: > > H

Processing json document

2016-07-06 Thread Lan Jiang

Hi, there Spark has provided json document processing feature for a long time. In most examples I see, each line is a json object in the sample file. That is the easiest case. But how can we process a json document, which does not conform to this standard format (one line per json object)? Here is

Re: Processing json document

RE: Processing json document

RE: Processing json document

Re: Processing json document

Re: Processing json document

Re: Processing json document

Re: Processing json document

Re: Processing json document

Processing json document

9 matches

Site Navigation

Mail list logo

Footer information