subject:"Re\: pyspark inferSchema"

Re: pyspark inferSchema

2014-08-05 Thread Brad Miller

I've followed up in a thread more directly related to jsonRDD and jsonFile, but it seems like after building from the current master I'm still having some problems with nested dictionaries. http://apache-spark-user-list.1001560.n3.nabble.com/trouble-with-jsonRDD-and-jsonFile-in-pyspark-tp11461p115

Re: pyspark inferSchema

2014-08-05 Thread Yin Huai

Yes, 2376 has been fixed in master. Can you give it a try? Also, for inferSchema, because Python is dynamically typed, I agree with Davies to provide a way to scan a subset (or entire) of the dataset to figure out the proper schema. We will take a look it. Thanks, Yin On Tue, Aug 5, 2014 at 12

Re: pyspark inferSchema

2014-08-05 Thread Brad Miller

Assuming updating to master fixes the bug I was experiencing with jsonRDD and jsonFile, then pushing "sample" to master will probably not be necessary. We believe that the link below was the bug I experienced, and I've been told it is fixed in master. https://issues.apache.org/jira/browse/SPARK-2

Re: pyspark inferSchema

2014-08-05 Thread Davies Liu

This "sample" argument of inferSchema is still no in master, if will try to add it if it make sense. On Tue, Aug 5, 2014 at 12:14 PM, Brad Miller wrote: > Hi Davies, > > Thanks for the response and tips. Is the "sample" argument to inferSchema > available in the 1.0.1 release of pyspark? I'm no

Re: pyspark inferSchema

2014-08-05 Thread Brad Miller

Hi Davies, Thanks for the response and tips. Is the "sample" argument to inferSchema available in the 1.0.1 release of pyspark? I'm not sure (based on the documentation linked below) that it is. http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema It soun

Re: pyspark inferSchema

2014-08-05 Thread Brad Miller

Got it. Thanks! On Tue, Aug 5, 2014 at 11:53 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Notice the difference in the schema. Are you running the 1.0.1 release, >> or a more bleeding-edge version from the repository? > > Yep, my bad. I’m running off master at commit > 184048f80

Re: pyspark inferSchema

2014-08-05 Thread Davies Liu

On Tue, Aug 5, 2014 at 11:01 AM, Nicholas Chammas wrote: > I was just about to ask about this. > > Currently, there are two methods, sqlContext.jsonFile() and > sqlContext.jsonRDD(), that work on JSON text and infer a schema that covers > the whole data set. > > For example: > > from pyspark.sql i

Re: pyspark inferSchema

2014-08-05 Thread Nicholas Chammas

Notice the difference in the schema. Are you running the 1.0.1 release, or > a more bleeding-edge version from the repository? Yep, my bad. I’m running off master at commit 184048f80b6fa160c89d5bb47b937a0a89534a95. Nick

Re: pyspark inferSchema

2014-08-05 Thread Brad Miller

Hi Nick, Thanks for the great response. I actually already investigated jsonRDD and jsonFile, although I did not realize they provide more complete schema inference. I did however have other problems with jsonRDD and jsonFile, but I will now describe in a separate thread with an appropriate subj

Re: pyspark inferSchema

2014-08-05 Thread Nicholas Chammas

I was just about to ask about this. Currently, there are two methods, sqlContext.jsonFile() and sqlContext.jsonRDD(), that work on JSON text and infer a schema that covers the whole data set. For example: from pyspark.sql import SQLContext sqlContext = SQLContext(sc) >>> a = sqlContext.jsonRDD(s

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

Re: pyspark inferSchema

10 matches

Site Navigation

Mail list logo

Footer information