I've followed up in a thread more directly related to jsonRDD and jsonFile,
but it seems like after building from the current master I'm still having
some problems with nested dictionaries.
http://apache-spark-user-list.1001560.n3.nabble.com/trouble-with-jsonRDD-and-jsonFile-in-pyspark-tp11461p115
Yes, 2376 has been fixed in master. Can you give it a try?
Also, for inferSchema, because Python is dynamically typed, I agree with
Davies to provide a way to scan a subset (or entire) of the dataset to
figure out the proper schema. We will take a look it.
Thanks,
Yin
On Tue, Aug 5, 2014 at 12
Assuming updating to master fixes the bug I was experiencing with jsonRDD
and jsonFile, then pushing "sample" to master will probably not be
necessary.
We believe that the link below was the bug I experienced, and I've been
told it is fixed in master.
https://issues.apache.org/jira/browse/SPARK-2
This "sample" argument of inferSchema is still no in master, if will
try to add it if it make
sense.
On Tue, Aug 5, 2014 at 12:14 PM, Brad Miller wrote:
> Hi Davies,
>
> Thanks for the response and tips. Is the "sample" argument to inferSchema
> available in the 1.0.1 release of pyspark? I'm no
Hi Davies,
Thanks for the response and tips. Is the "sample" argument to inferSchema
available in the 1.0.1 release of pyspark? I'm not sure (based on the
documentation linked below) that it is.
http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema
It soun
Got it. Thanks!
On Tue, Aug 5, 2014 at 11:53 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
> Notice the difference in the schema. Are you running the 1.0.1 release,
>> or a more bleeding-edge version from the repository?
>
> Yep, my bad. I’m running off master at commit
> 184048f80
On Tue, Aug 5, 2014 at 11:01 AM, Nicholas Chammas
wrote:
> I was just about to ask about this.
>
> Currently, there are two methods, sqlContext.jsonFile() and
> sqlContext.jsonRDD(), that work on JSON text and infer a schema that covers
> the whole data set.
>
> For example:
>
> from pyspark.sql i
Notice the difference in the schema. Are you running the 1.0.1 release, or
> a more bleeding-edge version from the repository?
Yep, my bad. I’m running off master at commit
184048f80b6fa160c89d5bb47b937a0a89534a95.
Nick
Hi Nick,
Thanks for the great response.
I actually already investigated jsonRDD and jsonFile, although I did not
realize they provide more complete schema inference. I did however have
other problems with jsonRDD and jsonFile, but I will now describe in a
separate thread with an appropriate subj
I was just about to ask about this.
Currently, there are two methods, sqlContext.jsonFile() and
sqlContext.jsonRDD(), that work on JSON text and infer a schema that covers
the whole data set.
For example:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
>>> a = sqlContext.jsonRDD(s
10 matches
Mail list logo