Re: pyspark read json file with high dimensional sparse data

Michael Armbrust Wed, 30 Mar 2016 13:09:07 -0700

You can force the data to be loaded as a sparse map assuming the key/value
types are consistent.  Here is an example
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1863598192220754/2840265927289860/latest.html>
.


On Wed, Mar 30, 2016 at 8:17 AM, Yavuz Nuzumlalı <manuya...@gmail.com>
wrote:

> Hi all,
>
> I'm trying to read a data inside a json file using
> `SQLContext.read.json()` method.
>
> However, reading operation does not finish. My data is of 290000x3100
> dimensions, but it's actually really sparse, so if there is a way to
> directly read json into a sparse dataframe, it would work perfect for me.
>
> What are the alternatives for reading such data into spark?
>
> P.S. : When I try to load first 50000 rows, read operation is completed in
> ~2 minutes.
>

Re: pyspark read json file with high dimensional sparse data

Reply via email to