Re: Can't download moderately large data or number of rows to csv

Rick Moritz Wed, 03 May 2017 01:01:48 -0700

I think whether this is an issue or not, depends a lot on how you use
Zeppelin, and what tools you need to integrate with. Sadly Excel is still
around as a data processing tool, and many people who I introduce to
Zeppelin are quite proficient with it, hence the desire to export to csv in
a trivial manner --  or merely the presence of the "download CSV"-button
incites them to expect it to work for reasonably sized data (i.e. up to
around 10^6 rows).


I do prefer Ruslan's idea, but I think Zeppelin should include something
similar out of the box. The key requirement should be that the data doesn't
have to travel through the notebook interface, but rather is made available
in a temporary folder and then served via a download link. The downside to
this approach is, that ideally you'd want this kind of operation to be
interpreter agnostic. In that case every interpreter would need to offer an
interface which allows to collect the data to a local-to-zeppelin temporary
folder.

Nonetheless, to turn Zeppelin into the serve-it-all solution that it could
be, I do believe that "fixing" the csv-export is important. I'd definitely
vote for a Jira advancing this issue.

On Tue, May 2, 2017 at 9:33 PM, Kevin Niemann <kevin.niem...@gmail.com>
wrote:

> We came across this issue as well, Zeppelin csv export is using the data
> URI scheme which is base64 encoding all the rows into a single string,
> Chrome seems to crash with over a few thousand rows, but Firefox has been
> able to handle over 100k for me. However, the Zeppelin notebook itself
> becomes slow at that point. I would also like better support for the
> ability to export a large set of rows, perhaps another tool is more
> preferred?
>
> On Tue, May 2, 2017 at 10:00 AM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
>> Good idea to introduce in Zeppelin a way to download full datasets
>> without
>> actually visualizing them.
>>
>> Not sure if this helps, we taught our users to use %sh hadoop fs
>> -getmerge /hadoop/path/dir/ /some/nfs/mount/
>> for large files (they sometimes have to download datasets with millions
>> of records).
>> They run Zeppelin on edge nodes that have NFS mounts to a drop zone.
>>
>> ps. Hue has a limit too, by default 100k rows
>> https://github.com/cloudera/hue/blob/release-3.12.0/desktop/
>> conf.dist/hue.ini#L905
>> Not sure how much it scales up.
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Tue, May 2, 2017 at 10:41 AM, Paul Brenner <pbren...@placeiq.com>
>> wrote:
>>
>>> There are limits to how much data the download to csv button will
>>> download (1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI
>>> teams. This limit comes up far before we run into issues with showing too
>>> many rows of data in zeppelin.
>>>
>>> Unfortunately (fortunately?) Hue is the other tool the BI team has been
>>> using and there they have no problem downloading much larger datasets to
>>> csv. This is definitely not a requirement I’ve ever run into in the way I
>>> use zeppelin since I would just use spark to write the data out. However,
>>> the BI team is not allowed to run spark jobs (they use hive via jdbc) so
>>> that download to csv button is pretty important to them.
>>>
>>> Would it be possible to significantly increase the limit? Even better
>>> would it be possible to download more data than is shown? I assume this is
>>> the type of thing I would need to open a ticket for, but I wanted to ask
>>> here first.
>>>
>>> <http://www.placeiq.com/> <http://www.placeiq.com/>
>>> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
>>> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
>>> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
>>> <https://www.linkedin.com/company/placeiq>
>>> <https://www.linkedin.com/company/placeiq>
>>> DATA SCIENTIST
>>> *(217) 390-3033 <(217)%20390-3033> *
>>>
>>> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>>> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>>> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
>>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>>> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>>> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
>>> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
>>> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
>>> PlaceIQ:Location Data Accuracy]
>>> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>>>
>>
>>
>

Re: Can't download moderately large data or number of rows to csv

Reply via email to