Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2287#issuecomment-54636076
  
    Hi @wardviaene,
    
    Do you have an example program that reproduces this bug?  We should 
probably add it as a regression test (see `python/pyspark/tests.py` for 
examples of how to do this).
    
    (For other reviewers: you can browse SerializingAdapter's code at 
http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/)  It looks like 
this code is designed to handle the pickling of file() objects.  The Dill 
developers have recently been discussing how to pickle file handles: 
https://github.com/uqfoundation/dill/issues/57
    
    It looks like `SerializingAdapter.max_transmit_data` acts as an upper-limit 
on the sizes of closures that PiCloud would send to their service.  Unlike 
PiCloud, we don't have limits on closure sizes (there are warnings, but these 
are detected / enforced inside the JVM).  Therefore, I wonder if we should just 
remove this limit and allow the whole file to be read rather than adding an 
obscure configuration option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to