pavel0fadeev commented on PR #48037:
URL: https://github.com/apache/spark/pull/48037#issuecomment-2362343583

   @juliuszsompolski Regarding the issue with your environment in **master** 
branch. 
   I checked and it looks like I have the same problem when I run 
`"pyspark.ml.torch.tests.test_data_loader"` tests on my Mac with Python 3.11. 
   For me it seems that `"AttributeError: Can't pickle local object 
'_SparkPartitionTorchDataset._get_field_converter.<locals>.converter'"` is the 
main error message here and there is an article about this 
https://medium.com/devopss-hole/python-multiprocessing-pickle-issue-e2d35ccf96a9.
 In short, in some circumstances there is a problem in the **multiprocessing** 
library when using "spawn" multiprocessing_context, which is the default for 
Mac and Windows. All Unix OS except Mac have "fork" as the default 
multiprocessing context and this may explain why we see different behaviour for 
this test on some local environments and on Github runners.
   So, **torch** library uses **multiprocessing** library in DataLoader and I 
tried to change multiprocessing_context to "fork" which helped me get over the 
initial exception but I got another one. Then I tried to set num_workers=0 for 
DataLoader to get rid of multiprocessing at all in this test:
   
https://github.com/apache/spark/blob/04455797bfb3631b13b41cfa5d2604db3bf8acc2/python/pyspark/ml/torch/tests/test_data_loader.py#L70
   I changed it to 
   `data_loader = _get_spark_partition_data_loader(num_samples, batch_size, 
num_workers=0)`
   after that the test completed successfully.
   But I don't know how to change the environment to get rid of this issue 
without changing the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to