Hi All, When I run the program shown below, I receive the error shown below. I am running the current version of branch-0.9 from github. Note that I do not receive the error when I replace "2 ** 29" with "2 ** X", where X < 29. More interestingly, I do not receive the error when X = 30, and when X > 30 the code either crashes with "Memory Error" or "Py4JNetworkError: An error occurred while trying to connect to the Java server".
I am aware that there are some bugs (https://spark-project.atlassian.net/browse/SPARK-1065) related to memory consumption with pyspark and broadcasting, but the behavior with X = 29 seemed different and I was wondering if anybody had any insight. -Brad *Program* from pyspark import SparkContext SparkContext.setSystemProperty('spark.executor.memory', '25g') sc = SparkContext('spark://crosby.research.intel-research.net:7077', 'FeatureExtraction') meg_512 = range((2 ** 29) / 8) tmp_broad = sc.broadcast(meg_512) *Error* --------------------------------------------------------------------------- Py4JError Traceback (most recent call last) <ipython-input-1-db8033dee301> in <module>() 3 sc = SparkContext('spark://crosby.research.intel-research.net:7077', 'FeatureExtraction') 4 meg_1024 = range((2 ** 29) / 8) ----> 5 tmp_broad = sc.broadcast(meg_1024) /home/spark/spark-branch-0.9/python/pyspark/context.py in broadcast(self, value) 277 pickleSer = PickleSerializer() 278 pickled = pickleSer.dumps(value) --> 279 jbroadcast = self._jsc.broadcast(bytearray(pickled)) 280 return Broadcast(jbroadcast.id(), value, jbroadcast, 281 self._pickled_broadcast_vars) /home/spark/spark-branch-0.9/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 535 answer = self.gateway_client.send_command(command) 536 return_value = get_return_value(answer, self.gateway_client, --> 537 self.target_id, self.name) 538 539 for temp_arg in temp_args: /home/spark/spark-branch-0.9/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 302 raise Py4JError( 303 'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'. --> 304 format(target_id, '.', name, value)) 305 else: 306 raise Py4JError( Py4JError: An error occurred while calling o7.broadcast. Trace: java.lang.NegativeArraySizeException at py4j.Base64.decode(Base64.java:292) at py4j.Protocol.getBytes(Protocol.java:167) at py4j.Protocol.getObject(Protocol.java:276) at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81) at py4j.commands.CallCommand.execute(CallCommand.java:77) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:701)