Seems the problem was that we have an actor that picks put the stream (as a receiver) that sends it off to another one that does the actual stream, if the message is a string it works ok, if it is an array (or list) it just dies.
Not sure why, as I cannot see any difference in terms overhead between a string or an array. On Fri, Mar 27, 2015 at 3:20 PM, Tamas Jambor <jambo...@gmail.com> wrote: > It is just a comma separated file, about 10 columns wide which we append > with a unique id and a few additional values. > > On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> jamborta : >> Please also describe the format of your csv files. >> >> Cheers >> >> On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail <deanwamp...@gmail.com> >> wrote: >> >>> Show us the code. This shouldn't happen for the simple process you >>> described >>> >>> Sent from my rotary phone. >>> >>> >>> > On Mar 27, 2015, at 5:47 AM, jamborta <jambo...@gmail.com> wrote: >>> > >>> > Hi all, >>> > >>> > We have a workflow that pulls in data from csv files, then originally >>> setup >>> > up of the workflow was to parse the data as it comes in (turn into >>> array), >>> > then store it. This resulted in out of memory errors with larger files >>> (as a >>> > result of increased GC?). >>> > >>> > It turns out if the data gets stored as a string first, then parsed, it >>> > issues does not occur. >>> > >>> > Why is that? >>> > >>> > Thanks, >>> > >>> > >>> > >>> > -- >>> > View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tp22255.html >>> > Sent from the Apache Spark User List mailing list archive at >>> Nabble.com. >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> > For additional commands, e-mail: user-h...@spark.apache.org >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >