Seems the problem was that we have an actor that picks put the stream (as a
receiver) that sends it off to another one that does the actual stream, if
the message is a string it works ok, if it is an array (or list) it just
dies.

Not sure why, as I cannot see any difference in terms overhead between a
string or an array.

On Fri, Mar 27, 2015 at 3:20 PM, Tamas Jambor <jambo...@gmail.com> wrote:

> It is just a comma separated file, about 10 columns wide which we append
> with a unique id and a few additional values.
>
> On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> jamborta :
>> Please also describe the format of your csv files.
>>
>> Cheers
>>
>> On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail <deanwamp...@gmail.com>
>> wrote:
>>
>>> Show us the code. This shouldn't happen for the simple process you
>>> described
>>>
>>> Sent from my rotary phone.
>>>
>>>
>>> > On Mar 27, 2015, at 5:47 AM, jamborta <jambo...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > We have a workflow that pulls in data from csv files, then originally
>>> setup
>>> > up of the workflow was to parse the data as it comes in (turn into
>>> array),
>>> > then store it. This resulted in out of memory errors with larger files
>>> (as a
>>> > result of increased GC?).
>>> >
>>> > It turns out if the data gets stored as a string first, then parsed, it
>>> > issues does not occur.
>>> >
>>> > Why is that?
>>> >
>>> > Thanks,
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tp22255.html
>>> > Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: user-h...@spark.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Reply via email to