Re: python : Out of memory: Kill process

Eduardo Cusa Thu, 26 Mar 2015 06:42:35 -0700

Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.

I ran the same code as before, I need to make any changes?







On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu <dav...@databricks.com> wrote:

> With batchSize = 1, I think it will become even worse.
>
> I'd suggest to go with 1.3, have a taste for the new DataFrame API.
>
> On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
> <eduardo.c...@usmediaconsulting.com> wrote:
> > Hi Davies, I running 1.1.0.
> >
> > Now I'm following this thread that recommend use batchsize parameter = 1
> >
> >
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
> >
> > if this does not work I will install  1.2.1 or  1.3
> >
> > Regards
> >
> >
> >
> >
> >
> >
> > On Wed, Mar 25, 2015 at 3:39 PM, Davies Liu <dav...@databricks.com>
> wrote:
> >>
> >> What's the version of Spark you are running?
> >>
> >> There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
> >>
> >> [1] https://issues.apache.org/jira/browse/SPARK-6055
> >>
> >> On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
> >> <eduardo.c...@usmediaconsulting.com> wrote:
> >> > Hi Guys, I running the following function with spark-submmit and de SO
> >> > is
> >> > killing my process :
> >> >
> >> >
> >> >   def getRdd(self,date,provider):
> >> >     path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
> >> >     log2= self.sqlContext.jsonFile(path)
> >> >     log2.registerTempTable('log_test')
> >> >     log2.cache()
> >>
> >> You only visit the table once, cache does not help here.
> >>
> >> >     out=self.sqlContext.sql("SELECT user, tax from log_test where
> >> > provider =
> >> > '"+provider+"'and country <> ''").map(lambda row: (row.user, row.tax))
> >> >     print "out1"
> >> >     return  map((lambda (x,y): (x, list(y))),
> >> > sorted(out.groupByKey(2000).collect()))
> >>
> >> 100 partitions (or less) will be enough for 2G dataset.
> >>
> >> >
> >> >
> >> > The input dataset has 57 zip files (2 GB)
> >> >
> >> > The same process with a smaller dataset completed successfully
> >> >
> >> > Any ideas to debug is welcome.
> >> >
> >> > Regards
> >> > Eduardo
> >> >
> >> >
> >
> >
>

Re: python : Out of memory: Kill process

Reply via email to