That's a max of 3.3K single-character strings. Even with the java overhead
that shouldn't be more than a meg right?
none of these should make it out of young gen assuming the list "cats"
doesn't stick around outside the udf.

On Thu, Feb 24, 2011 at 3:49 PM, Aniket Mokashi <[email protected]>wrote:

> Hi Jai,
>
> Thanks for your email. I suspect that its the Strings in tight loop reason
> as you have suggested. I have a loop in my udf that does the following.
>
> while((startInd = someLog.indexOf('[',startInd)) > 0) {
>                                endInd = someLog.indexOf(']', startInd);
>                                if(endInd > 0) {
>                                        category =
> someLog.substring(startInd, endInd+1);
>                                        cats.add(category);
>                                }
>                                startInd = endInd;
>                        }
>
> My jobs are failing in both local and mr mode. UDF works fine for a
> smaller input (a few lines). Also, I checked that sizeof someLog doesnt
> exceed a 10000.
>
> Thanks,
> Aniket
>
>
> On Thu, February 24, 2011 3:58 am, Jai Krishna wrote:
> > Sharing the code would be useful as mentioned. Also of help would the
> > heap settings that the JVM had.
> >
> > However, off the top of my head, one common situation (esp. in text
> > processing/tokenizing) is instantiating Strings in a tight loop.
> >
> > Besides you could also exercise your UDF in a local JVM and take a heap
> > dump / profile it. If your heap is less than 512M, you could use basic
> > profiling via hprof/hat (see
> > http://java.sun.com/developer/technicalArticles/Programming/HPROF.html).
> >
> >
> > Thanks,
> > Jai
> >
> >
> >
> > On 2/24/11 9:26 AM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >
> >
> > Aniket, share the code?
> > It really depends on how you create them.
> >
> >
> > -D
> >
> >
> > On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi
> > <[email protected]>wrote:
> >
> >
> >> I ve written a simple UDF that parses a chararray (which looks like
> >> ...[a].....[b]...[a]...) to capture stuff inside brackets and return
> >> them as String a=2;b=1; and so on. The input chararray are rarely more
> >> than 1000 characters and are not more than 100000 (I ve added log.warn
> >> in my udf to ensure this). But, I still see java heap error while
> >> running this udf (even in local mode, the job simply fails). My
> >> assumption is maps and lists that I use locally will be recollected by
> >> gc. Am I missing something?
> >>
> >> Thanks,
> >> Aniket
> >>
> >>
> >>
> >
> >
>
>
>

Reply via email to