Re: ORC NPE while writing stats

2015-09-03 Thread Prasanth Jayachandran
> On Sep 2, 2015, at 10:57 PM, David Capwell wrote: > > So, very quickly looked at the JIRA and I had the following question; > if you have a pool per thread rather than global, then assuming 50% > heap will cause writer to OOM with multiple threads, which is > different

Re: ORC NPE while writing stats

2015-09-03 Thread David Capwell
Thanks, that should help moving forward On Sep 3, 2015 10:38 AM, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > > > On Sep 2, 2015, at 10:57 PM, David Capwell wrote: > > > > So, very quickly looked at the JIRA and I had the following question; > > if you

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Also, the data put in are primitives, structs (list), and arrays (list); we don't use any of the boxed writables (like text). On Sep 2, 2015 12:57 PM, "David Capwell" wrote: > We have multiple threads writing, but each thread works on one file, so > orc writer is only touched

Re: ORC NPE while writing stats

2015-09-02 Thread Prasanth Jayachandran
Memory manager is made thread local https://issues.apache.org/jira/browse/HIVE-10191 Can you try the patch from HIVE-10191 and see if that helps? On Sep 2, 2015, at 8:58 PM, David Capwell > wrote: I'll try that out and see if it goes away (not

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
So, very quickly looked at the JIRA and I had the following question; if you have a pool per thread rather than global, then assuming 50% heap will cause writer to OOM with multiple threads, which is different than older (0.14) ORC, correct?

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Walking the MemoryManager, and I have a few questions: # statements Every time you create a writer for a given thread (assuming the thread local version), you just update MemoryManager with the stripe size. The scale is just %heap / (#writer * stripe (assuming equal stripe size)). Periodically

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
I'll try that out and see if it goes away (not seen this in the past 24 hours, no code change). Doing this now means that I can't share the memory, so will prob go with a thread local and allocate fixed sizes to the pool per thread (50% heap / 50 threads). Will most likely be awhile before I can

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Thanks for the jira, will see if that works for us. On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > Memory manager is made thread local > https://issues.apache.org/jira/browse/HIVE-10191 > > Can you try the patch from HIVE-10191 and see if that helps? > >

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Also, if I am walking this correctly writer.addRow(struct) may trigger my current thread to flush all the state for other writers running in different threads. This state isn't updated by the same lock, so my thread won't see the same state, which would explain the NPE. Another issue is that

Re: ORC NPE while writing stats

2015-09-02 Thread Owen O'Malley
I don't see how it would get there. That implies that minimum was null, but the count was non-zero. The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like: @Override OrcProto.ColumnStatistics.Builder serialize() { OrcProto.ColumnStatistics.Builder result = super.serialize();

Re: ORC NPE while writing stats

2015-09-02 Thread Owen O'Malley
(Dropping dev) Well, that explains the non-determinism, because the MemoryManager will be shared across threads and thus the stripes will get flushed at effectively random times. Can you try giving each writer a unique MemoryManager? You'll need to put a class into the

ORC NPE while writing stats

2015-09-01 Thread David Capwell
We are writing ORC files in our application for hive to consume. Given enough time, we have noticed that writing causes a NPE when working with a string column's stats. Not sure whats causing it on our side yet since replaying the same data is just fine, it seems more like this just happens over