> On Sep 2, 2015, at 10:57 PM, David Capwell wrote:
>
> So, very quickly looked at the JIRA and I had the following question;
> if you have a pool per thread rather than global, then assuming 50%
> heap will cause writer to OOM with multiple threads, which is
> different
Thanks, that should help moving forward
On Sep 3, 2015 10:38 AM, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:
>
> > On Sep 2, 2015, at 10:57 PM, David Capwell wrote:
> >
> > So, very quickly looked at the JIRA and I had the following question;
> > if you
Also, the data put in are primitives, structs (list), and arrays (list); we
don't use any of the boxed writables (like text).
On Sep 2, 2015 12:57 PM, "David Capwell" wrote:
> We have multiple threads writing, but each thread works on one file, so
> orc writer is only touched
Memory manager is made thread local
https://issues.apache.org/jira/browse/HIVE-10191
Can you try the patch from HIVE-10191 and see if that helps?
On Sep 2, 2015, at 8:58 PM, David Capwell
> wrote:
I'll try that out and see if it goes away (not
So, very quickly looked at the JIRA and I had the following question;
if you have a pool per thread rather than global, then assuming 50%
heap will cause writer to OOM with multiple threads, which is
different than older (0.14) ORC, correct?
Walking the MemoryManager, and I have a few questions:
# statements
Every time you create a writer for a given thread (assuming the thread
local version), you just update MemoryManager with the stripe size.
The scale is just %heap / (#writer * stripe (assuming equal stripe
size)).
Periodically
I'll try that out and see if it goes away (not seen this in the past 24
hours, no code change).
Doing this now means that I can't share the memory, so will prob go with a
thread local and allocate fixed sizes to the pool per thread (50% heap / 50
threads). Will most likely be awhile before I can
Thanks for the jira, will see if that works for us.
On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:
> Memory manager is made thread local
> https://issues.apache.org/jira/browse/HIVE-10191
>
> Can you try the patch from HIVE-10191 and see if that helps?
>
>
Also, if I am walking this correctly
writer.addRow(struct) may trigger my current thread to flush all the
state for other writers running in different threads. This state
isn't updated by the same lock, so my thread won't see the same state,
which would explain the NPE. Another issue is that
I don't see how it would get there. That implies that minimum was null, but
the count was non-zero.
The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:
@Override
OrcProto.ColumnStatistics.Builder serialize() {
OrcProto.ColumnStatistics.Builder result = super.serialize();
(Dropping dev)
Well, that explains the non-determinism, because the MemoryManager will be
shared across threads and thus the stripes will get flushed at effectively
random times.
Can you try giving each writer a unique MemoryManager? You'll need to put a
class into the
We are writing ORC files in our application for hive to consume.
Given enough time, we have noticed that writing causes a NPE when
working with a string column's stats. Not sure whats causing it on
our side yet since replaying the same data is just fine, it seems more
like this just happens over
12 matches
Mail list logo