In answer to the other part of your question, yes, by default, each
fragment will write into its own set of files, you could be looking at (#
unique values) * (number of fragments) files being created. There is an
option to shuffle the data before writing, so that each value will be
written by only a one writer:

alter session set `store.partition.hash_distribute` = true

That should reduce the number of files.

On Thu, Oct 8, 2015 at 11:12 AM, John Omernik <[email protected]> wrote:

> That helped my memory leak! Thanks all!
>
> On Thu, Oct 8, 2015 at 10:59 AM, John Omernik <[email protected]> wrote:
>
> > Sweet! I'll go check it out. Thanks!
> >
> > On Thu, Oct 8, 2015 at 10:53 AM, Paul Ilechko <[email protected]>
> > wrote:
> >
> >> Yes, Drill 1.2 is available at package.mapr.com as of yesterday
> >>
> >> On Thu, Oct 8, 2015 at 11:48 AM, John Omernik <[email protected]> wrote:
> >>
> >> > MapR have a package yet? :) When I compiled Drill with the MapR
> Profile
> >> > myself, I couldn't get MapR Tables working, so I reverted back to
> Drill
> >> 1.1
> >> > as packaged by MapR.
> >> >
> >> >
> >> >
> >> > On Thu, Oct 8, 2015 at 10:42 AM, Abdel Hakim Deneche <
> >> > [email protected]>
> >> > wrote:
> >> >
> >> > > We fixed a similar issue as part of Drill 1.2. Can you give it a try
> >> to
> >> > see
> >> > > if your problem is effectively resolved ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Thu, Oct 8, 2015 at 8:33 AM, John Omernik <[email protected]>
> >> wrote:
> >> > >
> >> > > > I am on the MapR Packaged version of 1.1.  Do you still need the
> >> > > > sys.version?
> >> > > >
> >> > > > On Thu, Oct 8, 2015 at 10:13 AM, Abdel Hakim Deneche <
> >> > > > [email protected]>
> >> > > > wrote:
> >> > > >
> >> > > > > Hey John,
> >> > > > >
> >> > > > > The error you are seeing is a memory leak. Drill's allocator
> found
> >> > that
> >> > > > > about 1MB of allocated memory wasn't released at the end of the
> >> > > > fragment's
> >> > > > > execution.
> >> > > > >
> >> > > > > What version of Drill are you using ? can you share the result
> of:
> >> > > > >
> >> > > > > select * from sys.version;
> >> > > > >
> >> > > > > Thanks
> >> > > > >
> >> > > > > On Thu, Oct 8, 2015 at 7:35 AM, John Omernik <[email protected]>
> >> > wrote:
> >> > > > >
> >> > > > > > I am trying to complete a test case on some data. I took a
> >> schema
> >> > and
> >> > > > > used
> >> > > > > > log-synth (thanks Ted) to create fairly wide table.  (89
> >> > columns).  I
> >> > > > > then
> >> > > > > > outputted my data as csv files, and created a drill view, so
> >> far so
> >> > > > good.
> >> > > > > >
> >> > > > > > One of the columns is a "date" column, (YYYY-MM-DD) format and
> >> has
> >> > > 1216
> >> > > > > > unique values. To me this would be like a 4 ish years of daily
> >> > > > > partitioned
> >> > > > > > data in hive, so tried to created my data partiioning on that
> >> > field.
> >> > > > > >
> >> > > > > > If I create a Parquet table based on that, eventually things
> >> hork
> >> > on
> >> > > me
> >> > > > > and
> >> > > > > > I get the error below.  If I don't use the PARTITION BY
> clause,
> >> it
> >> > > > > creates
> >> > > > > > the table just fine with 30 files.
> >> > > > > >
> >> > > > > > Looking in the folder it was supposed to create the
> PARTITIONED
> >> > > table,
> >> > > > it
> >> > > > > > has over 20K files in there.  Is this expected? Would we
> expect
> >> > > > > #Partitions
> >> > > > > > * #Fragment files? Could this be what the error is trying to
> >> tell
> >> > me?
> >> > > >  I
> >> > > > > > guess I am just lost on what the error means, and what I
> >> > should/could
> >> > > > > > expect on something like this.  Is this a bug or expected?
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > Error:
> >> > > > > >
> >> > > > > > java.lang.RuntimeException: java.sql.SQLException: SYSTEM
> ERROR:
> >> > > > > > IllegalStateException: Failure while closing accountor.
> >> Expected
> >> > > > private
> >> > > > > > and shared pools to be set to initial values.  However, one or
> >> more
> >> > > > were
> >> > > > > > not.  Stats are
> >> > > > > >
> >> > > > > > zone init allocated delta
> >> > > > > >
> >> > > > > > private 1000000 1000000 0
> >> > > > > >
> >> > > > > > shared 9999000000 9997806954 1193046.
> >> > > > > >
> >> > > > > >
> >> > > > > > Fragment 1:25
> >> > > > > >
> >> > > > > >
> >> > > > > > [Error Id: cad06490-f93e-4744-a9ec-d27cd06bc0a1 on
> >> > > > > > hadoopmapr1.mydata.com:31010]
> >> > > > > >
> >> > > > > > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> >> > > > > >
> >> > > > > > at
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> >> > > > > >
> >> > > > > > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> >> > > > > >
> >> > > > > > at sqlline.SqlLine.print(SqlLine.java:1583)
> >> > > > > >
> >> > > > > > at sqlline.Commands.execute(Commands.java:852)
> >> > > > > >
> >> > > > > > at sqlline.Commands.sql(Commands.java:751)
> >> > > > > >
> >> > > > > > at sqlline.SqlLine.dispatch(SqlLine.java:738)
> >> > > > > >
> >> > > > > > at sqlline.SqlLine.begin(SqlLine.java:612)
> >> > > > > >
> >> > > > > > at sqlline.SqlLine.start(SqlLine.java:366)
> >> > > > > >
> >> > > > > > at sqlline.SqlLine.main(SqlLine.java:259)
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > >
> >> > > > > Abdelhakim Deneche
> >> > > > >
> >> > > > > Software Engineer
> >> > > > >
> >> > > > >   <http://www.mapr.com/>
> >> > > > >
> >> > > > >
> >> > > > > Now Available - Free Hadoop On-Demand Training
> >> > > > > <
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Abdelhakim Deneche
> >> > >
> >> > > Software Engineer
> >> > >
> >> > >   <http://www.mapr.com/>
> >> > >
> >> > >
> >> > > Now Available - Free Hadoop On-Demand Training
> >> > > <
> >> > >
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> ----------------------------------
> >> Paul Ilechko
> >> Senior Systems Engineer
> >> MapR Technologies
> >> 908 331 2207
> >>
> >
> >
>

Reply via email to