In answer to the other part of your question, yes, by default, each fragment will write into its own set of files, you could be looking at (# unique values) * (number of fragments) files being created. There is an option to shuffle the data before writing, so that each value will be written by only a one writer:
alter session set `store.partition.hash_distribute` = true That should reduce the number of files. On Thu, Oct 8, 2015 at 11:12 AM, John Omernik <[email protected]> wrote: > That helped my memory leak! Thanks all! > > On Thu, Oct 8, 2015 at 10:59 AM, John Omernik <[email protected]> wrote: > > > Sweet! I'll go check it out. Thanks! > > > > On Thu, Oct 8, 2015 at 10:53 AM, Paul Ilechko <[email protected]> > > wrote: > > > >> Yes, Drill 1.2 is available at package.mapr.com as of yesterday > >> > >> On Thu, Oct 8, 2015 at 11:48 AM, John Omernik <[email protected]> wrote: > >> > >> > MapR have a package yet? :) When I compiled Drill with the MapR > Profile > >> > myself, I couldn't get MapR Tables working, so I reverted back to > Drill > >> 1.1 > >> > as packaged by MapR. > >> > > >> > > >> > > >> > On Thu, Oct 8, 2015 at 10:42 AM, Abdel Hakim Deneche < > >> > [email protected]> > >> > wrote: > >> > > >> > > We fixed a similar issue as part of Drill 1.2. Can you give it a try > >> to > >> > see > >> > > if your problem is effectively resolved ? > >> > > > >> > > Thanks > >> > > > >> > > On Thu, Oct 8, 2015 at 8:33 AM, John Omernik <[email protected]> > >> wrote: > >> > > > >> > > > I am on the MapR Packaged version of 1.1. Do you still need the > >> > > > sys.version? > >> > > > > >> > > > On Thu, Oct 8, 2015 at 10:13 AM, Abdel Hakim Deneche < > >> > > > [email protected]> > >> > > > wrote: > >> > > > > >> > > > > Hey John, > >> > > > > > >> > > > > The error you are seeing is a memory leak. Drill's allocator > found > >> > that > >> > > > > about 1MB of allocated memory wasn't released at the end of the > >> > > > fragment's > >> > > > > execution. > >> > > > > > >> > > > > What version of Drill are you using ? can you share the result > of: > >> > > > > > >> > > > > select * from sys.version; > >> > > > > > >> > > > > Thanks > >> > > > > > >> > > > > On Thu, Oct 8, 2015 at 7:35 AM, John Omernik <[email protected]> > >> > wrote: > >> > > > > > >> > > > > > I am trying to complete a test case on some data. I took a > >> schema > >> > and > >> > > > > used > >> > > > > > log-synth (thanks Ted) to create fairly wide table. (89 > >> > columns). I > >> > > > > then > >> > > > > > outputted my data as csv files, and created a drill view, so > >> far so > >> > > > good. > >> > > > > > > >> > > > > > One of the columns is a "date" column, (YYYY-MM-DD) format and > >> has > >> > > 1216 > >> > > > > > unique values. To me this would be like a 4 ish years of daily > >> > > > > partitioned > >> > > > > > data in hive, so tried to created my data partiioning on that > >> > field. > >> > > > > > > >> > > > > > If I create a Parquet table based on that, eventually things > >> hork > >> > on > >> > > me > >> > > > > and > >> > > > > > I get the error below. If I don't use the PARTITION BY > clause, > >> it > >> > > > > creates > >> > > > > > the table just fine with 30 files. > >> > > > > > > >> > > > > > Looking in the folder it was supposed to create the > PARTITIONED > >> > > table, > >> > > > it > >> > > > > > has over 20K files in there. Is this expected? Would we > expect > >> > > > > #Partitions > >> > > > > > * #Fragment files? Could this be what the error is trying to > >> tell > >> > me? > >> > > > I > >> > > > > > guess I am just lost on what the error means, and what I > >> > should/could > >> > > > > > expect on something like this. Is this a bug or expected? > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > Error: > >> > > > > > > >> > > > > > java.lang.RuntimeException: java.sql.SQLException: SYSTEM > ERROR: > >> > > > > > IllegalStateException: Failure while closing accountor. > >> Expected > >> > > > private > >> > > > > > and shared pools to be set to initial values. However, one or > >> more > >> > > > were > >> > > > > > not. Stats are > >> > > > > > > >> > > > > > zone init allocated delta > >> > > > > > > >> > > > > > private 1000000 1000000 0 > >> > > > > > > >> > > > > > shared 9999000000 9997806954 1193046. > >> > > > > > > >> > > > > > > >> > > > > > Fragment 1:25 > >> > > > > > > >> > > > > > > >> > > > > > [Error Id: cad06490-f93e-4744-a9ec-d27cd06bc0a1 on > >> > > > > > hadoopmapr1.mydata.com:31010] > >> > > > > > > >> > > > > > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > >> > > > > > > >> > > > > > at > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > >> > > > > > > >> > > > > > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > >> > > > > > > >> > > > > > at sqlline.SqlLine.print(SqlLine.java:1583) > >> > > > > > > >> > > > > > at sqlline.Commands.execute(Commands.java:852) > >> > > > > > > >> > > > > > at sqlline.Commands.sql(Commands.java:751) > >> > > > > > > >> > > > > > at sqlline.SqlLine.dispatch(SqlLine.java:738) > >> > > > > > > >> > > > > > at sqlline.SqlLine.begin(SqlLine.java:612) > >> > > > > > > >> > > > > > at sqlline.SqlLine.start(SqlLine.java:366) > >> > > > > > > >> > > > > > at sqlline.SqlLine.main(SqlLine.java:259) > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > > >> > > > > Abdelhakim Deneche > >> > > > > > >> > > > > Software Engineer > >> > > > > > >> > > > > <http://www.mapr.com/> > >> > > > > > >> > > > > > >> > > > > Now Available - Free Hadoop On-Demand Training > >> > > > > < > >> > > > > > >> > > > > >> > > > >> > > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > > >> > > Abdelhakim Deneche > >> > > > >> > > Software Engineer > >> > > > >> > > <http://www.mapr.com/> > >> > > > >> > > > >> > > Now Available - Free Hadoop On-Demand Training > >> > > < > >> > > > >> > > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >> > > > > >> > > > >> > > >> > >> > >> > >> -- > >> ---------------------------------- > >> Paul Ilechko > >> Senior Systems Engineer > >> MapR Technologies > >> 908 331 2207 > >> > > > > >
