Sorry i badly reported it. It's 8192M Thanks,
David. Le 18 févr. 2014 18:37, "Stephen Sprague" <sprag...@gmail.com> a écrit : > oh. i just noticed the -Xmx value you reported. > > there's no M or G after that number?? I'd like to see -Xmx8192M or > -Xmx8G. That *is* very important. > > thanks, > Stephen. > > > On Tue, Feb 18, 2014 at 9:22 AM, Stephen Sprague <sprag...@gmail.com>wrote: > >> thanks. >> >> re #1. we need to find that Hiveserver2 process. For all i know the one >> you reported is hiveserver1 (which works.) chances are they use the same >> -Xmx value but we really shouldn't make any assumptions. >> >> try wide format on the ps command (eg. ps -efw | grep -i Hiveserver2) >> >> re.#2. okay. so that tells us is not the number of columns blowing the >> heap but rather the combination of rows + columns. There's no way it >> stores the full result set on the heap even under normal circumstances so >> my guess is there's an internal number of rows it buffers. sorta like how >> unix buffers stdout. How and where that's set is out of my league. >> However, maybe you get around it by upping your heapsize again if you have >> the available memory of course. >> >> >> On Tue, Feb 18, 2014 at 8:39 AM, David Gayou <david.ga...@kxen.com>wrote: >> >>> >>> 1. I have no process with hiveserver2 ... >>> >>> "ps -ef | grep -i hive" return some pretty long command with a -Xmx8192 >>> and that's the value set in hive-env.sh >>> >>> >>> 2. The "select * from table limit 1" or even 100 is working correctly. >>> >>> >>> David. >>> >>> >>> On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague <sprag...@gmail.com>wrote: >>> >>>> He lives on after all! and thanks for the continued feedback. >>>> >>>> We need the answers to these questions using HS2: >>>> >>>> >>>> >>>> 1. what is the output of "ps -ef | grep -i hiveserver2" on your >>>> system? in particular what is the value of -Xmx ? >>>> >>>> 2. does "select * from table limit 1" work? >>>> >>>> Thanks, >>>> Stephen. >>>> >>>> >>>> >>>> On Tue, Feb 18, 2014 at 6:32 AM, David Gayou <david.ga...@kxen.com>wrote: >>>> >>>>> I'm so sorry, i wrote an answer, and i forgot to sent it.... >>>>> And i haven't been able to work on this for a few days. >>>>> >>>>> >>>>> So far : >>>>> >>>>> I have a 15k columns table and 50k rows. >>>>> >>>>> I do not see any changes if i change the storage. >>>>> >>>>> >>>>> *Hive 12.0* >>>>> >>>>> My test query is "select * from bigtable" >>>>> >>>>> >>>>> If i use the hive cli, it works fine. >>>>> >>>>> If i use hiveserver1 + ODBC : it works fine >>>>> >>>>> If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this >>>>> java exception : >>>>> >>>>> 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction >>>>> (ProcessFunction.java:process(41)) - Internal error processing >>>>> FetchResults >>>>> >>>>> java.lang.OutOfMemoryError: Java heap space >>>>> at java.util.Arrays.copyOf(Arrays.java:2734) >>>>> at java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>>>> at java.util.ArrayList.add(ArrayList.java:351) >>>>> at >>>>> org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) >>>>> at >>>>> org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) >>>>> at >>>>> org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) >>>>> at >>>>> org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) >>>>> at >>>>> org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) >>>>> at >>>>> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) >>>>> at >>>>> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) >>>>> at >>>>> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) >>>>> at >>>>> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) >>>>> at >>>>> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) >>>>> at >>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) >>>>> >>>>> >>>>> >>>>> >>>>> *From the SVN trunk* : (for the HIVE-3746) >>>>> >>>>> With the maven change, most of the documentation and wiki are out of >>>>> date. >>>>> Compiling from trunk was not that easy and i may have failed some >>>>> steps but : >>>>> >>>>> It has the same behavior. It works in CLI and hiveserver1. >>>>> It fails with hiveserver 2. >>>>> >>>>> >>>>> Regards >>>>> >>>>> David Gayou >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 <navis....@nexr.com> wrote: >>>>> >>>>>> With HIVE-3746, which will be included in hive-0.13, HiveServer2 >>>>>> takes less memory than before. >>>>>> >>>>>> Could you try it with the version in trunk? >>>>>> >>>>>> >>>>>> 2014-02-13 10:49 GMT+09:00 Stephen Sprague <sprag...@gmail.com>: >>>>>> >>>>>> question to the original poster. closure appreciated! >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague < >>>>>>> sprag...@gmail.com> wrote: >>>>>>> >>>>>>>> thanks Ed. And on a separate tact lets look at Hiveserver2. >>>>>>>> >>>>>>>> >>>>>>>> @OP> >>>>>>>> >>>>>>>> *I've tried to look around on how i can change the thrift heap size >>>>>>>> but haven't found anything.* >>>>>>>> >>>>>>>> >>>>>>>> looking at my hiveserver2 i find this: >>>>>>>> >>>>>>>> $ ps -ef | grep -i hiveserver2 >>>>>>>> dwr 9824 20479 0 12:11 pts/1 00:00:00 grep -i >>>>>>>> hiveserver2 >>>>>>>> dwr 28410 1 0 00:05 ? 00:01:04 >>>>>>>> /usr/lib/jvm/java-6-sun/jre/bin/java >>>>>>>> *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs >>>>>>>> -Dhadoop.log.file=hadoop.log >>>>>>>> -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= >>>>>>>> -Dhadoop.root.logger=INFO,console >>>>>>>> -Djava.library.path=/usr/lib/hadoop/lib/native >>>>>>>> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true >>>>>>>> -Dhadoop.security.logger=INFO,NullAppender >>>>>>>> org.apache.hadoop.util.RunJar >>>>>>>> /usr/lib/hive/lib/hive-service-0.12.0.jar >>>>>>>> org.apache.hive.service.server.HiveServer2 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> questions: >>>>>>>> >>>>>>>> 1. what is the output of "ps -ef | grep -i hiveserver2" on your >>>>>>>> system? in particular what is the value of -Xmx ? >>>>>>>> >>>>>>>> 2. can you restart your hiveserver with -Xmx1g? or some value >>>>>>>> that makes sense to your system? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Lots of questions now. we await your answers! :) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo < >>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Final table compression should not effect the de serialized size >>>>>>>>> of the data over the wire. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague < >>>>>>>>> sprag...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Excellent progress David. So. What the most important thing >>>>>>>>>> here we learned was that it works (!) by running hive in local mode >>>>>>>>>> and >>>>>>>>>> that this error is a limitation in the HiveServer2. That's >>>>>>>>>> important. >>>>>>>>>> >>>>>>>>>> so textfile storage handler and having issues converting it to >>>>>>>>>> ORC. hmmm. >>>>>>>>>> >>>>>>>>>> follow-ups. >>>>>>>>>> >>>>>>>>>> 1. what is your query that fails? >>>>>>>>>> >>>>>>>>>> 2. can you add a "limit 1" to the end of your query and tell us >>>>>>>>>> if that works? this'll tell us if it's column or row bound. >>>>>>>>>> >>>>>>>>>> 3. bonus points. run these in local mode: >>>>>>>>>> > set hive.exec.compress.output=true; >>>>>>>>>> > set mapred.output.compression.type=BLOCK; >>>>>>>>>> > set >>>>>>>>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; >>>>>>>>>> > create table blah stored as ORC as select * from <your >>>>>>>>>> table>; #i'm curious if this'll work. >>>>>>>>>> > show create table blah; #send output back if previous >>>>>>>>>> step worked. >>>>>>>>>> >>>>>>>>>> 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that >>>>>>>>>> works any differently. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm wondering if compression would have any effect on the size of >>>>>>>>>> the internal ArrayList the thrift server uses. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jan 31, 2014 at 9:21 AM, David Gayou < >>>>>>>>>> david.ga...@kxen.com> wrote: >>>>>>>>>> >>>>>>>>>>> Ok, so here are some news : >>>>>>>>>>> >>>>>>>>>>> I tried to boost the HADOOP_HEAPSIZE to 8192, >>>>>>>>>>> I also setted the mapred.child.java.opts to 512M >>>>>>>>>>> >>>>>>>>>>> And it doesn't seem's to have any effect. >>>>>>>>>>> ------ >>>>>>>>>>> >>>>>>>>>>> I tried it using an ODBC driver => fail after few minutes. >>>>>>>>>>> Using a local JDBC (beeline) => running forever without any >>>>>>>>>>> error. >>>>>>>>>>> >>>>>>>>>>> Both through hiveserver 2 >>>>>>>>>>> >>>>>>>>>>> If i use the local mode : it works! (but that not really what >>>>>>>>>>> i need, as i don't really how to access it with my software) >>>>>>>>>>> >>>>>>>>>>> ------ >>>>>>>>>>> I use a text file as storage. >>>>>>>>>>> I tried to use ORC, but i can't populate it with a load data >>>>>>>>>>> (it return an error of file format). >>>>>>>>>>> >>>>>>>>>>> Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" >>>>>>>>>>> after populating the table, i have a file format error on select. >>>>>>>>>>> >>>>>>>>>>> ------ >>>>>>>>>>> >>>>>>>>>>> @Edward : >>>>>>>>>>> >>>>>>>>>>> I've tried to look around on how i can change the thrift heap >>>>>>>>>>> size but haven't found anything. >>>>>>>>>>> Same thing for my client (haven't found how to change the heap >>>>>>>>>>> size) >>>>>>>>>>> >>>>>>>>>>> My usecase is really to have the most possible columns. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks a lot for your help >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo < >>>>>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Ok here are the problem(s). Thrift has frame size limits, >>>>>>>>>>>> thrift has to buffer rows into memory. >>>>>>>>>>>> >>>>>>>>>>>> Hove thrift has a heap size, it needs to big in this case. >>>>>>>>>>>> >>>>>>>>>>>> Your client needs a big heap size as well. >>>>>>>>>>>> >>>>>>>>>>>> The way to do this query if it is possible may be turning row >>>>>>>>>>>> lateral, potwntially by treating it as a list, it will make >>>>>>>>>>>> queries on it >>>>>>>>>>>> awkward. >>>>>>>>>>>> >>>>>>>>>>>> Good luck >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, January 30, 2014, Stephen Sprague < >>>>>>>>>>>> sprag...@gmail.com> wrote: >>>>>>>>>>>> > oh. thinking some more about this i forgot to ask some other >>>>>>>>>>>> basic questions. >>>>>>>>>>>> > >>>>>>>>>>>> > a) what storage format are you using for the table (text, >>>>>>>>>>>> sequence, rcfile, orc or custom)? "show create table <table>" >>>>>>>>>>>> would yield >>>>>>>>>>>> that. >>>>>>>>>>>> > >>>>>>>>>>>> > b) what command is causing the stack trace? >>>>>>>>>>>> > >>>>>>>>>>>> > my thinking here is rcfile and orc are column based (i think) >>>>>>>>>>>> and if you don't select all the columns that could very well limit >>>>>>>>>>>> the size >>>>>>>>>>>> of the "row" being returned and hence the size of the internal >>>>>>>>>>>> ArrayList. >>>>>>>>>>>> OTOH, if you're using "select *", um, you have my sympathies. :) >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague < >>>>>>>>>>>> sprag...@gmail.com> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > thanks for the information. Up-to-date hive. Cluster on the >>>>>>>>>>>> smallish side. And, well, sure looks like a memory issue. :) >>>>>>>>>>>> rather than >>>>>>>>>>>> an inherent hive limitation that is. >>>>>>>>>>>> > >>>>>>>>>>>> > So. I can only speak as a user (ie. not a hive developer) >>>>>>>>>>>> but what i'd be interested in knowing next is is this via running >>>>>>>>>>>> hive in >>>>>>>>>>>> local mode, correct? (eg. not through hiveserver1/2). And it >>>>>>>>>>>> looks like it >>>>>>>>>>>> boinks on array processing which i assume to be internal code >>>>>>>>>>>> arrays and >>>>>>>>>>>> not hive data arrays - your 15K columns are all scalar/simple >>>>>>>>>>>> types, >>>>>>>>>>>> correct? Its clearly fetching results and looks be trying to >>>>>>>>>>>> store them in >>>>>>>>>>>> a java array - and not just one row but a *set* of rows >>>>>>>>>>>> (ArrayList) >>>>>>>>>>>> > >>>>>>>>>>>> > two things to try. >>>>>>>>>>>> > >>>>>>>>>>>> > 1. boost the heap-size. try 8192. And I don't know if >>>>>>>>>>>> HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was >>>>>>>>>>>> called >>>>>>>>>>>> something like "HIVE_HEAPSIZE". :) Anyway, can't hurt to try. >>>>>>>>>>>> > >>>>>>>>>>>> > 2. trim down the number of columns and see where the breaking >>>>>>>>>>>> point is. is it 10K? is it 5K? The idea is to confirm its _the >>>>>>>>>>>> number of >>>>>>>>>>>> columns_ that is causing the memory to blow and not some other >>>>>>>>>>>> artifact >>>>>>>>>>>> unbeknownst to us. >>>>>>>>>>>> > >>>>>>>>>>>> > 3. Google around the Hive namespace for something that might >>>>>>>>>>>> limit or otherwise control the number of rows stored at once in >>>>>>>>>>>> Hive's >>>>>>>>>>>> internal buffer. I snoop around too. >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > That's all i got for now and maybe we'll get lucky and >>>>>>>>>>>> someone on this list will know something or another about this. :) >>>>>>>>>>>> > >>>>>>>>>>>> > cheers, >>>>>>>>>>>> > Stephen. >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > On Thu, Jan 30, 2014 at 2:32 AM, David Gayou < >>>>>>>>>>>> david.ga...@kxen.com> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > We are using the Hive 0.12.0, but it doesn't work better on >>>>>>>>>>>> hive 0.11.0 or hive 0.10.0 >>>>>>>>>>>> > Our hadoop version is 1.1.2. >>>>>>>>>>>> > Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU >>>>>>>>>>>> (with hyperthreading so 4 cores per machine) + 16Gb Ram each >>>>>>>>>>>> > >>>>>>>>>>>> > The error message i get is : >>>>>>>>>>>> > >>>>>>>>>>>> > 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction >>>>>>>>>>>> (ProcessFunction.java:process(41)) - Internal error processing >>>>>>>>>>>> FetchResults >>>>>>>>>>>> > java.lang.OutOfMemoryError: Java heap space >>>>>>>>>>>> > at java.util.Arrays.copyOf(Arrays.java:2734) >>>>>>>>>>>> > at >>>>>>>>>>>> java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>>>>>>>>>>> > at java.util.ArrayList.add(ArrayList.java:351) >>>>>>>>>>>> > at org.apache.hive.service.cli.Row.<init>(Row.java:47) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) >>>>>>>>>>>> > at >>>>>>>>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) >>>>>>>>>>>> > at java.security.AccessCont >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Sorry this was sent from mobile. Will do less grammar and spell >>>>>>>>>>>> check than usual. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >