Re: Zeppelin out of memory issue - (GC overhead limit exceeded)
Thanks Jianfeng, But i am still not able to solve the issue. I have set it to 4g but still no luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY property. Also as I have read that GC overhead limit exceeded error occurs when the heap memory is insufficient. So How can I increase the heap memory. Please correct me if I am wrong as I am still trying to learn these things. Reagrds, Rushikesh Raut On Sun, Mar 26, 2017 at 4:25 PM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > > This is a bug of zeppelin. spark.driver.memory won’t take effect. As for > now it isn’t passed to spark through —conf parameter. See > https://issues.apache.org/jira/browse/ZEPPELIN-1263 > The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting > page. > > > > Best Regard, > Jeff Zhang > > > From: RUSHIKESH RAUT <rushikeshraut...@gmail.com> > Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Date: Sunday, March 26, 2017 at 5:03 PM > To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded) > > ZEPPELIN_INTP_JAVA_OPTS >
Re: Zeppelin out of memory issue - (GC overhead limit exceeded)
Yes I know it inevitable if the data is large. I want to know how do I increase the interpreter memory to handle large data? Thanks, Rushikesh Raut On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" <jzh...@hortonworks.com> wrote: > > How large is your data ? This problem is inevitable if your data is too > large, you can try to use spark data frame if that works for you. > > > > > > Best Regard, > Jeff Zhang > > > From: RUSHIKESH RAUT <rushikeshraut...@gmail.com> > Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Date: Saturday, March 25, 2017 at 5:06 PM > To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Subject: Zeppelin out of memory issue - (GC overhead limit exceeded) > > Hi everyone, > > I am trying to load some data from hive table into my notebook and then > convert this dataframe into r dataframe using spark.r interpreter. This > works perfectly for small amount of data. > But if the data is increased then it gives me error > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in > the zeppelin-env.cmd file but i am still facing this issue. I have used the > following configuration > > set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m" > set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m" > > I am sure that this much size should be sufficient for my data but still i > am getting this same error. Any guidance will be much appreciated. > > Thanks, > Rushikesh Raut >
Zeppelin out of memory issue - (GC overhead limit exceeded)
Hi everyone, I am trying to load some data from hive table into my notebook and then convert this dataframe into r dataframe using spark.r interpreter. This works perfectly for small amount of data. But if the data is increased then it gives me error java.lang.OutOfMemoryError: GC overhead limit exceeded I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in the zeppelin-env.cmd file but i am still facing this issue. I have used the following configuration set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m" set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m" I am sure that this much size should be sufficient for my data but still i am getting this same error. Any guidance will be much appreciated. Thanks, Rushikesh Raut
Unable to import data from multiple hive tables in single notebook
Hi All, I am facing following issue while running sparkR codes in zeppelin. I am trying to load data from hive tables and then perform some operation. Following is the code of two paragraphs present in my notebook 1)input_file<- sql(sqlContext, "select * from table1") data(input_file) head(input_file) *op*---> I am getting proper data of the table1 2) input_file1<- sql(sqlContext, "select * from table2") data(input_file1) head(input_file1) *op* --> i am getting following error Error: is.character(x) is not TRUE Error in head(input_file1): object 'input_file1' not found Both the tables are present in hive. The only difference that i am observing is variable name. So is importing data from multiple tables not allowed in zeppelin? Also there is one other issue, if I use same variable name then the code does not give error but it also doesn't change the data. It still shows table1 data. Can anyone explain what i might be doing wrong or what the error exactly mean. I have also checked the logs but couldnt find anything. Thanks, Rushikesh
Re: Re: Zeppelin unable to respond after some time
-advertising-initiative-nai-as-100th-member/> > <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP> > <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/> > <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image: > PlaceIQ:Location Data Accuracy] > <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/> > > On Fri, Feb 17, 2017 at 3:23 PM "xyun...@simuwell.com" > <">"xyun...@simuwell.com" > > wrote: > > The problem could be not only the resource, but the session. If you run a > chunk of spark code and you should see the a running application in the > spark UI, but in your code if you shut it down after the job is finished, > then on the spark UI you will see the hob is finished. Within zeppelin, > each job will start the spark session only once(different interpreter mode > could be set if you want notebooks to share the session or not), if you > closed it ,it will never restart it again. The only way to get the same > code work again is to restat the interpreter or restart zeppelin. I`m not > sure if I explain clearly, but hope it could help > > > > From: Paul Brenner <pbren...@placeiq.com> > > *Date:* 2017-02-17 12:14 > *To:* users <users@zeppelin.apache.org> > *Subject:* Re: Re: Zeppelin unable to respond after some time > I’ve definitely had this problem with jobs that don’t take all the > resources on the cluster. Also, my experience matches what others have > reported: just restarting zeppelin and re-runing the stuck paragraph solves > the issue. > > I’ve also experienced this problem with for loops. Some for loops which > write to disk but absolutely don’t have any variables that are increasing > in size will hang in Zeppelin. If I run the exact same code in the scala > REPL it goes through without problem. > > > > <http://www.placeiq.com/> <http://www.placeiq.com/> > <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq> > <https://twitter.com/placeiq> <https://twitter.com/placeiq> > <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ> > <https://www.linkedin.com/company/placeiq> > <https://www.linkedin.com/company/placeiq> > DATA SCIENTIST > *(217) 390-3033 <(217)%20390-3033> * > > <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/> > <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/> > <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP> > <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/> > <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image: > PlaceIQ:Location Data Accuracy] > <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/> > > On Fri, Feb 17, 2017 at 2:12 PM "xyun...@simuwell.com" > <">"xyun...@simuwell.com" > > wrote: > > I have solved the similar issue before. You should check on spark UI and > probably you will see your single job is taking all the resources. > Therefore further job that submitting to the same cluster will just hang on > there. When you restart zeppelin then the old job is killed and all the > resource it took will be relea
Re: Zeppelin unable to respond after some time
Yes happens with r and spark codes frequently On Feb 17, 2017 3:25 PM, "小野圭二" <onoke...@gmail.com> wrote: yes, almost every time. There are not any special operations. Just run the tutorial demos. >From my feeling, it happens in R demo frequently. 2017-02-17 18:50 GMT+09:00 Jeff Zhang <zjf...@gmail.com>: > > Is it easy to reproduce it ? > > 小野圭二 <onoke...@gmail.com>于2017年2月17日周五 下午5:47写道: > >> I am facing on the same issue now. >> >> 2017-02-17 18:25 GMT+09:00 RUSHIKESH RAUT <rushikeshraut...@gmail.com>: >> >> Hi all, >> >> I am facing a issue while using Zeppelin. I am trying to load some >> data(not that big data) into Zeppelin and then build some visualization on >> it. The problem is that when I try to run the code first time it's working >> but after some time the same code doesn't work. It remains in running state >> on gui, but no logs are generated in Zeppelin logs. Also all further tasks >> are hanging in pending state. >> As soon as I restart Zeppelin it works. So I am guessing it's some >> memory issue. I have read that Zeppelin stores the data in memory so it is >> possible that it runs out of memory after some time. >> How do I debug this issue? How much is the default memory that Zeppelin >> takes at start? Also is there any way that I can run Zeppelin with >> specified memory so that I can start the process with more memory. Because >> it doesn't make sense to restart Zeppelin after every half hour >> >> Thanks, >> Rushikesh Raut >> >> >>