Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread RUSHIKESH RAUT
Thanks Jianfeng,

But i am still not able to solve the issue. I have set it to 4g but still
no luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY
 property.
Also as I have read that GC overhead limit exceeded error occurs when the
heap memory is insufficient. So How can I increase the heap memory. Please
correct me if I am wrong as I am still trying to learn these things.
Reagrds,
Rushikesh Raut

On Sun, Mar 26, 2017 at 4:25 PM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:

>
> This is a bug of zeppelin. spark.driver.memory won’t take effect. As for
> now it isn’t passed to spark through —conf parameter. See
> https://issues.apache.org/jira/browse/ZEPPELIN-1263
> The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting
> page.
>
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: RUSHIKESH RAUT <rushikeshraut...@gmail.com>
> Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
> Date: Sunday, March 26, 2017 at 5:03 PM
> To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
> Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)
>
> ZEPPELIN_INTP_JAVA_OPTS
>


Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-25 Thread RUSHIKESH RAUT
Yes I know it inevitable if the data is large. I want to know how do I
increase the interpreter memory to handle large data?

Thanks,
Rushikesh Raut

On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" <jzh...@hortonworks.com>
wrote:

>
> How large is your data ? This problem is inevitable if your data is too
> large, you can try to use spark data frame if that works for you.
>
>
>
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: RUSHIKESH RAUT <rushikeshraut...@gmail.com>
> Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
> Date: Saturday, March 25, 2017 at 5:06 PM
> To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
> Subject: Zeppelin out of memory issue - (GC overhead limit exceeded)
>
> Hi everyone,
>
> I am trying to load some data from hive table into my notebook and then
> convert this dataframe into r dataframe using spark.r interpreter. This
> works perfectly for small amount of data.
> But if the data is increased then it gives me error
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
> the zeppelin-env.cmd file but i am still facing this issue. I have used the
> following configuration
>
> set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
> set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"
>
> I am sure that this much size should be sufficient for my data but still i
> am getting this same error. Any guidance will be much appreciated.
>
> Thanks,
> Rushikesh Raut
>


Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-25 Thread RUSHIKESH RAUT
Hi everyone,

I am trying to load some data from hive table into my notebook and then
convert this dataframe into r dataframe using spark.r interpreter. This
works perfectly for small amount of data.
But if the data is increased then it gives me error

java.lang.OutOfMemoryError: GC overhead limit exceeded

I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
the zeppelin-env.cmd file but i am still facing this issue. I have used the
following configuration

set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"

I am sure that this much size should be sufficient for my data but still i
am getting this same error. Any guidance will be much appreciated.

Thanks,
Rushikesh Raut


Unable to import data from multiple hive tables in single notebook

2017-02-26 Thread RUSHIKESH RAUT
Hi All,

I am facing following issue while running sparkR codes in zeppelin. I am
trying to load data from hive tables and then perform some operation.
Following is the code of two paragraphs present in my notebook
1)input_file<- sql(sqlContext, "select * from table1")
  data(input_file)
  head(input_file)

*op*---> I am getting proper data of the table1

2) input_file1<- sql(sqlContext, "select * from table2")
   data(input_file1)
   head(input_file1)

*op* --> i am getting following error

Error: is.character(x) is not TRUE
Error in head(input_file1): object 'input_file1' not found

Both the tables are present in hive. The only difference that i am
observing is variable name. So is importing data from multiple tables not
allowed in zeppelin?

Also there is one other issue, if I use same variable name then the code
does not give error but it also doesn't change the data. It still shows
table1 data.

Can anyone explain what i might be doing wrong or what the error exactly
mean. I have also checked the logs but couldnt find anything.

Thanks,
Rushikesh


Re: Re: Zeppelin unable to respond after some time

2017-02-17 Thread RUSHIKESH RAUT
-advertising-initiative-nai-as-100th-member/>
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP>
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
> PlaceIQ:Location Data Accuracy]
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>
> On Fri, Feb 17, 2017 at 3:23 PM "xyun...@simuwell.com" 
> <">"xyun...@simuwell.com"
> > wrote:
>
> The problem could be not only the resource, but the session. If you run a
> chunk of spark code and you should see the a running application in the
> spark UI, but in your code if you shut it down after the job is finished,
> then on the spark UI you will see the hob is finished. Within zeppelin,
> each job will start the spark session only once(different interpreter mode
> could be set if you want notebooks to share the session or not), if you
> closed it ,it will never restart it again. The only way to get the same
> code work again is to restat the interpreter or restart zeppelin. I`m not
> sure if I explain clearly, but hope it could help
>
>
>
> From: Paul Brenner <pbren...@placeiq.com>
>
> *Date:* 2017-02-17 12:14
> *To:* users <users@zeppelin.apache.org>
> *Subject:* Re: Re: Zeppelin unable to respond after some time
> I’ve definitely had this problem with jobs that don’t take all the
> resources on the cluster. Also, my experience matches what others have
> reported: just restarting zeppelin and re-runing the stuck paragraph solves
> the issue.
>
> I’ve also experienced this problem with for loops. Some for loops which
> write to disk but absolutely don’t have any variables that are increasing
> in size will hang in Zeppelin. If I run the exact same code in the scala
> REPL it goes through without problem.
>
>
>
> <http://www.placeiq.com/> <http://www.placeiq.com/>
> <http://www.placeiq.com/> Paul Brenner <https://twitter.com/placeiq>
> <https://twitter.com/placeiq> <https://twitter.com/placeiq>
> <https://www.facebook.com/PlaceIQ> <https://www.facebook.com/PlaceIQ>
> <https://www.linkedin.com/company/placeiq>
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature_medium=Email_campaign=AccuracyWP>
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>[image:
> PlaceIQ:Location Data Accuracy]
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
>
> On Fri, Feb 17, 2017 at 2:12 PM "xyun...@simuwell.com" 
> <">"xyun...@simuwell.com"
> > wrote:
>
> I have solved the similar issue before.  You should check on spark UI and
> probably you will see your single job is taking all the resources.
> Therefore further job that submitting to the same cluster will just hang on
> there. When you restart zeppelin then the old job is killed and all the
> resource it took will be relea

Re: Zeppelin unable to respond after some time

2017-02-17 Thread RUSHIKESH RAUT
Yes happens with r and spark codes frequently

On Feb 17, 2017 3:25 PM, "小野圭二" <onoke...@gmail.com> wrote:

yes, almost every time.
There are not any special operations.
Just run the tutorial demos.
>From my feeling, it happens in R demo frequently.

2017-02-17 18:50 GMT+09:00 Jeff Zhang <zjf...@gmail.com>:

>
> Is it easy to reproduce it ?
>
> 小野圭二 <onoke...@gmail.com>于2017年2月17日周五 下午5:47写道:
>
>> I am facing on the same issue now.
>>
>> 2017-02-17 18:25 GMT+09:00 RUSHIKESH RAUT <rushikeshraut...@gmail.com>:
>>
>> Hi all,
>>
>> I am facing a issue while using Zeppelin. I am trying to load some
>> data(not that big data) into Zeppelin and then build some visualization on
>> it. The problem is that when I try to run the code first time it's working
>> but after some time the same code doesn't work. It remains in running state
>> on gui, but no logs are generated in Zeppelin logs. Also all further tasks
>> are hanging in pending state.
>> As soon as I restart  Zeppelin it works. So I am guessing it's some
>> memory issue. I have read that Zeppelin stores the data in memory so it is
>> possible that it runs out of memory after some time.
>> How do I debug this issue? How much is the default memory that Zeppelin
>> takes at start? Also is there any way that I can run Zeppelin with
>> specified memory so that I can start the process with more memory. Because
>> it doesn't make sense to restart Zeppelin after every half hour
>>
>> Thanks,
>> Rushikesh Raut
>>
>>
>>