plot multiple locations points in map

2017-02-22 Thread Alec Lee
Hi all

My pyspark code is able to generate the list of locations (longitude, 
latitude), which I want to plot on the Map integrate in notebook. Any China map 
API that I can call? A sample code is preferred. 

thanks

AL

Re: [DISCUSS] Admin feature

2017-02-22 Thread Alec Lee
We have multiple uses in our organization share same anonymous account, which 
will potentially cause the problems, we hope to have true authenticated mode. 

thanks

AL
> On Feb 22, 2017, at 9:14 PM, Jongyoul Lee  wrote:
> 
> Hi folks,
> 
> Recently, I've heard some new feature assumed that it needed the admin 
> account or similar role. But Apache Zeppelin doesn't have any admin feature 
> like hiding/showing menu and settings. I want to know how community thinks 
> about that feature.
> 
> My first concern is that we have to consider two modes: anonymous and 
> authenticated.
> 
> Feel free to start the discussion on pros and cons.
> 
> Regards,
> Jongyoul
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net 



Re: notebook interpreter restart

2017-01-25 Thread Alec Lee
Thanks a lot, we miss that setting, will do the experiments right away, :)

best


AL

> On Jan 25, 2017, at 6:17 PM, Paul Brenner  wrote:
> 
> 
> Did you try setting your interpreter to “Isolated” mode? Is it currently in 
> “shared” mode? 
> 
> If you haven’t played with this setting before then: 
> 1. Open the interpreters page
> 2. Find your interpreter and click the edit button in the top right corner
> 3. Beneath the word “option” at the top left there should be a drop down that 
> says “shared”, “scoped”, or “isolated”.
> 4. Set that drop down to isolated and then scroll down and click save.
> 
> By doing that, we get a separate YARN application for each notebook running 
> on the same interpreter.
> 
>  <http://www.placeiq.com/> <http://www.placeiq.com/> 
> <http://www.placeiq.com/>Paul Brenner 
> <https://twitter.com/placeiq> <https://twitter.com/placeiq> 
> <https://twitter.com/placeiq>   <https://www.facebook.com/PlaceIQ> 
> <https://www.facebook.com/PlaceIQ>   
> <https://www.linkedin.com/company/placeiq> 
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> (217) 390-3033  
> 
>  
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
>  
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
>  
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>
>  
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
> 
> On Wed, Jan 25, 2017 at 9:12 PM Alec Lee mailto:Alec Lee 
> >> wrote:
> Hello, Paul 
> 
> Thank you so much for prompt reply. Our understanding is each interpreter 
> takes one application on YARN, if multiple notebooks share same interpreter 
> will run through same YARN application. 
> 
> As you mentioned this “… which will kill any other YARN application 
> associated with that interpreter.” , we seem never see more than one YARN 
> application in YARN UI, the case we had, I use one interpreter but wrote many 
> notebooks, but no matter how many notebooks spark jobs, I only see one YARN 
> application. So I am curious how to make single interpreter being associated 
> with more than YARN applications.
> 
> 
> Thanks
> 
> 
> AL
> 
> 
> 
> 
> 
>> On Jan 25, 2017, at 5:39 PM, Paul Brenner > <mailto:pbren...@placeiq.com>> wrote:
>> 
>> 
>> Alec,
>> 
>> The way we use zeppelin at our company is to set our interpreters to 
>> “isolated”. That way each notebook gets it’s own application on yarn. 
>> 
>> This mostly works well. The one downside is that if you stop a notebook 
>> (e.g. by calling sys.exit in a cell of a spark notebook) it does stop the 
>> YARN application most of the time but you can’t restart that notebook until 
>> you have restarted the interpreter… which will kill any other YARN 
>> application associated with that interpreter.
>> 
>> So our full setup is that we give each user an interpreter (which is good 
>> because we can set each user’s interpreter to have their username in 
>> spark.yarn.queue) and set each user’s interpreter to isolated.
>> 
>> Honestly I still don’t understand what scoped does… maybe that would work a

Re: notebook interpreter restart

2017-01-25 Thread Alec Lee
Hello, Paul 

Thank you so much for prompt reply. Our understanding is each interpreter takes 
one application on YARN, if multiple notebooks share same interpreter will run 
through same YARN application. 

As you mentioned this “… which will kill any other YARN application associated 
with that interpreter.” , we seem never see more than one YARN application in 
YARN UI, the case we had, I use one interpreter but wrote many notebooks, but 
no matter how many notebooks spark jobs, I only see one YARN application. So I 
am curious how to make single interpreter being associated with more than YARN 
applications.


Thanks


AL





> On Jan 25, 2017, at 5:39 PM, Paul Brenner  wrote:
> 
> 
> Alec,
> 
> The way we use zeppelin at our company is to set our interpreters to 
> “isolated”. That way each notebook gets it’s own application on yarn. 
> 
> This mostly works well. The one downside is that if you stop a notebook (e.g. 
> by calling sys.exit in a cell of a spark notebook) it does stop the YARN 
> application most of the time but you can’t restart that notebook until you 
> have restarted the interpreter… which will kill any other YARN application 
> associated with that interpreter.
> 
> So our full setup is that we give each user an interpreter (which is good 
> because we can set each user’s interpreter to have their username in 
> spark.yarn.queue) and set each user’s interpreter to isolated.
> 
> Honestly I still don’t understand what scoped does… maybe that would work as 
> well?
> 
>  <http://www.placeiq.com/> <http://www.placeiq.com/> 
> <http://www.placeiq.com/>Paul Brenner 
> <https://twitter.com/placeiq> <https://twitter.com/placeiq> 
> <https://twitter.com/placeiq>   <https://www.facebook.com/PlaceIQ> 
> <https://www.facebook.com/PlaceIQ>   
> <https://www.linkedin.com/company/placeiq> 
> <https://www.linkedin.com/company/placeiq>
> DATA SCIENTIST
> (217) 390-3033  
> 
>  
> <http://www.placeiq.com/2015/05/26/placeiq-named-winner-of-prestigious-2015-oracle-data-cloud-activate-award/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2015/12/18/accuracy-vs-precision-in-location-data-mma-webinar/>
>  
> <http://placeiq.com/2016/03/08/measuring-addressable-tv-campaigns-is-now-possible/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://placeiq.com/2016/04/13/placeiq-joins-the-network-advertising-initiative-nai-as-100th-member/>
>  
> <http://pages.placeiq.com/Location-Data-Accuracy-Whitepaper-Download.html?utm_source=Signature&utm_medium=Email&utm_campaign=AccuracyWP>
>  
> <http://placeiq.com/2016/08/03/placeiq-bolsters-location-intelligence-platform-with-mastercard-insights/>
>  
> <http://placeiq.com/2016/10/26/the-making-of-a-location-data-industry-milestone/>
>  
> <http://placeiq.com/2016/12/07/placeiq-introduces-landmark-a-groundbreaking-offering-that-delivers-access-to-the-highest-quality-location-data-for-insights-that-fuel-limitless-business-decisions/>
> 
> On Wed, Jan 25, 2017 at 8:20 PM Alec Lee mailto:Alec Lee 
> >> wrote:
> Hi, all 
> 
> 
> Currently we are exploring feature of zeppelin, now the situation we are 
> using YARN to manage spark jobs. In terms of the experiments, we conclude 
> that one interpreter is corresponding to An application in YARN cluster, that 
> means all the notebooks from zeppelin with same interpreter go through single 
> Application in YARN. Also we found if out code shuts down the application in 
> YARN, then any notebooks fail to run after this point - error like this 
> “can’t call a stop spark context …..” . The only solution for this is to 
> restart the interpreter. How to get around this without restart the 
> interpreter? 
> 
> 
> 
> Thanks 
> 
> 
> AL
> 



notebook interpreter restart

2017-01-25 Thread Alec Lee
Hi, all


Currently we are exploring feature of zeppelin, now the situation we are using 
YARN to manage spark jobs. In terms of the experiments, we conclude that one 
interpreter is corresponding to An application in YARN cluster, that means all 
the notebooks from zeppelin with same interpreter go through single Application 
in YARN. Also we found if out code shuts down the application in YARN, then any 
notebooks fail to run after this point -  error like this “can’t call a stop 
spark context …..” . The only solution for this is to restart the interpreter. 
How to get around this without restart the interpreter?



Thanks


AL

pyspark can't run through

2017-01-05 Thread Alec Lee
Hello, all

I recently come cross good tool - zeppelin, it is easy to use. But I have some 
troubles to make pyspark work in my server. The code below used to work fine, 
but for no reason it pop up errors like permission deny.  

Code
%pyspark
import pandas
## http://blog.csdn.net/lsshlsw/article/details/53768756
## possible error, fix as above
PlateText = sc.textFile("hdfs://node9:54310/user/hadoop/tmp/plate_A216E6.csv")
header = PlateText.first()
#print header
#print PlateText.count()


Error msg ***
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7122867288384299951.py", line 267, in 
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7122867288384299951.py", line 265, in 
exec(code)
  File "", line 3, in 
  File "/opt/spark-2.0.2-bin-hadoop2.7/python/pyspark/rdd.py", line 1328, in 
first
rs = self.take(1)
  File "/opt/spark-2.0.2-bin-hadoop2.7/python/pyspark/rdd.py", line 1310, in 
take
res = self.context.runJob(self, takeUpToNumLeft, p)
  File "/opt/spark-2.0.2-bin-hadoop2.7/python/pyspark/context.py", line 933, in 
runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
partitions)
  File 
"/opt/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py",
 line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
  File "/opt/spark-2.0.2-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, 
in deco
return f(*a, **kw)
  File 
"/opt/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py",
 line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, 192.168.168.29): java.io.IOException: Cannot run program 
"/opt/spark-2.0.2-bin-hadoop2.7/python": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:114)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 14 more
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)