Mailing list

2016-08-12 Thread Inam Ur Rehman
UNSUBSCRIBE


Hive Exception

2016-07-22 Thread Inam Ur Rehman
Hi All
I am really stuck here. i know this has been asked before but it just wont
solve for me. I am using anaconda distribution 3.5 and and i have build
spark-1.6.2 two times 1st time with hive and JDBC support through this
command
*mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver
-DskipTests clean package*  it gives hive exception
and 2nd time through this command
*./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4
-Phive -Phive-thriftserver -Pyarn *it is also giving me exception.
i have also tried spark pre built version spark-1.6.1-bin-hadoop2.6 but the
exception remains the same..
the things i've tried to solve this
1) place hive-site.xml in spark\cpnf folder it was not there before.
2) set SPARK_HIVE = true
3) run sbt assembly
but the problem is still there.

here is the full error
You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
assembly
---
Py4JJavaError Traceback (most recent call last)
 in ()
  3
  4 binary_map = {'Yes':1.0, 'No':0.0, 'True':1.0, 'False':0.0}
> 5 toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
  6
  7 CV_data = CV_data.drop('State').drop('Area code') .drop('Total
day charge').drop('Total eve charge') .drop('Total night
charge').drop('Total intl charge') .withColumn('Churn',
toNum(CV_data['Churn'])) .withColumn('International plan',
toNum(CV_data['International plan'])) .withColumn('Voice mail plan',
toNum(CV_data['Voice mail plan'])).cache()

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\pyspark\sql\functions.py
in __init__(self, func, returnType, name)
   1556 self.returnType = returnType
   1557 self._broadcast = None
-> 1558 self._judf = self._create_judf(name)
   1559
   1560 def _create_judf(self, name):

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\pyspark\sql\functions.py
in _create_judf(self, name)
   1567 pickled_command, broadcast_vars, env, includes =
_prepare_for_python_RDD(sc, command, self)
   1568 ctx = SQLContext.getOrCreate(sc)
-> 1569 jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
   1570 if name is None:
   1571 name = f.__name__ if hasattr(f, '__name__') else
f.__class__.__name__

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\pyspark\sql\context.py
in _ssql_ctx(self)
681 try:
682 if not hasattr(self, '_scala_HiveContext'):
--> 683 self._scala_HiveContext = self._get_hive_ctx()
684 return self._scala_HiveContext
685 except Py4JError as e:

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\pyspark\sql\context.py
in _get_hive_ctx(self)
690
691 def _get_hive_ctx(self):
--> 692 return self._jvm.HiveContext(self._jsc.sc())
693
694 def refreshTable(self, tableName):

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py
in __call__(self, *args)
   1062 answer = self._gateway_client.send_command(command)
   1063 return_value = get_return_value(
-> 1064 answer, self._gateway_client, None, self._fqn)
   1065
   1066 for temp_arg in temp_args:

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\pyspark\sql\utils.py in
deco(*a, **kw)
 43 def deco(*a, **kw):
 44 try:
---> 45 return f(*a, **kw)
 46 except py4j.protocol.Py4JJavaError as e:
 47 s = e.java_exception.toString()

C:\Users\InAm-Ur-Rehman\Sparkkk\spark-1.6.2\python\lib\py4j-0.9-src.zip\py4j\protocol.py
in get_return_value(answer, gateway_client, target_id, name)
306 raise Py4JJavaError(
307 "An error occurred while calling {0}{1}{2}.\n".
--> 308 format(target_id, ".", name), value)
309 else:
310 raise Py4JError(

Py4JJavaError: An error occurred while calling
None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:204


Re: Error in running JavaALSExample example from spark examples

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution..

On Fri, Jul 22, 2016 at 10:43 PM, VG  wrote:

> Using 2.0.0-preview using maven
> So all dependencies should be correct I guess
>
> 
> org.apache.spark
> spark-core_2.11
> 2.0.0-preview
> provided
> 
>
> I see in maven dependencies that this brings in
> scala-reflect-2.11.4
> scala-compiler-2.11.0
>
> and so on
>
>
>
> On Fri, Jul 22, 2016 at 11:04 PM, Aaron Ilovici 
> wrote:
>
>> What version of Spark/Scala are you running?
>>
>>
>>
>> -Aaron
>>
>
>


Re: running jupyter notebook server Re: spark and plot data

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution..

On Fri, Jul 22, 2016 at 10:07 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> Hi Pseudo
>
> I do not know much about zeppelin . What languages are you using?
>
> I have been doing my data exploration and graphing using python mostly
> because early on spark had good support for python. Its easy to collect()
> data as a local PANDAS object. I think at this point R should work well.
> You should be able to easily collect() your data as a R dataframe. I have
> not tried to Rstudio.
>
> I typically run the Jupiter notebook server in my data center. I find the
> notebooks really nice. I typically use matplotlib to generates my graph.
> There are a lot of graphing packages.
>
> *Attached is the script I use to start the notebook server*. This script
> and process works but is a little hacky You call it as follows
>
>
> #
> # on a machine in your cluster
> #
> $ cd dirWithNotebooks
>
> # all the logs will be in startIPythonNotebook.sh.out
> # nohup allows you to log in start your notebook server and log out.
> $ nohup startIPythonNotebook.sh > startIPythonNotebook.sh.out &
>
> #
> # on you local machine
> #
>
> # because of firewalls I need to open an ssh tunnel
> $ ssh -o ServerAliveInterval=120 -N -f -L localhost:8889:localhost:7000
> myCluster
>
> # connect to the notebook server using the browser of you choice
>
> http://localhost:8889
>
>
>
> #
> # If you need to stop your notebooks server you may need to kill the server
> # there is probably a cleaner way to do this
> # $ ps -el | head -1; ps -efl | grep python
> #
>
> http://jupyter.org/
>
>
> P.S. Jupiter is in the process of being released. The new Juypter lab
> alpha was just announced it looks really sweet.
>
>
>
> From: pseudo oduesp 
> Date: Friday, July 22, 2016 at 2:08 AM
> To: Andrew Davidson 
> Subject: Re: spark and plot data
>
> HI andy  ,
> thanks for reply ,
> i tell it just hard to each time switch  between local concept and
> destributed concept , for example zepplin give easy way to interact with
> data ok , but it's hard to configure on huge cluster with lot of node in my
> case i have cluster with 69 nodes and i process huge volume of data with
> pyspark and it cool but when  i want to plot some chart  i get hard job to
> make it .
>
> i sampling my result or aggregate  , take for example if i user
> randomforest algorithme in machine learning  i want to retrive  most
> importante features with my version alerady installed in our cluster
> (1.5.0) i can't get this.
>
> do you have any solution.
>
> Thanks
>
> 2016-07-21 18:44 GMT+02:00 Andy Davidson :
>
>> Hi Pseudo
>>
>> Plotting, graphing, data visualization, report generation are common
>> needs in scientific and enterprise computing.
>>
>> Can you tell me more about your use case? What is it about the current
>> process / workflow do you think could be improved by pushing plotting (I
>> assume you mean plotting and graphing) into spark.
>>
>>
>> In my personal work all the graphing is done in the driver on summary
>> stats calculated using spark. So for me using standard python libs has not
>> been a problem.
>>
>> Andy
>>
>> From: pseudo oduesp 
>> Date: Thursday, July 21, 2016 at 8:30 AM
>> To: "user @spark" 
>> Subject: spark and plot data
>>
>> Hi ,
>> i know spark  it s engine  to compute large data set but for me i work
>> with pyspark and it s very wonderful machine
>>
>> my question  we  don't have tools for ploting data each time we have to
>> switch and go back to python for using plot.
>> but when you have large result scatter plot or roc curve  you cant use
>> collect to take data .
>>
>> somone have propostion for plot .
>>
>> thanks
>>
>>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


Re: ml models distribution

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution.. plz

On Fri, Jul 22, 2016 at 6:12 PM, Sean Owen  wrote:

> No there isn't anything in particular, beyond the various bits of
> serialization support that write out something to put in your storage
> to begin with. What you do with it after reading and before writing is
> up to your app, on purpose.
>
> If you mean you're producing data outside the model that your model
> uses, your model data might be produced by an RDD operation, and saved
> that way. There it's no different than anything else you do with RDDs.
>
> What part are you looking to automate beyond those things? that's most of
> it.
>
> On Fri, Jul 22, 2016 at 2:04 PM, Sergio Fernández 
> wrote:
> > Hi Sean,
> >
> > On Fri, Jul 22, 2016 at 12:52 PM, Sean Owen  wrote:
> >>
> >> If you mean, how do you distribute a new model in your application,
> >> then there's no magic to it. Just reference the new model in the
> >> functions you're executing in your driver.
> >>
> >> If you implemented some other manual way of deploying model info, just
> >> do that again. There's no special thing to know.
> >
> >
> > Well, because some huge model, we typically bundle both logic
> > (pipeline/application)  and models separately. Normally we use a shared
> > stores (e.g., HDFS) or coordinated distribution of the models. But I
> wanted
> > to know if there is any infrastructure in Spark that specifically
> addresses
> > such need.
> >
> > Thanks.
> >
> > Cheers,
> >
> > P.S.: sorry Jacek, with "ml" I meant "Machine Learning". I thought is a
> > quite spread acronym. Sorry for the possible confusion.
> >
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: +43 6602747925
> > e: sergio.fernan...@redlink.co
> > w: http://redlink.co
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: MLlib, Java, and DataFrame

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution..plz

On Fri, Jul 22, 2016 at 5:50 PM, Jean Georges Perrin  wrote:

> Thanks Marco - I like the idea of sticking with DataFrames ;)
>
>
> On Jul 22, 2016, at 7:07 AM, Marco Mistroni  wrote:
>
> Hello Jean
>  you can take ur current DataFrame and send them to mllib (i was doing
> that coz i dindt know the ml package),but the process is littlebit
> cumbersome
>
>
> 1. go from DataFrame to Rdd of Rdd of [LabeledVectorPoint]
> 2. run your ML model
>
> i'd suggest you stick to DataFrame + ml package :)
>
> hth
>
>
>
> On Fri, Jul 22, 2016 at 4:41 AM, Jean Georges Perrin  wrote:
>
>> Hi,
>>
>> I am looking for some really super basic examples of MLlib (like a linear
>> regression over a list of values) in Java. I have found a few, but I only
>> saw them using JavaRDD... and not DataFrame.
>>
>> I was kind of hoping to take my current DataFrame and send them in MLlib.
>> Am I too optimistic? Do you know/have any example like that?
>>
>> Thanks!
>>
>> jg
>>
>>
>> Jean Georges Perrin
>> j...@jgp.net / @jgperrin
>>
>>
>>
>>
>>
>
>


Re: Create dataframe column from list

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution..

On Fri, Jul 22, 2016 at 5:26 PM, Ashutosh Kumar 
wrote:

>
> http://stackoverflow.com/questions/36382052/converting-list-to-column-in-spark
>
>
> On Fri, Jul 22, 2016 at 5:15 PM, Divya Gehlot 
> wrote:
>
>> Hi,
>> Can somebody help me by creating the dataframe column from the scala list
>> .
>> Would really appreciate the help .
>>
>> Thanks ,
>> Divya
>>
>
>