Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN
HI All, I am using Spark 1.6 and Pyspark. I am trying to build a Randomforest classifier model using mlpipeline and in python. When I am trying to print the model I get the below value. RandomForestClassificationModel (uid=rfc_be9d4f681b92) with 10 trees When I use MLLIB RandomForest model wit

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+ > when+using+Joda+D

Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN
HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue wrote: > I am trying to read json files following the example: >

Re: custom schema in spark throwing error

2015-12-21 Thread VISHNU SUBRAMANIAN
Try this val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true) )) On Mon, Dec 21, 2015 at 8:26 AM, Divya Gehlot wrote: > >1. scala> import org.apache.spark.sql.hive.HiveContext >2. impor

How VectorIndexer works in Spark ML pipelines

2015-10-15 Thread VISHNU SUBRAMANIAN
HI All, I am trying to use the VectorIndexer (FeatureExtraction) technique available from the Spark ML Pipelines. I ran the example in the documentation . val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) .fit(data)

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
uery.I need to > run the mentioned block again to use the UDF. > Is there is any way to maintain UDF in sqlContext permanently? > > Thanks, > Vinod > > On Wed, Jul 8, 2015 at 7:16 AM, VISHNU SUBRAMANIAN < > johnfedrickena...@gmail.com> wrote: > >> Hi,

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
Hi, sqlContext.udf.register("udfname", functionname _) example: def square(x:Int):Int = { x * x} register udf as below sqlContext.udf.register("square",square _) Thanks, Vishnu On Wed, Jul 8, 2015 at 2:23 PM, vinod kumar wrote: > Hi Everyone, > > I am new to spark.may I know how to define

Re: used cores are less then total no. of core

2015-02-24 Thread VISHNU SUBRAMANIAN
Try adding --total-executor-cores 5 , where 5 is the number of cores. Thanks, Vishnu On Wed, Feb 25, 2015 at 11:52 AM, Somnath Pandeya < somnath_pand...@infosys.com> wrote: > Hi All, > > > > I am running a simple word count example of spark (standalone cluster) , > In the UI it is showing > > F

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy < 2013ht12...@wilp.bits-pilani.ac.in> wrote: > Hello All, > > I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark > Examples in my U

Re: getting the cluster elements from kmeans run

2015-02-11 Thread VISHNU SUBRAMANIAN
You can use model.predict(point) that will help you identify the cluster center and map it to the point. rdd.map(x => (x,model.predict(x))) Thanks, Vishnu On Wed, Feb 11, 2015 at 11:06 PM, Harini Srinivasan wrote: > Hi, > > Is there a way to get the elements of each cluster after running kmean

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
apache.spark.sql.hive.api.java.HiveContext(sc);// Queries are expressed > in HiveQL.Row[] results = sqlContext.sql(sqlClause).collect(); > > > Is my understanding right? > > Regards, > Ashish > > On Wed, Feb 11, 2015 at 4:42 PM, VISHNU SUBRAMANIAN < > johnfedrickena...@gmail

Re: Re: How can I read this avro file using spark & scala?

2015-02-11 Thread VISHNU SUBRAMANIAN
Check this link. https://github.com/databricks/spark-avro Home page for Spark-avro project. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:19 PM, Todd wrote: > Databricks provides a sample code on its website...but i can't find it for > now. > > > > > > > At 2015-02-12 00:43:07, "captainfranz" wro

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Siddarth, It depends on what you are trying to solve. But the connectivity for cassandra and spark is good . The answer depends upon what exactly you are trying to solve. Thanks, Vishnu On Wed, Feb 11, 2015 at 7:47 PM, Siddharth Ubale < siddharth.ub...@syncoms.com> wrote: > Hi , > > > > I

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Ashish, In order to answer your question , I assume that you are planning to process data and cache them in the memory.If you are using a thrift server that comes with Spark then you can query on top of it. And multiple applications can use the cached data as internally all the requests go to t

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

2015-02-06 Thread VISHNU SUBRAMANIAN
Can you try creating just a single spark context and then try your code. If you want to use it for streaming pass the same sparkcontext object instead of conf. Note: Instead of just replying to me , try to use reply to all so that the post is visible for the community . That way you can expect im

Re: Shuffle Dependency Casting error

2015-02-05 Thread VISHNU SUBRAMANIAN
Hi, Could you share the code snippet. Thanks, Vishnu On Thu, Feb 5, 2015 at 11:22 PM, aanilpala wrote: > Hi, I am working on a text mining project and I want to use > NaiveBayesClassifier of MLlib to classify some stream items. So, I have two > Spark contexts one of which is a streaming contex

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN
You can use updateStateByKey() to perform the above operation. On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta wrote: > > Hi Sean, > > Kafka Producer is working fine. > This is related to Spark. > > How can i configure spark so that it will make sure to remember count from > the beginning. > > If

Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN
looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > Can you check permissions etc as I am able to run > r.saveAsTextFile("file:///home/cloudera/tmp/out