Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
06 11:59:00 <-- this message should not be counted, right? > however in my test, this one is still counted > > > > On Tue, Feb 6, 2018 at 2:05 PM, Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > >> Yes, that is correct. >> >> On Tue, Feb 6, 2

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
Yes, that is correct. On Tue, Feb 6, 2018 at 4:56 PM, Jiewen Shao <fifistorm...@gmail.com> wrote: > Vishnu, thanks for the reply > so "event time" and "window end time" have nothing to do with current > system timestamp, watermark moves with the higher value

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
tml#watermark Thanks, Vishnu On Tue, Feb 6, 2018 at 12:11 PM Jiewen Shao <fifistorm...@gmail.com> wrote: > sample code: > > Let's say Xyz is POJO with a field called timestamp, > > regarding code withWatermark("timestamp", "20 seconds") > > I

Re: Apache Spark - Spark Structured Streaming - Watermark usage

2018-01-31 Thread Vishnu Viswanath
is applicable for aggregation only. If you are having only a map function and don't want to process it, you could do a filter based on its EventTime field, but I guess you will have to compare it with the processing time since there is no API to access Watermark by the user. -Vishnu On Fri, Jan 26

Handling skewed data

2017-04-17 Thread Vishnu Viswanath
Hello All, Does anyone know if the skew handling code mentioned in this talk https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? If so can I know where to look for more info, JIRA? Pull request? Thanks in advance. Regards, Vishnu Viswanath.

Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN
ture 3 > 1741.0) If (feature 47 <= 0.0) Predict: 1.0 Else (feature 47 > 0.0) How can I achieve the same thing using MLpipelines model. Thanks in Advance. Vishnu

Re: Installing Spark on Mac

2016-03-04 Thread Vishnu Viswanath
Installing spark on mac is similar to how you install it on Linux. I use mac and have written a blog on how to install spark here is the link : http://vishnuviswanath.com/spark_start.html Hope this helps. On Fri, Mar 4, 2016 at 2:29 PM, Simon Hafner wrote: > I'd try

Re: Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Vishnu Viswanath
Thank you Ashwin. On Sun, Feb 28, 2016 at 7:19 PM, Ashwin Giridharan <ashwin.fo...@gmail.com> wrote: > Hi Vishnu, > > A partition will either be in memory or in disk. > > -Ashwin > On Feb 28, 2016 15:09, "Vishnu Viswanath" <vishnu.viswanat...@gmail.co

Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Vishnu Viswanath
, or will the whole 2nd partition be kept in disk. Regards, Vishnu

Question on RDD caching

2016-02-04 Thread Vishnu Viswanath
will store the Partition in memory 4. Therefore, each node can have partitions of different RDDs in it's cache. Can someone please tell me if I am correct. Thanks and Regards, Vishnu Viswanath,

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI , If you need a data frame specific solution , you can try the below df.select(from_unixtime(col("max(utcTimestamp)")/1000)) On Tue, 2 Feb 2016 at 09:44 Ted Yu wrote: > See related thread on using Joda DateTime: > http://search-hadoop.com/m/q3RTtSfi342nveex1=RE+NPE+ >

Re: How to accelerate reading json file?

2016-01-05 Thread VISHNU SUBRAMANIAN
HI , You can try this sqlContext.read.format("json").option("samplingRatio","0.1").load("path") If it still takes time , feel free to experiment with the samplingRatio. Thanks, Vishnu On Wed, Jan 6, 2016 at 12:43 PM, Gavin Yue <yue.yuany...@gmail.co

Re: custom schema in spark throwing error

2015-12-21 Thread VISHNU SUBRAMANIAN
Try this val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true) )) On Mon, Dec 21, 2015 at 8:26 AM, Divya Gehlot wrote: > >1. scala> import

Re: Spark ML Random Forest output.

2015-12-04 Thread Vishnu Viswanath
to index) and (index to label) and use this for getting back your original label. May be there is better way to do this.. Regards, Vishnu On Fri, Dec 4, 2015 at 4:56 PM, Eugene Morozov <evgeny.a.moro...@gmail.com> wrote: > Hello, > > I've got an input dataset of handwritten digits a

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6? Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang <yblia...@gmail.com> wrote: > You can set "handleInvalid" to "s

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you. On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang <yblia...@gmail.com> wrote: > You can get 1.6.0-RC1 from > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > currently, but it's not the last release version. > > 2015-12-02 23:57 GMT+0

Re: General question on using StringIndexer in SparkML

2015-12-01 Thread Vishnu Viswanath
is: For the column on which I am doing StringIndexing, the test data is having values which was not there in train data. Since fit() is done only on the train data, the indexing is failing. Can you suggest me what can be done in this situation. Thanks, On Mon, Nov 30, 2015 at 12:32 AM, Vishnu Viswanath

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
and do the prediction. Thanks and Regards, Vishnu Viswanath On Sun, Nov 29, 2015 at 8:14 AM, Yanbo Liang <yblia...@gmail.com> wrote: > Hi Vishnu, > > The string and indexer map is generated at model training step and > used at model prediction step. > It means that the s

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
is section of spark ml doc > http://spark.apache.org/docs/latest/ml-guide.html#how-it-works > > > > On Mon, Nov 30, 2015 at 12:52 AM, Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > >> Thanks for the reply Yanbo. >> >> I understand that the mod

General question on using StringIndexer in SparkML

2015-11-28 Thread Vishnu Viswanath
and A. So the StringIndexer will assign index as C 0.0 B 1.0 A 2.0 These indexes are different from what we used for modeling. So won’t this give me a wrong prediction if I use StringIndexer? ​ -- Thanks and Regards, Vishnu Viswanath, *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
n("line2",df2("line")) org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330 missing from line#2326 in operator !Project [line#2326,line#2330 AS line2#2331]; ​ Thanks and Regards, Vishnu Viswanath *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
: Yes, thats why I thought of adding row number in both the DataFrames and join them based on row number. Is there any better way of doing this? Both DataFrames will have same number of rows always, but are not related by any column to do join. Thanks and Regards, Vishnu Viswanath ​ On Wed, Nov 25, 201

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
, Ted Yu <yuzhih...@gmail.com> wrote: Vishnu: > rowNumber (deprecated, replaced with row_number) is a window function. > >* Window function: returns a sequential number starting at 1 within a > window partition. >* >* @group window_funcs >* @since 1.6.0

how to us DataFrame.na.fill based on condition

2015-11-23 Thread Vishnu Viswanath
Hi Can someone tell me if there is a way I can use the fill method in DataFrameNaFunctions based on some condition. e.g., df.na.fill("value1","column1","condition1") df.na.fill("value2","column1","condition2") i want to fill nulls in column1 with values - either value 1 or value 2,

Re: how to us DataFrame.na.fill based on condition

2015-11-23 Thread Vishnu Viswanath
Thanks for the reply Davies I think replace, replaces a value with another value. But what I want to do is fill in the null value of a column.( I don't have a to_replace here ) Regards, Vishnu On Mon, Nov 23, 2015 at 1:37 PM, Davies Liu <dav...@databricks.com> wrote: > DataFram

How VectorIndexer works in Spark ML pipelines

2015-10-15 Thread VISHNU SUBRAMANIAN
2.0,252.0,253.0,35.0,73.0,252.0,252.0,253.0,35.0,31.0,211.0,252.0,253.0,35.0])] I can,t understand what is happening. I tried with simple data sets also , but similar result. Please help. Thanks, Vishnu

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
HI Vinod, Yes If you want to use a scala or python function you need the block of code. Only Hive UDF's are available permanently. Thanks, Vishnu On Wed, Jul 8, 2015 at 5:17 PM, vinod kumar vinodsachin...@gmail.com wrote: Thanks Vishnu, When restart the service the UDF was not accessible

Re: UDF in spark

2015-07-08 Thread VISHNU SUBRAMANIAN
Hi, sqlContext.udf.register(udfname, functionname _) example: def square(x:Int):Int = { x * x} register udf as below sqlContext.udf.register(square,square _) Thanks, Vishnu On Wed, Jul 8, 2015 at 2:23 PM, vinod kumar vinodsachin...@gmail.com wrote: Hi Everyone, I am new to spark.may I

Re: used cores are less then total no. of core

2015-02-24 Thread VISHNU SUBRAMANIAN
Try adding --total-executor-cores 5 , where 5 is the number of cores. Thanks, Vishnu On Wed, Feb 25, 2015 at 11:52 AM, Somnath Pandeya somnath_pand...@infosys.com wrote: Hi All, I am running a simple word count example of spark (standalone cluster) , In the UI it is showing For each

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy 2013ht12...@wilp.bits-pilani.ac.in wrote: Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Siddarth, It depends on what you are trying to solve. But the connectivity for cassandra and spark is good . The answer depends upon what exactly you are trying to solve. Thanks, Vishnu On Wed, Feb 11, 2015 at 7:47 PM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi , I am new

Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
thrift server. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:31 PM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Thanks for your reply, Vishnu. I assume you are suggesting I build Hive tables and cache them in memory and query on top of that for fast, real-time querying. Perhaps, I should

Re: Re: How can I read this avro file using spark scala?

2015-02-11 Thread VISHNU SUBRAMANIAN
Check this link. https://github.com/databricks/spark-avro Home page for Spark-avro project. Thanks, Vishnu On Wed, Feb 11, 2015 at 10:19 PM, Todd bit1...@163.com wrote: Databricks provides a sample code on its website...but i can't find it for now. At 2015-02-12 00:43:07, captainfranz

Re: getting the cluster elements from kmeans run

2015-02-11 Thread VISHNU SUBRAMANIAN
You can use model.predict(point) that will help you identify the cluster center and map it to the point. rdd.map(x = (x,model.predict(x))) Thanks, Vishnu On Wed, Feb 11, 2015 at 11:06 PM, Harini Srinivasan har...@us.ibm.com wrote: Hi, Is there a way to get the elements of each cluster after

Re: NaiveBayes classifier causes ShuffleDependency class cast exception

2015-02-06 Thread VISHNU SUBRAMANIAN
Can you try creating just a single spark context and then try your code. If you want to use it for streaming pass the same sparkcontext object instead of conf. Note: Instead of just replying to me , try to use reply to all so that the post is visible for the community . That way you can expect

Re: Shuffle Dependency Casting error

2015-02-05 Thread VISHNU SUBRAMANIAN
Hi, Could you share the code snippet. Thanks, Vishnu On Thu, Feb 5, 2015 at 11:22 PM, aanilpala aanilp...@gmail.com wrote: Hi, I am working on a text mining project and I want to use NaiveBayesClassifier of MLlib to classify some stream items. So, I have two Spark contexts one of which

Re: Java Kafka Word Count Issue

2015-02-02 Thread VISHNU SUBRAMANIAN
You can use updateStateByKey() to perform the above operation. On Mon, Feb 2, 2015 at 4:29 PM, Jadhav Shweta jadhav.shw...@tcs.com wrote: Hi Sean, Kafka Producer is working fine. This is related to Spark. How can i configure spark so that it will make sure to remember count from the

Re: Failed to save RDD as text file to local file system

2015-01-08 Thread VISHNU SUBRAMANIAN
looks like it is trying to save the file in Hdfs. Check if you have set any hadoop path in your system. On Fri, Jan 9, 2015 at 12:14 PM, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: Can you check permissions etc as I am able to run r.saveAsTextFile(file:///home/cloudera/tmp/out1)

how to do incremental model updates using spark streaming and mllib

2014-12-25 Thread vishnu
an incrementally updating , how do i do it. Thanks, Vishnu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-do-incremental-model-updates-using-spark-streaming-and-mllib-tp20862.html Sent from the Apache Spark User List mailing list archive at Nabble.com