Re: Save DataFrame to HBase

2016-04-27 Thread Benjamin Kim
Hi Ted, Do you know when the release will be? I also see some documentation for usage of the hbase-spark module at the hbase website. But, I don’t see an example on how to save data. There is only one for reading/querying data. Will this be added when the final version does get released? Thank

Re: Save DataFrame to HBase

2016-04-27 Thread Benjamin Kim
Daniel, If you can get the code snippet, that would be great! I’ve been trying to get it to work for me as well. The examples on the Phoenix website do not work for me. If you are willing to also, can you include your setup to make Phoenix work with Spark? Thanks, Ben > On Apr 27, 2016, at 11

spark-ts

2016-04-27 Thread Bhupendra Mishra
Guys, please help me with following question on Spark-TS liabrary You’ve just acquired a new dataset showing the purchases of stock from market resellers during the day over a ten month period. You’ve looked at the daily data and have decided that you can model this using a time series analysis. Y

AuthorizationException while exposing via JDBC client (beeline)

2016-04-27 Thread ram kumar
Hi, I wrote a spark job which registers a temp table and when I expose it via beeline (JDBC client) $ *./bin/beeline* beeline> * !connect jdbc:hive2://IP:10003 -n ram -p *0: jdbc:hive2://IP> *show tables;+-+--+-

Re: what should I do when spark ut hang?

2016-04-27 Thread Ted Yu
Did you have a chance to take jstack when VersionsSuite was running ? You can use the following command to run the test: sbt/sbt "test-only org.apache.spark.sql.hive.client.VersionsSuite" On Wed, Apr 27, 2016 at 9:01 PM, Demon King wrote: > Hi, all: >I compile spark-1.6.1 in redhat 5.

insert into a partition table take a long time

2016-04-27 Thread 谭成灶
Hello Sir/Madam I want to insert into a partition table using dynamic partition (about 300G ,dst table created in a orc format), but in stage "get_partition_with_auth" take a long time , while I have set hive.exec.dynamic.partition=true hive.exec.dynamic.partition.mode="nonstrict"

what should I do when spark ut hang?

2016-04-27 Thread Demon King
Hi, all: I compile spark-1.6.1 in redhat 5.7(I have installed spark-1.6.0-cdh5.7.0 hive-1.1.0+cdh5.7.0 and hadoop-2.6.0+cdh5.7.0 in this machine). my compile cmd is: build/mvn --force -Psparkr -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver when I u

Re: Compute

2016-04-27 Thread Karl Higley
You're right that there's some duplicate distance computations happening in the implementation I mentioned. I ran into the kinds of issues you describe, and I ended up accepting the duplicate computational work in exchange for significantly reduced memory usage. I couldn't figure out a way to avoid

Re: Cant join same dataframe twice ?

2016-04-27 Thread Divya Gehlot
when working with Dataframes and using explain to debug I observed that Spark gives different tagging number for the same dataframe columns Like in this case val df1 = df2.join(df3,"Column1") Below throwing error missing columns val df 4 = df1.join(df3,"Column2") For instance,df2 has 2 columns ,

Re: Compute

2016-04-27 Thread nguyen duc tuan
I see this implementation before. The problem here is that: If after several hashes, if a pair of points appears K times in a bucket (with respect to K hashes), the distance needs to be computed K times, and total the data needs to shuffled will upto K times. So it deduce to my problem. I'm trying

Re: Compute

2016-04-27 Thread Karl Higley
One idea is to avoid materializing the pairs of points before computing the distances between them. You could do that using the LSH signatures by building (Signature, (Int, Vector)) tuples, grouping by signature, and then iterating pairwise over the resulting lists of points to compute the distance

Compute

2016-04-27 Thread nguyen duc tuan
Hi all, Currently, I'm working on implementing LSH on spark. The problem leads to follow problem. I have an RDD[(Int, Int)] stores all pairs of ids of vectors need to compute distance and an other RDD[(Int, Vector)] stores all vectors with their ids. Can anyone suggest an efficiency way to compute

Fwd: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Michael Segel
Doh! Wrong email account again! > Begin forwarded message: > > From: Michael Segel > Subject: Re: Spark support for Complex Event Processing (CEP) > Date: April 27, 2016 at 7:16:55 PM CDT > To: Mich Talebzadeh > Cc: Esa Heikkinen , "user@spark" > > > Uhm… > I think you need to clarify a

Spark executor crashes when the tasks are cancelled

2016-04-27 Thread Kiran Chitturi
Hi, We are seeing this issue with Spark 1.6.1. The executor is exiting when one of the running tasks is cancelled. The executor logs is showing the below error and crashing. 16/04/27 16:34:13 ERROR SparkUncaughtExceptionHandler: [Container in > shutdown] Uncaught exception in thread Thread[Execu

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Mich Talebzadeh
couple of things. There is no such thing as Continuous Data Streaming as there is no such thing as Continuous Availability. There is such thing as Discrete Data Streaming and High Availability but they reduce the finite unavailability to minimum. In terms of business needs a 5 SIGMA is good eno

Re: EOFException while reading from HDFS

2016-04-27 Thread Bibudh Lahiri
Hi, I installed Hadoop 2.6.0 today on one of the machines (172.26.49.156), got HDFS running on it (both Namenode and Datanode on the same machine) and copied the files to HDFS. However, from the same machine, when I try to load the same CSV with the following statement: sqlContext.read.format(

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Esa Heikkinen
Hi Thanks for the answer. I have developed a log file analyzer for RTPIS (Real Time Passenger Information System) system, where buses drive lines and the system try to estimate the arrival times to the bus stops. There are many different log files (and events) and analyzing situation can be

Re: Transformation question

2016-04-27 Thread Mathieu Longtin
I would make a DataFrame (or DataSet) out of the RDD and use SQL join. On Wed, Apr 27, 2016 at 2:50 PM Eduardo wrote: > Is there a way to write a transformation that for each entry of an RDD > uses certain other values of another RDD? As an example, image you have a > RDD of entries to predict a

Re: How can I bucketize / group a DataFrame from parquet files?

2016-04-27 Thread Michael Armbrust
Unfortunately, I don't think there is an easy way to do this in 1.6. In Spark 2.0 we will make DataFrame = Dataset[Row], so this should work out of the box. On Mon, Apr 25, 2016 at 11:08 PM, Brandon White wrote: > I am creating a dataFrame from parquet files. The schema is based on the > parque

Re: getting ClassCastException when calling UDF

2016-04-27 Thread Paras sachdeva
Hi Can you try below : We are registering using spark.sql.function.udf : def *myUDF*(wgts: *Int*, amnt: *Float*) = (wgts*amnt)/100.asInstanceOf[ Float] val *myUdf* = udf(myUDF(_:int,_:Float)) > Now you can invoke the function directly in spark sql or outside. Thanks, Paras Sachdeva On Wed, Ap

Transformation question

2016-04-27 Thread Eduardo
Is there a way to write a transformation that for each entry of an RDD uses certain other values of another RDD? As an example, image you have a RDD of entries to predict a certain label. In a second RDD, you have historical data. So for each entry in the first RDD, you want to find similar entries

Re: Save DataFrame to HBase

2016-04-27 Thread Paras sachdeva
Hi Daniel, Would you possibly be able to share the snipped to code you have used ? Thank you. On Wed, Apr 27, 2016 at 3:13 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi Benjamin, > Yes it should work. > > Let me know if you need further assistance I might be able to get the co

Error running spark-sql-perf version 0.3.2 against Spark 1.6

2016-04-27 Thread Michael Slavitch
Hello; I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark 1.6, I get the following when running ./bin/run --benchmark DatsetPerformance Exception in thread "main" java.lang.ClassNotFoundException: com.databricks.spark.sql.perf.DatsetPerformance Even though the cl

Dataframe fails for large resultsize

2016-04-27 Thread Buntu Dev
I got 14GB of parquet data and when trying to apply order by using spark sql and save the first 1M rows but keeps failing with "Connection reset by peer: socket write error" on the executors. I've allocated about 10g to both driver and the executors along with setting the maxResultSize to 10g but

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-27 Thread Cody Koeninger
Seems like it'd be better to look into the Kafka side of things to determine why you're losing leaders frequently, as opposed to trying to put a bandaid on it. On Wed, Apr 27, 2016 at 11:49 AM, SRK wrote: > Hi, > > We seem to be getting a lot of LeaderLostExceptions and our source Stream is > wor

unsubscribe

2016-04-27 Thread Varanasi, Venkata
-- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www

What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-27 Thread SRK
Hi, We seem to be getting a lot of LeaderLostExceptions and our source Stream is working with a default value of rebalance.backoff.ms which is 2000. I was thinking to increase this value to 5000. Any suggestions on this? Thanks! -- View this message in context: http://apache-spark-user-list.

Re: Streaming K-means not printing predictions

2016-04-27 Thread Ashutosh Kumar
The problem seems to be streamconxt.textFileStream(path) is not reading the file at all. It does not throw any exception also. I tried some tricks given in mailing lists like copying the file to specified directory after start of program, touching the file to change timestamp etc but no luck. Th

Re: Cant join same dataframe twice ?

2016-04-27 Thread Ted Yu
I wonder if Spark can provide better support for this case. The following schema is not user friendly (shown previsouly): StructField(b,IntegerType,false), StructField(b,IntegerType,false) Except for 'select *', there is no way for user to query any of the two fields. On Tue, Apr 26, 2016 at 10

Spark Metrics for Ganglia

2016-04-27 Thread Khalid Latif
I built Apache Spark on Ubuntu 14.04 LTS with the following command: mvn -Pspark-ganglia-lgpl -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -DskipTests clean package Build was successful. Then, following modifications were made. 1. Included "SPARK_LOCAL_IP=127.0.0.1" to the file $SPARK_HOME

Re: executor delay in Spark

2016-04-27 Thread Mike Hynes
Hi Raghava, I'm terribly sorry about the end of my last email; that garbled sentence was garbled because it wasn't meant to exist; I wrote it on my phone, realized I wouldn't realistically have time to look into another set of logs deeply enough, and then mistook myself for having deleted it. Agai

unsubscribe

2016-04-27 Thread Burger, Robert
Robert Burger | Solutions Design IT Specialist | CBAW TS | TD Wealth Technology Solutions 79 Wellington Street West, 17th Floor, TD South Tower, Toronto, ON, M5K 1A2 If you wish to unsubscribe from receiving commercial electronic messages from TD Bank Group, please click here or go to the fol

unsubscribe

2016-04-27 Thread Harjit Singh
signature.asc Description: Message signed with OpenPGP using GPGMail

Fwd: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Michael Segel
Sorry sent from wrong email address. > Begin forwarded message: > > From: Michael Segel > Subject: Re: Spark support for Complex Event Processing (CEP) > Date: April 27, 2016 at 7:51:14 AM CDT > To: Mich Talebzadeh > Cc: Esa Heikkinen , "user @spark" > > > Spark and CEP? It depends… > > O

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Mich Talebzadeh
please see my other reply Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 27 April 2016 at 10:40, Esa

Re: Spark 1.6.1 throws error: Did not find registered driver with class oracle.jdbc.OracleDriver

2016-04-27 Thread Jeff Zhang
Could you check the log of executor to find the full stack trace ? On Tue, Apr 26, 2016 at 12:30 AM, Mich Talebzadeh wrote: > Hi, > > This JDBC connection was working fine in Spark 1.5,2 > > val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > val sqlContext = new HiveContext(sc) >

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
for me it works, with out making any other change, try importing import sqlContext.implicits._ otherwise verify if u are able to run other functions or u have some issue with ur setup. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_

Re: user Digest 27 Apr 2016 09:42:14 -0000 Issue 6581

2016-04-27 Thread Mich Talebzadeh
Hi Esa, I am trying to use Spark streaming for CEP and is looking promising. *General (from my **blog * ) In a nutshell CEP involves the continuous processing and analysis of high-volume, high-speed data streams from inside

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread shengshanzhang
1.6.1 > 在 2016年4月27日,下午6:28,Sachin Aggarwal 写道: > > what is ur spark version? > > On Wed, Apr 27, 2016 at 3:12 PM, shengshanzhang > wrote: > Hi, > > On spark website, there is code as follows showing how to create > datasets. > > > However when

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Ted Yu
Did you do the import as the first comment shows ? > On Apr 27, 2016, at 2:42 AM, shengshanzhang wrote: > > Hi, > > On spark website, there is code as follows showing how to create > datasets. > > > However when i input this line into spark-shell,there comes a Error, > and who

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
what is ur spark version? On Wed, Apr 27, 2016 at 3:12 PM, shengshanzhang wrote: > Hi, > > On spark website, there is code as follows showing how to create datasets. > > > However when i input this line into spark-shell,there comes a Error, and > who can tell me Why and how to fix this? > > scal

Re: user Digest 27 Apr 2016 09:42:14 -0000 Issue 6581

2016-04-27 Thread Mario Ds Briggs
Wikipedia defines the goal of CEP as 'respond to them (events) as quickly as possible' . So i think there is an direct link to 'streaming', when we say CEP. However pattern matching at its core applies even on historical data. thanks Mario - Message from Esa Heikkinen on Wed, 27 Apr 2016

Re: Save DataFrame to HBase

2016-04-27 Thread Daniel Haviv
Hi Benjamin, Yes it should work. Let me know if you need further assistance I might be able to get the code I've used for that project. Thank you. Daniel > On 24 Apr 2016, at 17:35, Benjamin Kim wrote: > > Hi Daniel, > > How did you get the Phoenix plugin to work? I have CDH 5.5.2 installed

error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread shengshanzhang
Hi, On spark website, there is code as follows showing how to create datasets. However when i input this line into spark-shell,there comes a Error, and who can tell me Why and how to fix this? scala> val ds = Seq(1, 2, 3).toDS() :35: error: value toDS is not a member of Seq[I

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Esa Heikkinen
Hi I have followed with interest the discussion about CEP and Spark. It is quite close to my research, which is a complex analyzing for log files and "history" data (not actually for real time streams). I have few questions: 1) Is CEP only for (real time) stream data and not for "history" d

Re: n

2016-04-27 Thread shengshanzhang
your build.sb seems a little complexed. thank you a lot. and the example in the official spark website, explains how to utilize spark-sql based on spark-shell, there is no instructions about how to writing a Self-Contained Applications. for a learner who is not Familiar with with scala or ja

Re: n

2016-04-27 Thread shengshanzhang
thanks a lot. I add a spark-sql dependence in build.sb as red line shows. name := "Simple Project" version := "1.0" scalaVersion := "2.10.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1" libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1" ~ > 在 2016年4月27日,下

Re: n

2016-04-27 Thread shengshanzhang
thanks a lot. I add a spark-sql dependence in build.sb as red line shows. name := "Simple Project" version := "1.0" scalaVersion := "2.10.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1" libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1" ~ > 在 2016年4月27日,下

Re: n

2016-04-27 Thread ramesh reddy
Spark Sql jar has to be added as a dependency in build.sbt. On Wednesday, 27 April 2016 1:57 PM, shengshanzhang wrote: Hello :     my code is as follows: --- import org.apache.spark.{SparkConf, SparkContext} import

Re: n

2016-04-27 Thread Marco Mistroni
Hi please share your build.sbt here's mine for reference (using Spark 1.6.1 + scala 2.10) (pls ignore extra stuff i have added for assembly and logging) // Set the project name to the string 'My Project' name := "SparkExamples" // The := method used in Name and Version is one of two fundamental

n

2016-04-27 Thread shengshanzhang
Hello : my code is as follows: --- import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.SQLContext case class Record(key: Int, value: String) object RDDRelation { def main(args: Arr

getting ClassCastException when calling UDF

2016-04-27 Thread Divya Gehlot
Hi, I am using Spark 1.5.2 and defined below udf import org.apache.spark.sql.functions.udf > val myUdf = (wgts : Int , amnt :Float) => { > (wgts*amnt)/100.asInstanceOf[Float] > } > val df2 = df1.withColumn("WEIGHTED_AMOUNT",callUDF(udfcalWghts, FloatType,col("RATE"),col("AMOUNT"))) In my sc

Re: removing header from csv file

2016-04-27 Thread Marco Mistroni
If u r using Scala api you can do Myrdd.zipwithindex.filter(_._2 >0).map(_._1) Maybe a little bit complicated but will do the trick As per spark CSV, you will get back a data frame which you can reconduct to rdd. . Hth Marco On 27 Apr 2016 6:59 am, "nihed mbarek" wrote: > You can add a filter wi

Re: removing header from csv file

2016-04-27 Thread Nachiketa
Why "without sqlcontext" ? Could you please describe what is it that you are trying to accomplish ? Thanks. Regards, Nachiketa On Wed, Apr 27, 2016 at 10:54 AM, Ashutosh Kumar wrote: > I see there is a library spark-csv which can be used for removing header > and processing of csv files. But i