[jira] [Commented] (SPARK-8556) Beeline script throws ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895898#comment-15895898 ] Arvind Surve commented on SPARK-8556: - Hi Cheng, Would you mind sharing configuration issue you had? -Arvind > Beeline script throws ClassNotFoundException > > > Key: SPARK-8556 > URL: https://issues.apache.org/jira/browse/SPARK-8556 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Cheng Lian >Priority: Blocker > > 1.5.0-SNAPSHOT, commit 1dfb0f7b2aed5ee6d07543fdeac8ff7c777b63b9 > Build Spark with: > {noformat} > $ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 > {noformat} > Start HiveThriftServer2 with: > {noformat} > $ ./sbin/start-thriftserver.sh > {noformat} > Run Beeline and quit immediately: > {noformat} > $ ./bin/beeline -u jdbc:hive2://localhost:1 > Connecting to jdbc:hive2://localhost:1 > org/apache/hive/service/cli/thrift/TCLIService$Iface > Beeline version 1.5.0-SNAPSHOT by Apache Hive > 0: jdbc:hive2://localhost:1> Exception in thread "main" > java.lang.NoClassDefFoundError: > org/apache/hive/service/cli/thrift/TCLIService$Iface > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:208) > at > org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) > at > org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186) > at org.apache.hive.beeline.Commands.close(Commands.java:802) > at org.apache.hive.beeline.Commands.closeall(Commands.java:784) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:673) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) > Caused by: java.lang.ClassNotFoundException: > org.apache.hive.service.cli.thrift.TCLIService$Iface > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10645) Bivariate Statistics: Spearman's Correlation support as UDAF
[ https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964068#comment-14964068 ] Arvind Surve commented on SPARK-10645: -- Spearman's correlation coefficient (SpCoeff) does not fit into the UDAF model, as rank needs to be calculated for every column independently. I have created a stand-alone method to have holistic approach to evaluate SpCoeff which is outlined below. This method takes two arrays -- representing two columns -- (This can be converted to taking two RDDs as input parameters) and returns SpCoeff. This method can be added in org.apache.spark.sql.execution.stat.StatFunction.scala, with coff() method invoked for "spearman" method. Please provide feedback on this approach and then go from there. // This function will calculate Spearman's rank correlation coefficient // Reference: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient def computeSpearmanCorrCoeff(sc: SparkContext, data1:Array[Int], data2:Array[Int]): Double = { val rddData1 = sc.parallelize(data1) val rddData2 = sc.parallelize(data2) //Calculate Rank for first vector data. val rddData1Rank = rddData1 .zipWithIndex() .sortByKey() .zipWithIndex() .map{case((a,b),c)=> (a,((c+1.0),1.0))} .reduceByKey{case(a,b) => (((a._1*a._2+b._1*b._2)/(a._2+b._2),(a._2 + b._2 )))} .map { case (a,(b,c)) => (a,b)} //Calculate Rank for second vector data. val rddData2Rank = rddData2 .zipWithIndex() .sortByKey() .zipWithIndex() .map{case((a,b),c)=> (a,((c+1.0),1.0))} .reduceByKey{case(a,b) => (((a._1*a._2+b._1*b._2)/(a._2+b._2),(a._2 + b._2 )))} .map { case (a,(b,c)) => (a,b)} //Calculate sum of square of diffrence of ranks between two vector corresponding elements in original order. val sumSqRankDiff = rddData1.zip(rddData2) .join(rddData1Rank).map{case (a,(b,c)) => (b, (a, c))} .join(rddData2Rank).map{case (a,((b,c),d)) => (d-c)*(d-c)}.sum() //Length of vector. val dataLen = rddData1Rank.count() // Return Spearman's rank correlation coefficient. return (1 - (6 * sumSqRankDiff)/(dataLen*(dataLen*dataLen -1))) } -Arvind Surve > Bivariate Statistics: Spearman's Correlation support as UDAF > > > Key: SPARK-10645 > URL: https://issues.apache.org/jira/browse/SPARK-10645 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Jihong MA > > Spearman's rank correlation coefficient : > https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org