[jira] [Commented] (SPARK-8556) Beeline script throws ClassNotFoundException

2017-03-04 Thread Arvind Surve (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895898#comment-15895898
 ] 

Arvind Surve commented on SPARK-8556:
-

Hi Cheng,

Would you mind sharing configuration issue you had?

-Arvind

> Beeline script throws ClassNotFoundException
> 
>
> Key: SPARK-8556
> URL: https://issues.apache.org/jira/browse/SPARK-8556
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Priority: Blocker
>
> 1.5.0-SNAPSHOT, commit 1dfb0f7b2aed5ee6d07543fdeac8ff7c777b63b9
> Build Spark with:
> {noformat}
> $ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1
> {noformat}
> Start HiveThriftServer2 with:
> {noformat}
> $ ./sbin/start-thriftserver.sh
> {noformat}
> Run Beeline and quit immediately:
> {noformat}
> $ ./bin/beeline -u jdbc:hive2://localhost:1
> Connecting to jdbc:hive2://localhost:1
> org/apache/hive/service/cli/thrift/TCLIService$Iface
> Beeline version 1.5.0-SNAPSHOT by Apache Hive
> 0: jdbc:hive2://localhost:1> Exception in thread "main" 
> java.lang.NoClassDefFoundError: 
> org/apache/hive/service/cli/thrift/TCLIService$Iface
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
> at java.sql.DriverManager.getConnection(DriverManager.java:208)
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186)
> at org.apache.hive.beeline.Commands.close(Commands.java:802)
> at org.apache.hive.beeline.Commands.closeall(Commands.java:784)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:673)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hive.service.cli.thrift.TCLIService$Iface
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10645) Bivariate Statistics: Spearman's Correlation support as UDAF

2015-10-19 Thread Arvind Surve (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964068#comment-14964068
 ] 

Arvind Surve commented on SPARK-10645:
--

Spearman's correlation coefficient (SpCoeff) does not fit into the UDAF model, 
as rank needs to be calculated for every column independently.

I have created a stand-alone method to have holistic approach to evaluate 
SpCoeff which is outlined below.
This method takes two arrays -- representing two columns -- (This can be 
converted to taking two RDDs as input parameters) and returns SpCoeff. This 
method can be added in org.apache.spark.sql.execution.stat.StatFunction.scala, 
with coff() method invoked for "spearman" method.

Please provide feedback on this approach and then go from there.

  // This function will calculate Spearman's rank correlation coefficient
  // Reference: 
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
  def computeSpearmanCorrCoeff(sc: SparkContext, data1:Array[Int], 
data2:Array[Int]): Double = {

val rddData1 = sc.parallelize(data1)
val rddData2 = sc.parallelize(data2)

//Calculate Rank for first vector data.
val rddData1Rank = rddData1 .zipWithIndex()
.sortByKey()
.zipWithIndex()
.map{case((a,b),c)=> (a,((c+1.0),1.0))}
.reduceByKey{case(a,b) => 
(((a._1*a._2+b._1*b._2)/(a._2+b._2),(a._2 + b._2 )))}
.map { case (a,(b,c)) => (a,b)}

//Calculate Rank for second vector data.
val rddData2Rank = rddData2 .zipWithIndex()
.sortByKey()
.zipWithIndex()
.map{case((a,b),c)=> (a,((c+1.0),1.0))}
.reduceByKey{case(a,b) => 
(((a._1*a._2+b._1*b._2)/(a._2+b._2),(a._2 + b._2 )))}
.map { case (a,(b,c)) => (a,b)}

//Calculate sum of square of diffrence of ranks between two vector 
corresponding elements in original order.
val sumSqRankDiff = rddData1.zip(rddData2)
.join(rddData1Rank).map{case (a,(b,c)) => (b, (a, c))}
.join(rddData2Rank).map{case (a,((b,c),d)) => 
(d-c)*(d-c)}.sum()

//Length of vector.
val dataLen = rddData1Rank.count()

// Return Spearman's rank correlation coefficient.
return (1 - (6 * sumSqRankDiff)/(dataLen*(dataLen*dataLen -1)))
  }


-Arvind Surve

> Bivariate Statistics: Spearman's Correlation support as UDAF
> 
>
> Key: SPARK-10645
> URL: https://issues.apache.org/jira/browse/SPARK-10645
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Jihong MA
>
> Spearman's rank correlation coefficient : 
> https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org