I know quite a lot about machine learning, but new to scala and spark. Got 
stuck due to Spark API, so please advise.

I have a txt file with each line format like this

#label \t   # query, a strong of words, delimited by space
1  wireless amazon kindle

2  apple iPhone 5

1  kindle fire 8G

2  apple iPad
first field is the label, second field is the string My plan is to split the 
data into label and feature, transform the string into sparse vector using 
build in function Word2Vec(I assume it is using bag of words to get dict 
first), then classify using SVMWithSGD to train

object QueryClassification {


  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Query 
Classification").setMaster("local")
    val sc = new SparkContext(conf)
    val input = sc.textFile("spark_data.txt")

    val word2vec = new Word2Vec()

    val parsedData = input.map {line =>
      val parts = line.split("\t")

      ## How to write code here? I need to parse into feature vector 
      ## properly and then apply word2vec function after the map
      *LabeledPoint(parts(0).toDouble, ????)*   
    }

    ## * is the item I got from parsing parts(1) above
    word2vec.fit(*)  





    val numIterations = 20
    val model = SVMWithSGD.train(parsedData,numIterations)


  }
}
Thanks a lot

Reply via email to