Re: Start python script with SparkLauncher

2015-11-11 Thread Andrejs
Thanks Ted, that helped me, it turned out that I wrongly formated the name of the server, I had to add spark:// in front of server name. Cheers, Andrejs On 11/11/15 14:26, Ted Yu wrote: Please take a look at launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java to see how

Start python script with SparkLauncher

2015-11-11 Thread Andrejs
spark-1.4.1-bin-hadoop2.6") .setAppResource("/home/user/MyCode/forSpark/wordcount.py").addPyFile("/home/andabe/MyCode/forSpark/wordcount.py") .setMaster("myServerName") .setAppName("pytho2word") .launch(); println("finishing") spark.waitFor(); println("finished") Any help is appreciated. Cheers, Andrejs

Re: When querying ElasticSearch, score is 0

2015-04-18 Thread Andrejs Abele
Thank you for the information. Cheers, Andrejs On 04/18/2015 10:23 AM, Nick Pentreath wrote: > ES-hadoop uses a scan & scroll search to efficiently retrieve large > result sets. Scores are not tracked in a scan and sorting is not > supported hence 0 scores. > > http://www.

When querying ElasticSearch, score is 0

2015-04-16 Thread Andrejs Abele
hip, Butler County, Ohio, in the United States. It is located about ten miles southwest of Hamilton on Howards Creek, a tributary of the Great Miami River in section 28 of R1ET3N of the Congress Lands. It is three miles west of Shandon and two miles south of Okeana.", _metadata -> Map(_index -> dbpedia, _type -> docs, _id -> AUy5aQs7895C6HE5GmG4, _score -> 0.0)) As you can see _score is 0. Would appreciate any help, Cheers, Andrejs

save as JSON objects

2014-11-04 Thread Andrejs Abele
Hi, Can some one pleas sugest me, what is the best way to output spark data as JSON file. (File where each line is a JSON object) Cheers, Andrejs

Re: how idf is calculated

2014-10-31 Thread Andrejs Abele
I found my problem. I assumed based on TF-IDF in Wikipedia , that log base 10 is used, but as I found in this discussion <https://groups.google.com/forum/#!topic/scala-language/K5tbYSYqQc8>, in scala it is actually ln (natural logarithm). Regards, Andrejs On Thu, Oct 30, 2014 at 10:49 PM,

how idf is calculated

2014-10-30 Thread Andrejs Abele
f where calculated Best regards, Andrejs

Getting vector values

2014-10-30 Thread Andrejs Abele
Hi, I'm new to Mllib and spark. I'm trying to use tf-idf and use those values for term ranking. I'm getting tf values in vector format, but how can get the values of vector? val sc = new SparkContext(conf) val documents: RDD[Seq[String]] = sc.textFile("/home/andr