Re: best spark spatial lib?

2017-10-10 Thread Ram Sriharsha
why can't you do this in Magellan? Can you post a sample query that you are trying to run that has spatial and logical operators combined? Maybe I am not understanding the issue properly Ram On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad wrote: > I need to have a location

Re: cannot cast to double from spark row

2017-09-14 Thread Ram Sriharsha
; 1. row.getAs[Double](Constants.Datapoint.Latitude) > > 2. row.getAs[String](Constants.Datapoint.Latitude).toDouble > > I dont want to use row.getDouble(0) as position of column in file keeps on > change. > > Thanks, > Asmath > -- Ram Sriharsha Product Manager, Apache

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
;> contain instances for every target label. In such cases, an >>>> ArrayIndexOutOfBoundsException is generated. >>>> >>>> I've tried to reproduce the problem in a simple SBT project here: >>>> >>>>https://github.com/junglebarry/Sp

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
to be thrown in the case the training dataset is missing the rare class. could you reproduce this in a simple snippet of code that we can quickly test on the shell? On Tue, Jan 26, 2016 at 3:02 PM, Ram Sriharsha <sriharsha@gmail.com> wrote: > Hey David, Yeah absolutely!, feel free to crea

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
ception with random split ><https://gist.github.com/junglebarry/6073aa474d89f3322063>. Only >exceptions in 2/3 of cases, due to randomness. > > If these look good as test cases, I'll take a look at filing JIRAs and > getting patches tomorrow morning. It's late here! > > Than

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
this... but if it cannot for some reason, we can have a check in OneVsRest that doesn't train that classifier On Tue, Jan 26, 2016 at 4:33 PM, Ram Sriharsha <sriharsha@gmail.com> wrote: > Hey David > > In your scenario, OneVsRest is training a classifier for 1 vs not 1... and > the inp

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-25 Thread Ram Sriharsha
I'm happy to look into patching the code, but I first wanted to confirm > that the problem was real, and that I wasn't somehow misunderstanding how I > should be using OneVsRest. > > Any guidance would be appreciated - I'm new to the list. > > Many thanks, > David > -- Ram Srihar

Re: XML Parsing

2015-07-19 Thread Ram Sriharsha
You would need to write an Xml Input Format that can parse XML into lines based on start/end tags Mahout has a XMLInputFormat implementation you should be able to import: https://github.com/apache/mahout/blob/master/integration/src/main/java/org/apache/mahout/text/wikipedia/XmlInputFormat.java

Re: Examples of flatMap in dataFrame

2015-06-08 Thread Ram Sriharsha
Hi You are looking for the explode method (in Dataframe API starting 1.3 I believe) https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1002 Ram On Sun, Jun 7, 2015 at 9:22 PM, Dimp Bhat dimp201...@gmail.com wrote: Hi, I'm trying to write

Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Ram Sriharsha
Hi We are in the process of adding examples for feature transformations ( https://issues.apache.org/jira/browse/SPARK-7546) and this should be available shortly on Spark Master. In the meanwhile, the best place to start would be to look at how the Tokenizer works here:

Re: Doubts about SparkSQL

2015-05-23 Thread Ram Sriharsha
Yes it does ... you can try out the following example (the People dataset that comes with Spark). There is an inner query that filters on age and an outer query that filters on name. The physical plan applies a single composite filter on name and age as you can see below sqlContext.sql(select *

Re: Query a Dataframe in rdd.map()

2015-05-21 Thread Ram Sriharsha
21, 2015 at 10:54 AM, Ram Sriharsha sriharsha@gmail.com wrote: Your original code snippet seems incomplete and there isn't enough information to figure out what problem you actually ran into from your original code snippet there is an rdd variable which is well defined and a df variable

Re: DataFrame Column Alias problem

2015-05-21 Thread Ram Sriharsha
df.groupBy($col1).agg(count($col1).as(c)).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I

Re: Query a Dataframe in rdd.map()

2015-05-21 Thread Ram Sriharsha
Your original code snippet seems incomplete and there isn't enough information to figure out what problem you actually ran into from your original code snippet there is an rdd variable which is well defined and a df variable that is not defined in the snippet of code you sent one way to make

Re: Decision tree: categorical variables

2015-05-19 Thread Ram Sriharsha
Hi Keerthi As Xiangrui mentioned in the reply, the categorical variables are assumed to be encoded as integers between 0 and k - 1, if k is the parameter you are passing as the category info map. So you will need to handle this during parsing (your columns 3 and 6 need to be converted into ints

Re: InferredSchema Example in Spark-SQL

2015-05-17 Thread Ram Sriharsha
(), not .toRD() *From:* Ram Sriharsha [mailto:sriharsha@gmail.com] *Sent:* Monday, May 18, 2015 8:31 AM *To:* Rajdeep Dua *Cc:* user *Subject:* Re: InferredSchema Example in Spark-SQL you mean toDF() ? (toDF converts the RDD to a DataFrame, in this case inferring schema from the case

Re: InferredSchema Example in Spark-SQL

2015-05-17 Thread Ram Sriharsha
you mean toDF() ? (toDF converts the RDD to a DataFrame, in this case inferring schema from the case class) On Sun, May 17, 2015 at 5:07 PM, Rajdeep Dua rajdeep@gmail.com wrote: Hi All, Was trying the Inferred Schema spart example

Re: Using sc.HadoopConfiguration in Python

2015-05-14 Thread Ram Sriharsha
, but _jsc does not have anything to pass hadoop configs. can you illustrate your answer a bit more? TIA... On Wed, May 13, 2015 at 12:08 AM, Ram Sriharsha sriharsha@gmail.com wrote: yes, the SparkContext in the Python API has a reference to the JavaSparkContext (jsc) https://spark.apache.org

Re: Using sc.HadoopConfiguration in Python

2015-05-12 Thread Ram Sriharsha
yes, the SparkContext in the Python API has a reference to the JavaSparkContext (jsc) https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext through which you can access the hadoop configuration On Tue, May 12, 2015 at 6:39 AM, ayan guha guha.a...@gmail.com wrote: Hi