Re: Tensor Flow

2016-12-12 Thread tog
Tensorframes is a project from databricks ( https://github.com/databricks/tensorframes). No commit for a couple of months though. Does anyone have an insight on the status of the project? On Mon, 12 Dec 2016 at 19:31 Meeraj Kunnumpurath wrote: > Apologies. okay, I will have a look at Tensor Fra

Re: Apache Groovy and Spark

2015-11-18 Thread tog
s and the java7 > invoke-dynamic JARs things are better. I'm still unsure I'd use it in > production, and, given spark's focus on Scala and Python, I'd pick one of > those two > > > On 18 Nov 2015, at 20:35, tog wrote: > > Hi > > I start playing wit

Apache Groovy and Spark

2015-11-18 Thread tog
Hi I start playing with both Apache projects and quickly got that exception. Anyone being able to give some hint on the problem so that I can dig further. It seems to be a problem for Spark to load some of the groovy classes ... Any idea? Thanks Guillaume tog GroovySpark $ $GROOVY_HOME/bin

Spark+Groovy: java.lang.ClassNotFoundException: org.apache.spark.rpc.akka.AkkaRpcEnvFactory

2015-11-18 Thread tog
Hello I am trying to use Spark from Groovy. When using the grab feature supposed to download dependencies I am facing a ClassNotFoundException @Grab(group='org.apache.spark', module='spark-core_2.10', version='1.5.2') I am trying to look at the jars that might be pulled by spark-core_2.10. I dow

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
hich exactly > does this quite efficiently and can scale too. Hence, looking for a > solution using this technique. > > > regards > Bala > > > On 5 November 2015 at 18:50, tog > wrote: > >> Hi Bala >> >> Can't you do a simple dictionnary a

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
Hi Bala Can't you do a simple dictionnary and map those values to numbers? Cheers Guillaume On 5 November 2015 at 09:54, Balachandar R.A. wrote: > HI > > > I am new to spark MLlib and machine learning. I have a csv file that > consists of around 100 thousand rows and 20 columns. Of these 20 co

Re: Question abt serialization

2015-07-28 Thread tog
out line 27) > > println "Count of spark: " + file.filter({s -> s.contains('spark')}). > count() > > Thanks > Best Regards > > On Sun, Jul 26, 2015 at 12:43 AM, tog wrote: > >> Hi >> >> I have been using Spark for quite some time us

Question abt serialization

2015-07-25 Thread tog
am not doing correctly here. Thanks tog Groovy4Spark $ groovy GroovySparkWordcount.groovy class org.apache.spark.api.java.JavaRDD true true Caught: org.apache.spark.SparkException: Task not serializable org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureC

Re: sliding

2015-07-02 Thread tog
d, e), 2), ((d, e, > f), 3)] > > After filter: [((a,b,c), 0), ((d, e, f), 3)], which is what I'm assuming > you want (non-overlapping buckets)? You can then do something like > .map(func(_._1)) to apply func (e.g. min, max, mean) to the 3-tuples. > > On Thu, Jul 2, 2015

Re: sliding

2015-07-02 Thread tog
ul 2, 2015 at 2:33 PM, tog wrote: > >> Was complaining about the Seq ... >> >> Moved it to >> val eventsfiltered = events.sliding(3).map(s => Event(s(0).time, >> (s(0).x+s(1).x+s(2).x)/3.0 (s(0).vztot+s(1).vztot+s(2).vztot)/3.0)) >> >> and that is wo

Re: sliding

2015-07-02 Thread tog
the time serie. On 2 July 2015 at 18:25, Feynman Liang wrote: > What's the error you are getting? > > On Thu, Jul 2, 2015 at 9:37 AM, tog wrote: > >> Hi >> >> Sorry for this scala/spark newbie question. I am creating RDD which >> represent large tim

sliding

2015-07-02 Thread tog
Hi Sorry for this scala/spark newbie question. I am creating RDD which represent large time series this way: val data = sc.textFile("somefile.csv") case class Event( time: Double, x: Double, vztot: Double ) val events = data.filter(s => !s.startsWith("GMT")).map{s

Re: Time series data

2015-06-29 Thread tog
Hi Have you tested the Cloudera project: https://github.com/cloudera/spark-timeseries ? Let me know how did you progress on that route as I am also interested in that topic ? Cheers On 26 June 2015 at 14:07, Caio Cesar Trucolo wrote: > Hi everyone! > > I am working with multiple time series

Re: spark and binary files

2015-05-11 Thread tog
D > partitions. You may want to take a deeper look at > SparkContext.newAPIHadoopRDD to load your data. > > > > On Sat, May 9, 2015 at 4:48 PM, tog > wrote: > >> Hi >> >> I havé an application that currently run using MR. It currently starts >> ext

spark and binary files

2015-05-08 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further pro

parallelism on binary file

2015-05-08 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further pro