saving or visualizing PCA

2015-03-18 Thread roni
Hi , I am generating PCA using spark . But I dont know how to save it to disk or visualize it. Can some one give me some pointerspl. Thanks -Roni

FetchFailedException: Adjusted frame length exceeds 2147483647: 12716268407 - discarded

2015-03-19 Thread roni
I get 2 types of error - -org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 and FetchFailedException: Adjusted frame length exceeds 2147483647: 12716268407 - discarded Spar keeps re-trying to submit the code and keeps getting this error. My file on

distcp problems on ec2 standalone spark cluster

2015-03-09 Thread roni
I got pass the issues with the cluster not started problem by adding Yarn to mapreduce.framework.name . But when I try to to distcp , if I use uRI with s3://path to my bucket .. I get invalid path even though the bucket exists. If I use s3n:// it just hangs. Did anyone else face anything like

spark-ec2 script problems

2015-03-05 Thread roni
) at org.apache.hadoop.tools.DistCp.main(DistCp.java:374) I tried doing start-all.sh , start-dfs.sh and start-yarn.sh what should I do ? Thanks -roni

Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
Did you get this to work? I got pass the issues with the cluster not startetd problem I am having problem where distcp with s3 URI says incorrect forlder path and s3n:// hangs. stuck for 2 days :( Thanks -R -- View this message in context:

Re: Setting up Spark with YARN on EC2 cluster

2015-03-10 Thread roni
Hi Harika, Did you get any solution for this? I want to use yarn , but the spark-ec2 script does not support it. Thanks -Roni -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-up-Spark-with-YARN-on-EC2-cluster-tp21818p21991.html Sent from the Apache

Re: diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread roni
. The means inherently don't matter either way in this computation. On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote: I am trying to compute PCA using computePrincipalComponents. I also computed PCA using h2o in R and R's prcomp. The answers I get from H2o and R's

upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
= bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) Is There no way to upgrade without creating new cluster? Thanks Roni On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote: Yes, that's the problem. The RDD class exists in both binary

Re: Cannot run spark-shell command not found.

2015-03-30 Thread roni
I think you must have downloaded the spark source code gz file. It is little confusing. You have to select the hadoop version also and the actual tgz file will have spark version and hadoop version in it. -R On Mon, Mar 30, 2015 at 10:34 AM, vance46 wang2...@purdue.edu wrote: Hi all, I'm a

joining multiple parquet files

2015-03-31 Thread roni
) WHERE (DATE(TableC.date)=date(now())) I can do a 2 files join like - val joinedVal = g1.join(g2,g1.col(kmer) === g2.col(kmer)) But I am trying to find common kmer strings from 4 files. Thanks Roni

spark.sql.Row manipulation

2015-03-31 Thread roni
I have 2 paraquet files with format e.g name , age, town I read them and then join them to get all the names which are in both towns . the resultant dataset is res4: Array[org.apache.spark.sql.Row] = Array([name1, age1, town1,name2,age2,town2]) Name 1 and name 2 are same as I am joining

Re: Resource manager UI for Spark applications

2015-03-03 Thread roni
with internal Ip address. Even if I replace that address with the public IP , it still does not work. What kind of setup changes are needed for that? Thanks -roni On Tue, Mar 3, 2015 at 8:45 AM, Rohini joshi roni.epi...@gmail.com wrote: Hi , I have 2 questions - 1. I was trying to use

Re: Resource manager UI for Spark applications

2015-03-03 Thread roni
container_1386639398517_0007_01_19 Cheers On Tue, Mar 3, 2015 at 9:50 AM, roni roni.epi...@gmail.com wrote: Hi Ted, I used s3://support.elasticmapreduce/spark/install-spark to install spark on my EMR cluster. It is 1.2.0. When I click on the link for history or logs it takes me to http://ip-172-31-43

Re: Resource manager UI for Spark applications

2015-03-03 Thread roni
ah!! I think I know what you mean. My job was just in accepted stage for a long time as it was running a huge file. But now that it is in running stage , I can see it . I can see it at post 9046 though instead of 4040 . But I can see it. Thanks -roni On Tue, Mar 3, 2015 at 1:19 PM, Zhan Zhang

Re: issue Running Spark Job on Yarn Cluster

2015-03-04 Thread roni
look at the logs yarn logs --applicationId applicationId That should give the error. On Wed, Mar 4, 2015 at 9:21 AM, sachin Singh sachin.sha...@gmail.com wrote: Not yet, Please let. Me know if you found solution, Regards Sachin On 4 Mar 2015 21:45, mael2210 [via Apache Spark User List]

diffrence in PCA of MLib vs H2o in R

2015-03-24 Thread roni
that the settings for MLib PCA is same as I am using for H2o or prcomp. Thanks Roni

Storing data in MySQL from spark hive tables

2015-05-20 Thread roni
Hi , I am trying to setup the hive metastore and mysql DB connection. I have a spark cluster and I ran some programs and I have data stored in some hive tables. Now I want to store this data into Mysql so that it is available for further processing. I setup the hive-site.xml file. ?xml

Re: which database for gene alignment data ?

2015-06-08 Thread roni
with the other .bed files. The data is huge. .bed files can range from .5 GB to 5 gb (or more) I was thinking of using cassandra, but not sue if the overlapping queries can be supported and will be fast enough. Thanks for the help -Roni On Sat, Jun 6, 2015 at 7:03 AM, Ted Yu yuzhih...@gmail.com

which database for gene alignment data ?

2015-06-06 Thread roni
I want to use spark for reading compressed .bed file for reading gene sequencing alignments data. I want to store bed file data in db and then use external gene expression data to find overlaps etc, which database is best for it ? Thanks -Roni

Re: which database for gene alignment data ?

2015-06-09 Thread roni
to save something in an external database, so that we can re-use the saved data in multiple ways by multiple people. Any suggestions on the DB selection or keeping data centralized for use by multiple distinct groups? Thanks -Roni On Mon, Jun 8, 2015 at 12:47 PM, Frank Austin Nothaft fnoth

Re: Is anyone using Amazon EC2? (second attempt!)

2015-05-29 Thread roni
Hi , Any update on this? I am not sure if the issue I am seeing is related .. I have 8 slaves and when I created the cluster I specified ebs volume with 100G. I see on Ec2 8 volumes created and each attached to the corresponding slave. But when I try to copy data on it , it complains that

Re: Do I really need to build Spark for Hive/Thrift Server support?

2015-08-10 Thread roni
Hi All, Any explanation for this? As Reece said I can do operations with hive but - val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) -- gives error. I have already created spark ec2 cluster with the spark-ec2 script. How can I build it again? Thanks _Roni On Tue, Jul 28, 2015

No suitable driver found for jdbc:mysql://

2015-07-22 Thread roni
.compute.amazonaws.com:7077 http://ec2-52-25-191-999.us-west-2.compute.amazonaws.com:7077 --class saveBedToDB target/scala-2.10/adam-project_2.10-1.0.jar* *What else can I Do ?* *Thanks* *-Roni*

Upgrade spark cluster to latest version

2015-11-03 Thread roni
Hi Spark experts, This may be a very naive question but can you pl. point me to a proper way to upgrade spark version on an existing cluster. Thanks Roni > Hi, > I have a current cluster running spark 1.4 and want to upgrade to latest > version. > How can I do it without cr

connecting to remote spark and reading files on HDFS or s3 in sparkR

2015-09-10 Thread roni
I have spark installed on a EC2 cluster. Can I connect to that from my local sparkR in RStudio? if yes , how ? Can I read files which I have saved as parquet files on hdfs or s3 in sparkR ? If yes , How? Thanks -Roni

reading files on HDFS /s3 in sparkR -failing

2015-09-10 Thread roni
read file on s3 , I get - java.io.IOException: No FileSystem for scheme: s3 Thanks in advance. -Roni

Re: connecting to remote spark and reading files on HDFS or s3 in sparkR

2015-09-14 Thread roni
r). > > Thanks > Best Regards > > On Thu, Sep 10, 2015 at 11:20 PM, roni <roni.epi...@gmail.com> wrote: > >> I have spark installed on a EC2 cluster. Can I connect to that from my >> local sparkR in RStudio? if yes , how ? >> >> Can I read files which I h

Re: cannot coerce class "data.frame" to a DataFrame - with spark R

2016-02-19 Thread roni
rt them > into data.frame? > > > _ > From: roni <roni.epi...@gmail.com> > Sent: Thursday, February 18, 2016 4:55 PM > Subject: cannot coerce class "data.frame" to a DataFrame - with spark R > To: <user@spark.apache.org> > > > > Hi , > I am tryin

Re: sparkR issues ?

2016-03-15 Thread roni
e if we can change the implementation of as.data.frame() > in SparkR to avoid such covering. > > > > *From:* Alex Kozlov [mailto:ale...@gmail.com] > *Sent:* Tuesday, March 15, 2016 2:59 PM > *To:* roni <roni.epi...@gmail.com> > *Cc:* user@spark.apache.org > *Subject:* Re: sparkR i

Re: sparkR issues ?

2016-03-15 Thread roni
collision. SparkR defines it's > own DataFrame class which shadows what seems to be your own definition. > > Is DataFrame something you define? Can you rename it? > > On Mon, Mar 14, 2016 at 10:44 PM, roni <roni.epi...@gmail.com> wrote: > >> Hi, >> I am working

sparkR issues ?

2016-03-14 Thread roni
Hi, I am working with bioinformatics and trying to convert some scripts to sparkR to fit into other spark jobs. I tries a simple example from a bioinf lib and as soon as I start sparkR environment it does not work. code as follows - countData <- matrix(1:100,ncol=4) condition <-

bisecting kmeans model tree

2016-04-21 Thread roni
Hi , I want to get the bisecting kmeans tree structure to show a dendogram on the heatmap I am generating based on the hierarchical clustering of data. How do I get that using mlib . Thanks -Roni

bisecting kmeans tree

2016-04-20 Thread roni
Hi , I want to get the bisecting kmeans tree structure to show on the heatmap I am generating based on the hierarchical clustering of data. How do I get that using mlib . Thanks -R

Re: bisecting kmeans model tree

2016-07-12 Thread roni
Hi Spark,Mlib experts, Anyone who can shine light on this? Thanks _R On Thu, Apr 21, 2016 at 12:46 PM, roni <roni.epi...@gmail.com> wrote: > Hi , > I want to get the bisecting kmeans tree structure to show a dendogram on > the heatmap I am generating based on the hierarch

MLIB and R results do not match for SVD

2016-08-16 Thread roni
Hi All, Some time back I had asked the question about PCA results not matching between R and MLIB. I was suggested to use svd.v instead of PCA to match the uncentered PCA . But the results of mlib and R for svd do not match .(I can understand the numbers not matching exactly) but the

support vector regression in spark

2016-12-01 Thread roni
Hi All, I want to know how can I do support vector regression in SPARK? Thanks R

SVM regression in Spark

2016-11-29 Thread roni
ow can I do this in spark? Thanks in advance Roni

Re: SVM regression in Spark

2016-11-30 Thread roni
Hi Spark expert, Can anyone help for doing SVR (Support vector machine regression) in SPARK. Thanks R On Tue, Nov 29, 2016 at 6:50 PM, roni <roni.epi...@gmail.com> wrote: > Hi All, > I am trying to change my R code to spark. I am using SVM regression in R > . It seems like spa

Re: calculate diff of value and median in a group

2017-07-14 Thread roni
I was using this function percentile_approx on 100GB of compressed data and it just hangs there. Any pointers? On Wed, Mar 22, 2017 at 6:09 PM, ayan guha wrote: > For median, use percentile_approx with 0.5 (50th percentile is the median) > > On Thu, Mar 23, 2017 at 11:01