Hi ,
I am generating PCA using spark .
But I dont know how to save it to disk or visualize it.
Can some one give me some pointerspl.
Thanks
-Roni
I get 2 types of error -
-org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0 and
FetchFailedException: Adjusted frame length exceeds 2147483647: 12716268407
- discarded
Spar keeps re-trying to submit the code and keeps getting this error.
My file on
I got pass the issues with the cluster not started problem by adding Yarn
to mapreduce.framework.name .
But when I try to to distcp , if I use uRI with s3://path to my bucket .. I
get invalid path even though the bucket exists.
If I use s3n:// it just hangs.
Did anyone else face anything like
)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
I tried doing start-all.sh , start-dfs.sh and start-yarn.sh
what should I do ?
Thanks
-roni
Did you get this to work?
I got pass the issues with the cluster not startetd problem
I am having problem where distcp with s3 URI says incorrect forlder path and
s3n:// hangs.
stuck for 2 days :(
Thanks
-R
--
View this message in context:
Hi Harika,
Did you get any solution for this?
I want to use yarn , but the spark-ec2 script does not support it.
Thanks
-Roni
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Setting-up-Spark-with-YARN-on-EC2-cluster-tp21818p21991.html
Sent from the Apache
. The means inherently don't matter either way in
this computation.
On Tue, Mar 24, 2015 at 6:13 AM, roni roni.epi...@gmail.com wrote:
I am trying to compute PCA using computePrincipalComponents.
I also computed PCA using h2o in R and R's prcomp. The answers I get
from
H2o and R's
I have a EC2 cluster created using spark version 1.2.1.
And I have a SBT project .
Now I want to upgrade to spark 1.3 and use the new features.
Below are issues .
Sorry for the long post.
Appreciate your help.
Thanks
-Roni
Question - Do I have to create a new cluster using spark 1.3?
Here
...@gmail.com
wrote:
What version of Spark do the other dependencies rely on (Adam and H2O?) -
that could be it
Or try sbt clean compile
—
Sent from Mailbox https://www.dropbox.com/mailbox
On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:
I have a EC2 cluster created using
= bedFile.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
* val joinRDD = bedPair.join(filtered) *
Any idea whats going on?
I have data on the EC2 so I am avoiding creating the new cluster , but just
upgrading and changing the code to use 1.3 and Spark SQL
Thanks
Roni
On Wed, Mar 25
://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:
Thanks Dean and Nick.
So, I removed the ADAM and H2o from my SBT
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)
Is There no way to upgrade without creating new cluster?
Thanks
Roni
On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote:
Yes, that's the problem. The RDD class exists in both binary
I think you must have downloaded the spark source code gz file.
It is little confusing. You have to select the hadoop version also and
the actual tgz file will have spark version and hadoop version in it.
-R
On Mon, Mar 30, 2015 at 10:34 AM, vance46 wang2...@purdue.edu wrote:
Hi all,
I'm a
)
WHERE (DATE(TableC.date)=date(now()))
I can do a 2 files join like - val joinedVal =
g1.join(g2,g1.col(kmer) === g2.col(kmer))
But I am trying to find common kmer strings from 4 files.
Thanks
Roni
I have 2 paraquet files with format e.g name , age, town
I read them and then join them to get all the names which are in both
towns .
the resultant dataset is
res4: Array[org.apache.spark.sql.Row] = Array([name1, age1,
town1,name2,age2,town2])
Name 1 and name 2 are same as I am joining
with internal Ip address. Even if I replace that address
with the public IP , it still does not work. What kind of setup changes
are needed for that?
Thanks
-roni
On Tue, Mar 3, 2015 at 8:45 AM, Rohini joshi roni.epi...@gmail.com
wrote:
Hi ,
I have 2 questions -
1. I was trying to use
container_1386639398517_0007_01_19
Cheers
On Tue, Mar 3, 2015 at 9:50 AM, roni roni.epi...@gmail.com wrote:
Hi Ted,
I used s3://support.elasticmapreduce/spark/install-spark to install
spark on my EMR cluster. It is 1.2.0.
When I click on the link for history or logs it takes me to
http://ip-172-31-43
ah!! I think I know what you mean. My job was just in accepted stage for
a long time as it was running a huge file.
But now that it is in running stage , I can see it . I can see it at post
9046 though instead of 4040 . But I can see it.
Thanks
-roni
On Tue, Mar 3, 2015 at 1:19 PM, Zhan Zhang
look at the logs
yarn logs --applicationId applicationId
That should give the error.
On Wed, Mar 4, 2015 at 9:21 AM, sachin Singh sachin.sha...@gmail.com
wrote:
Not yet,
Please let. Me know if you found solution,
Regards
Sachin
On 4 Mar 2015 21:45, mael2210 [via Apache Spark User List]
that the settings for MLib PCA is same as I am using for
H2o or prcomp.
Thanks
Roni
Hi ,
I am trying to setup the hive metastore and mysql DB connection.
I have a spark cluster and I ran some programs and I have data stored in
some hive tables.
Now I want to store this data into Mysql so that it is available for
further processing.
I setup the hive-site.xml file.
?xml
with the other .bed files.
The data is huge. .bed files can range from .5 GB to 5 gb (or more)
I was thinking of using cassandra, but not sue if the overlapping
queries can be supported and will be fast enough.
Thanks for the help
-Roni
On Sat, Jun 6, 2015 at 7:03 AM, Ted Yu yuzhih...@gmail.com
I want to use spark for reading compressed .bed file for reading gene
sequencing alignments data.
I want to store bed file data in db and then use external gene expression
data to find overlaps etc, which database is best for it ?
Thanks
-Roni
to save something in an external database, so that we can re-use
the saved data in multiple ways by multiple people.
Any suggestions on the DB selection or keeping data centralized for use by
multiple distinct groups?
Thanks
-Roni
On Mon, Jun 8, 2015 at 12:47 PM, Frank Austin Nothaft fnoth
Hi ,
Any update on this?
I am not sure if the issue I am seeing is related ..
I have 8 slaves and when I created the cluster I specified ebs volume with
100G.
I see on Ec2 8 volumes created and each attached to the corresponding slave.
But when I try to copy data on it , it complains that
Hi All,
Any explanation for this?
As Reece said I can do operations with hive but -
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) -- gives error.
I have already created spark ec2 cluster with the spark-ec2 script. How can
I build it again?
Thanks
_Roni
On Tue, Jul 28, 2015
.compute.amazonaws.com:7077
http://ec2-52-25-191-999.us-west-2.compute.amazonaws.com:7077 --class
saveBedToDB target/scala-2.10/adam-project_2.10-1.0.jar*
*What else can I Do ?*
*Thanks*
*-Roni*
Hi Spark experts,
This may be a very naive question but can you pl. point me to a proper
way to upgrade spark version on an existing cluster.
Thanks
Roni
> Hi,
> I have a current cluster running spark 1.4 and want to upgrade to latest
> version.
> How can I do it without cr
I have spark installed on a EC2 cluster. Can I connect to that from my
local sparkR in RStudio? if yes , how ?
Can I read files which I have saved as parquet files on hdfs or s3 in
sparkR ? If yes , How?
Thanks
-Roni
read file on s3 , I get - java.io.IOException: No FileSystem for
scheme: s3
Thanks in advance.
-Roni
r).
>
> Thanks
> Best Regards
>
> On Thu, Sep 10, 2015 at 11:20 PM, roni <roni.epi...@gmail.com> wrote:
>
>> I have spark installed on a EC2 cluster. Can I connect to that from my
>> local sparkR in RStudio? if yes , how ?
>>
>> Can I read files which I h
rt them
> into data.frame?
>
>
> _
> From: roni <roni.epi...@gmail.com>
> Sent: Thursday, February 18, 2016 4:55 PM
> Subject: cannot coerce class "data.frame" to a DataFrame - with spark R
> To: <user@spark.apache.org>
>
>
>
> Hi ,
> I am tryin
e if we can change the implementation of as.data.frame()
> in SparkR to avoid such covering.
>
>
>
> *From:* Alex Kozlov [mailto:ale...@gmail.com]
> *Sent:* Tuesday, March 15, 2016 2:59 PM
> *To:* roni <roni.epi...@gmail.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: sparkR i
collision. SparkR defines it's
> own DataFrame class which shadows what seems to be your own definition.
>
> Is DataFrame something you define? Can you rename it?
>
> On Mon, Mar 14, 2016 at 10:44 PM, roni <roni.epi...@gmail.com> wrote:
>
>> Hi,
>> I am working
Hi,
I am working with bioinformatics and trying to convert some scripts to
sparkR to fit into other spark jobs.
I tries a simple example from a bioinf lib and as soon as I start sparkR
environment it does not work.
code as follows -
countData <- matrix(1:100,ncol=4)
condition <-
Hi ,
I want to get the bisecting kmeans tree structure to show a dendogram on
the heatmap I am generating based on the hierarchical clustering of data.
How do I get that using mlib .
Thanks
-Roni
Hi ,
I want to get the bisecting kmeans tree structure to show on the heatmap I
am generating based on the hierarchical clustering of data.
How do I get that using mlib .
Thanks
-R
Hi Spark,Mlib experts,
Anyone who can shine light on this?
Thanks
_R
On Thu, Apr 21, 2016 at 12:46 PM, roni <roni.epi...@gmail.com> wrote:
> Hi ,
> I want to get the bisecting kmeans tree structure to show a dendogram on
> the heatmap I am generating based on the hierarch
Hi All,
Some time back I had asked the question about PCA results not matching
between R and MLIB. I was suggested to use svd.v instead of PCA to match
the uncentered PCA .
But the results of mlib and R for svd do not match .(I can understand the
numbers not matching exactly) but the
Hi All,
I want to know how can I do support vector regression in SPARK?
Thanks
R
ow can I do this in spark?
Thanks in advance
Roni
Hi Spark expert,
Can anyone help for doing SVR (Support vector machine regression) in
SPARK.
Thanks
R
On Tue, Nov 29, 2016 at 6:50 PM, roni <roni.epi...@gmail.com> wrote:
> Hi All,
> I am trying to change my R code to spark. I am using SVM regression in R
> . It seems like spa
I was using this function percentile_approx on 100GB of compressed data
and it just hangs there. Any pointers?
On Wed, Mar 22, 2017 at 6:09 PM, ayan guha wrote:
> For median, use percentile_approx with 0.5 (50th percentile is the median)
>
> On Thu, Mar 23, 2017 at 11:01
43 matches
Mail list logo