Thanks for sharing this, Brandon! Looks like a great architecture for people to
build on.
Matei
On August 15, 2014 at 2:07:06 PM, Brandon Amos (a...@adobe.com) wrote:
Hi Spark community,
At Adobe Research, we're happy to open source a prototype
technology called Spindle we've been
Hi,
Do you know what YARN scheduler you're using and what version of YARN? It
seems like this would be caused by YarnClient.getQueueInfo returning null,
though, from browsing the YARN code, I'm not sure how this could happen.
-Sandy
On Fri, Aug 15, 2014 at 11:23 AM, Andrew Or
On closer look, it seems like this can occur if the queue doesn't exist.
Filed https://issues.apache.org/jira/browse/SPARK-3082.
-Sandy
On Sat, Aug 16, 2014 at 12:49 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi,
Do you know what YARN scheduler you're using and what version of YARN? It
If you mean you want to overwrite the file in-place while you're
reading it, no you can't do that with HDFS. That would be dicey on any
file system. If you just want to append to the file, yes HDFS supports
appends. I am pretty certain Spark does not have a concept that maps
to appending, though I
Quite a good question, I assume you know the size of the cluster going in,
then you can essentially try to partition the data in some multiples of
that use rangepartitioner to partition the data roughly equally. Dynamic
partitions are created based on number of blocks on filesystem hence the
Hey Brandon,
Thank you for sharing this.
What is the relationship of this project to the spark-ec2 tool that comes
with Spark? Does it provide a superset of the functionality of spark-ec2?
Nick
2014년 8월 13일 수요일, bdamosa...@adobe.com님이 작성한 메시지:
Hi Spark community,
We're excited about Spark
+1 for such a document.
Eric Friedman
On Aug 15, 2014, at 1:10 PM, Kevin Markey kevin.mar...@oracle.com wrote:
Sandy and others:
Is there a single source of Yarn/Hadoop properties that should be set or
reset for running Spark on Yarn?
We've sort of stumbled through one property
I followed this thread
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-issues-with-resourcemanager-scheduler-address-td5201.html#a5258
to set SPARK_YARN_USER_ENV to HADOOP_CONF_DIR
export SPARK_YARN_USER_ENV=CLASSPATH=$HADOOP_CONF_DIR
and used the following command to share conf
There's really nothing special besides including that jar on your classpath.
You just do selects, inserts, etc as you normally would.
The same instructions here apply
https://cwiki.apache.org/confluence/display/Hive/Parquet
From:
Thanks for your help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I'm also getting into same issue and is blocked here. Did any of you were
able to go past this issue? I tried using both ephimeral and
persistent-hdfs. I'm getting the same issue.
--
View this message in context:
Hi to all, sorry for not being fully on topic but I have 2 quick questions
about Parquet tables registered in Hive/sparq:
1) where are the created tables stored?
2) If I have multiple hiveContexts (one per application) using the same
parquet table, is there any problem if inserting concurrently
Hi,
Maybe this helps you. For the speed layer I think something like complex
event processing as it is - to some extent - supported by Spark Streaming
can make sense. You process the events as they come in. You store them
afterwards. The Spark Streaming web page gives a nice example: trend
If you're using HiveContext then all metadata is in the Hive metastore as
defined in hive-site.xml.
Concurrent writes should be fine as long as you're using a concurrent metastore
db.
From: Flavio Pompermaiermailto:pomperma...@okkam.it
Sent: 8/16/2014 1:26 PM
Hi to all, sorry for not being fully on topic but I have 2 quick questions
about Parquet tables registered in Hive/sparq:
Using HiveQL to CREATE TABLE will add a table to the metastore / warehouse
exactly as it would in hive. Registering is a purely temporary operation
that lives with the
Hi All,
I was doing a groupBy and apparently some keys were very frequent making
the serializer fail with buffer overflow exception. I did not need a
groupBy so I switched to combineByKey in this case but would like to know
how to increase the kryo buffer sizes to avoid this error. I hope there is
I want to look at porting a Hadoop problem to Spark - eventually I want to
run on a Hadoop 2.0 cluster but while I am learning and porting I want to
run small problems in my windows box.
I installed scala and sbt.
I download Spark and in the spark directory can say
mvn -Phadoop-0.23
Hi Brandon,
Looks very cool...will try it out for ad-hoc analysis of our datasets and
provide more feedback...
Could you please give bit more details about the differences of Spindle
architecture compared to Hue + Spark integration (python stack) and Ooyala
Jobserver ?
Does Spindle allow
nevermind folks!!!
On Sat, Aug 16, 2014 at 2:22 PM, Chengi Liu chengi.liu...@gmail.com wrote:
Hi,
I have data like following:
1,2,3,4
1,2,3,4
5,6,2,1
and so on..
I would like to create a new rdd as follows:
(0,0,1)
(0,1,2)
(0,2,3)
(0,3,4)
(1,0,1)
.. and so on..
How do i do
I have some RDD's stored as s3://-backed sequence files sharded into 1000
parts. The startup time is pretty long (~10's of minutes). It's
communicating with S3, but I don't know what it's doing. Is it just
fetching the metadata from S3 for each part? Is there a way to pipeline
this with the
Hi,
I have built spark-1.0.0 on Windows using Java 7/8 and I have been able to
run several examples - here are my notes -
http://ml-nlp-ir.blogspot.com/2014/04/building-spark-on-windows-and-cloudera.html
on how to build from source and run examples in spark shell.
Regards,
Manu
On Sat, Aug
Hi DB,
Thanks for your reply, I saw the slide in slidesshare, and I am studying
it. But one link in the page which is
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16579/consoleFull
reports ERROR 404 NOT FOUND.
--
View this message in context:
I am also trying to run on Windows and will post once I am able to launch.
My guess is that by hand it probably means manually forming the java
command I.e. class path and java options and then appending right class
name for worker or master.
Spark script follow hierarchy : start-master or
Hi,
I am just playing around with the codes in Spark.
I am printing out some statements of the codes given in Spark so as to see
how it looks.
Every time I change/add something to the code I have to run the command
*SPARK_HADOOP_VERSION=2.3.0 sbt/sbt assembly*
which is tiresome at times.
Is
24 matches
Mail list logo