Hi all,
I have a few questions on how Spark is integrated with Mesos - any
details, or pointers to a design document / relevant source, will be much
appreciated.
I'm aware of this description,
https://github.com/apache/spark/blob/master/docs/running-on-mesos.md
But its pretty high-level as
There is a PageRank algorithm in the lib package of graphx. And you can
find an example to invoke it in SynthBenchmark.scala in
org.apache.spark.examples.graphx.
2015-05-03 16:52 GMT+08:00 Praveen Kumar Muthuswamy muthusamy...@gmail.com
:
Hi All,
I am looking to run LDA for topic modeling and
OK to file a JIRA to scrape out a few Java 6-specific things in the
code? and/or close issues about working with Java 6 if they're not
going to be resolved for 1.4?
I suppose this means the master builds and PR builder in Jenkins
should simply continue to use Java 7 then.
On Tue, May 5, 2015 at
Glad that worked out for you. I updated the post on my github to hopefully
clarify the issue.
On Tue, May 5, 2015 at 9:36 AM, Mark Stewart mark.stew...@tapjoy.com
wrote:
In case anyone else was having similar issues, the reordering and dropping
of the reduceByKey solved the issues we were
In case anyone else was having similar issues, the reordering and dropping
of the reduceByKey solved the issues we were having. Thank you kindly, Mr.
Koeninger.
On Thu, Apr 30, 2015 at 3:06 PM, Cody Koeninger c...@koeninger.org wrote:
In fact, you're using the 2 arg form of reduce by key to
OK I sent an email.
On Tue, May 5, 2015 at 2:47 PM, shane knapp skn...@berkeley.edu wrote:
+1 to an announce to user and dev. java6 is so old and sad.
On Tue, May 5, 2015 at 2:24 PM, Tom Graves tgraves...@yahoo.com wrote:
+1. I haven't seen major objections here so I would say send
Hi all,
We will drop support for Java 6 starting Spark 1.5, tentative scheduled to
be released in Sep 2015. Spark 1.4, scheduled to be released in June 2015,
will be the last minor release that supports Java 6. That is to say:
Spark 1.4.x (~ Jun 2015): will work with Java 6, 7, 8.
Spark 1.5+ (~
Speaking as a user of spark on mesos
Yes it appears that each app appears as a separate framework on the mesos
master
In fine grained mode the number of executors goes up and down vs fixed in
coarse.
I would not run fine grained mode on a large cluster as it can potentially
spin up a lot of
alright, this is happening again w/this worker and i will be taking it
offline for further investigation. i'm OOO for the rest of the day, but
will check in again later this evening.
On Tue, May 5, 2015 at 9:33 AM, shane knapp skn...@berkeley.edu wrote:
ok, i reset the maven cache on
hmm, still happening. looking deeper.
On Tue, May 5, 2015 at 8:54 AM, shane knapp skn...@berkeley.edu wrote:
taking a look now.
On Tue, May 5, 2015 at 3:23 AM, Patrick Wendell pwend...@gmail.com
wrote:
For unknown reasons, pull requests on Jenkins worker 3 have been
failing with an
Scan sharing can indeed be a useful optimization in spark, because you
amortize not only the time spent scanning over the data, but also time
spent in task launch and scheduling overheads.
Here's a trivial example in scala. I'm not aware of a place in SparkSQL
where this is used - I'd imagine
ok, i reset the maven cache on amp-jenkins-worker-03 and some stuff is
currently building and not failing... i'll keep a close eye on this for
now.
On Tue, May 5, 2015 at 9:15 AM, shane knapp skn...@berkeley.edu wrote:
hmm, still happening. looking deeper.
On Tue, May 5, 2015 at 8:54 AM,
Hi everyone,
I have two Spark jobs inside a Spark Application, which read from the same
input file.
They are executed in 2 threads.
Right now, I cache the input file into memory before executing these two
jobs.
Are there another ways to share their same input with just only one read?
I know
Looks like there is a case in TableReader.scala where Hive.get() is being
called without already setting it via Hive.get(hiveconf). I am running in
yarn-client mode (compiled with -Phive-provided and with hive-0.13.1a).
Basically this means the broadcasted hiveconf is not getting used and the
taking a look now.
On Tue, May 5, 2015 at 3:23 AM, Patrick Wendell pwend...@gmail.com wrote:
For unknown reasons, pull requests on Jenkins worker 3 have been
failing with an exception[1]. After trying to fix this by clearing the
ivy and maven caches on the node, I've given up and simply
FWIW, CSV has the same problem that renders it immune to naive partitioning.
Consider the following RFC 4180 compliant record:
1,2,
all,of,these,are,just,one,field
,4,5
Now, it's probably a terrible idea to give a file system awareness of
actual file types, but couldn't HDFS handle this
@reynold, I’ll raise a JIRA today.@oliver, let’s discuss on the ticket?
I suspect the algorithm is going to be bit fiddly and would definitely benefit
from multiple heads. If possible, I think we should handle pathological cases
like {“:”:”:”,{”{”:”}”}} correctly, rather than bailing out.
I know these methods , but i need to create events using the timestamps in
the data tuples ,means every time a new tuple is generated using the
timestamp in a CSV file .this will be useful to simulate the data rate
with time just like real sensor data .
On Fri, May 1, 2015 at 2:52 PM, Juan
+1 in favor of dropping Java1.6 support.
+1 in favor of doing a wide ANNOUNCE to the user and dev groups declaring
which version of Spark (sounds like 1.5) will drop support and when (if it
isn¹t already posted somewhere) Spark 1.5 will release.
On 5/5/15, 3:08 AM, Patrick Wendell
Since that's an internal class used only for unit testing, what would the
benefit be?
On Tue, May 5, 2015 at 3:19 PM, BenFradet benjamin.fra...@gmail.com wrote:
Hi,
Since we're now supporting Kafka 0.8.2.1
https://github.com/apache/spark/pull/4537 , and that there is a new
Producer API
Hi,
Since we're now supporting Kafka 0.8.2.1
https://github.com/apache/spark/pull/4537 , and that there is a new
Producer API https://kafka.apache.org/documentation.html#producerapi
with this version, I was wondering if we should convert to this new API in
KafkaTestUtils
+1
testing is super important, it'll be good to give recognition for it.
On Mon, May 4, 2015 at 5:46 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey All,
Community testing during the QA window is an important part of the
release cycle in Spark. It helps us deliver higher quality releases
Is there any NP-Complete related problem that Spark or Spark Streaming
engineers needed to address efficiently during design/code implementation in
order to achieve satisfactory performance? Can you mention some example of
this NP-Complete and the implementation choice?
--
View this message in
Hi Gidon,
1. Yes, each Spark application is wrapped in a new Mesos framework.
2. In fine grained mode, what happens is that Spark scheduler
specifies a custom Mesos executor per slave, and each Mesos task is a
Spark executor that will be launched by the Mesos executor. It's hard
to determine
Kathy,
Thank you. I have CCd the project so they can resolve this.
On Tuesday, 5 May 2015, Kathy Wilson kwil...@tableau.com wrote:
There’s a typo in the title under Integrated on the Spark SQL web page (
https://spark.apache.org/sql/):
Seemlessly mix SQL queries with Spark programs.
Even if it's only used for testing and the examples, why not move ahead of
the deprecation and gain some performance along the way.
Plus, regarding the examples, I think it's good practice to use the
recommended API and not the legacy one.
--
View this message in context:
Regarding performance, keep in mind we'd probably have to turn all those
async calls into blocking calls for the unit tests
On Tue, May 5, 2015 at 3:44 PM, BenFradet benjamin.fra...@gmail.com wrote:
Even if it's only used for testing and the examples, why not move ahead of
the deprecation and
I'm probably the only Eclipse user here, but it seems I have the best
workflow :) At least for me things work as they should: once I imported
projects in the workspace I can build and run/debug tests from the IDE. I
only go to sbt when I need to re-create projects or I want to run the full
test
If there is broad consensus here to drop Java 1.6 in Spark 1.5, should
we do an ANNOUNCE to user and dev?
On Mon, May 4, 2015 at 7:24 PM, shane knapp skn...@berkeley.edu wrote:
sgtm
On Mon, May 4, 2015 at 11:23 AM, Patrick Wendell pwend...@gmail.com wrote:
If we just set JAVA_HOME in
For unknown reasons, pull requests on Jenkins worker 3 have been
failing with an exception[1]. After trying to fix this by clearing the
ivy and maven caches on the node, I've given up and simply blacklisted
that worker.
[error] oro#oro;2.0.8!oro.jar origin location must be absolute:
Hi,
Beside caching, is it possible if an RDD has multiple child RDDs? So I can
read the input one and produce multiple outputs for multiple jobs which
share the input.
On May 5, 2015 6:24 PM, Evan R. Sparks evan.spa...@gmail.com wrote:
Scan sharing can indeed be a useful optimization in spark,
Yes that might be true, I will have to test that.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/New-Kafka-producer-API-tp12050p12058.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
+1. I haven't seen major objections here so I would say send announcement and
see if any users have objections
Tom
On Tuesday, May 5, 2015 5:09 AM, Patrick Wendell pwend...@gmail.com
wrote:
If there is broad consensus here to drop Java 1.6 in Spark 1.5, should
we do an ANNOUNCE to
+1 to an announce to user and dev. java6 is so old and sad.
On Tue, May 5, 2015 at 2:24 PM, Tom Graves tgraves...@yahoo.com wrote:
+1. I haven't seen major objections here so I would say send announcement
and see if any users have objections
Tom
On Tuesday, May 5, 2015 5:09 AM,
I've raised the JSON-related ticket at
https://issues.apache.org/jira/browse/SPARK-7366.
@Ewan I think it would be great to support multiline CSV records too.
The motivation is very similar but my instinct is that little/nothing
of the implementation could be usefully shared, so it's better as a
35 matches
Mail list logo