date:20150505

Spark/Mesos

2015-05-05 Thread Gidon Gershinsky

Hi all, I have a few questions on how Spark is integrated with Mesos - any details, or pointers to a design document / relevant source, will be much appreciated. I'm aware of this description, https://github.com/apache/spark/blob/master/docs/running-on-mesos.md But its pretty high-level as

Re: LDA and PageRank Using GraphX

2015-05-05 Thread 吴明瑜

There is a PageRank algorithm in the lib package of graphx. And you can find an example to invoke it in SynthBenchmark.scala in org.apache.spark.examples.graphx. 2015-05-03 16:52 GMT+08:00 Praveen Kumar Muthuswamy muthusamy...@gmail.com : Hi All, I am looking to run LDA for topic modeling and

Re: [ANNOUNCE] Ending Java 6 support in Spark 1.5 (Sep 2015)

2015-05-05 Thread Sean Owen

OK to file a JIRA to scrape out a few Java 6-specific things in the code? and/or close issues about working with Java 6 if they're not going to be resolved for 1.4? I suppose this means the master builds and PR builder in Jenkins should simply continue to use Java 7 then. On Tue, May 5, 2015 at

Re: practical usage of the new exactly-once supporting DirectKafkaInputDStream

2015-05-05 Thread Cody Koeninger

Glad that worked out for you. I updated the post on my github to hopefully clarify the issue. On Tue, May 5, 2015 at 9:36 AM, Mark Stewart mark.stew...@tapjoy.com wrote: In case anyone else was having similar issues, the reordering and dropping of the reduceByKey solved the issues we were

Re: practical usage of the new exactly-once supporting DirectKafkaInputDStream

2015-05-05 Thread Mark Stewart

In case anyone else was having similar issues, the reordering and dropping of the reduceByKey solved the issues we were having. Thank you kindly, Mr. Koeninger. On Thu, Apr 30, 2015 at 3:06 PM, Cody Koeninger c...@koeninger.org wrote: In fact, you're using the 2 arg form of reduce by key to

Re: [discuss] ending support for Java 6?

2015-05-05 Thread Reynold Xin

OK I sent an email. On Tue, May 5, 2015 at 2:47 PM, shane knapp skn...@berkeley.edu wrote: +1 to an announce to user and dev. java6 is so old and sad. On Tue, May 5, 2015 at 2:24 PM, Tom Graves tgraves...@yahoo.com wrote: +1. I haven't seen major objections here so I would say send

[ANNOUNCE] Ending Java 6 support in Spark 1.5 (Sep 2015)

2015-05-05 Thread Reynold Xin

Hi all, We will drop support for Java 6 starting Spark 1.5, tentative scheduled to be released in Sep 2015. Spark 1.4, scheduled to be released in June 2015, will be the last minor release that supports Java 6. That is to say: Spark 1.4.x (~ Jun 2015): will work with Java 6, 7, 8. Spark 1.5+ (~

Re: Spark/Mesos

2015-05-05 Thread Hector Yee

Speaking as a user of spark on mesos Yes it appears that each app appears as a separate framework on the mesos master In fine grained mode the number of executors goes up and down vs fixed in coarse. I would not run fine grained mode on a large cluster as it can potentially spin up a lot of

Re: Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread shane knapp

alright, this is happening again w/this worker and i will be taking it offline for further investigation. i'm OOO for the rest of the day, but will check in again later this evening. On Tue, May 5, 2015 at 9:33 AM, shane knapp skn...@berkeley.edu wrote: ok, i reset the maven cache on

Re: Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread shane knapp

hmm, still happening. looking deeper. On Tue, May 5, 2015 at 8:54 AM, shane knapp skn...@berkeley.edu wrote: taking a look now. On Tue, May 5, 2015 at 3:23 AM, Patrick Wendell pwend...@gmail.com wrote: For unknown reasons, pull requests on Jenkins worker 3 have been failing with an

Re: Scan Sharing in Spark

2015-05-05 Thread Evan R. Sparks

Scan sharing can indeed be a useful optimization in spark, because you amortize not only the time spent scanning over the data, but also time spent in task launch and scheduling overheads. Here's a trivial example in scala. I'm not aware of a place in SparkSQL where this is used - I'd imagine

Re: Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread shane knapp

ok, i reset the maven cache on amp-jenkins-worker-03 and some stuff is currently building and not failing... i'll keep a close eye on this for now. On Tue, May 5, 2015 at 9:15 AM, shane knapp skn...@berkeley.edu wrote: hmm, still happening. looking deeper. On Tue, May 5, 2015 at 8:54 AM,

Scan Sharing in Spark

2015-05-05 Thread Quang-Nhat HOANG-XUAN

Hi everyone, I have two Spark jobs inside a Spark Application, which read from the same input file. They are executed in 2 threads. Right now, I cache the input file into memory before executing these two jobs. Are there another ways to share their same input with just only one read? I know

Hive.get() called without HiveConf being already set on a yarn executor

2015-05-05 Thread Manku Timma

Looks like there is a case in TableReader.scala where Hive.get() is being called without already setting it via Hive.get(hiveconf). I am running in yarn-client mode (compiled with -Phive-provided and with hive-0.13.1a). Basically this means the broadcasted hiveconf is not getting used and the

Re: Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread shane knapp

taking a look now. On Tue, May 5, 2015 at 3:23 AM, Patrick Wendell pwend...@gmail.com wrote: For unknown reasons, pull requests on Jenkins worker 3 have been failing with an exception[1]. After trying to fix this by clearing the ivy and maven caches on the node, I've given up and simply

Re: Multi-Line JSON in SparkSQL

2015-05-05 Thread Ewan Higgs

FWIW, CSV has the same problem that renders it immune to naive partitioning. Consider the following RFC 4180 compliant record: 1,2, all,of,these,are,just,one,field ,4,5 Now, it's probably a terrible idea to give a file system awareness of actual file types, but couldn't HDFS handle this

Re: Multi-Line JSON in SparkSQL

2015-05-05 Thread Joe Halliwell

@reynold, I’ll raise a JIRA today.@oliver, let’s discuss on the ticket? I suspect the algorithm is going to be bit fiddly and would definitely benefit from multiple heads. If possible, I think we should handle pathological cases like {“:”:”:”,{”{”:”}”}} correctly, rather than bailing out.

Re: Event generator for SPARK-Streaming from csv

2015-05-05 Thread anshu shukla

I know these methods , but i need to create events using the timestamps in the data tuples ,means every time a new tuple is generated using the timestamp in a CSV file .this will be useful to simulate the data rate with time just like real sensor data . On Fri, May 1, 2015 at 2:52 PM, Juan

Re: [discuss] ending support for Java 6?

2015-05-05 Thread York, Brennon

+1 in favor of dropping Java1.6 support. +1 in favor of doing a wide ANNOUNCE to the user and dev groups declaring which version of Spark (sounds like 1.5) will drop support and when (if it isn¹t already posted somewhere) Spark 1.5 will release. On 5/5/15, 3:08 AM, Patrick Wendell

Re: New Kafka producer API

2015-05-05 Thread Cody Koeninger

Since that's an internal class used only for unit testing, what would the benefit be? On Tue, May 5, 2015 at 3:19 PM, BenFradet benjamin.fra...@gmail.com wrote: Hi, Since we're now supporting Kafka 0.8.2.1 https://github.com/apache/spark/pull/4537 , and that there is a new Producer API

New Kafka producer API

2015-05-05 Thread BenFradet

Hi, Since we're now supporting Kafka 0.8.2.1 https://github.com/apache/spark/pull/4537 , and that there is a new Producer API https://kafka.apache.org/documentation.html#producerapi with this version, I was wondering if we should convert to this new API in KafkaTestUtils

Re: Thanking Test Partners

2015-05-05 Thread Imran Rashid

+1 testing is super important, it'll be good to give recognition for it. On Mon, May 4, 2015 at 5:46 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Community testing during the QA window is an important part of the release cycle in Spark. It helps us deliver higher quality releases

NP-Complete Design Choices in Spark Implementation

2015-05-05 Thread Junior

Is there any NP-Complete related problem that Spark or Spark Streaming engineers needed to address efficiently during design/code implementation in order to achieve satisfactory performance? Can you mention some example of this NP-Complete and the implementation choice? -- View this message in

Re: Spark/Mesos

2015-05-05 Thread Timothy Chen

Hi Gidon, 1. Yes, each Spark application is wrapped in a new Mesos framework. 2. In fine grained mode, what happens is that Spark scheduler specifies a custom Mesos executor per slave, and each Mesos task is a Spark executor that will be launched by the Mesos executor. It's hard to determine

Re: Typo on Spark SQL web page

2015-05-05 Thread Tony Stevenson

Kathy, Thank you. I have CCd the project so they can resolve this. On Tuesday, 5 May 2015, Kathy Wilson kwil...@tableau.com wrote: There’s a typo in the title under Integrated on the Spark SQL web page ( https://spark.apache.org/sql/): Seemlessly mix SQL queries with Spark programs.

Re: New Kafka producer API

2015-05-05 Thread BenFradet

Even if it's only used for testing and the examples, why not move ahead of the deprecation and gain some performance along the way. Plus, regarding the examples, I think it's good practice to use the recommended API and not the legacy one. -- View this message in context:

Re: New Kafka producer API

2015-05-05 Thread Cody Koeninger

Regarding performance, keep in mind we'd probably have to turn all those async calls into blocking calls for the unit tests On Tue, May 5, 2015 at 3:44 PM, BenFradet benjamin.fra...@gmail.com wrote: Even if it's only used for testing and the examples, why not move ahead of the deprecation and

Re: Speeding up Spark build during development

2015-05-05 Thread Iulian Dragoș

I'm probably the only Eclipse user here, but it seems I have the best workflow :) At least for me things work as they should: once I imported projects in the workspace I can build and run/debug tests from the IDE. I only go to sbt when I need to re-create projects or I want to run the full test

Re: [discuss] ending support for Java 6?

2015-05-05 Thread Patrick Wendell

If there is broad consensus here to drop Java 1.6 in Spark 1.5, should we do an ANNOUNCE to user and dev? On Mon, May 4, 2015 at 7:24 PM, shane knapp skn...@berkeley.edu wrote: sgtm On Mon, May 4, 2015 at 11:23 AM, Patrick Wendell pwend...@gmail.com wrote: If we just set JAVA_HOME in

Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread Patrick Wendell

For unknown reasons, pull requests on Jenkins worker 3 have been failing with an exception[1]. After trying to fix this by clearing the ivy and maven caches on the node, I've given up and simply blacklisted that worker. [error] oro#oro;2.0.8!oro.jar origin location must be absolute:

Re: Scan Sharing in Spark

2015-05-05 Thread Quang-Nhat HOANG-XUAN

Hi, Beside caching, is it possible if an RDD has multiple child RDDs? So I can read the input one and produce multiple outputs for multiple jobs which share the input. On May 5, 2015 6:24 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Scan sharing can indeed be a useful optimization in spark,

Re: New Kafka producer API

2015-05-05 Thread BenFradet

Yes that might be true, I will have to test that. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/New-Kafka-producer-API-tp12050p12058.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: [discuss] ending support for Java 6?

2015-05-05 Thread Tom Graves

+1. I haven't seen major objections here so I would say send announcement and see if any users have objections Tom On Tuesday, May 5, 2015 5:09 AM, Patrick Wendell pwend...@gmail.com wrote: If there is broad consensus here to drop Java 1.6 in Spark 1.5, should we do an ANNOUNCE to

Re: [discuss] ending support for Java 6?

2015-05-05 Thread shane knapp

+1 to an announce to user and dev. java6 is so old and sad. On Tue, May 5, 2015 at 2:24 PM, Tom Graves tgraves...@yahoo.com wrote: +1. I haven't seen major objections here so I would say send announcement and see if any users have objections Tom On Tuesday, May 5, 2015 5:09 AM,

Re: Multi-Line JSON in SparkSQL

2015-05-05 Thread Joe Halliwell

I've raised the JSON-related ticket at https://issues.apache.org/jira/browse/SPARK-7366. @Ewan I think it would be great to support multiline CSV records too. The motivation is very similar but my instinct is that little/nothing of the implementation could be usefully shared, so it's better as a

Spark/Mesos

Re: LDA and PageRank Using GraphX

Re: [ANNOUNCE] Ending Java 6 support in Spark 1.5 (Sep 2015)

Re: practical usage of the new exactly-once supporting DirectKafkaInputDStream

Re: practical usage of the new exactly-once supporting DirectKafkaInputDStream

Re: [discuss] ending support for Java 6?

[ANNOUNCE] Ending Java 6 support in Spark 1.5 (Sep 2015)

Re: Spark/Mesos

Re: Pull request builder errors (taking Jenkins worker 3 offline)

Re: Pull request builder errors (taking Jenkins worker 3 offline)

Re: Scan Sharing in Spark

Re: Pull request builder errors (taking Jenkins worker 3 offline)

Scan Sharing in Spark

Hive.get() called without HiveConf being already set on a yarn executor

Re: Pull request builder errors (taking Jenkins worker 3 offline)

Re: Multi-Line JSON in SparkSQL

Re: Multi-Line JSON in SparkSQL

Re: Event generator for SPARK-Streaming from csv

Re: [discuss] ending support for Java 6?

Re: New Kafka producer API

New Kafka producer API

Re: Thanking Test Partners

NP-Complete Design Choices in Spark Implementation

Re: Spark/Mesos

Re: Typo on Spark SQL web page

Re: New Kafka producer API

Re: New Kafka producer API

Re: Speeding up Spark build during development

Re: [discuss] ending support for Java 6?

Pull request builder errors (taking Jenkins worker 3 offline)

Re: Scan Sharing in Spark

Re: New Kafka producer API

Re: [discuss] ending support for Java 6?

Re: [discuss] ending support for Java 6?

Re: Multi-Line JSON in SparkSQL

35 matches

Site Navigation

Mail list logo

Footer information