Re: Integrating D3 with Spark
Hey Ousterhout , I found its amazing .Before this i used to use my own D3.js files that subscribes to the redis pub-shub database where output tuples are being published to the DB . So it was already including latency to push data to redis ,although it was very less. Once again thanks . On Sun, Apr 12, 2015 at 10:06 PM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi Pradyumn, Take a look at this pull request, which does something similar: https://github.com/apache/spark/pull/2342/files You can put JavaScript in script tags in Scala. That code takes a nice approach of putting most of the JavaScript in a new file, and then just calling into it from the HTML generated by the Scala files. -Kay On Apr 11, 2015, at 4:54 PM, shroffpradyumn shroffprady...@berkeley.edu wrote: I'm working on adding a data-graph to the Spark jobs page (rendered by stagePage.scala) to help users analyze the different job phases visually. I've already made a mockup using dummy data and D3.js but I'm having some difficulties integrating my JavaScript code with the Scala code of Spark. Essentially, I'm not sure on how I can access the validTasks variable in stagePage.scala from within my Javascript code such that I can use it along with D3.js to render the data-graph. Any help would be greatly appreciated! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Integrating-D3-with-Spark-tp11544.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- Thanks Regards, Anshu Shukla
Re: Parquet File Binary column statistics error when reuse byte[] among rows
Thanks for reporting this! Would you mind to open JIRA tickets for both Spark and Parquet? I'm not sure whether Parquet declares somewhere the user mustn't reuse byte arrays when using binary type. If it does, then it's a Spark bug. Anyway, this should be fixed. Cheng On 4/12/15 1:50 PM, Yijie Shen wrote: Hi, Suppose I create a dataRDD which extends RDD[Row], and each row is GenericMutableRow(Array(Int, Array[Byte])). A same Array[Byte] object is reused among rows but has different content each time. When I convert it to a dataFrame and save it as Parquet File, the file's row group statistic(max min) of Binary column would be wrong. Here is the reason: In Parquet, BinaryStatistic just keep max min as parquet.io.api.Binary references, Spark sql would generate a new Binary backed by the same Array[Byte] passed from row. reference backed max: Binary--ByteArrayBackedBinary-- Array[Byte] Therefore, each time parquet updating row group's statistic, max min would always refer to the same Array[Byte], which has new content each time. When parquet decides to save it into file, the last row's content would be saved as both max min. It seems it is a parquet bug because it's parquet's responsibility to update statistics correctly. But not quite sure. Should I report it as a bug in parquet JIRA? The spark JIRA is https://issues.apache.org/jira/browse/SPARK-6859 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: wait time between start master and start slaves
Oh, good point. So I guess I should be able to query the master via code like this before any slaves are started. On Sat, Apr 11, 2015 at 7:52 PM Ted Yu yuzhih...@gmail.com wrote: From SparkUI.scala : def getUIPort(conf: SparkConf): Int = { conf.getInt(spark.ui.port, SparkUI.DEFAULT_PORT) } Better retrieve effective UI port before probing. Cheers On Sat, Apr 11, 2015 at 2:38 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So basically, to tell if the master is ready to accept slaves, just poll http://master-node:4040 for an HTTP 200 response? On Sat, Apr 11, 2015 at 2:42 PM Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah from what I remember it was set defensively. I don't know of a good way to check if the master is up though. I guess we could poll the Master Web UI and see if we get a 200/ok response Shivaram On Fri, Apr 10, 2015 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Check this out https://github.com/mesos/spark-ec2/blob/f0a48be1bb5aaeef508619a46065648beb8f1d92/spark-standalone/setup.sh#L26-L33 (from spark-ec2): # Start Master$BIN_FOLDER/start-master.sh # Pause sleep 20 # Start Workers$BIN_FOLDER/start-slaves.sh I know this was probably done defensively, but is there a more direct way to know when the master is ready? Nick
Re: [VOTE] Release Apache Spark 1.3.1 (RC3)
+1 On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted on is v1.3.1-rc2 (commit 3e83913): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e8391327ba586eaf54447043bd526d919043a44 The list of fixes present in this release can be found at: http://bit.ly/1C2nVPY The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.3.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1088/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.3.1-rc3-docs/ The patches on top of RC2 are: [SPARK-6851] [SQL] Create new instance for each converted parquet relation [SPARK-5969] [PySpark] Fix descending pyspark.rdd.sortByKey. [SPARK-6343] Doc driver-worker network reqs [SPARK-6767] [SQL] Fixed Query DSL error in spark sql Readme [SPARK-6781] [SQL] use sqlContext in python shell [SPARK-6753] Clone SparkConf in ShuffleSuite tests [SPARK-6506] [PySpark] Do not try to retrieve SPARK_HOME when not needed... Please vote on releasing this package as Apache Spark 1.3.1! The vote is open until Tuesday, April 14, at 07:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.3.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org