Re: Integrating D3 with Spark

2015-04-12 Thread anshu shukla
Hey  Ousterhout ,

I found its amazing .Before  this  i used to use  my own  D3.js files that
 subscribes  to the redis pub-shub  database  where  output  tuples are
being published  to the DB . So  it was already including latency  to push
 data to redis ,although  it was very less.
Once again thanks .

On Sun, Apr 12, 2015 at 10:06 PM, Kay Ousterhout kayousterh...@gmail.com
wrote:

 Hi Pradyumn,

 Take a look at this pull request, which does something similar:
 https://github.com/apache/spark/pull/2342/files

 You can put JavaScript in script tags in Scala. That code takes a nice
 approach of putting most of the JavaScript in a new file, and then just
 calling into it from the HTML generated by the Scala files.

 -Kay

  On Apr 11, 2015, at 4:54 PM, shroffpradyumn shroffprady...@berkeley.edu
 wrote:
 
  I'm working on adding a data-graph to the Spark jobs page (rendered by
  stagePage.scala) to help users analyze the different job phases visually.
 
  I've already made a mockup using dummy data and D3.js but I'm having
 some
  difficulties integrating my JavaScript code with the Scala code of Spark.
  Essentially, I'm not sure on how I can access the validTasks variable in
  stagePage.scala from within my Javascript code such that I can use it
 along
  with D3.js to render the data-graph.
 
  Any help would be greatly appreciated!
 
 
 
  --
  View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Integrating-D3-with-Spark-tp11544.html
  Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
Thanks  Regards,
Anshu Shukla


Re: Parquet File Binary column statistics error when reuse byte[] among rows

2015-04-12 Thread Cheng Lian
Thanks for reporting this! Would you mind to open JIRA tickets for both 
Spark and Parquet?


I'm not sure whether Parquet declares somewhere the user mustn't reuse 
byte arrays when using binary type. If it does, then it's a Spark bug. 
Anyway, this should be fixed.


Cheng

On 4/12/15 1:50 PM, Yijie Shen wrote:

Hi,

Suppose I create a dataRDD which extends RDD[Row], and each row is
GenericMutableRow(Array(Int, Array[Byte])). A same Array[Byte] object is
reused among rows but has different content each time. When I convert it to
a dataFrame and save it as Parquet File, the file's row group statistic(max
 min) of Binary column would be wrong.



Here is the reason: In Parquet, BinaryStatistic just keep max  min as
parquet.io.api.Binary references, Spark sql would generate a new Binary
backed by the same Array[Byte] passed from row.
  reference backed max: Binary--ByteArrayBackedBinary--
Array[Byte]

Therefore, each time parquet updating row group's statistic, max  min
would always refer to the same Array[Byte], which has new content each
time. When parquet decides to save it into file, the last row's content
would be saved as both max  min.



It seems it is a parquet bug because it's parquet's responsibility to
update statistics correctly.
But not quite sure. Should I report it as a bug in parquet JIRA?


The spark JIRA is https://issues.apache.org/jira/browse/SPARK-6859




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: wait time between start master and start slaves

2015-04-12 Thread Nicholas Chammas
Oh, good point. So I guess I should be able to query the master via code
like this before any slaves are started.

On Sat, Apr 11, 2015 at 7:52 PM Ted Yu yuzhih...@gmail.com wrote:

 From SparkUI.scala :

   def getUIPort(conf: SparkConf): Int = {
 conf.getInt(spark.ui.port, SparkUI.DEFAULT_PORT)
   }
 Better retrieve effective UI port before probing.

 Cheers

 On Sat, Apr 11, 2015 at 2:38 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 So basically, to tell if the master is ready to accept slaves, just poll
 http://master-node:4040 for an HTTP 200 response?
 ​

 On Sat, Apr 11, 2015 at 2:42 PM Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

  Yeah from what I remember it was set defensively. I don't know of a good
  way to check if the master is up though. I guess we could poll the
 Master
  Web UI and see if we get a 200/ok response
 
  Shivaram
 
  On Fri, Apr 10, 2015 at 8:24 PM, Nicholas Chammas 
  nicholas.cham...@gmail.com wrote:
 
  Check this out
  
 
 https://github.com/mesos/spark-ec2/blob/f0a48be1bb5aaeef508619a46065648beb8f1d92/spark-standalone/setup.sh#L26-L33
  
  (from spark-ec2):
 
  # Start Master$BIN_FOLDER/start-master.sh
 
 
  # Pause
  sleep 20
  # Start Workers$BIN_FOLDER/start-slaves.sh
 
  I know this was probably done defensively, but is there a more direct
 way
  to know when the master is ready?
 
  Nick
  ​
 
 





Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-12 Thread Mark Hamstra
+1

On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell pwend...@gmail.com
wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.3.1!

 The tag to be voted on is v1.3.1-rc2 (commit 3e83913):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e8391327ba586eaf54447043bd526d919043a44

 The list of fixes present in this release can be found at:
 http://bit.ly/1C2nVPY

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.3.1-rc3/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1088/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.3.1-rc3-docs/

 The patches on top of RC2 are:
 [SPARK-6851] [SQL] Create new instance for each converted parquet relation
 [SPARK-5969] [PySpark] Fix descending pyspark.rdd.sortByKey.
 [SPARK-6343] Doc driver-worker network reqs
 [SPARK-6767] [SQL] Fixed Query DSL error in spark sql Readme
 [SPARK-6781] [SQL] use sqlContext in python shell
 [SPARK-6753] Clone SparkConf in ShuffleSuite tests
 [SPARK-6506] [PySpark] Do not try to retrieve SPARK_HOME when not needed...

 Please vote on releasing this package as Apache Spark 1.3.1!

 The vote is open until Tuesday, April 14, at 07:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.3.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org