PySpark API signature mismatch compared to Scala

2016-03-10 Thread Li Ming Tsai
Hi, Looking at 1.6.0, I see at least the following mismatch: 1. DStream mapWithState, thought declared experimental is not found in Python 2. Dstream updateStateByKey, missing initialRDD, partitioner Is Python API falling behind? Thanks!

[Streaming] Batch interval and bulk export

2016-03-09 Thread Li Ming Tsai
Hi, I am doing a few basic operation like map -> reduceByKey -> filter, which is very similar to world count and I'm saving the result where the count > threshold. Currently the batch window is every 10s, but I would like to save the results to redshift at a lower frequency instead of every

[Kinesis] multiple KinesisRecordProcessor threads.

2016-03-04 Thread Li Ming Tsai
Hi, @chris @tdas Referring to the latest integration documentation, it states the following: "A single Kinesis input DStream can read from multiple shards of a Kinesis stream by creating multiple KinesisRecordProcessor threads." But looking at the API and the example, each time we call

Re: off-heap certain operations

2016-02-16 Thread Li Ming Tsai
Hi Sean, > Personally, I would leave this off. Is this not production ready, thus we should disable it? Thanks, Liming From: Sean Owen Sent: Saturday, February 13, 2016 2:18 AM To: Ovidiu-Cristian MARCU Cc: Ted Yu; Sea;

Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-09 Thread Li Ming Tsai
Liming From: Li Ming Tsai <mailingl...@ltsai.com> Sent: Sunday, February 7, 2016 10:03 AM To: user@spark.apache.org Subject: Re: Slowness in Kmeans calculating fastSquaredDistance Hi, I did more investigation and found out that BLAS.scala is calling the nativ

Turning on logging for internal Spark logs

2016-02-09 Thread Li Ming Tsai
Hi, I have the default conf/log4j.properties: log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout

Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-06 Thread Li Ming Tsai
/mllib/linalg/BLAS.scala#L126 private def dot(x: DenseVector, y: DenseVector): Double = { val n = x.size f2jBLAS.ddot(n, x.values, 1, y.values, 1) } Maybe Xiangrui can comment on this? From: Li Ming Tsai <mailingl...@ltsai.com> Sent:

Add Singapore meetup

2016-02-04 Thread Li Ming Tsai
Hi, Realised that Singapore has not been added. Please add http://www.meetup.com/Spark-Singapore/ Thanks!

Slowness in Kmeans calculating fastSquaredDistance

2016-02-04 Thread Li Ming Tsai
Hi, I'm using INTEL MKL on Spark 1.6.0 which I built myself with the -Pnetlib-lgpl flag. I am using spark local[4] mode and I run it like this: # export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64 # bin/spark-shell ... I have also added the following to