Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Prashant Sharma
I am getting 404 for Link https://repository.apache.org/content/repositories/orgapachespark-1217. --Prashant On Fri, Dec 9, 2016 at 10:43 AM, Michael Allman wrote: > I believe https://github.com/apache/spark/pull/16122 needs to be included > in Spark 2.1. It's a simple

Re: Issue in using DenseVector in RowMatrix, error could be due to ml and mllib package changes

2016-12-08 Thread Nick Pentreath
Yes most likely due to hashing tf returns ml vectors while you need mllib vectors for row matrix. I'd recommend using the vector conversion utils (I think in mllib.linalg.Vectors but I'm on mobile right now so can't recall exactly). There are until methods for converting single vectors as well as

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Michael Allman
I believe https://github.com/apache/spark/pull/16122 needs to be included in Spark 2.1. It's a simple bug fix to some functionality that is introduced in 2.1. Unfortunately, it's been manually verified only. There's no unit test that covers it, and

Re: how can I set the log configuration file for spark history server ?

2016-12-08 Thread Don Drake
You can update $SPARK_HOME/spark-env.sh by setting the environment variable SPARK_HISTORY_OPTS. See http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options for options (spark.history.fs.logDirectory) you can set. There is log rotation built in (by time, not size) to the

how can I set the log configuration file for spark history server ?

2016-12-08 Thread John Fang
./start-history-server.sh starting org.apache.spark.deploy.history.HistoryServer, logging to  /home/admin/koala/data/versions/0/SPARK/2.0.2/spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out Then the history will print all log to the

Issue in using DenseVector in RowMatrix, error could be due to ml and mllib package changes

2016-12-08 Thread satyajit vegesna
Hi All, PFB code. import org.apache.spark.ml.feature.{HashingTF, IDF} import org.apache.spark.ml.linalg.SparseVector import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} /** * Created by satyajit

Fwd: Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
+dev I forget to add @user. Dongjoon. -- Forwarded message - From: Dongjoon Hyun Date: Thu, Dec 8, 2016 at 16:00 Subject: Question about SPARK-11374 (skip.header.line.count) To: Hi, All. Could you give me some opinion? There

Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
Hi, All. Could you give me some opinion? There is an old SPARK issue, SPARK-11374, about removing header lines from text file. Currently, Spark supports removing CSV header lines by the following way. ``` scala> spark.read.option("header","true").csv("/data").show +---+---+ | c1| c2| +---+---+

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Shivaram Venkataraman
+0 I am not sure how much of a problem this is but the pip packaging seems to have changed the size of the hadoop-2.7 artifact. As you can see in http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/, the Hadoop 2.7 build is 359M almost double the size of the other Hadoop

Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-08 Thread Kazuaki Ishizaki
The line where I pointed out would work correctly. This is because a type of this division is double. d2i correctly handles overflow cases. Kazuaki Ishizaki From: Nicholas Chammas To: Kazuaki Ishizaki/Japan/IBM@IBMJP, Reynold Xin Cc:

Re: Publishing of the Spectral LDA model on Spark Packages

2016-12-08 Thread François Garillot
This is very cool ! Thanks a lot for making this more accessible ! Best, -- FG On Wed, Dec 7, 2016 at 11:46 PM Jencir Lee wrote: > Hello, > > We just published the Spectral LDA model on Spark Packages. It’s an > alternative approach to the LDA modelling based on tensor

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-12-08 Thread Reynold Xin
This vote is closed in favor of rc2. On Mon, Nov 28, 2016 at 5:25 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Thursday, December 1, 2016 at 18:00 UTC and > passes if a majority of at

[Spark SQL]: How does Spark HiveThriftServer handle idle sessions ?

2016-12-08 Thread Moriarty
In org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.init(), SparkSQLSessionManager inits “backgroundOperationPool” by creating a ThreadPool directly instead of invoking super.createBackgroundOperationPool(). This results in the idle-session-check-thread in

Re: modifications to ALS.scala

2016-12-08 Thread Georg Heiler
You can write some code e.g. A custom estimator transformer in sparks namespace. http://stackoverflow.com/a/40785438/2587904 might help you get started. Be aware that using private e.g. Spark internal api might be subjected to change from release to release. You definitely will require spark