from:"WangJianfei"

Why we get 0 when the key is null?

2016-09-15 Thread WangJianfei

this func is in Partitioner def getPartition(key: Any): Int = key match { case null => 0 //case None => 0 case _ => Utils.nonNegativeMod(key.hashCode, numPartitions) } -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-we-get-0-when-th

What's the use of RangePartitioner.hashCode

2016-09-15 Thread WangJianfei

who can give me an example of the use of RangePartitioner.hashCode, thank you! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-use-of-RangePartitioner-hashCode-tp18953.html Sent from the Apache Spark Developers List mailing list archive at N

Re: Why we get 0 when the key is null?

2016-09-15 Thread WangJianfei

When the key is not In the rdd, I can also get an value , I just feel a little strange. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-we-get-0-when-the-key-is-null-tp18952p18955.html Sent from the Apache Spark Developers List mailing list archive

What's the meaning when the partitions is zero?

2016-09-15 Thread WangJianfei

class HashPartitioner(partitions: Int) extends Partitioner { require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.") the soruce code require(partitions >=0) ,but I don't know why it makes sense when the partitions is 0. -- View this message in context: http://apac

Re: What's the meaning when the partitions is zero?

2016-09-16 Thread WangJianfei

if so, we will get exception when the numPartitions is 0. def getPartition(key: Any): Int = key match { case null => 0 //case None => 0 case _ => Utils.nonNegativeMod(key.hashCode, numPartitions) } -- View this message in context: http://apache-spark-developers-list.1001551.n3.na

Doubt about ExternalSorter.spillMemoryIteratorToDisk

2016-09-16 Thread WangJianfei

We can see that when the number of been written objects equals serializerBatchSize, the flush() will be called. But if the objects written exceeds the default buffer size, what will happen? if this situation happens,will the flush() be called automatelly? private[this] def spillMemoryIteratorToD

Re: Fwd: Question regarding merging to two RDDs

2016-09-17 Thread WangJianfei

maybe you can use dataframe ,with the header file as a schema -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei

do you run this on yarn mode or else? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18978.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread WangJianfei

if I remove this abstract class A[T : Encoder] {} it's ok! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18980.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread WangJianfei

Than you very much sir! but what i want to know is whether the hashcode overflow will make a trouble. thank you! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-use-of-RangePartitioner-hashCode-tp18953p18996.html Sent from the Apache Spark

Re: What's the use of RangePartitioner.hashCode

2016-09-24 Thread WangJianfei

thank you! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-use-of-RangePartitioner-hashCode-tp18953p19037.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --

Broadcast big dataset

2016-09-28 Thread WangJianfei

Hi Devs In my application, i just broadcast a dataset(about 500M) to the ececutors(100+), I got a java heap error Jmartad-7219.hadoop.jd.local:53591 (size: 4.0 MB, free: 3.3 GB) 16/09/28 15:56:48 INFO BlockManagerInfo: Added broadcast_9_piece19 in memory on BJHC-Jmartad-9012.hadoop.jd.local:53197

Re: Broadcast big dataset

2016-09-28 Thread WangJianfei

First thank you very much！ My executor memeory is also 4G， but my spark version is 1.5. Does spark version make a trouble? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Broadcast-big-dataset-tp19127p19143.html Sent from the Apache Spark Developers

Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-15 Thread WangJianfei

Hi devs: I'm doubt about the design of spark.read.json, why the json file is not a standard json file, who can tell me the internal reason. Any advice is appreciated. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-the-json-file-used-by-sparkSe

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-16 Thread WangJianfei

Thank you very much! I will have a look about your link. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-the-json-file-used-by-sparkSession-read-json-must-be-a-valid-json-object-per-line-tp19464p19466.html Sent from the Apache Spark Developers Lis

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-16 Thread WangJianfei

thank you！ But I think is's user unfriendly to process standard json file with DataFrame. Need we provide a new overrided method to do this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-the-json-file-used-by-sparkSession-read-json-must-be-a-val

If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition

2016-11-10 Thread WangJianfei

Hi Devs: If i run sc.textFile(path,xxx) many times, will the elements be the same(same element,same order)in each partitions? My experiment show that it's the same, but which may not cover all the cases. Thank you! -- View this message in context: http://apache-spark-developers-list.100155

Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

2016-11-11 Thread WangJianfei

when we train the mode, we will use the data with a subSampleRate, so if the subSampleRate < 1.0 , we can do a sample first to reduce the memory usage. se the code below in GradientBoostedTrees.boost() while (m < numIterations && !doneLearning) { // Update data with pseudo-residuals 剩余误差

does The Design of spark consider the scala parallelize collections?

2016-11-12 Thread WangJianfei

Hi devs: According to scala doc, we can see the scala has parallelize collections, according to my experient, surely, parallelize collections can accelerate the operation,such as(map). so i want to know does spark has used the scala parallelize collections and even will spark consider thant? tha

回复： Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0

2016-11-15 Thread WangJianfei

st]";; 发送时间: 2016年11月16日(星期三) 凌晨3:54 收件人: "WangJianfei"; 主题: Re: Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0 Thanks for the suggestion. That would be faster, but less accurate in most cases. It's generally bet

Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

2016-11-15 Thread WangJianfei

with predError.zip(input) ,we get RDD data, so we can just do a sample on predError or input, if so, we can't use zip(the elements number must be the same in each partition),thank you! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-

Question about spark.mllib.GradientDescent

2016-11-29 Thread WangJianfei

Hi devs: I think it's unnecessary to use c1._1 += c2.1 in combOp operation, I think it's the same if we use c1._1+c2._1, see the code below : in GradientDescent.scala val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) .treeAggregate((BDV.zeros[D

Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

2016-11-30 Thread WangJianfei

Hi devs: Normally, the adaptive learning rate methods can have a fast convergence then standard SGD, so why don't we imp them? see the link for more details http://sebastianruder.com/optimizing-gradient-descent/index.html#adadelta -- View this message in context: http://apache-spark-develo

Re: Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

2016-11-30 Thread WangJianfei

yes, thank you, i know this imp is very simple, but i want to know why spark mllib imp this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-don-t-we-imp-some-adaptive-learning-rate-methods-such-as-adadelat-adam-tp20057p20060.html Sent from the Apa

Why we get 0 when the key is null?

What's the use of RangePartitioner.hashCode

Re: Why we get 0 when the key is null?

What's the meaning when the partitions is zero?

Re: What's the meaning when the partitions is zero?

Doubt about ExternalSorter.spillMemoryIteratorToDisk

Re: Fwd: Question regarding merging to two RDDs

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: java.lang.NoClassDefFoundError, is this a bug?

Re: What's the use of RangePartitioner.hashCode

Re: What's the use of RangePartitioner.hashCode

Broadcast big dataset

Re: Broadcast big dataset

Why the json file used by sparkSession.read.json must be a valid json object per line

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition

Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

does The Design of spark consider the scala parallelize collections?

回复： Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0

Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

Question about spark.mllib.GradientDescent

Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

Re: Why don't we imp some adaptive learning rate methods, such as adadelat, adam?

24 matches

Site Navigation

Mail list logo

Footer information