who can give me an example of the use of RangePartitioner.hashCode, thank
you!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-use-of-RangePartitioner-hashCode-tp18953.html
Sent from the Apache Spark Developers List mailing list archive at
this func is in Partitioner
def getPartition(key: Any): Int = key match {
case null => 0
//case None => 0
case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
}
--
View this message in context:
When the key is not In the rdd, I can also get an value , I just feel a
little strange.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Why-we-get-0-when-the-key-is-null-tp18952p18955.html
Sent from the Apache Spark Developers List mailing list
thank you!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-use-of-RangePartitioner-hashCode-tp18953p19037.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
if I remove this abstract class A[T : Encoder] {} it's ok!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18980.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
maybe you can use dataframe ,with the header file as a schema
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
do you run this on yarn mode or else?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p18978.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
We can see that when the number of been written objects equals
serializerBatchSize, the flush() will be called. But if the objects written
exceeds the default buffer size, what will happen? if this situation
happens,will the flush() be called automatelly?
private[this] def
if so, we will get exception when the numPartitions is 0.
def getPartition(key: Any): Int = key match {
case null => 0
//case None => 0
case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
}
--
View this message in context:
First thank you very much!
My executor memeory is also 4G, but my spark version is 1.5. Does spark
version make a trouble?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Broadcast-big-dataset-tp19127p19143.html
Sent from the Apache Spark
Hi Devs
In my application, i just broadcast a dataset(about 500M) to the
ececutors(100+), I got a java heap error
Jmartad-7219.hadoop.jd.local:53591 (size: 4.0 MB, free: 3.3 GB)
16/09/28 15:56:48 INFO BlockManagerInfo: Added broadcast_9_piece19 in memory
on
when we train the mode, we will use the data with a subSampleRate, so if the
subSampleRate < 1.0 , we can do a sample first to reduce the memory usage.
se the code below in GradientBoostedTrees.boost()
while (m < numIterations && !doneLearning) {
// Update data with pseudo-residuals 剩余误差
Hi Devs:
If i run sc.textFile(path,xxx) many times, will the elements be the
same(same element,same order)in each partitions?
My experiment show that it's the same, but which may not cover all the
cases. Thank you!
--
View this message in context:
with predError.zip(input) ,we get RDD data, so we can just do a sample on
predError or input, if so, we can't use zip(the elements number must be the
same in each partition),thank you!
--
View this message in context:
";<ml-node+s1001551n19899...@n3.nabble.com>;
发送时间: 2016年11月16日(星期三) 凌晨3:54
收件人: "WangJianfei"<wangjianfe...@otcaix.iscas.ac.cn>;
主题: Re: Reduce the memory usage if we do same first inGradientBoostedTrees if
subsamplingRate< 1.0
Thanks for the suggesti
Hi devs:
According to scala doc, we can see the scala has parallelize collections,
according to my experient, surely, parallelize collections can accelerate
the operation,such as(map). so i want to know does spark has used the scala
parallelize collections and even will spark consider thant?
thank you!
But I think is's user unfriendly to process standard json file with
DataFrame. Need we provide a new overrided method to do this?
--
View this message in context:
Thank you very much! I will have a look about your link.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Why-the-json-file-used-by-sparkSession-read-json-must-be-a-valid-json-object-per-line-tp19464p19466.html
Sent from the Apache Spark Developers
Hi devs:
I think it's unnecessary to use c1._1 += c2.1 in combOp operation, I
think it's the same if we use c1._1+c2._1, see the code below :
in GradientDescent.scala
val (gradientSum, lossSum, miniBatchSize) = data.sample(false,
miniBatchFraction, 42 + i)
Hi devs:
Normally, the adaptive learning rate methods can have a fast convergence
then standard SGD, so why don't we imp them?
see the link for more details
http://sebastianruder.com/optimizing-gradient-descent/index.html#adadelta
--
View this message in context:
20 matches
Mail list logo