I didn't follow all of this thread, but if you want to have exactly one
bucket-output-file per RDD-partition, you have to repartition (shuffle)
your data on the bucket-key.
If you don't repartition (shuffle), you may have records with different
bucket-keys in the same RDD-partition, leading to t
Your data has only two keys, and basically all values are assigned to
only one of them. There is no better way to distribute the keys, than
the one Spark executes.
What you have to do is to use different keys to sort and range-partition
on. Try to invoke sortBy() on a non-pair-RDD. This will t
Ask yourself how to access the third element in an array in Scala.
Am 05.09.2016 um 16:14 schrieb Ashok Kumar:
Hi,
I want to filter them for values.
This is what is in array
74,20160905-133143,98.11218069128827594148
I want to filter anything > 50.0 in the third column
Thanks
On Monday,
Have you followed this?
http://spark.apache.org/docs/latest/spark-standalone.html
It sounds more like your master is not connected to any executor. Hence,
no resources are available.
Am 04.09.16 um 05:34 schrieb kant kodali:
I don't think my driver program which is running on my local machine
t;compile group: 'org.apache.spark' name: 'spark-streaming_2.10' version:
>'2.0.0'
>on the executor side I don't know what jars are being used but I have
>installed
>using this zip filespark-2.0.0-bin-hadoop2.7.tgz
>
>
>
>
>
>
>
There is an InvalidClassException complaining about non-matching
serialVersionUIDs. Shouldn't that be caused by different jars on executors
and driver?
Am 03.09.2016 1:04 nachm. schrieb "Tal Grynbaum" :
> My guess is that you're running out of memory somewhere. Try to increase
> the driver memor
ntation available in 2.0.0.
I would highly appreciate some feedback to my thoughts and questions
Am 31.08.2016 um 14:45 schrieb Fridtjof Sander:
Hi Spark users,
I'm currently investigating spark's bucketing and partitioning
capabilities and I have some questions:
Let /T/ be a table
Hi Spark users,
I'm currently investigating spark's bucketing and partitioning
capabilities and I have some questions:
Let /T/ be a table that is bucketed and sorted by /T.id/ and partitioned
by /T.date/. Before persisting, /T/ has been repartitioned by /T.id/ to
get only one file per bucket
IsotonicRegression().setIsotonic(true) val model = ir.fit(dataset) val
predictions = model .transform(dataset) .select("prediction").rdd.map
{ case Row(pred) => pred }.collect() assert(predictions === Array(1,
2, 2, 2, 6, 16.5, 16.5, 17, 18)) |
Thanks
Yanbo
2016-07-11 6:14 GMT-0
Hi Swaroop,
from my understanding, Isotonic Regression is currently limited to data
with 1 feature plus weight and label. Also the entire data is required
to fit into memory of a single machine.
I did some work on the latter issue but discontinued the project,
because I felt no one really need
10 matches
Mail list logo