If you are on spark 1.3, use repartitionandSort followed by mappartition.
In 1.4, window functions will be supported, it seems
On 1 Jun 2015 04:10, Ricardo Almeida ricardo.alme...@actnowib.com wrote:
That's great and how would you create an ordered index by partition (by
product in this
Hi Shushant,
Spark currently makes no effort to request executors based on data locality
(although it does try to schedule tasks within executors based on data
locality). We're working on adding this capability at SPARK-4352
https://issues.apache.org/jira/browse/SPARK-4352.
-Sandy
On Sun, May
Each time you run a Spark SQL query we will create new RDDs that load the
data and thus you should see the newest results. There is one caveat:
formats that use the native Data Source API (parquet, ORC (in Spark 1.4),
JSON (in Spark 1.5)) cache file metadata to speed up interactive querying.
To
Hi
We are using spark 1.3.1
Avro-chill (tomorrow will check if its important) we register avro classes
from java
Avro 1.7.6
On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:
Which Spark version are you using? I'd like to understand whether this
change could be caused by recent Kryo
Can you file a JIRA with the detailed steps to reproduce the problem?
On Fri, May 29, 2015 at 2:59 AM, Alex Nakos ana...@gmail.com wrote:
Hi-
I’ve just built the latest spark RC from source (1.4.0 RC3) and can
confirm that the spark shell is still NOT working properly on 2.11. No
classes in
Hi-
Yup, I’ve already done so here:
https://issues.apache.org/jira/browse/SPARK-7944
Please let me know if this requires any more information - more than happy
to provide whatever I can.
Thanks
Alex
On Sun, May 31, 2015 at 8:45 AM, Tathagata Das t...@databricks.com wrote:
Can you file a JIRA
Alternatively, I will give a talk about LOR and LIR with elastic-net
implementation and interpretation of those models in spark summit.
https://spark-summit.org/2015/events/large-scale-lasso-and-elastic-net-regularized-generalized-linear-models/
You may attend or watch online.
Sincerely,
DB
Hello,
Since RDDs are created from data from Hive tables or HDFS, how do we ensure
they are invalidated when the source data is updated?
Regards,
Ashish
There is no mechanism for keeping an RDD up to date with a changing source.
However you could set up a steam that watches for changes to the directory and
processes the new files or use the Hive integration in SparkSQL to run Hive
queries directly. (However, old query results will still grow
I want to understand how spark takes care of data localisation in cluster
mode when run on YARN.
1.Driver program asks ResourceManager for executors. Does it tell yarn's RM
to check HDFS blocks of input data and then allocate executors to it.
And executors remain fixed throughout application or
10 matches
Mail list logo