Hi,
Could you share the code with foreachPartition?
Jacek
11.03.2016 7:33 PM "Darin McBeath" napisał(a):
>
>
> I can verify this by looking at the log file for the workers.
>
> Since I output logging statements in the object called by the
> foreachPartition, I can see the
Hi Tristan,
Mind sharing the relevant code? I'd like to learn the way you use
Transformer to do so. Thanks!
Jacek
11.03.2016 7:07 PM "Tristan Nixon" napisał(a):
> I have a similar situation in an app of mine. I implemented a custom ML
> Transformer that wraps the Jackson
Just a guess...flatMap?
Jacek
11.03.2016 7:46 PM "Stefan Panayotov" napisał(a):
> Hi,
>
> I have a problem that requires me to go through the rows in a DataFrame
> (or possibly through rows in a JSON file) and conditionally add rows
> depending on a value in one of the
Thanks Ted.
I haven't explicitly specified Scala (I tried different versions in pom.xml
as well).
For what it is worth, this is what I get when I do a maven dependency
tree. I wonder if the 2.11.2 coming from scala-reflect matters:
[INFO] | | \- org.scala-lang:scalap:jar:2.11.0:compile
Thanks Jacek. Pom is below (Currenlty set to 1.6.1 spark but I started out
with 1.6.0 with the same problem).
http://maven.apache.org/POM/4.0.0;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
ok, some more information (and presumably a workaround).
when I initial read in my file, I use the following code.
JavaRDD keyFileRDD = sc.textFile(keyFile)
Looking at the UI, this file has 2 partitions (both on the same executor).
I then subsequently repartition this RDD (to 16)
Thanks for the pointer to those indexers, those are some good examples. A
good way to go for the trainer and any scoring done in Spark. I will
definitely have to deal with scoring in non-Spark systems though.
I think I will need to scale up beyond what single-node liblinear can
practically
Experts need your help,
I'm using spark 1.4.1 and when set this hive.metastore.metadb.dir
programmatically for a hivecontext i.e for local metastore i.e the default
metastore_db for derby, the metastore_db is still getting creating in the
same path as user.dir.
Can you guys provide some insights
Looks like Scala version mismatch.
Are you using 2.11 everywhere ?
On Fri, Mar 11, 2016 at 10:33 AM, vasu20 wrote:
> Hi
>
> Any help appreciated on this. I am trying to write a Spark program using
> IntelliJ. I get a run time error as soon as new SparkConf() is called from
Hi,
I have a problem that requires me to go through the rows in a DataFrame (or
possibly through rows in a JSON file) and conditionally add rows depending on a
value in one of the columns in each existing row. So, for example if I have:
+---+---+---+
| _1| _2| _3|
+---+---+---+
Hi,
Why do you use maven not sbt for Scala?
Can you show the entire pom.xml and the command to execute the app?
Jacek
11.03.2016 7:33 PM "vasu20" napisał(a):
> Hi
>
> Any help appreciated on this. I am trying to write a Spark program using
> IntelliJ. I get a run time
This is an interesting discussion,
I have had some success running GraphX on large graphs with more than a
Billion edges using clusters of different size up to 64 machines. However,
the performance goes down when I double the cluster size to reach 128
machines of r3.xlarge. Does any one have
Hi
Any help appreciated on this. I am trying to write a Spark program using
IntelliJ. I get a run time error as soon as new SparkConf() is called from
main. Top few lines of the exception are pasted below.
These are the following versions:
Spark jar: spark-assembly-1.6.0-hadoop2.6.0.jar
AFAIK we haven't actually broken 2.6 compatibility yet for PySpark itself,
since Jenkins is still testing that configuration.
I think the problem that you're seeing is that dev/run-tests /
dev/run-tests-jenkins only work against Python 2.7+ right now. However,
./python/run-tests should be able to
My driver code has the following:
// Init S3 (workers) so we can read the assets
partKeyFileRDD.foreachPartition(new SimpleStorageServiceInit(arg1, arg2, arg3));
// Get the assets. Create a key pair where the key is asset id and the value
is the rec.
JavaPairRDD seqFileRDD =
Right now I know of three different things to pass property parameters to
the Spark Context. They are:
- A) Inside a SparkConf object just before creating the Spark Context
- B) During job submission (i.e. --conf spark.driver.memory.memory = 2g)
- C) By using a specific property file
Hi
I have multiple node cluster and my spark jobs depend on a native
library (.so files) and some jar files.
Can some one please explain what are the best ways to distribute dependent
files across nodes?
right now i copied dependent files in all nodes using chef tool .
Regards
Prateek
--
Thanks Josh. I am able to run with Python 2.7 explicitly by specifying
--python-executables=python2.7. By default it checks only for Python2.6.
Thanks
Gayathri
On Fri, Mar 11, 2016 at 10:35 AM, Josh Rosen
wrote:
> AFAIK we haven't actually broken 2.6 compatibility
101 - 118 of 118 matches
Mail list logo