True, although a number of other little issues make me, personally,
not want to continue down this road:
- There are already a lot of build profiles to try to cover Hadoop versions
- I don't think it's quite right to have vendor-specific builds in
Spark to begin with
- We should be moving to only
Hi all,
Related to https://issues.apache.org/jira/browse/SPARK-3039, the default CDH4
build, which is built with mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests
clean package”, pulls in avro-mapred hadoop1, as opposed to avro-mapred
hadoop2. This ends up in the same error as mentioned in
Hi,
I am interested in a Spark OSGI bundle.
While checking the maven repository I found out that it is still not being
implemented.
Can we see an OSGI bundle being released soon? Is it in the Spark Project
roadmap?
Rgds
--
Niranda
Hi all,
I'm running Spark 1.2.0, in Stand alone mode, on different cluster and
server sizes. All of my data is cached in memory.
Basically I have a mass of data, about 8gb, with about 37k of columns, and
I'm running different configs of an BinaryLogisticRegressionBFGS.
When I put spark to run on 9
It sounds like your computation just isn't CPU bound, right? or maybe
that only some stages are. It's not clear what work you are doing
beyond the core LR.
Stages don't wait on each other unless one depends on the other. You'd
have to clarify what you mean by running stages in parallel, like what
Yes that makes sense, but it doesn't make the jobs CPU-bound. What is
the bottleneck? the model building or other stages? I would think you
can get the model building to be CPU bound, unless you have chopped it
up into really small partitions. I think it's best to look further
into what stages are
Hi Sean,
I'm trying to increase the cpu usage by running logistic regression in
different datasets in parallel. They shouldn't depend on each other.
I train several logistic regression models from different column
combinations of a main dataset. I processed the combinations in a ParArray
in an
For the second question, we do plan to support Hive 0.14, possibly in
Spark 1.4.0.
For the first question:
1. In Spark 1.2.0, the Parquet support code doesn’t support timestamp
type, so you can’t.
2. In Spark 1.3.0, timestamp support was added, also Spark SQL uses its
own Parquet support
1. In Spark 1.3.0, timestamp support was added, also Spark SQL uses
its own Parquet support to handle both read path and write path when
dealing with Parquet tables declared in Hive metastore, as long as you’re
not writing to a partitioned table. So yes, you can.
Ah, I had
Trying to run pyspark on yarn in client mode with basic wordcount example I see
the following error when doing the collect:
Error from python worker: /usr/bin/python: No module named sqlPYTHONPATH was:
For the old parquet path (available in 1.2.1) , i made a few changes for
being able to read/write to a table partitioned on timestamp type column
https://github.com/apache/spark/pull/4469
On Fri, Feb 20, 2015 at 8:28 PM, The Watcher watche...@gmail.com wrote:
1. In Spark 1.3.0,
In the Spark SQL 1.2 Programmers Guide, we can generate the schema based on
the string of schema via
val schema =
StructType(
schemaString.split( ).map(fieldName = StructField(fieldName,
StringType, true)))
But when running this on Spark 1.3.0 (RC1), I get the error:
val schema =
Oh, I just realized that I never imported all of sql._ . My bad!
On Fri Feb 20 2015 at 7:51:32 AM Denny Lee denny.g@gmail.com wrote:
In the Spark SQL 1.2 Programmers Guide, we can generate the schema based
on the string of schema via
val schema =
StructType(
schemaString.split(
Hi Sean,
does it mean that Spark is not encouraged to be embedded on other products?
On Fri, Feb 20, 2015 at 3:29 PM, Sean Owen so...@cloudera.com wrote:
I don't think an OSGI bundle makes sense for Spark. It's part JAR,
part lifecycle manager. Spark has its own lifecycle management and is
No, you usually run Spark apps via the spark-submit script, and the
Spark machinery is already deployed on a cluster. Although it's
possible to embed the driver and get it working that way, it's not
supported.
On Fri, Feb 20, 2015 at 4:48 PM, Niranda Perera
niranda.per...@gmail.com wrote:
Hi
Thanks for the explanation.
To be clear, I meant to speak for any hadoop 2 releases before 2.2, which
have profiles in Spark. I referred to CDH4, since that¹s the only Hadoop
2.0/2.1 version Spark ships a prebuilt package for.
I understand the hesitation of making a code change if Spark doesn¹t
16 matches
Mail list logo