Hey Sean,
Right now we don't publish every 2.11 binary to avoid combinatorial
explosion of the number of build artifacts we publish (there are other
parameters such as whether hive is included, etc). We can revisit this
in future feature releases, but .1 releases like this are reserved for
bug
Sounds good, that makes sense.
Cheers,
Sean
On Jan 27, 2015, at 11:35 AM, Patrick Wendell pwend...@gmail.com wrote:
Hey Sean,
Right now we don't publish every 2.11 binary to avoid combinatorial
explosion of the number of build artifacts we publish (there are other
parameters such as
Okay - we've resolved all issues with the signatures and keys.
However, I'll leave the current vote open for a bit to solicit
additional feedback.
On Tue, Jan 27, 2015 at 10:43 AM, Sean McNamara
sean.mcnam...@webtrends.com wrote:
Sounds good, that makes sense.
Cheers,
Sean
On Jan 27, 2015,
Koert,
As Mark said, I have already refactored the API so that nothing is catalyst
is exposed (and users won't need them anyway). Data types, Row interfaces
are both outside catalyst package and in org.apache.spark.sql.
On Tue, Jan 27, 2015 at 9:08 AM, Koert Kuipers ko...@tresata.com wrote:
Reynold,
But with type alias we will have the same problem, right?
If the methods doesn't receive schemardd anymore, we will have to change
our code to migrade from schema to dataframe. Unless we have an implicit
conversion between DataFrame and SchemaRDD
2015-01-27 17:18 GMT-02:00 Reynold Xin
Dirceu,
That is not possible because one cannot overload return types.
SQLContext.parquetFile (and many other methods) needs to return some type,
and that type cannot be both SchemaRDD and DataFrame.
In 1.3, we will create a type alias for DataFrame called SchemaRDD to not
break source
Yes - the key issue is just due to me creating new keys this time
around. Anyways let's take another stab at this. In the mean time,
please don't hesitate to test the release itself.
- Patrick
On Tue, Jan 27, 2015 at 10:00 AM, Sean Owen so...@cloudera.com wrote:
Got it. Ignore the SHA512 issue
thats great. guess i was looking at a somewhat stale master branch...
On Tue, Jan 27, 2015 at 2:19 PM, Reynold Xin r...@databricks.com wrote:
Koert,
As Mark said, I have already refactored the API so that nothing is
catalyst is exposed (and users won't need them anyway). Data types, Row
It has been pretty evident for some time that's what it is, hasn't it?
Yes that's a better name IMO.
On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin r...@databricks.com wrote:
Hi,
We are considering renaming SchemaRDD - DataFrame in 1.3, and wanted to
get the community's opinion.
The context
Thanks, Andrew. That's great material.
On Mon, Jan 26, 2015 at 10:23 PM, Andrew Ash and...@andrewash.com wrote:
In addition to the references you have at the end of the presentation,
there's a great set of practical examples based on the learnings from Qt
posted here:
The type alias means your methods can specify either type and they will work.
It's just another name for the same type. But Scaladocs and such will show
DataFrame as the type.
Matei
On Jan 27, 2015, at 12:10 PM, Dirceu Semighini Filho
dirceu.semigh...@gmail.com wrote:
Reynold,
But with
+1
1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.1.x
1.2.0
2.1. statistics OK
2.2. Linear/Ridge/Laso Regression
+1
Tested on Mac OS X
On Tue, Jan 27, 2015 at 12:35 PM, Krishna Sankar ksanka...@gmail.com
wrote:
+1
1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
2. Tested pyspark, mlib -
Hi Patrick:
I would love to help reviewing in any way I can. Im fairly new here. Can you
help with a pointer to get me started.
Thanks
From: Patrick Wendell pwend...@gmail.com
To: dev@spark.apache.org dev@spark.apache.org
Sent: Tuesday, January 27, 2015 3:56 PM
Subject: Friendly
Hey All,
Just a reminder, as always around release time we have a very large
volume of patches show up near the deadline.
One thing that can help us maximize the number of patches we get in is
to have community involvement in performing code reviews. And in
particular, doing a thorough review
You certainly do not need yo build Spark as root. It might clumsily
overcome a permissions problem in your local env but probably causes other
problems.
On Jan 27, 2015 11:18 AM, angel__ angel.alvarez.pas...@gmail.com wrote:
I had that problem when I tried to build Spark 1.2. I don't exactly
I had that problem when I tried to build Spark 1.2. I don't exactly know what
is causing it, but I guess it might have something to do with user
permissions.
I could finally fix this by building Spark as root user (now I'm dealing
with another problem, but ...that's another story...)
--
View
I'm +1 on this, although a little worried about unknowingly introducing
SparkSQL dependencies every time someone wants to use this. It would be
great if the interface can be abstract and the implementation (in this
case, SparkSQL backend) could be swapped out.
One alternative suggestion on the
I think there are several signing / hash issues that should be fixed
before this release.
Hashes:
http://issues.apache.org/jira/browse/SPARK-5308
https://github.com/apache/spark/pull/4161
The hashes here are correct, but have two issues:
As noted in the JIRA, the format of the hash file is
I am running into this issue as well, when storing large Arrays as the
value in a kv pair
and then doing a reducebykey.
Can one of the experts please comment if it would make sense to add an
operation to
add values in place like accumulators do - this would essentially merge the
vectors for
a
In master, Reynold has already taken care of moving Row
into org.apache.spark.sql; so, even though the implementation of Row (and
GenericRow et al.) is in Catalyst (which is more optimizer than parser),
that needn't be of concern to users of the API in its most recent state.
On Tue, Jan 27, 2015
I personally have no preference DataFrame vs. DataTable, but only wish to lay
out the history and etymology simply because I'm into that sort of thing.
Frame comes from Marvin Minsky's 1970's AI construct: slots and the data
that go in them. The S programming language (precursor to R) adopted
Got it. Ignore the SHA512 issue since these aren't somehow expected by
a policy or Maven to be in a certain format. Just wondered if the
difference was intended.
The Maven way of generated the SHA1 hashes is to set this on the
install plugin, AFAIK, although I'm not sure if the intent was to hash
23 matches
Mail list logo