Sorry to revive an old thread, but I just ran into this issue myself. It
is likely that you do not have the assembly jar built, or that you have
SPARK_HOME set incorrectly (it does not need to be set).
Michael
On Thu, Feb 27, 2014 at 8:13 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, all
Hi Everyone,
I'm very excited about merging this new feature into Spark! We have a lot
of cool things in the pipeline, including: porting Shark's in-memory
columnar format to Spark SQL, code-generation for expression evaluation and
improved support for complex types in parquet.
I would love to
It will be great if there are any examples or usecases to look at ?
There are examples in the Spark documentation. Patrick posted and updated
copy here so people can see them before 1.0 is released:
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
Does this feature
Hey Everyone,
Here is a pretty major (but source compatible) change we are considering
making to the RDD API for 1.0. Java and Python APIs would remain the same,
but users of Scala would likely need to use less casts. This would be
especially true for libraries whose functions take RDDs as
From my experience, covariance often becomes a pain when dealing with
serialization/deserialization (I've experienced a few cases while
developing play-json datomisca).
Moreover, if you have implicits, variance often becomes a headache...
This is exactly the kind of feedback I was hoping
Hi Pascal,
Thanks for the input. I think we are going to be okay here since, as Koert
said, the current serializers use runtime type information. We could also
keep at ClassTag around for the original type when the RDD was created.
Good things to be aware of though.
Michael
On Sat, Mar 22,
, so we're really curious as far as
the architectural direction.
-Evan
On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
mich...@databricks.com wrote:
It will be great if there are any examples or usecases to look at ?
There are examples in the Spark documentation. Patrick posted
Just a quick note to everyone that Patrick and I are playing around with
Travis CI on the Spark github repository. For now, travis does not run all
of the test cases, so will only be turned on experimentally. Long term it
looks like Travis might give better integration with github, so we are
Is the migration from Jenkins to Travis finished?
It is not finished and really at this point it is only something we are
considering, not something that will happen for sure. We turned it on in
addition to Jenkins so that we could start finding issues exactly like the
ones you described
There is a JIRA for one of the flakey tests here:
https://issues.apache.org/jira/browse/SPARK-1409
On Mon, Apr 7, 2014 at 11:32 AM, Patrick Wendell pwend...@gmail.com wrote:
TD - do you know what is going on here?
I looked into this ab it and at least a few of these that use
Thread.sleep()
Hi Marcelo,
Thanks for bringing this up here, as this has been a topic of debate
recently. Some thoughts below.
... all of the suffer from the fact that the log message needs to be built
even
though it might not be used.
This is not true of the current implementation (and this is actually
BTW...
You can do calculations in string interpolation:
sTime: ${timeMillis / 1000}s
Or use format strings.
fFloat with two decimal places: $floatValue%.2f
More info:
http://docs.scala-lang.org/overviews/core/string-interpolation.html
On Thu, Apr 10, 2014 at 5:46 PM, Michael Armbrust mich
The Spark REPL is slightly modified from the normal Scala REPL to prevent
work from being done twice when closures are deserialized on the workers.
I'm not sure exactly why this causes your problem, but its probably worth
filing a JIRA about it.
Here is another issues with classes defined in the
-1
We found a regression in the way configuration is passed to executors.
https://issues.apache.org/jira/browse/SPARK-1864
https://github.com/apache/spark/pull/808
Michael
On Fri, May 16, 2014 at 3:57 PM, Mark Hamstra m...@clearstorydata.comwrote:
+1
On Fri, May 16, 2014 at 2:16 AM,
Thanks for reporting this!
https://issues.apache.org/jira/browse/SPARK-1964
https://github.com/apache/spark/pull/913
If you could test out that PR and see if it fixes your problems I'd really
appreciate it!
Michael
On Thu, May 29, 2014 at 9:09 AM, Andrew Ash and...@andrewash.com wrote:
I
Yes, you'll need to download the code from that PR and reassemble Spark
(sbt/sbt assembly).
On Thu, May 29, 2014 at 10:02 AM, dataginjaninja
rickett.stepha...@gmail.com wrote:
Michael,
Will I have to rebuild after adding the change? Thanks
--
View this message in context:
You should be able to get away with only doing it locally. This bug is
happening during analysis which only occurs on the driver.
On Thu, May 29, 2014 at 10:17 AM, dataginjaninja
rickett.stepha...@gmail.com wrote:
Darn, I was hoping just to sneak it in that file. I am not the only person
Awesome, thanks for testing!
On Thu, Jun 5, 2014 at 1:30 PM, dataginjaninja rickett.stepha...@gmail.com
wrote:
I can confirm that the patch fixed my issue. :-)
-
Cheers,
Stephanie
--
View this message in context:
I assume you are adding tests? because that is the only time you should
see that message.
That error could mean a couple of things:
1) The query is invalid and hive threw an exception
2) Your Hive setup is bad.
Regarding #2, you need to have the source for Hive 0.12.0 available and
built as
+1
I tested sql/hive functionality.
On Sat, Jul 5, 2014 at 9:30 AM, Mark Hamstra m...@clearstorydata.com
wrote:
+1
On Fri, Jul 4, 2014 at 12:40 PM, Patrick Wendell pwend...@gmail.com
wrote:
I'll start the voting with a +1 - ran tests on the release candidate
and ran some basic
Hey Ian,
Thanks for bringing these up! Responses in-line:
Just wondering if right now spark sql is expected to be thread safe on
master?
doing a simple hadoop file - RDD - schema RDD - write parquet
will fail in reflection code if i run these in a thread pool.
You are probably hitting
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus easy to remove, and I would like catalyst to be usable outside of
Spark. A pull request to make this possible would be welcome.
Ideally, we'd
I just wanted to send out a quick note about a change in the handling of
strings when loading / storing data using parquet and Spark SQL. Before,
Spark SQL did not support binary data in Parquet, so all binary blobs were
implicitly treated as Strings. 9fe693
Thanks for reporting back. I was pretty confused trying to reproduce the
error :)
On Thu, Jul 24, 2014 at 1:09 PM, Stephen Boesch java...@gmail.com wrote:
OK I did find my error. The missing step:
mvn install
I should have republished (mvn install) all of the other modules .
The mvn
That query is looking at Fix Version not Target Version. The fact that
the first one is still open is only because the bug is not resolved in
master. It is fixed in 1.0.2. The second one is partially fixed in 1.0.2,
but is not worth blocking the release for.
On Fri, Jul 25, 2014 at 4:23 PM,
How recent is this? We've already reverted this patch once due to failing
tests. It would be helpful to include a link to the failed build. If its
failing again we'll have to revert again.
On Sun, Jul 27, 2014 at 5:26 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, all
It seems that the JDBC
A few things:
- When we upgrade to Hive 0.13.0, Patrick will likely republish the
hive-exec jar just as we did for 0.12.0
- Since we have to tie into some pretty low level APIs it is unsurprising
that the code doesn't just compile out of the box against 0.13.0
- ScalaReflection is for
It seems that the HiveCompatibilitySuite need a hadoop and hive
environment, am I right?
Relative path in absolute URI:
file:$%7Bsystem:test.tmp.dir%7D/tmp_showcrt1”
You should only need Hadoop and Hive if you are creating new tests that we
need to compute the answers for. Existing tests
Could you make a PR as described here:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
On Fri, Aug 8, 2014 at 1:57 PM, Zhan Zhang zhaz...@gmail.com wrote:
Sorry, forget to upload files. I have never posted before :) hive.diff
whether it's ok to make a PR now because hive-0.13 version
is not compatible with hive-0.12 and here i used org.apache.hive.
On 2014/7/29 8:22, Michael Armbrust wrote:
A few things:
- When we upgrade to Hive 0.13.0, Patrick will likely republish the
hive-exec jar just as we did for 0.12.0
- dev list
+ user list
You should be able to query Spark SQL using JDBC, starting with the 1.1
release. There is some documentation is the repo
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#running-the-thrift-jdbc-server,
and we'll update the official docs once the
Any initial proposal or design about the caching to Tachyon that you
can share so far?
Caching parquet files in tachyon with saveAsParquetFile and then reading
them with parquetFile should already work. You can use SQL on these tables
by using registerTempTable.
Some of the general parquet
It seems like there are two things here:
- Co-locating blocks with the same keys to avoid network transfer.
- Leveraging partitioning information to avoid a shuffle when data is
already partitioned correctly (even if those partitions aren't yet on the
same machine).
The former seems more
+1
On Tue, Sep 2, 2014 at 5:18 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
+1
Tested on Mac OS X.
Matei
On September 2, 2014 at 5:03:19 PM, Kan Zhang (kzh...@apache.org) wrote:
+1
Verified PySpark InputFormat/OutputFormat examples.
On Tue, Sep 2, 2014 at 4:10 PM, Reynold Xin
+1
On Wed, Sep 3, 2014 at 12:29 AM, Reynold Xin r...@databricks.com wrote:
+1
Tested locally on Mac OS X with local-cluster mode.
On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com
wrote:
I'll kick it off with a +1
On Wed, Sep 3, 2014 at 12:24 AM, Patrick
Feel free to submit a PR to add a log4j.properies file to
sql/catalyst/src/test/resources similar to what we do in core/hive.
On Sat, Sep 6, 2014 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
This is just a line logging that one test succeeded right? I don't find
that noise. Recently I
On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger c...@koeninger.org wrote:
Is there a reason in general not to push projections and predicates down
into the individual ParquetTableScans in a union?
This would be a great case to add to ColumnPruning. Would be awesome if
you could open a JIRA
Thanks!
On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger c...@koeninger.org wrote:
Opened
https://issues.apache.org/jira/browse/SPARK-3462
I'll take a look at ColumnPruning and see what I can do
On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, Sep
kind of surprised this was not run into before. Do people not
segregate their data by day/week in the HDFS directory structure?
On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust mich...@databricks.com
wrote:
Thanks!
On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger c...@koeninger.org
wrote
/d1 is a directory, not a
parquet partition
sqlContext.parquetFile(/foo)
// works, but has the noted lack of pushdown
sqlContext.parquetFile(/foo/d1).unionAll(sqlContext.parquetFile(/foo/d2))
Is there another alternative?
On Tue, Sep 9, 2014 at 1:29 PM, Michael Armbrust mich
)
On Tue, Sep 9, 2014 at 3:02 PM, Michael Armbrust mich...@databricks.com
wrote:
What Patrick said is correct. Two other points:
- In the 1.2 release we are hoping to beef up the support for working
with partitioned parquet independent of the metastore.
- You can actually do operations
chance of adding it to the 1.1.1
point release, assuming there ends up being one?
On Wed, Sep 10, 2014 at 11:39 AM, Michael Armbrust mich...@databricks.com
wrote:
Hey Cody,
Thanks for doing this! Will look at your PR later today.
Michael
On Wed, Sep 10, 2014 at 9:31 AM, Cody Koeninger c
- dev
Is it possible that you are constructing more than one HiveContext in a
single JVM? Due to global state in Hive code this is not allowed.
Michael
On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao hao.ch...@intel.com wrote:
Hi, Du
I am not sure what you mean “triggers the HiveContext to
Hi Cody,
There are currently no concrete plans for adding buckets to Spark SQL, but
thats mostly due to lack of resources / demand for this feature. Adding
full support is probably a fair amount of work since you'd have to make
changes throughout parsing/optimization/execution. That said, there
I actually submitted a patch to do this yesterday:
https://github.com/apache/spark/pull/2493
Can you tell us more about your configuration. In particular how much
memory/cores do the executors have and what does the schema of your data
look like?
On Tue, Sep 23, 2014 at 7:39 AM, Cody Koeninger
Views are not supported yet. Its not currently on the near term roadmap,
but that can change if there is sufficient demand or someone in the
community is interested in implementing them. I do not think it would be
very hard.
Michael
On Sun, Sep 28, 2014 at 11:59 AM, Du Li
The hard part here is updating the existing code base... which is going to
create merge conflicts with like all of the open PRs...
On Wed, Oct 1, 2014 at 6:13 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Ah, since there appears to be a built-in rule for end-of-line whitespace,
Hi Cody,
Assuming you are talking about 'safe' changes to the schema (i.e. existing
column names are never reused with incompatible types), this is something
I'd love to support. Perhaps you can describe more what sorts of changes
you are making, and if simple merging of the schemas would be
Thanks for the input. We purposefully made sure that the config option did
not make it into a release as it is not something that we are willing to
support long term. That said we'll try and make this easier in the future
either through hints or better support for statistics.
In this particular
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess). The code path these APIs use will be the same
as that for datasources included in the core spark sql library.
Michael
On
Also, in general for SQL only changes it is sufficient to run sbt/sbt
catatlyst/test sql/test hive/test. The hive/test part takes the
longest, so I usually leave that out until just before submitting unless my
changes are hive specific.
On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas
You can't change parquet schema without reencoding the data as you need to
recalculate the footer index data. You can manually do what SPARK-3851
https://issues.apache.org/jira/browse/SPARK-3851 is going to do today
however.
Consider two schemas:
Old Schema: (a: Int, b: String)
New Schema,
dev to bcc.
Thanks for reaching out, Ozgun. Let's discuss if there were any missing
optimizations off list. We'll make sure to report back or add any findings
to the tuning guide.
On Mon, Nov 3, 2014 at 3:01 PM, ozgun oz...@citusdata.com wrote:
Hey Patrick,
It's Ozgun from Citus Data. We'd
+1 (binding)
On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
BTW, my own vote is obviously +1 (binding).
Matei
On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC
However, I haven't seen it be as
high as the 100ms Michael quoted (maybe this was for jobs with tasks that
have much larger objects that take a long time to deserialize?).
I was thinking more about the average end-to-end latency for launching a
query that has 100s of partitions. Its also
Hey Sean,
Thanks for pointing this out. Looks like a bad test where we should be
doing Set comparison instead of Array.
Michael
On Thu, Nov 13, 2014 at 2:05 AM, Sean Owen so...@cloudera.com wrote:
LICENSE and NOTICE are fine. Signature and checksum is fine. I
unzipped and built the plain
I'm going to have to disagree here. If you are building a release
distribution or integrating with legacy systems then maven is probably the
correct choice. However most of the core developers that I know use sbt,
and I think its a better choice for exploration and development overall.
That
* I moved from sbt to maven in June specifically due to Andrew Or's
describing mvn as the default build tool. Developers should keep in mind
that jenkins uses mvn so we need to run mvn before submitting PR's - even
if sbt were used for day to day dev work
To be clear, I think that the PR
, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
wrote:
You probably don't need to create a new kind of SchemaRDD. Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2. There is not a ton of documentation, but the test cases show how
In Hive 13 (which is the default for Spark 1.2), parquet is included and
thus we no longer include the Hive parquet bundle. You can now use the
included
ParquetSerDe: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
If you want to compile Spark 1.2 with Hive 12 instead you can pass
Here's a fix: https://github.com/apache/spark/pull/3586
On Wed, Dec 3, 2014 at 11:05 AM, Michael Armbrust mich...@databricks.com
wrote:
Thanks for reporting. As a workaround you should be able to SET
spark.sql.hive.convertMetastoreParquet=false, but I'm going to try to fix
this before
The command run fine for me on master. Note that Hive does print an
exception in the logs, but that exception does not propogate to user code.
On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got exception saying Hive: NoSuchObjectException(message:table
Thanks for reporting. This looks like a regression related to:
https://github.com/apache/spark/pull/2570
I've filed it here: https://issues.apache.org/jira/browse/SPARK-4769
On Fri, Dec 5, 2014 at 12:03 PM, kb kend...@hotmail.com wrote:
I am having trouble getting create table as select or
This is by hive's design. From the Hive documentation:
The column change command will only modify Hive's metadata, and will not
modify data. Users should make sure the actual data layout of the
table/partition conforms with the metadata definition.
On Sat, Dec 6, 2014 at 8:28 PM, Jianshi
Message-
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Saturday, December 6, 2014 4:51 AM
To: kb
Cc: d...@spark.incubator.apache.org; Cheng Hao
Subject: Re: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0
Thanks for reporting. This looks like a regression
is to use a subquery to add a bunch of column
alias. I'll try it later.
Thanks,
Jianshi
On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust mich...@databricks.com
wrote:
This is by hive's design. From the Hive documentation:
The column change command will only modify Hive's metadata
As the scala doc for applySchema says, It is important to make sure that
the structure of every [[Row]] of the provided RDD matches the provided
schema. Otherwise, there will be runtime exceptions. We don't check as
doing runtime reflection on all of the data would be very expensive. You
will
The modified version of hive can be found here:
https://github.com/pwendell/hive
On Thu, Dec 11, 2014 at 5:47 PM, Yi Tian tianyi.asiai...@gmail.com wrote:
Hi, all
We found some bugs in hive-0.12, but we could not wait for hive community
fixing them.
We want to fix these bugs in our lab and
I agree and this is something that we have discussed in the past.
Essentially I think instead of creating a RelationProvider that returns a
single table, we'll have something like an external catalog that can return
multiple base relations.
On Sun, Dec 21, 2014 at 6:43 PM, Venkata ramana
for timestamp type support. For decimal type, I think we only
support decimals that fits in a long.
Thanks,
Daoyuan
-Original Message-
From: Alessandro Baretta [mailto:alexbare...@gmail.com]
Sent: Saturday, December 27, 2014 2:47 PM
To: dev@spark.apache.org; Michael Armbrust
Subject
of strategies are basically embodied in
SparkStrategies.scala...is there a design doc/roadmap/JIRA issue detailing
what strategies exist and which are planned?
Thanks,
Nick
On Jan 22, 2015, at 7:45 PM, Michael Armbrust mich...@databricks.com
wrote:
Here is the initial design document
There was work being done at Berkeley on prototyping support for Succinct
in Spark SQL. Rachit might have more information.
On Thu, Jan 22, 2015 at 7:04 AM, Dean Wampler deanwamp...@gmail.com wrote:
Interesting. I was wondering recently if anyone has explored working with
compressed data
+1 to adding such an optimization to parquet. The bytes are tagged
specially as UTF8 in the parquet schema so it seem like it would be
possible to add this.
On Fri, Jan 16, 2015 at 8:17 AM, Mick Davies michael.belldav...@gmail.com
wrote:
Hi,
It seems that a reasonably large proportion of
Here is the initial design document for catalyst :
https://docs.google.com/document/d/1Hc_Ehtr0G8SQUg69cmViZsMi55_Kf3tISD9GPGU5M1Y/edit
Strategies (many of which are in SparkStragegies.scala) are the part that
creates the physical operators from a catalyst logical plan. These
operators have
I'd suggest marking the HiveContext as @transient since its not valid to
use it on the slaves anyway.
On Mon, Feb 16, 2015 at 4:27 AM, Haopu Wang hw...@qilinsoft.com wrote:
When I'm investigating this issue (in the end of this email), I take a
look at HiveContext's code and find this change
...@databricks.com wrote:
Michael - it is already transient. This should probably considered a bug
in the scala compiler, but we can easily work around it by removing the use
of destructuring binding.
On Mon, Feb 16, 2015 at 10:41 AM, Michael Armbrust mich...@databricks.com
wrote:
I'd suggest
P.S: For some reason replacing import sqlContext.createSchemaRDD with
import sqlContext.implicits._ doesn't do the implicit conversations.
registerTempTable
gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
We will write up a whole migration guide before the final
1) is SKEWED BY honored ? If so, has anyone run into directories not being
created ?
It is not.
2) if it is not honored, does it matter ? Hive introduced this feature to
better handle joins where tables had a skewed distribution on keys joined
on so that the single mapper handling one of
In particular the performance tricks are in SpecificMutableRow.
On Wed, Jan 28, 2015 at 5:49 PM, Evan Chan velvia.git...@gmail.com wrote:
Yeah, it's null. I was worried you couldn't represent it in Row
because of primitive types like Int (unless you box the Int, which
would be a performance
Its not completely transparent, but you can do something like the following
today:
CACHE TABLE hotData AS SELECT columns, I, care, about FROM fullTable
On Sun, Feb 1, 2015 at 3:03 AM, Mick Davies michael.belldav...@gmail.com
wrote:
I have been working a lot recently with denormalised tables
FYI: https://issues.apache.org/jira/browse/INFRA-9259
Thanks for reporting. This was a result of a change to our DDL parser that
resulted in types becoming reserved words. I've filled a JIRA and will
investigate if this is something we can fix.
https://issues.apache.org/jira/browse/SPARK-6250
On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe
Two other criteria that I use when deciding what to backport:
- Is it a regression from a previous minor release? I'm much more likely
to backport fixes in this case, as I'd love for most people to stay up to
date.
- How scary is the change? I think the primary goal is stability of the
#4 with a preference for CamelCaseEnums
On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com
wrote:
another vote for #4
People are already used to adding () in Java.
On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote:
#4 but with MemoryOnly (more
On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra m...@clearstorydata.com
wrote:
So what are we expecting of Hive 0.12.0 builds with this RC? I know not
every combination of Hadoop and Hive versions, etc., can be supported, but
even an example build from the Building Spark page isn't looking too
Already done :)
https://github.com/apache/spark/commit/2e8c6ca47df14681c1110f0736234ce76a3eca9b
On Fri, Apr 24, 2015 at 2:37 PM, Reynold Xin r...@databricks.com wrote:
Can you elaborate what you mean by that? (what's already available in
Python?)
On Fri, Apr 24, 2015 at 2:24 PM, Shuai
Unfortunately, I think the SQLParser is not threadsafe. I would recommend
using HiveQL.
On Thu, Apr 30, 2015 at 4:07 AM, Wangfei (X) wangf...@huawei.com wrote:
actually this is a sql parse exception, are you sure your sql is right?
发自我的 iPhone
在 2015年4月30日,18:50,Haopu Wang
Hey Marcelo,
Thanks for the heads up! I'm currently in the process of refactoring all
of this (to separate the metadata connection from the execution side) and
as part of this I'm making the initialization of the session not lazy. It
would be great to hear if this also works for your internal
I am working on it. Here is the (very rough) version:
https://github.com/apache/spark/compare/apache:master...marmbrus:multiHiveVersions
On Mon, Apr 27, 2015 at 1:03 PM, Punyashloka Biswal punya.bis...@gmail.com
wrote:
Thanks Marcelo and Patrick - I don't know how I missed that ticket in my
I'd happily merge a PR that changes the distinct implementation to be more
like Spark core, assuming it includes benchmarks that show better
performance for both the fits in memory case and the too big for memory
case.
On Thu, May 7, 2015 at 2:23 AM, Olivier Girardot
FWIW... My Spark SQL development workflow is usually to run build/sbt
sparkShell or build/sbt 'sql/test-only testSuiteName'. These commands
starts in as little as 30s on my laptop, automatically figure out which
subprojects need to be rebuilt, and don't require the expensive assembly
creation.
-1 (binding)
We just were alerted to a pretty serious regression since 1.3.0 (
https://issues.apache.org/jira/browse/SPARK-6851). Should have a fix
shortly.
Michael
On Fri, Apr 10, 2015 at 6:10 AM, Corey Nolet cjno...@gmail.com wrote:
+1 (non-binding)
- Verified signatures
- built on Mac
Overall this seems like a reasonable proposal to me. Here are a few
thoughts:
- There is some debugging utility to the ruleName, so we would probably
want to at least make that an argument to the rule function.
- We also have had rules that operate on SparkPlan, though since there is
only one
Can you file a JIRA please?
On Tue, Jun 23, 2015 at 1:42 AM, StanZhai m...@zhaishidan.cn wrote:
Hi all,
After upgraded the cluster from spark 1.3.1 to 1.4.0(rc4), I encountered
the
following exception when use concat with UDF in where clause:
I'd suggest looking at the avro data source as an example implementation:
https://github.com/databricks/spark-avro
I also gave a talk a while ago: https://www.youtube.com/watch?v=GQSNJAzxOr8
Hi,
You can connect to by JDBC as described in
1. Custom aggregators that do map-side combine.
This is something I'd hoping to add in Spark 1.5
2. UDFs with more than 22 arguments which is not supported by ScalaUdf,
and to avoid wrapping a Java function interface in one of 22 different
Scala function interfaces depending on the number
Through the DataFrame API, users should never see UTF8String.
Expression (and any class in the catalyst package) is considered internal
and so uses the internal representation of various types. Which type we
use here is not stable across releases.
Is there a reason you aren't defining a UDF
Its no longer valid to start more than one instance of HiveContext in a
single JVM, as one of the goals of this refactoring was to allow connection
to more than one metastore from a single context.
For tests I suggest you use TestHive as we do in our unit tests. It has a
reset() method you can
I think this is likely something that we'll want to do during the code
generation phase. Though its probably not the lowest hanging fruit at this
point.
On Sun, May 31, 2015 at 5:02 AM, Reynold Xin r...@databricks.com wrote:
I think you are looking for
This was a change that was made to match a wrong answer coming from older
versions of Hive. Unfortunately I think its too late to fix this in the
1.4 branch (as I'd like to avoid changing answers at all in point
releases), but in Spark 1.5 we revert to the correct behavior.
1 - 100 of 342 matches
Mail list logo