Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Yin Huai
Hello everyone, I am wondering if it makes more sense to not count Saturday and Sunday. I doubt that any serious testing work was done during this past weekend. Can we only count business days in the voting process? Thanks, Yin On Sun, Jun 7, 2020 at 3:24 PM Denny Lee wrote: > +1

Re: moving the spark jenkins job builder repo from dbricks --> spark

2018-10-17 Thread Yin Huai
Shane, Thank you for initiating this work! Can we do an audit of jenkins users and trim down the list? Also, for packaging jobs, those branch snapshot jobs are active (for example, https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ for publishing

Re: python tests related to pandas are skipped in jenkins

2018-01-31 Thread Yin Huai
I created https://issues.apache.org/jira/browse/SPARK-23292 for this issue. On Wed, Jan 31, 2018 at 8:17 PM, Yin Huai <yh...@databricks.com> wrote: > btw, seems we also have the same skipping logic for pyarrow. But, I have > not looked into if tests related to pyarrow

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Yin Huai
seems we are not running tests related to pandas in pyspark tests (see my email "python tests related to pandas are skipped in jenkins"). I think we should fix this test issue and make sure all tests are good before cutting RC3. On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal

Re: python tests related to pandas are skipped in jenkins

2018-01-31 Thread Yin Huai
btw, seems we also have the same skipping logic for pyarrow. But, I have not looked into if tests related to pyarrow are skipped or not. On Wed, Jan 31, 2018 at 8:15 PM, Yin Huai <yh...@databricks.com> wrote: > Hello, > > I was running python tests and found that py

python tests related to pandas are skipped in jenkins

2018-01-31 Thread Yin Huai
Hello, I was running python tests and found that pyspark.sql.tests.GroupbyAggPandasUDFTests.test_unsupported_types does not run with Python 2 because the test uses

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Yin Huai
+1 On Mon, Sep 11, 2017 at 5:47 PM, Sameer Agarwal wrote: > +1 (non-binding) > > On Thu, Sep 7, 2017 at 9:10 PM, Bryan Cutler wrote: > >> +1 (non-binding) for the goals and non-goals of this SPIP. I think it's >> fine to work out the minor details of

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-06 Thread Yin Huai
+1 On Thu, Jul 6, 2017 at 8:40 PM, Hyukjin Kwon wrote: > +1 > > 2017-07-07 6:41 GMT+09:00 Reynold Xin : > >> +1 >> >> >> On Fri, Jun 30, 2017 at 6:44 PM, Michael Armbrust > > wrote: >> >>> Please vote on releasing the following

Re: [ANNOUNCE] Announcing Apache Spark 2.1.0

2016-12-29 Thread Yin Huai
> > Jacek > > On 29 Dec 2016 5:03 p.m., "Yin Huai" <yh...@databricks.com> wrote: > >> Hi all, >> >> Apache Spark 2.1.0 is the second release of Spark 2.x line. This release >> makes significant strides in the production readiness of Structured

[ANNOUNCE] Announcing Apache Spark 2.1.0

2016-12-29 Thread Yin Huai
Hi all, Apache Spark 2.1.0 is the second release of Spark 2.x line. This release makes significant strides in the production readiness of Structured Streaming, with added support for event time watermarks

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-12 Thread Yin Huai
-1 I hit https://issues.apache.org/jira/browse/SPARK-18816, which prevents executor page from showing the log links if an application does not have executors initially. On Mon, Dec 12, 2016 at 3:02 PM, Marcelo Vanzin wrote: > Actually this is not a simple pom change. The

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-09 Thread Yin Huai
+1 On Wed, Nov 9, 2016 at 1:14 PM, Yin Huai <yh...@databricks.com> wrote: > +! > > On Wed, Nov 9, 2016 at 1:02 PM, Denny Lee <denny.g@gmail.com> wrote: > >> +1 (non binding) >> >> >> >> On Tue, Nov 8, 2016 at 10:14 PM vaquar khan <

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-09 Thread Yin Huai
+! On Wed, Nov 9, 2016 at 1:02 PM, Denny Lee wrote: > +1 (non binding) > > > > On Tue, Nov 8, 2016 at 10:14 PM vaquar khan wrote: > >> *+1 (non binding)* >> >> On Tue, Nov 8, 2016 at 10:21 PM, Weiqing Yang >> wrote: >> >>

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-04 Thread Yin Huai
+1 On Tue, Nov 1, 2016 at 9:51 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Fri, Nov 4, 2016 at 22:00 PDT and passes if a > majority of at least 3+1 PMC votes are cast. > > [ ] +1 Release

Re: [VOTE] Release Apache Spark 1.6.3 (RC2)

2016-11-03 Thread Yin Huai
+1 On Thu, Nov 3, 2016 at 12:57 PM, Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > +1 > > On Thu, Nov 3, 2016 at 6:58 PM, Michael Armbrust > wrote: > >> +1 >> >> On Wed, Nov 2, 2016 at 5:40 PM, Reynold Xin wrote: >> >>>

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Yin Huai
+1 On Thu, Sep 29, 2016 at 4:07 PM, Luciano Resende wrote: > +1 (non-binding) > > On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.0.1. The vote is open until Sat,

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Yin Huai
+1 On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun wrote: > +1 (non binding) > > RC3 is compiled and tested on the following two systems, too. All tests > passed. > > * CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1 >with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive

Re: [master] ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

2016-08-17 Thread Yin Huai
Yea. Please create a jira. Thanks! On Tue, Aug 16, 2016 at 11:06 PM, Jacek Laskowski <ja...@japila.pl> wrote: > On Tue, Aug 16, 2016 at 10:51 PM, Yin Huai <yh...@databricks.com> wrote: > > > Do you want to try it? > > Yes, indeed! I'd be more than happy. Guid

Re: [master] ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

2016-08-16 Thread Yin Huai
Hi Jacek, We will try to create the default database if it does not exist. Hive actually relies on that AlreadyExistsException to determine if a db already exists and ignore the error to implement the logic of "CREATE DATABASE IF NOT EXISTS". So, that message does not mean any bad thing happened.

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-23 Thread Yin Huai
-1 because of https://issues.apache.org/jira/browse/SPARK-16121. This jira was resolved after 2.0.0-RC1 was cut. Without the fix, Spark SQL effectively only uses the driver to list files when loading datasets and the driver-side file listing is very slow for datasets having many files and

Re: Inconsistent joinWith behavior?

2016-06-20 Thread Yin Huai
Hello Richard, Looks like the Dataset is Dataset[(Int, Int)]. I guess for the case of "ds.joinWith(other, expr, Outer).map({ case (t, u) => (Option(t), Option(u)) })". We are trying to use null to create a "(Int, Int)" and somehow it ended up with a tuple2 having default values. Can you create a

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-19 Thread Yin Huai
+1 On Wed, May 18, 2016 at 10:49 AM, Reynold Xin wrote: > Hi Ovidiu-Cristian , > > The best source of truth is change the filter with target version to > 2.1.0. Not a lot of tickets have been targeted yet, but I'd imagine as we > get closer to 2.0 release, more will be

Re: HiveContext.refreshTable() missing in spark 2.0

2016-05-17 Thread Yin Huai
Hi Yang, I think it's deleted accidentally while we were working on the API migration. We will add it back ( https://issues.apache.org/jira/browse/SPARK-15367). Thanks, Yin On Fri, May 13, 2016 at 2:47 AM, 汪洋 wrote: > Hi all, > > I notice that HiveContext used to have

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-08 Thread Yin Huai
+1 On Mon, Mar 7, 2016 at 12:39 PM, Reynold Xin wrote: > +1 (binding) > > > On Sun, Mar 6, 2016 at 12:08 PM, Egor Pahomov > wrote: > >> +1 >> >> Spark ODBC server is fine, SQL is fine. >> >> 2016-03-03 12:09 GMT-08:00 Yin Yang :

Re: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

2016-01-26 Thread Yin Huai
Can you post more logs, specially lines around "Initializing execution hive ..." (this is for an internal used fake metastore and it is derby) and "Initializing HiveMetastoreConnection version ..." (this is for the real metastore. It should be your remote one)? Also, those temp tables are stored

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Yin Huai
+1 On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee wrote: > +1 > > On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson wrote: > >> +1 >> >> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen >> wrote: >> >>> +1 >>> >>> On Tue, Dec 22, 2015

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-20 Thread Yin Huai
Hi Jerry, Looks like https://issues.apache.org/jira/browse/SPARK-11739 is for the issue you described. It has been fixed in 1.6. With this change, when you call SQLContext.getOrCreate(sc2), we will first check if sc has been stopped. If so, we will create a new SQLContext using sc2. Thanks, Yin

Re: ​Spark 1.6 - H​ive remote metastore not working

2015-12-16 Thread Yin Huai
oh i see. In your log, I guess you can find a line like "Initializing execution hive, version". The line you showed is actually associated with execution hive, which is a fake metastore that used by spark sql internally. Logs related to the real metastore (the metastore storing table metadata and

Re: ​Spark 1.6 - H​ive remote metastore not working

2015-12-16 Thread Yin Huai
I see 15/12/16 00:06:13 INFO metastore: Trying to connect to metastore with URI thrift://remoteNode:9083 15/12/16 00:06:14 INFO metastore: Connected to metastore. Looks like you were connected to your remote metastore. On Tue, Dec 15, 2015 at 3:31 PM, syepes wrote: > ​Hello,

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Yin Huai
+1 On Wed, Dec 16, 2015 at 7:19 PM, Patrick Wendell wrote: > +1 > > On Wed, Dec 16, 2015 at 6:15 PM, Ted Yu wrote: > >> Ran test suite (minus docker-integration-tests) >> All passed >> >> +1 >> >> [INFO] Spark Project External ZeroMQ

Re: [build system] brief downtime right now

2015-12-14 Thread Yin Huai
Hi Shane, Seems Spark's lint-r started to fail from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-Master-SBT/4260/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/console. Is it related to the upgrade work of R? Thanks, Yin On Mon, Dec 14, 2015 at

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Yin Huai
+1 Critical and blocker issues of SQL have been addressed. On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust wrote: > I'll kick off the voting with a +1. > > On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > wrote: > >> Please vote on

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-06 Thread Yin Huai
-1 Tow blocker bugs have been found after this RC. https://issues.apache.org/jira/browse/SPARK-12089 can cause data corruption when an external sorter spills data. https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks from acquiring memory even when the executor indeed can allocate

Re: IntelliJ license for committers?

2015-12-02 Thread Yin Huai
I think they can renew your license. In https://www.jetbrains.com/buy/opensource/?product=idea, you can find "Update Open Source License". On Wed, Dec 2, 2015 at 7:47 AM, Sean Owen wrote: > I'm aware that IntelliJ has (at least in the past) made licenses > available to

Re: Seems jenkins is down (or very slow)?

2015-11-13 Thread Yin Huai
It was generally slow. But, after 5 or 10 minutes, it's all good. On Fri, Nov 13, 2015 at 9:16 AM, shane knapp <skn...@berkeley.edu> wrote: > were you hitting any particular URL when you noticed this, or was it > generally slow? > > On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai <

Re: Seems jenkins is down (or very slow)?

2015-11-12 Thread Yin Huai
Seems it is back. On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai <yh...@databricks.com> wrote: > Hi Guys, > > Seems Jenkins is down or very slow? Does anyone else experience it or just > me? > > Thanks, > > Yin >

Seems jenkins is down (or very slow)?

2015-11-12 Thread Yin Huai
Hi Guys, Seems Jenkins is down or very slow? Does anyone else experience it or just me? Thanks, Yin

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Yin Huai
:57 > *To:* yh...@databricks.com > > *Cc:* r...@databricks.com; dev@spark.apache.org > *Subject:* Re: Dataframe nested schema inference from Json without type > conflicts > > > > Exactly, that's a much better way to put it. > > > > Thanks, > > Ewan >

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Yin Huai
Hi Ewan, For your use case, you only need the schema inference to pick up the structure of your data (basically you want spark sql to infer the type of complex values like arrays and structs but keep the type of primitive values as strings), right? Thanks, Yin On Thu, Oct 1, 2015 at 2:27 PM,

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-27 Thread Yin Huai
+1 Tested 1.5.1 SQL blockers. On Sat, Sep 26, 2015 at 1:36 PM, robineast wrote: > +1 > > > build/mvn clean package -DskipTests -Pyarn -Phadoop-2.6 > OK > Basic graph tests > Load graph using edgeListFile...SUCCESS > Run PageRank...SUCCESS > Minimum Spanning Tree

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
/browse/SPARK-10731>). >> >> Best Regards, >> >> Jerry >> >> >> On Mon, Sep 21, 2015 at 1:01 PM, Yin Huai <yh...@databricks.com> wrote: >> >>> btw, does 1.4 has the same problem? >>> >>> On Mon, Sep 21, 2015 at 10:01

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Seems 1.4 has the same issue. On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote: > btw, does 1.4 has the same problem? > > On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote: > >> Hi Jerry, >> >> Looks like it

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
btw, does 1.4 has the same problem? On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote: > Hi Jerry, > > Looks like it is a Python-specific issue. Can you create a JIRA? > > Thanks, > > Yin > > On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam <

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Hi Jerry, Looks like it is a Python-specific issue. Can you create a JIRA? Thanks, Yin On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote: > Hi Spark Developers, > > I just ran some very simple operations on a dataset. I was surprise by the > execution plan of take(1),

Re: HyperLogLogUDT

2015-09-13 Thread Yin Huai
gt; > — > Sent from Mailbox <https://www.dropbox.com/mailbox> > > > On Sun, Sep 13, 2015 at 12:09 AM, Yin Huai <yh...@databricks.com> wrote: > >> Hi Nick, >> >> The buffer exposed to UDAF interface is just a view of underlying buffer >> (this under

Re: HyperLogLogUDT

2015-09-12 Thread Yin Huai
Hi Nick, The buffer exposed to UDAF interface is just a view of underlying buffer (this underlying buffer is shared by different aggregate functions and every function takes one or multiple slots). If you need a UDAF, extending UserDefinedAggregationFunction is the preferred approach.

Re: [SparkSQL]Could not alter table in Spark 1.5 use HiveContext

2015-09-10 Thread Yin Huai
Yes, Spark 1.5 use Hive 1.2's metastore client by default. You can change it by putting the following settings in your spark conf. spark.sql.hive.metastore.version = 0.13.1 spark.sql.hive.metastore.jars = maven or the path of your hive 0.13 jars and hadoop jars For spark.sql.hive.metastore.jars,

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Yin Huai
Hi Krishna, Can you share your code to reproduce the memory allocation issue? Thanks, Yin On Fri, Sep 4, 2015 at 8:00 AM, Krishna Sankar wrote: > Thanks Tom. Interestingly it happened between RC2 and RC3. > Now my vote is +1/2 unless the memory error is known and has a

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-28 Thread Yin Huai
-1 Found a problem on reading partitioned table. Right now, we may create a SQL project/filter operator for every partition. When we have thousands of partitions, there will be a huge number of SQLMetrics (accumulators), which causes high memory pressure to the driver and then takes down the

Re: SQLContext.read.json(path) throws java.io.IOException

2015-08-26 Thread Yin Huai
The JSON support in Spark SQL handles a file with one JSON object per line or one JSON array of objects per line. What is the format your file? Does it only contain a single line? On Wed, Aug 26, 2015 at 6:47 AM, gsvic victora...@gmail.com wrote: Hi, I have the following issue. I am trying to

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Yin Huai
+1. I tested those SQL blocker bugs in my laptop and they have been fixed. On Mon, Jun 29, 2015 at 6:51 AM, Sean Owen so...@cloudera.com wrote: +1 sigs, license, etc check out. All tests pass for me in the Hadoop 2.6 + Hive configuration on Ubuntu. (I still get those pesky cosmetic UDF test

Re: Hive 0.12 support in 1.4.0 ?

2015-06-22 Thread Yin Huai
Hi Tom, In Spark 1.4, we have de-coupled the support of Hive's metastore and other parts (parser, Hive udfs, and Hive SerDes). The execution engine of Spark SQL in 1.4 will always use Hive 0.13.1. For the metastore connection part, you can connect to either Hive 0.12 or 0.13.1's metastore. We

Re: Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Yin Huai
Is it the full stack trace? On Thu, Jun 18, 2015 at 6:39 AM, Sea 261810...@qq.com wrote: Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set spark.yarn.jar and spark.jars in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Yin Huai
Sean, Can you add -Phive -Phive-thriftserver and try those Hive tests? Thanks, Yin On Fri, Jun 5, 2015 at 5:19 AM, Sean Owen so...@cloudera.com wrote: Everything checks out again, and the tests pass for me on Ubuntu + Java 7 with '-Pyarn -Phadoop-2.6', except that I always get

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Yin Huai
Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207 just before

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-29 Thread Yin Huai
For Spark SQL internal operations, probably we can just create MapPartitionsRDD directly (like https://github.com/apache/spark/commit/5287eec5a6948c0c6e0baaebf35f512324c0679a ). On Fri, May 29, 2015 at 11:04 AM, Josh Rosen rosenvi...@gmail.com wrote: Hey, want to file a JIRA for this? This

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-28 Thread Yin Huai
Justin, If you are creating multiple HiveContexts in tests, you need to assign a temporary metastore location for every HiveContext (like what we do at here https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L527-L543). Otherwise, they

[Spark SQL] Generating new golden answer files for HiveComparisonTest

2015-04-25 Thread Yin Huai
Spark SQL developers, If you are trying to add new tests based on HiveComparisonTest and want to generate golden answer files with Hive 0.13.1, unfortunately, the setup work is quite different from that for Hive 0.12. We have updated SQL readme to include the new instruction for Hive 0.13.1. You

Re: dataframe can not find fields after loading from hive

2015-04-19 Thread Yin Huai
Hi Cesar, Can you try 1.3.1 ( https://spark.apache.org/releases/spark-release-1-3-1.html) and see if it still shows the error? Thanks, Yin On Fri, Apr 17, 2015 at 1:58 PM, Reynold Xin r...@databricks.com wrote: This is strange. cc the dev list since it might be a bug. On Thu, Apr 16,

Re: Spark SQL ExternalSorter not stopped

2015-03-20 Thread Yin Huai
Hi Michael, Thanks for reporting it. Yes, it is a bug. I have created https://issues.apache.org/jira/browse/SPARK-6437 to track it. Thanks, Yin On Thu, Mar 19, 2015 at 10:51 AM, Michael Allman mich...@videoamp.com wrote: I've examined the experimental support for ExternalSorter in Spark SQL,

Re: Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Yin Huai
Hi Nitay, Can you try using backticks to quote the column name? Like org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType( struct`int`:bigint)? Thanks, Yin On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com wrote: Thanks for reporting. This was a result of a change

Re: org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.1] failure: ``varchar'' expected but identifier char found in spark-sql

2015-02-17 Thread Yin Huai
Hi Quizhuang, Right now, char is not supported in DDL. Can you try varchar or string? Thanks, Yin On Mon, Feb 16, 2015 at 10:39 PM, Qiuzhuang Lian qiuzhuang.l...@gmail.com wrote: Hi, I am not sure this has been reported already or not, I run into this error under spark-sql shell as build

Re: Join implementation in SparkSQL

2015-01-16 Thread Yin Huai
Hi Alex, Can you attach the output of sql(explain extended your query).collect.foreach(println)? Thanks, Yin On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta alexbare...@gmail.com wrote: Reynold, The source file you are directing me to is a little too terse for me to understand what

Re: scala.MatchError on SparkSQL when creating ArrayType of StructType

2014-12-08 Thread Yin Huai
Seems you hit https://issues.apache.org/jira/browse/SPARK-4245. It was fixed in 1.2. Thanks, Yin On Wed, Dec 3, 2014 at 11:50 AM, invkrh inv...@gmail.com wrote: Hi, I am using SparkSQL on 1.1.0 branch. The following code leads to a scala.MatchError at

Get attempt number in a closure

2014-10-20 Thread Yin Huai
Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this

Re: Get attempt number in a closure

2014-10-20 Thread Yin Huai
think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure

Re: Get attempt number in a closure

2014-10-20 Thread Yin Huai
Reynold? -Kay On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote: There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin

Re: Spark SQL Query and join different data sources.

2014-09-02 Thread Yin Huai
Actually, with HiveContext, you can join hive tables with registered temporary tables. On Fri, Aug 22, 2014 at 9:07 PM, chutium teng@gmail.com wrote: oops, thanks Yan, you are right, i got scala sqlContext.sql(select * from a join b).take(10) java.lang.RuntimeException: Table Not Found: