Re: Tungsten gives unexpected results when selecting null elements in array

2015-12-21 Thread PierreB
For info, this is the generated code: GeneratedExpressionCode( cursor8 = 16; convertedStruct6.pointTo(buffer7, Platform.BYTE_ARRAY_OFFSET, 1, cursor8); /* input[0, ArrayType(StringType,true)][0] */ /* input[0, ArrayType(StringType,true)] */ boolean isNull2

Re: Tungsten gives unexpected results when selecting null elements in array

2015-12-21 Thread PierreB
I believe the problem is that the generated code does not check if the selected item in the array is null.Naïvely, I think changing this line would solve this:

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Zhan Zhang
This looks to me is a very unusual use case. You stop the SparkContext, and start another one. I don’t think it is well supported. As the SparkContext is stopped, all the resources are supposed to be released. Is there any mandatory reason you have stop and restart another SparkContext.

Re: Expression/LogicalPlan dichotomy in Spark SQL Catalyst

2015-12-21 Thread Michael Armbrust
> > Why was the choice made in Catalyst to make LogicalPlan/QueryPlan and > Expression separate subclasses of TreeNode, instead of e.g. also make > QueryPlan inherit from Expression? > I think this is a pretty common way to model things (glancing at postgres it looks similar). Expression and

pyspark streaming 1.6 mapWithState?

2015-12-21 Thread Renyi Xiong
Hi TD, I noticed mapWithState was available in spark 1.6. Is there any plan to enable it in pyspark as well? thanks, Renyi.

Expression/LogicalPlan dichotomy in Spark SQL Catalyst

2015-12-21 Thread Roland Reumerman
[Note: this question has been moved from the Conversation in [SPARK-4226][SQL]Add subquery (not) in/exists support #9055 to the dev mailing list.] We've added our own In/Exists - plus Subquery in Select - support to a partial fork of Spark SQL Catalyst (which we use in transformations from

Re: Tungsten gives unexpected results when selecting null elements in array

2015-12-21 Thread Reynold Xin
Thanks for the email. Do you mind creating a JIRA ticket and reply with a link to the ticket? On Mon, Dec 21, 2015 at 1:12 PM, PierreB < pierre.borckm...@realimpactanalytics.com> wrote: > I believe the problem is that the generated code does not check if the > selected item in the array is null.

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-21 Thread Michael Armbrust
It's come to my attention that there have been several bug fixes merged since RC3: - SPARK-12404 - Fix serialization error for Datasets with Timestamps/Arrays/Decimal - SPARK-12218 - Fix incorrect pushdown of filters to parquet - SPARK-12395 - Fix join columns of outer join for DataFrame

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Jerry Lam
Hi Zhan, I'm illustrating the issue via a simple example. However it is not difficult to imagine use cases that need this behaviour. For example, you want to release all resources of spark when it does not use for longer than an hour in a job server like web services. Unless you can prevent

Re: A proposal for Spark 2.0

2015-12-21 Thread Reynold Xin
FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT. On Tue, Nov 10, 2015 at 3:10 PM, Reynold Xin wrote: > I’m starting a new thread since the other one got intermixed with feature > requests. Please refrain from making feature request in this thread. Not >

Re: A proposal for Spark 2.0

2015-12-21 Thread Reynold Xin
I'm not sure if we need special API support for GPUs. You can already use GPUs on individual executor nodes to build your own applications. If we want to leverage GPUs out of the box, I don't think the solution is to provide GPU specific APIs. Rather, we should just switch the underlying execution

Re: A proposal for Spark 2.0

2015-12-21 Thread Allen Zhang
Thanks your quick respose, ok, I will start a new thread with my thoughts Thanks, Allen At 2015-12-22 15:19:49, "Reynold Xin" wrote: I'm not sure if we need special API support for GPUs. You can already use GPUs on individual executor nodes to build your own

Re: A proposal for Spark 2.0

2015-12-21 Thread Allen Zhang
plus dev 在 2015-12-22 15:15:59,"Allen Zhang" 写道: Hi Reynold, Any new API support for GPU computing in our 2.0 new version ? -Allen 在 2015-12-22 14:12:50,"Reynold Xin" 写道: FYI I updated the master branch's Spark version to

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Ted Yu
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @work wrote: > Jerry > I thought you should not create more than one SparkContext

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Chester @work
Jerry I thought you should not create more than one SparkContext within one Jvm, ... Chester Sent from my iPhone > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave >