[jira] [Updated] (SPARK-12659) NPE when spill in CartisianProduct

2016-01-06 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12659:
---
Affects Version/s: (was: 1.6.0)
   2.0.0

> NPE when spill in CartisianProduct
> --
>
> Key: SPARK-12659
> URL: https://issues.apache.org/jira/browse/SPARK-12659
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:54)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:37)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:231)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12542) Support intersect/except in Hive SQL

2016-01-06 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12542:
--

Assignee: Davies Liu  (was: Xiao Li)

> Support intersect/except in Hive SQL
> 
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12542) Support intersect/except in Hive SQL

2016-01-06 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12542:
---
Summary: Support intersect/except in Hive SQL  (was: Support 
union/intersect/except in Hive SQL)

> Support intersect/except in Hive SQL
> 
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Xiao Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12662) Add a local sort operator to DataFrame used by randomSplit

2016-01-06 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086056#comment-15086056
 ] 

Davies Liu commented on SPARK-12662:


Another way to make the DataFrame deterministic is materialize it, via cache.

> Add a local sort operator to DataFrame used by randomSplit
> --
>
> Key: SPARK-12662
> URL: https://issues.apache.org/jira/browse/SPARK-12662
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Yin Huai
>Assignee: Sameer Agarwal
>
> With {{./bin/spark-shell --master=local-cluster[2,1,2014]}}, the following 
> code will provide overlapped rows for two DFs returned by the randomSplit. 
> {code}
> sqlContext.sql("drop table if exists test")
> val x = sc.parallelize(1 to 210)
> case class R(ID : Int)
> sqlContext.createDataFrame(x.map 
> {R(_)}).write.format("json").saveAsTable("bugsc1597")
> var df = sql("select distinct ID from test")
> var Array(a, b) = df.randomSplit(Array(0.333, 0.667), 1234L)
> a.registerTempTable("a")
> b.registerTempTable("b")
> val intersectDF = a.intersect(b)
> intersectDF.show
> {code}
> The reason is that {{sql("select distinct ID from test")} does not guarantee 
> the ordering rows in a partition. It will be good to add a local sort 
> operator to make row ordering within a partition deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12681) Split IdentifiersParser.g to avoid single huge java source

2016-01-06 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12681.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10624
[https://github.com/apache/spark/pull/10624]

> Split IdentifiersParser.g to avoid single huge java source
> --
>
> Key: SPARK-12681
> URL: https://issues.apache.org/jira/browse/SPARK-12681
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> {code}
> [error] 
> ~spark/sql/hive/target/scala-2.10/src_managed/main/org/apache/spark/sql/parser/SparkSqlParser_IdentifiersParser.java:11056:
>   ERROR: too long
> [error] static final String[] DFA5_transitionS = {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12681) Split IdentifiersParser.g to avoid single huge java source

2016-01-06 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12681:
--

 Summary: Split IdentifiersParser.g to avoid single huge java source
 Key: SPARK-12681
 URL: https://issues.apache.org/jira/browse/SPARK-12681
 Project: Spark
  Issue Type: Sub-task
Reporter: Davies Liu



{code}
[error] 
~spark/sql/hive/target/scala-2.10/src_managed/main/org/apache/spark/sql/parser/SparkSqlParser_IdentifiersParser.java:11056:
  ERROR: too long
[error] static final String[] DFA5_transitionS = {
[error]   ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12681) Split IdentifiersParser.g to avoid single huge java source

2016-01-06 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12681:
--

Assignee: Davies Liu

> Split IdentifiersParser.g to avoid single huge java source
> --
>
> Key: SPARK-12681
> URL: https://issues.apache.org/jira/browse/SPARK-12681
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> {code}
> [error] 
> ~spark/sql/hive/target/scala-2.10/src_managed/main/org/apache/spark/sql/parser/SparkSqlParser_IdentifiersParser.java:11056:
>   ERROR: too long
> [error] static final String[] DFA5_transitionS = {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12659) NPE when spill in CartisianProduct

2016-01-05 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12659:
--

 Summary: NPE when spill in CartisianProduct
 Key: SPARK-12659
 URL: https://issues.apache.org/jira/browse/SPARK-12659
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Davies Liu
Assignee: Davies Liu


{code}
java.lang.NullPointerException
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:54)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:37)
at 
org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:231)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12511) streaming driver with checkpointing unable to finalize leading to OOM

2016-01-05 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12511.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10514
[https://github.com/apache/spark/pull/10514]

> streaming driver with checkpointing unable to finalize leading to OOM
> -
>
> Key: SPARK-12511
> URL: https://issues.apache.org/jira/browse/SPARK-12511
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Affects Versions: 1.5.2, 1.6.0
> Environment: pyspark 1.5.2
> yarn 2.6.0
> python 2.6
> centos 6.5
> openjdk 1.8.0
>Reporter: Antony Mayi
>Assignee: Shixiong Zhu
>Priority: Critical
> Fix For: 2.0.0, 1.6.1
>
> Attachments: bug.py, finalizer-classes.png, finalizer-pending.png, 
> finalizer-spark_assembly.png
>
>
> Spark streaming application when configured with checkpointing is filling 
> driver's heap with multiple ZipFileInputStream instances as results of 
> spark-assembly.jar (potentially some others like for example snappy-java.jar) 
> getting repetitively referenced (loaded?). Java Finalizer can't finalize 
> these ZipFileInputStream instances and it eventually takes all heap leading 
> the driver to OOM crash.
> h2. Steps to reproduce:
> * Submit attached [^bug.py] to spark
> * Leave it running and monitor the driver java process heap
> ** with heap dump you will primarily see growing instances of byte array data 
> (here cumulated zip payload of the jar refs):
> {noformat}
>  num #instances #bytes  class name
> --
>1: 32653   32735296  [B
>2: 480005135816  [C
>3:411344144  [Lscala.concurrent.forkjoin.ForkJoinTask;
>4: 113621261816  java.lang.Class
>5: 470541129296  java.lang.String
>6: 254601018400  java.lang.ref.Finalizer
>7:  9802 789400  [Ljava.lang.Object;
> {noformat}
> ** with visualvm you can see:
> *** increasing number of objects pending for finalization
> !finalizer-pending.png!
> *** increasing number of ZipFileInputStreams instances related to the 
> spark-assembly.jar referenced by Finalizer
> !finalizer-spark_assembly.png!
> * Depending on the heap size and running time this will lead to driver OOM 
> crash
> h2. Comments
> * The [^bug.py] is lightweight proof of the problem. In production I am 
> experiencing this as quite rapid effect - in few hours it eats gigs of heap 
> and kills the app.
> * If the same [^bug.py] is run without checkpointing there is no issue 
> whatsoever.
> * Not sure if it is just pyspark related.
> * In [^bug.py] I am using the socketTextStream input but seems to be 
> independent of the input type (in production having same problem with Kafka 
> direct stream, have seen it even with textFileStream).
> * It is happening even if the input stream doesn't produce any data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12617) socket descriptor leak killing streaming app

2016-01-05 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12617.

   Resolution: Fixed
Fix Version/s: 1.6.1
   1.5.3
   2.0.0

Issue resolved by pull request 10579
[https://github.com/apache/spark/pull/10579]

> socket descriptor leak killing streaming app
> 
>
> Key: SPARK-12617
> URL: https://issues.apache.org/jira/browse/SPARK-12617
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Affects Versions: 1.5.2
> Environment: pyspark (python 2.6)
>Reporter: Antony Mayi
>Assignee: Shixiong Zhu
>Priority: Critical
> Fix For: 2.0.0, 1.5.3, 1.6.1
>
> Attachments: bug.py
>
>
> There is a socket descriptor leakage in a pyspark streaming app when 
> configured with batch interval more then 30 seconds. This is due to default 
> timeout in py4j JavaGateway which (half-)closes CallbackConnection after 30 
> seconds of inactivity and creates new one next time. That connection doesn't 
> get closed on the python CallbackServer side and keep piling up until it 
> eventually blocks new connections.
> h2. Steps to reproduce:
> * Submit attached [^bug.py] to spark
> * Watch {{/tmp/bug.log}} to see the increasing total number of py4j callback 
> connections of which 0 will ever be closed
> {code}
> [BUG] py4j callback server port: 51282
> [BUG] py4j CB 0/0 closed
> ...
> [BUG] py4j CB 0/123 closed
> {code}
> * You can confirm the reality by using lsof on the pyspark driver process:
> {code}
> $ sudo lsof -p 39770 | grep CLOSE_WAIT | grep :51282
> python2.6 39770  das   94u  IPv4 138824906  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:60419 (CLOSE_WAIT)
> python2.6 39770  das   95u  IPv4 138867747  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:60745 (CLOSE_WAIT)
> python2.6 39770  das   96u  IPv4 138831829  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:32849 (CLOSE_WAIT)
> python2.6 39770  das   97u  IPv4 138890524  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:33184 (CLOSE_WAIT)
> python2.6 39770  das   98u  IPv4 138860190  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:33512 (CLOSE_WAIT)
> python2.6 39770  das   99u  IPv4 138860439  0t0   TCP 
> localhost.localdomain:51282->localhost.localdomain:33854 (CLOSE_WAIT)
> ...
> {code}
> * If you leave it running for long enough the CallbackServer will eventually 
> become unable to accept new connections from the gateway and the app will 
> crash:
> {code}
> 16/01/02 05:12:07 ERROR scheduler.JobScheduler: Error generating jobs for 
> time 145171140 ms
> py4j.Py4JException: Error while obtaining a new communication channel
> ...
> Caused by: java.net.ConnectException: Connection timed out
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at java.net.Socket.(Socket.java:434)
> at java.net.Socket.(Socket.java:244)
> at py4j.CallbackConnection.start(CallbackConnection.java:104)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12636) Expose API on UnsafeRowRecordReader to just run on files

2016-01-05 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12636.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10581
[https://github.com/apache/spark/pull/10581]

> Expose API on UnsafeRowRecordReader to just run on files
> 
>
> Key: SPARK-12636
> URL: https://issues.apache.org/jira/browse/SPARK-12636
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 2.0.0
>
>
> This is beneficial just from a code testability point of view to be able to 
> exercise individual components. Also makes it easy to benchmark it. It would 
> be able to read data without need to create al the associate hadoop input 
> split, etc components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12661) Drop Python 2.6 support in PySpark

2016-01-05 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12661:
--

 Summary: Drop Python 2.6 support in PySpark
 Key: SPARK-12661
 URL: https://issues.apache.org/jira/browse/SPARK-12661
 Project: Spark
  Issue Type: Task
Reporter: Davies Liu


1. stop testing with 2.6
2. remove the code for python 2.6


see discussion : 
https://www.mail-archive.com/user@spark.apache.org/msg43423.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2016-01-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12470.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10421
[https://github.com/apache/spark/pull/10421]

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
> Fix For: 2.0.0, 1.6.1
>
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12541) Support rollup/cube in SQL query

2016-01-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12541.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10522
[https://github.com/apache/spark/pull/10522]

> Support rollup/cube in SQL query
> 
>
> Key: SPARK-12541
> URL: https://issues.apache.org/jira/browse/SPARK-12541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> We have DataFrame API for rollup/cube, but do not support them in SQL parser 
> (both SQLContext and HiveContext).
> PS: Hive parser only support `group by a, b,c WITH cube/rollup`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12624) When schema is specified, we should treat undeclared fields as null (in Python)

2016-01-04 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081932#comment-15081932
 ] 

Davies Liu commented on SPARK-12624:


[~rxin] The RDD or list could be just tuple, how do we know which is the 
missing column, the last one(s)?

> When schema is specified, we should treat undeclared fields as null (in 
> Python)
> ---
>
> Key: SPARK-12624
> URL: https://issues.apache.org/jira/browse/SPARK-12624
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Priority: Critical
>
> See https://github.com/apache/spark/pull/10564
> Basically that test case should pass without the above fix and just assume b 
> is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit

2016-01-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12289.

   Resolution: Fixed
 Assignee: Davies Liu  (was: Apache Spark)
Fix Version/s: 2.0.0

> Support UnsafeRow in TakeOrderedAndProject/Limit
> 
>
> Key: SPARK-12289
> URL: https://issues.apache.org/jira/browse/SPARK-12289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12292) Support UnsafeRow in Generate

2016-01-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12292.

   Resolution: Fixed
 Assignee: Davies Liu
Fix Version/s: 2.0.0

> Support UnsafeRow in Generate
> -
>
> Key: SPARK-12292
> URL: https://issues.apache.org/jira/browse/SPARK-12292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12293) Support UnsafeRow in LocalTableScan

2016-01-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12293.

   Resolution: Fixed
 Assignee: Davies Liu  (was: Liang-Chi Hsieh)
Fix Version/s: 2.0.0

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12585) The numFields of UnsafeRow should not changed by pointTo()

2015-12-30 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12585.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10528
[https://github.com/apache/spark/pull/10528]

> The numFields of UnsafeRow should not changed by pointTo()
> --
>
> Key: SPARK-12585
> URL: https://issues.apache.org/jira/browse/SPARK-12585
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes 
> is calculated, making pointTo() a little bit heavy.
> It should be part of constructor of UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12585) The numFields of UnsafeRow should not changed by pointTo()

2015-12-30 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12585:
--

 Summary: The numFields of UnsafeRow should not changed by pointTo()
 Key: SPARK-12585
 URL: https://issues.apache.org/jira/browse/SPARK-12585
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu


Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is 
calculated, making pointTo() a little bit heavy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12585) The numFields of UnsafeRow should not changed by pointTo()

2015-12-30 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12585:
---
Description: 
Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is 
calculated, making pointTo() a little bit heavy.

It should be part of constructor of UnsafeRow.

  was:Right now, numFields will be passed in by pointTo(), then 
bitSetWidthInBytes is calculated, making pointTo() a little bit heavy.


> The numFields of UnsafeRow should not changed by pointTo()
> --
>
> Key: SPARK-12585
> URL: https://issues.apache.org/jira/browse/SPARK-12585
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>
> Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes 
> is calculated, making pointTo() a little bit heavy.
> It should be part of constructor of UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12585) The numFields of UnsafeRow should not changed by pointTo()

2015-12-30 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12585:
--

Assignee: Davies Liu

> The numFields of UnsafeRow should not changed by pointTo()
> --
>
> Key: SPARK-12585
> URL: https://issues.apache.org/jira/browse/SPARK-12585
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes 
> is calculated, making pointTo() a little bit heavy.
> It should be part of constructor of UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12300) Fix schema inferance on local collections

2015-12-30 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12300:
---
Fix Version/s: (was: 1.6.0)
   1.6.1

> Fix schema inferance on local collections
> -
>
> Key: SPARK-12300
> URL: https://issues.apache.org/jira/browse/SPARK-12300
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: holdenk
>Priority: Minor
> Fix For: 1.6.1, 2.0.0
>
>
> Current schema inferance for local python collections halts as soon as there 
> are no NullTypes. This is different than when we specify a sampling ratio of 
> 1.0 on a distributed collection. This could result in incomplete schema 
> information.
> Repro:
> {code}
> input = [{"a": 1}, {"b": "coffee"}]
> df = sqlContext.createDataFrame(input)
> print df.schema
> {code}
> Discovered while looking at SPARK-2870



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12300) Fix schema inferance on local collections

2015-12-30 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12300.

   Resolution: Fixed
Fix Version/s: 1.6.0
   2.0.0

Issue resolved by pull request 10275
[https://github.com/apache/spark/pull/10275]

> Fix schema inferance on local collections
> -
>
> Key: SPARK-12300
> URL: https://issues.apache.org/jira/browse/SPARK-12300
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: holdenk
>Priority: Minor
> Fix For: 2.0.0, 1.6.0
>
>
> Current schema inferance for local python collections halts as soon as there 
> are no NullTypes. This is different than when we specify a sampling ratio of 
> 1.0 on a distributed collection. This could result in incomplete schema 
> information.
> Repro:
> {code}
> input = [{"a": 1}, {"b": "coffee"}]
> df = sqlContext.createDataFrame(input)
> print df.schema
> {code}
> Discovered while looking at SPARK-2870



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12542) Support union/intersect/except in Hive SQL

2015-12-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074368#comment-15074368
 ] 

Davies Liu commented on SPARK-12542:


[~viirya] [~xiaol] That's true, updated the title. Once SPARK-12362 is 
resolved, you can start to work on it.

> Support union/intersect/except in Hive SQL
> --
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Xiao Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12542) Support union/intersect/except in Hive SQL

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12542:
---
Assignee: Xiao Li

> Support union/intersect/except in Hive SQL
> --
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Xiao Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12542) Support union/intersect/except in Hive SQL

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12542:
---
Summary: Support union/intersect/except in Hive SQL  (was: Support 
intersect/except in SQL)

> Support union/intersect/except in Hive SQL
> --
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> Also `UNION ALL`in Hive SQL parser



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12542) Support union/intersect/except in Hive SQL

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12542:
---
Description: (was: Also `UNION ALL`in Hive SQL parser)

> Support union/intersect/except in Hive SQL
> --
>
> Key: SPARK-12542
> URL: https://issues.apache.org/jira/browse/SPARK-12542
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12385) Push projection into Join

2015-12-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074383#comment-15074383
 ] 

Davies Liu commented on SPARK-12385:


[~smilegator] Thanks for working on this, could you take the Aggregation as an 
example, Join could have a optional resultExpression (which does not equal to 
left.output ++ right.output), then expressions of Projection could be pushed 
into it.

> Push projection into Join
> -
>
> Key: SPARK-12385
> URL: https://issues.apache.org/jira/browse/SPARK-12385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> We usually have  Join followed by a projection to pruning some columns, but 
> Join already have a result projection to produce UnsafeRow, we should combine 
> them together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12544) Support window functions in SQLContext

2015-12-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074369#comment-15074369
 ] 

Davies Liu commented on SPARK-12544:


[~hvanhovell]  Right now, the SQL parser can't parse the clause like `rank() 
over (partition by xxx)`, but Hive parser can. Once we use  the Hive parser 
replace the Spark SQL once, we can close this JIRA.

> Support window functions in SQLContext
> --
>
> Key: SPARK-12544
> URL: https://issues.apache.org/jira/browse/SPARK-12544
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12541) Support rollup/cube in SQL query

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12541:
--

Assignee: Davies Liu

> Support rollup/cube in SQL query
> 
>
> Key: SPARK-12541
> URL: https://issues.apache.org/jira/browse/SPARK-12541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> We have DataFrame API for rollup/cube, but do not support them in SQL parser 
> (both SQLContext and HiveContext).
> PS: Hive parser only support `group by a, b,c WITH cube/rollup`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12541) Support rollup/cube in SQL query

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12541:
---
Description: We have DataFrame API for rollup/cube, but do not support them 
in SQL parser (both SQLContext and HiveContext).  (was: We have DataFrame API 
for rollup/cube/pivot, but do not support them in SQL parser (both SQLContext 
and HiveContext).)

> Support rollup/cube in SQL query
> 
>
> Key: SPARK-12541
> URL: https://issues.apache.org/jira/browse/SPARK-12541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> We have DataFrame API for rollup/cube, but do not support them in SQL parser 
> (both SQLContext and HiveContext).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12541) Support rollup/cube in SQL query

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12541:
---
Description: 
We have DataFrame API for rollup/cube, but do not support them in SQL parser 
(both SQLContext and HiveContext).

PS: Hive parser only support `group by a, b,c WITH cube/rollup`

  was:We have DataFrame API for rollup/cube, but do not support them in SQL 
parser (both SQLContext and HiveContext).


> Support rollup/cube in SQL query
> 
>
> Key: SPARK-12541
> URL: https://issues.apache.org/jira/browse/SPARK-12541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> We have DataFrame API for rollup/cube, but do not support them in SQL parser 
> (both SQLContext and HiveContext).
> PS: Hive parser only support `group by a, b,c WITH cube/rollup`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12541) Support rollup/cube in SQL query

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12541:
---
Summary: Support rollup/cube in SQL query  (was: Support rollup/cube/povit 
in SQL query)

> Support rollup/cube in SQL query
> 
>
> Key: SPARK-12541
> URL: https://issues.apache.org/jira/browse/SPARK-12541
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>
> We have DataFrame API for rollup/cube/pivot, but do not support them in SQL 
> parser (both SQLContext and HiveContext).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11437) createDataFrame shouldn't .take() when provided schema

2015-12-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074726#comment-15074726
 ] 

Davies Liu commented on SPARK-11437:


[~maver1ck] Maybe we should provide a API to verify the schema (called manually 
by user)?

> createDataFrame shouldn't .take() when provided schema
> --
>
> Key: SPARK-11437
> URL: https://issues.apache.org/jira/browse/SPARK-11437
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jason White
>Assignee: Jason White
> Fix For: 1.6.0
>
>
> When creating a DataFrame from an RDD in PySpark, `createDataFrame` calls 
> `.take(10)` to verify the first 10 rows of the RDD match the provided schema. 
> Similar to https://issues.apache.org/jira/browse/SPARK-8070, but that issue 
> affected cases where a schema was not provided.
> Verifying the first 10 rows is of limited utility and causes the DAG to be 
> executed non-lazily. If necessary, I believe this verification should be done 
> lazily on all rows. However, since the caller is providing a schema to 
> follow, I think it's acceptable to simply fail if the schema is incorrect.
> https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L321-L325



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu closed SPARK-12291.
--
Resolution: Not A Problem

> Support UnsafeRow in BroadcastLeftSemiJoinHash
> --
>
> Key: SPARK-12291
> URL: https://issues.apache.org/jira/browse/SPARK-12291
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu closed SPARK-11855.
--
Resolution: Won't Fix

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12540) Support all TPCDS queries

2015-12-29 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12540:
--

Assignee: Davies Liu

> Support all TPCDS queries
> -
>
> Key: SPARK-12540
> URL: https://issues.apache.org/jira/browse/SPARK-12540
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Spark SQL 1.6 can run 55 out of 99 TPCDS queries, the goal is to support all 
> of them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash

2015-12-29 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074729#comment-15074729
 ] 

Davies Liu commented on SPARK-12291:


Thanks, I will close this one.

> Support UnsafeRow in BroadcastLeftSemiJoinHash
> --
>
> Key: SPARK-12291
> URL: https://issues.apache.org/jira/browse/SPARK-12291
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12541) Support rollup/cube/povit in SQL query

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12541:
--

 Summary: Support rollup/cube/povit in SQL query
 Key: SPARK-12541
 URL: https://issues.apache.org/jira/browse/SPARK-12541
 Project: Spark
  Issue Type: New Feature
Reporter: Davies Liu


We have DataFrame API for rollup/cube/pivot, but do not support them in SQL 
parser (both SQLContext and HiveContext).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12544) Support window functions in SQLContext

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12544:
--

 Summary: Support window functions in SQLContext
 Key: SPARK-12544
 URL: https://issues.apache.org/jira/browse/SPARK-12544
 Project: Spark
  Issue Type: New Feature
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12545) Support exists condition

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12545:
--

 Summary: Support exists condition
 Key: SPARK-12545
 URL: https://issues.apache.org/jira/browse/SPARK-12545
 Project: Spark
  Issue Type: New Feature
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12540) Support all TPCDS queries

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12540:
--

 Summary: Support all TPCDS queries
 Key: SPARK-12540
 URL: https://issues.apache.org/jira/browse/SPARK-12540
 Project: Spark
  Issue Type: Epic
  Components: SQL
Reporter: Davies Liu


Spark SQL 1.6 can run 55 out of 99 TPCDS queries, the goal is to support all of 
them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12543) Support subquery in select/where/having

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12543:
--

 Summary: Support subquery in select/where/having
 Key: SPARK-12543
 URL: https://issues.apache.org/jira/browse/SPARK-12543
 Project: Spark
  Issue Type: New Feature
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12542) Support intersect/except in SQL

2015-12-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12542:
--

 Summary: Support intersect/except in SQL
 Key: SPARK-12542
 URL: https://issues.apache.org/jira/browse/SPARK-12542
 Project: Spark
  Issue Type: New Feature
Reporter: Davies Liu


Also `UNION ALL`in Hive SQL parser



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12511) streaming driver with checkpointing unable to finalize leading to OOM

2015-12-28 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12511:
---
Assignee: Shixiong Zhu

> streaming driver with checkpointing unable to finalize leading to OOM
> -
>
> Key: SPARK-12511
> URL: https://issues.apache.org/jira/browse/SPARK-12511
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Affects Versions: 1.5.2
> Environment: pyspark 1.5.2
> yarn 2.6.0
> python 2.6
> centos 6.5
> openjdk 1.8.0
>Reporter: Antony Mayi
>Assignee: Shixiong Zhu
>Priority: Critical
> Attachments: bug.py, finalizer-classes.png, finalizer-pending.png, 
> finalizer-spark_assembly.png
>
>
> Spark streaming application when configured with checkpointing is filling 
> driver's heap with multiple ZipFileInputStream instances as results of 
> spark-assembly.jar (potentially some others like for example snappy-java.jar) 
> getting repetitively referenced (loaded?). Java Finalizer can't finalize 
> these ZipFileInputStream instances and it eventually takes all heap leading 
> the driver to OOM crash.
> h2. Steps to reproduce:
> * Submit attached [^bug.py] to spark
> * Leave it running and monitor the driver java process heap
> ** with heap dump you will primarily see growing instances of byte array data 
> (here cumulated zip payload of the jar refs):
> {noformat}
>  num #instances #bytes  class name
> --
>1: 32653   32735296  [B
>2: 480005135816  [C
>3:411344144  [Lscala.concurrent.forkjoin.ForkJoinTask;
>4: 113621261816  java.lang.Class
>5: 470541129296  java.lang.String
>6: 254601018400  java.lang.ref.Finalizer
>7:  9802 789400  [Ljava.lang.Object;
> {noformat}
> ** with visualvm you can see:
> *** increasing number of objects pending for finalization
> !finalizer-pending.png!
> *** increasing number of ZipFileInputStreams instances related to the 
> spark-assembly.jar referenced by Finalizer
> !finalizer-spark_assembly.png!
> * Depending on the heap size and running time this will lead to driver OOM 
> crash
> h2. Comments
> * The [^bug.py] is lightweight proof of the problem. In production I am 
> experiencing this as quite rapid effect - in few hours it eats gigs of heap 
> and kills the app.
> * If the same [^bug.py] is run without checkpointing there is no issue 
> whatsoever.
> * Not sure if it is just pyspark related.
> * In [^bug.py] I am using the socketTextStream input but seems to be 
> independent of the input type (in production having same problem with Kafka 
> direct stream, have seen it even with textFileStream).
> * It is happening even if the input stream doesn't produce any data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12520) Python API dataframe join returns wrong results on outer join

2015-12-27 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12520.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10477
[https://github.com/apache/spark/pull/10477]

> Python API dataframe join returns wrong results on outer join
> -
>
> Key: SPARK-12520
> URL: https://issues.apache.org/jira/browse/SPARK-12520
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.4.1
>Reporter: Aravind  B
> Fix For: 2.0.0
>
>
> Consider the following dataframes:
> """
> left_table:
> +++-+--+
> |head_id_left|tail_id_left|weight|joining_column|
> +++-+--+
> |   1|   2|1|   1~2|
> +++-+--+
> right_table:
> +-+-+--+
> |head_id_right|tail_id_right|joining_column|
> +-+-+--+
> +-+-+--+
> """
> The following code returns an empty dataframe:
> """
> joined_table = left_table.join(right_table, "joining_column", "outer")
> """
> joined_table has zero rows. 
> However:
> """
> joined_table = left_table.join(right_table, left_table.joining_column == 
> right_table.joining_column, "outer")
> """
> returns the correct answer with one row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join

2015-12-27 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12520:
---
Assignee: Xiao Li

> Python API dataframe join returns wrong results on outer join
> -
>
> Key: SPARK-12520
> URL: https://issues.apache.org/jira/browse/SPARK-12520
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.4.1
>Reporter: Aravind  B
>Assignee: Xiao Li
> Fix For: 1.6.0, 2.0.0
>
>
> Consider the following dataframes:
> """
> left_table:
> +++-+--+
> |head_id_left|tail_id_left|weight|joining_column|
> +++-+--+
> |   1|   2|1|   1~2|
> +++-+--+
> right_table:
> +-+-+--+
> |head_id_right|tail_id_right|joining_column|
> +-+-+--+
> +-+-+--+
> """
> The following code returns an empty dataframe:
> """
> joined_table = left_table.join(right_table, "joining_column", "outer")
> """
> joined_table has zero rows. 
> However:
> """
> joined_table = left_table.join(right_table, left_table.joining_column == 
> right_table.joining_column, "outer")
> """
> returns the correct answer with one row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join

2015-12-27 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12520:
---
Fix Version/s: 1.6.0

> Python API dataframe join returns wrong results on outer join
> -
>
> Key: SPARK-12520
> URL: https://issues.apache.org/jira/browse/SPARK-12520
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.4.1
>Reporter: Aravind  B
> Fix For: 1.6.0, 2.0.0
>
>
> Consider the following dataframes:
> """
> left_table:
> +++-+--+
> |head_id_left|tail_id_left|weight|joining_column|
> +++-+--+
> |   1|   2|1|   1~2|
> +++-+--+
> right_table:
> +-+-+--+
> |head_id_right|tail_id_right|joining_column|
> +-+-+--+
> +-+-+--+
> """
> The following code returns an empty dataframe:
> """
> joined_table = left_table.join(right_table, "joining_column", "outer")
> """
> joined_table has zero rows. 
> However:
> """
> joined_table = left_table.join(right_table, left_table.joining_column == 
> right_table.joining_column, "outer")
> """
> returns the correct answer with one row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join

2015-12-27 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12520:
---
Fix Version/s: 1.5.3

> Python API dataframe join returns wrong results on outer join
> -
>
> Key: SPARK-12520
> URL: https://issues.apache.org/jira/browse/SPARK-12520
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.4.1
>Reporter: Aravind  B
>Assignee: Xiao Li
> Fix For: 1.5.3, 1.6.0, 2.0.0
>
>
> Consider the following dataframes:
> """
> left_table:
> +++-+--+
> |head_id_left|tail_id_left|weight|joining_column|
> +++-+--+
> |   1|   2|1|   1~2|
> +++-+--+
> right_table:
> +-+-+--+
> |head_id_right|tail_id_right|joining_column|
> +-+-+--+
> +-+-+--+
> """
> The following code returns an empty dataframe:
> """
> joined_table = left_table.join(right_table, "joining_column", "outer")
> """
> joined_table has zero rows. 
> However:
> """
> joined_table = left_table.join(right_table, left_table.joining_column == 
> right_table.joining_column, "outer")
> """
> returns the correct answer with one row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12472) OOM when sort a table and save as parquet

2015-12-21 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12472:
--

 Summary: OOM when sort a table and save as parquet
 Key: SPARK-12472
 URL: https://issues.apache.org/jira/browse/SPARK-12472
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu


{code}
t = sqlContext.table('store_sales')
t.unionAll(t).coalesce(2).sortWithinPartitions(t[0]).write.partitionBy('ss_sold_date_sk').parquet("/tmp/ttt")
{code}

{code}
15/12/21 14:35:52 WARN TaskSetManager: Lost task 1.0 in stage 25.0 (TID 96, 
192.168.0.143): java.lang.OutOfMemoryError: Java heap space
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:86)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:32)
at 
org.apache.spark.util.collection.TimSort$SortState.ensureCapacity(TimSort.java:951)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:699)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
at 
org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:226)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:327)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:342)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12388) Change default compressor to LZ4

2015-12-21 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12388.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10342
[https://github.com/apache/spark/pull/10342]

> Change default compressor to LZ4
> 
>
> Key: SPARK-12388
> URL: https://issues.apache.org/jira/browse/SPARK-12388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>  Labels: releasenotes
> Fix For: 2.0.0
>
>
> According the benchmark [1], LZ4-java could be 80% (or 30%) faster than 
> Snappy.
> After changing the compressor to LZ4, I saw 20% improvement on end-to-end 
> time for a TPCDS query (Q4).
> [1] https://github.com/ning/jvm-compressor-benchmark/wiki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12472) OOM when sort a table and save as parquet

2015-12-21 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067413#comment-15067413
 ] 

Davies Liu commented on SPARK-12472:


could be workarounded by decreasing the memory used by both Spark and Parquet 
using spark.memory.fraction (for example, 0.4) and parquet.memory.pool.ratio 
(for example, 0.3, in core-site.xml)

> OOM when sort a table and save as parquet
> -
>
> Key: SPARK-12472
> URL: https://issues.apache.org/jira/browse/SPARK-12472
> Project: Spark
>  Issue Type: Bug
>Reporter: Davies Liu
>
> {code}
> t = sqlContext.table('store_sales')
> t.unionAll(t).coalesce(2).sortWithinPartitions(t[0]).write.partitionBy('ss_sold_date_sk').parquet("/tmp/ttt")
> {code}
> {code}
> 15/12/21 14:35:52 WARN TaskSetManager: Lost task 1.0 in stage 25.0 (TID 96, 
> 192.168.0.143): java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:86)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:32)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.ensureCapacity(TimSort.java:951)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:699)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:226)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:327)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:342)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12054) Consider nullable in codegen

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12054.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10333
[https://github.com/apache/spark/pull/10333]

> Consider nullable in codegen
> 
>
> Key: SPARK-12054
> URL: https://issues.apache.org/jira/browse/SPARK-12054
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> Currently, we always check the nullability for results of expressions, we 
> could skip that if the expression is not nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12341) The "comment" field of DESCRIBE result set should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12341.

   Resolution: Fixed
Fix Version/s: 2.0.0

https://github.com/apache/spark/pull/10333

> The "comment" field of DESCRIBE result set should be nullable
> -
>
> Key: SPARK-12341
> URL: https://issues.apache.org/jira/browse/SPARK-12341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12342) Corr (Pearson correlation) should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12342.

   Resolution: Fixed
Fix Version/s: 2.0.0

https://github.com/apache/spark/pull/10333

> Corr (Pearson correlation) should be nullable
> -
>
> Key: SPARK-12342
> URL: https://issues.apache.org/jira/browse/SPARK-12342
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12336) Outer join using multiple columns results in wrong nullability

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12336.

   Resolution: Fixed
Fix Version/s: 2.0.0

https://github.com/apache/spark/pull/10333

> Outer join using multiple columns results in wrong nullability
> --
>
> Key: SPARK-12336
> URL: https://issues.apache.org/jira/browse/SPARK-12336
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.2, 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> When joining two DataFrames using multiple columns, a temporary inner join is 
> used to compute join output. Then a real join operator is created and 
> projected. However, the final projection list is based on the inner join 
> rather than real join operator. When the real join operator is an outer join, 
> nullability of the final projection can be wrong, since outer join may alter 
> nullability of its child plan(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12342) Corr (Pearson correlation) should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12342:
--

Assignee: Davies Liu  (was: Cheng Lian)

> Corr (Pearson correlation) should be nullable
> -
>
> Key: SPARK-12342
> URL: https://issues.apache.org/jira/browse/SPARK-12342
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12341) The "comment" field of DESCRIBE result set should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12341:
--

Assignee: Davies Liu  (was: Apache Spark)

> The "comment" field of DESCRIBE result set should be nullable
> -
>
> Key: SPARK-12341
> URL: https://issues.apache.org/jira/browse/SPARK-12341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12335) CentralMomentAgg should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12335:
--

Assignee: Davies Liu  (was: Apache Spark)

> CentralMomentAgg should be nullable
> ---
>
> Key: SPARK-12335
> URL: https://issues.apache.org/jira/browse/SPARK-12335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
>
> According to the {{getStatistics}} method overriden in all its subclasses, 
> {{CentralMomentAgg}} should be nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12335) CentralMomentAgg should be nullable

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12335.

   Resolution: Fixed
Fix Version/s: 2.0.0

https://github.com/apache/spark/pull/10333

> CentralMomentAgg should be nullable
> ---
>
> Key: SPARK-12335
> URL: https://issues.apache.org/jira/browse/SPARK-12335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> According to the {{getStatistics}} method overriden in all its subclasses, 
> {{CentralMomentAgg}} should be nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12336) Outer join using multiple columns results in wrong nullability

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12336:
--

Assignee: Davies Liu  (was: Cheng Lian)

> Outer join using multiple columns results in wrong nullability
> --
>
> Key: SPARK-12336
> URL: https://issues.apache.org/jira/browse/SPARK-12336
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.2, 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> When joining two DataFrames using multiple columns, a temporary inner join is 
> used to compute join output. Then a real join operator is created and 
> projected. However, the final projection list is based on the inner join 
> rather than real join operator. When the real join operator is an outer join, 
> nullability of the final projection can be wrong, since outer join may alter 
> nullability of its child plan(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12091) [PySpark] Removal of the JAVA-specific deserialized storage levels

2015-12-18 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12091.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10092
[https://github.com/apache/spark/pull/10092]

> [PySpark] Removal of the JAVA-specific deserialized storage levels
> --
>
> Key: SPARK-12091
> URL: https://issues.apache.org/jira/browse/SPARK-12091
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0
>Reporter: Xiao Li
> Fix For: 2.0.0
>
>
> Since the data is always serialized on the Python side, the JAVA-specific 
> deserialized levels are not provided, such as MEMORY_ONLY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12395:
--

Assignee: Davies Liu

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12402) Memory leak in broadcast hash join

2015-12-17 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12402:
--

 Summary: Memory leak in broadcast hash join
 Key: SPARK-12402
 URL: https://issues.apache.org/jira/browse/SPARK-12402
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu


The broadcasted HashRelation is not destroyed after query finished (also can't 
be reused).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12395.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10353
[https://github.com/apache/spark/pull/10353]

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 2.0.0, 1.6.1
>
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12402) Memory leak in pyspark

2015-12-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12402:
---
Description: After run some SQL query in PySpark, the DataFrame are still 
referenced by py4j Gateway, they are freed after calll `gc.collect()` in 
Python.  (was: The broadcasted HashRelation is not destroyed after query 
finished (also can't be reused).)

> Memory leak in pyspark
> --
>
> Key: SPARK-12402
> URL: https://issues.apache.org/jira/browse/SPARK-12402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>
> After run some SQL query in PySpark, the DataFrame are still referenced by 
> py4j Gateway, they are freed after calll `gc.collect()` in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12402) Memory leak in pyspark

2015-12-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12402:
---
Summary: Memory leak in pyspark  (was: Memory leak in broadcast hash join)

> Memory leak in pyspark
> --
>
> Key: SPARK-12402
> URL: https://issues.apache.org/jira/browse/SPARK-12402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>
> The broadcasted HashRelation is not destroyed after query finished (also 
> can't be reused).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8745) Remove GenerateProjection

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8745.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10316
[https://github.com/apache/spark/pull/10316]

> Remove GenerateProjection
> -
>
> Key: SPARK-8745
> URL: https://issues.apache.org/jira/browse/SPARK-8745
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> Based on discussion offline with [~marmbrus], we should remove 
> GenerateProjection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12380.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10338
[https://github.com/apache/spark/pull/10338]

> MLLib should use existing SQLContext instead create new one
> ---
>
> Key: SPARK-12380
> URL: https://issues.apache.org/jira/browse/SPARK-12380
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0, 1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12380:
--

 Summary: MLLib should use existing SQLContext instead create new 
one
 Key: SPARK-12380
 URL: https://issues.apache.org/jira/browse/SPARK-12380
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12385) Push projection into Join

2015-12-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12385:
--

 Summary: Push projection into Join
 Key: SPARK-12385
 URL: https://issues.apache.org/jira/browse/SPARK-12385
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Davies Liu


We usually have  Join followed by a projection to pruning some columns, but 
Join already have a result projection to produce UnsafeRow, we should combine 
them together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12395:
--

 Summary: Result of DataFrame.join(usingColumns) could be wrong for 
outer join
 Key: SPARK-12395
 URL: https://issues.apache.org/jira/browse/SPARK-12395
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu


For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Priority: Critical  (was: Major)

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>Reporter: Davies Liu
>Priority: Critical
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Priority: Blocker  (was: Critical)

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>Reporter: Davies Liu
>Priority: Blocker
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12388) Change default compressor to LZ4

2015-12-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12388:
--

 Summary: Change default compressor to LZ4
 Key: SPARK-12388
 URL: https://issues.apache.org/jira/browse/SPARK-12388
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Davies Liu


According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.

After changing the compressor to LZ4, I saw 25% improvement on end-to-end time 
for a TPCDS query (Q4).

[1] https://github.com/ning/jvm-compressor-benchmark/wiki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12388) Change default compressor to LZ4

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12388:
---
Description: 
According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.

After changing the compressor to LZ4, I saw 20% improvement on end-to-end time 
for a TPCDS query (Q4).

[1] https://github.com/ning/jvm-compressor-benchmark/wiki

  was:
According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.

After changing the compressor to LZ4, I saw 25% improvement on end-to-end time 
for a TPCDS query (Q4).

[1] https://github.com/ning/jvm-compressor-benchmark/wiki


> Change default compressor to LZ4
> 
>
> Key: SPARK-12388
> URL: https://issues.apache.org/jira/browse/SPARK-12388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> According the benchmark [1], LZ4-java could be 80% (or 30%) faster than 
> Snappy.
> After changing the compressor to LZ4, I saw 20% improvement on end-to-end 
> time for a TPCDS query (Q4).
> [1] https://github.com/ning/jvm-compressor-benchmark/wiki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Affects Version/s: 1.6.0

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Priority: Blocker
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Priority: Blocker  (was: Critical)

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Priority: Blocker
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Component/s: SQL

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Priority: Blocker
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12395) Result of DataFrame.join(usingColumns) could be wrong for outer join

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12395:
---
Priority: Critical  (was: Blocker)

> Result of DataFrame.join(usingColumns) could be wrong for outer join
> 
>
> Key: SPARK-12395
> URL: https://issues.apache.org/jira/browse/SPARK-12395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Davies Liu
>Priority: Critical
>
> For API DataFrame.join(right, usingColumns, joinType), if the joinType is 
> right_outer or full_outer, the resulting join column could be wrong (null).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12288:
---
Assignee: Xiao Li

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12293:
---
Assignee: Liang-Chi Hsieh  (was: Apache Spark)

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Liang-Chi Hsieh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code

2015-12-16 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060213#comment-15060213
 ] 

Davies Liu commented on SPARK-12179:


I think this UDF is not thread safe, rowNum  and comparedColumn  will be 
updated by multiple threads

> Spark SQL get different result with the same code
> -
>
> Key: SPARK-12179
> URL: https://issues.apache.org/jira/browse/SPARK-12179
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, 
> 1.5.2, 1.5.3
> Environment: hadoop version: 2.5.0-cdh5.3.2
> spark version: 1.5.3
> run mode: yarn-client
>Reporter: Tao Li
>Priority: Critical
>
> I run the sql in yarn-client mode, but get different result each time.
> As you can see the example, I get the different shuffle write with the same 
> shuffle read in two jobs with the same code.
> Some of my spark app runs well, but some always met this problem. And I met 
> this problem on spark 1.3, 1.4 and 1.5 version.
> Can you give me some suggestions about the possible causes or how do I figure 
> out the problem?
> 1. First Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.8 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54934
> 2. Second Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.6 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54905



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code

2015-12-16 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060217#comment-15060217
 ] 

Davies Liu commented on SPARK-12179:


Which  version of Spark are you using?

> Spark SQL get different result with the same code
> -
>
> Key: SPARK-12179
> URL: https://issues.apache.org/jira/browse/SPARK-12179
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, 
> 1.5.2, 1.5.3
> Environment: hadoop version: 2.5.0-cdh5.3.2
> spark version: 1.5.3
> run mode: yarn-client
>Reporter: Tao Li
>Priority: Critical
>
> I run the sql in yarn-client mode, but get different result each time.
> As you can see the example, I get the different shuffle write with the same 
> shuffle read in two jobs with the same code.
> Some of my spark app runs well, but some always met this problem. And I met 
> this problem on spark 1.3, 1.4 and 1.5 version.
> Can you give me some suggestions about the possible causes or how do I figure 
> out the problem?
> 1. First Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.8 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54934
> 2. Second Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.6 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54905



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12179) Spark SQL get different result with the same code

2015-12-16 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060217#comment-15060217
 ] 

Davies Liu edited comment on SPARK-12179 at 12/16/15 4:04 PM:
--

Which  version of Spark are you using? Can you try latest 1.5 branch or 1.6 RC?


was (Author: davies):
Which  version of Spark are you using?

> Spark SQL get different result with the same code
> -
>
> Key: SPARK-12179
> URL: https://issues.apache.org/jira/browse/SPARK-12179
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, 
> 1.5.2, 1.5.3
> Environment: hadoop version: 2.5.0-cdh5.3.2
> spark version: 1.5.3
> run mode: yarn-client
>Reporter: Tao Li
>Priority: Critical
>
> I run the sql in yarn-client mode, but get different result each time.
> As you can see the example, I get the different shuffle write with the same 
> shuffle read in two jobs with the same code.
> Some of my spark app runs well, but some always met this problem. And I met 
> this problem on spark 1.3, 1.4 and 1.5 version.
> Can you give me some suggestions about the possible causes or how do I figure 
> out the problem?
> 1. First Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.8 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54934
> 2. Second Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.6 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54905



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8745) Remove GenerateProjection

2015-12-15 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-8745:
-

Assignee: Davies Liu

> Remove GenerateProjection
> -
>
> Key: SPARK-8745
> URL: https://issues.apache.org/jira/browse/SPARK-8745
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Davies Liu
>
> Based on discussion offline with [~marmbrus], we should remove 
> GenerateProjection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-14 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12288.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10285
[https://github.com/apache/spark/pull/10285]

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12016) word2vec load model can't use findSynonyms to get words

2015-12-14 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12016.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10100
[https://github.com/apache/spark/pull/10100]

> word2vec load model can't use findSynonyms to get words 
> 
>
> Key: SPARK-12016
> URL: https://issues.apache.org/jira/browse/SPARK-12016
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
> Environment: ubuntu 14.04
>Reporter: yuangang.liu
> Fix For: 2.0.0
>
>
> I use word2vec.fit to train a word2vecModel and then save the model to file 
> system. when I load the model from file system, I found I can use 
> transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get 
> some words.
> I use the fellow code to test word2vec
> from pyspark import SparkContext
> from pyspark.mllib.feature import Word2Vec, Word2VecModel
> import os, tempfile
> from shutil import rmtree
> if __name__ == '__main__':
> sc = SparkContext('local', 'test')
> sentence = "a b " * 100 + "a c " * 10
> localDoc = [sentence, sentence]
> doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
> model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
> syms = model.findSynonyms("a", 2)
> print [s[0] for s in syms]
> path = tempfile.mkdtemp()
> model.save(sc, path)
> sameModel = Word2VecModel.load(sc, path)
> print model.transform("a") == sameModel.transform("a")
> syms = sameModel.findSynonyms("a", 2)
> print [s[0] for s in syms]
> try:
> rmtree(path)
> except OSError:
> pass
> I got "[u'b', u'c']" when the first printf
> then the “True” and " [u'__class__'] "
> I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12284) Output UnsafeRow from window function

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12284:
--

 Summary: Output UnsafeRow from window function
 Key: SPARK-12284
 URL: https://issues.apache.org/jira/browse/SPARK-12284
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12286) Support UnsafeRow in all SparkPlan (if possible)

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12286:
--

 Summary: Support UnsafeRow in all SparkPlan (if possible)
 Key: SPARK-12286
 URL: https://issues.apache.org/jira/browse/SPARK-12286
 Project: Spark
  Issue Type: Epic
  Components: SQL
Reporter: Davies Liu


There are still some SparkPlan does not support UnsafeRow (or does not support 
well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12289:
--

 Summary: Support UnsafeRow in TakeOrderedAndProject/Limit
 Key: SPARK-12289
 URL: https://issues.apache.org/jira/browse/SPARK-12289
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12290) Change the default value in SparkPlan

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12290:
--

 Summary: Change the default value in SparkPlan
 Key: SPARK-12290
 URL: https://issues.apache.org/jira/browse/SPARK-12290
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu


supportUnsafeRows = true
supportSafeRows = false  //
outputUnsafeRows = true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12283) Use UnsafeRow as the buffer in SortBasedAggregation to avoid Unsafe/Safe conversion

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12283:
--

 Summary: Use UnsafeRow as the buffer in SortBasedAggregation to 
avoid Unsafe/Safe conversion
 Key: SPARK-12283
 URL: https://issues.apache.org/jira/browse/SPARK-12283
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Davies Liu


SortBasedAggregation use GenericMutableRow as aggregation buffer, also requires 
that the input can't be UnsafeRow, because the we can't compare/evaluate 
UnsafeRow and GenericInternalRow  in the same time. The TungstenSort output 
UnsafeRow, so multiple Safe/Unsafe projections will be inserted between them.

If we can make sure that all the mutating happens in ascending order, the 
buffer of UnsafeRow could be used to update var-length object (String, Binary, 
Struct etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12286) Support UnsafeRow in all SparkPlan (if possible)

2015-12-11 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-12286:
--

Assignee: Davies Liu

> Support UnsafeRow in all SparkPlan (if possible)
> 
>
> Key: SPARK-12286
> URL: https://issues.apache.org/jira/browse/SPARK-12286
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> There are still some SparkPlan does not support UnsafeRow (or does not 
> support well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12287) Support UnsafeRow in MapPartitions/MapGroups/CoGroup

2015-12-11 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-12287:
---
Issue Type: Improvement  (was: Epic)

> Support UnsafeRow in MapPartitions/MapGroups/CoGroup
> 
>
> Key: SPARK-12287
> URL: https://issues.apache.org/jira/browse/SPARK-12287
> Project: Spark
>  Issue Type: Improvement
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12288:
--

 Summary: Support UnsafeRow in Coalesce/Except/Intersect
 Key: SPARK-12288
 URL: https://issues.apache.org/jira/browse/SPARK-12288
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12291:
--

 Summary: Support UnsafeRow in BroadcastLeftSemiJoinHash
 Key: SPARK-12291
 URL: https://issues.apache.org/jira/browse/SPARK-12291
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12293:
--

 Summary: Support UnsafeRow in LocalTableScan
 Key: SPARK-12293
 URL: https://issues.apache.org/jira/browse/SPARK-12293
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12294) Support UnsafeRow in HiveTableScan

2015-12-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12294:
--

 Summary: Support UnsafeRow in HiveTableScan
 Key: SPARK-12294
 URL: https://issues.apache.org/jira/browse/SPARK-12294
 Project: Spark
  Issue Type: Improvement
Reporter: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    5   6   7   8   9   10   11   12   13   14   >