[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182214#comment-15182214
 ] 

Santiago M. Mola commented on SPARK-13701:
--

Installed gfortran. Now it fails on NLSSuite, then ALSSuite succeeds.

{code}
[info] NNLSSuite:
[info] Exception encountered when attempting to run a suite with class name: 
org.apache.spark.mllib.optimization.NNLSSuite *** ABORTED *** (68 milliseconds)
[info]   java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
[info]   at org.jblas.NativeBlas.dgemm(Native Method)
[info]   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
[info]   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
[info]   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite.genOnesData(NNLSSuite.scala:33)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(NNLSSuite.scala:56)
[info]   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply$mcV$sp(NNLSSuite.scala:55)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45)
{code}

> MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm))
> --
>
> Key: SPARK-13701
> URL: https://issues.apache.org/jira/browse/SPARK-13701
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
> Environment: Ubuntu 14.04 on aarch64
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> jblas fails on arm64.
> {code}
> ALSSuite:
> Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
> milliseconds)
>   java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
>   at org.jblas.NativeBlas.dgemm(Native Method)
>   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
>   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
>   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
>   at 
> org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-05 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181921#comment-15181921
 ] 

Santiago M. Mola commented on SPARK-13701:
--

This is probably just gfortran not being installed? I'll test as soon as 
possible.

> MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm))
> --
>
> Key: SPARK-13701
> URL: https://issues.apache.org/jira/browse/SPARK-13701
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
> Environment: Ubuntu 14.04 on aarch64
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> jblas fails on arm64.
> {code}
> ALSSuite:
> Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
> milliseconds)
>   java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
>   at org.jblas.NativeBlas.dgemm(Native Method)
>   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
>   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
>   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
>   at 
> org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-05 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-13701:


 Summary: MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm))
 Key: SPARK-13701
 URL: https://issues.apache.org/jira/browse/SPARK-13701
 Project: Spark
  Issue Type: Bug
  Components: MLlib
 Environment: Ubuntu 14.04 on aarch64
Reporter: Santiago M. Mola
Priority: Minor


jblas fails on arm64.

{code}
ALSSuite:
Exception encountered when attempting to run a suite with class name: 
org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
milliseconds)
  java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
  at org.jblas.NativeBlas.dgemm(Native Method)
  at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
  at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
  at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
  at 
org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)

2016-03-04 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181362#comment-15181362
 ] 

Santiago M. Mola commented on SPARK-13690:
--

snappy-java does not have any fallback, but snappy seems to work on arm64 
correctly. I submitted a PR for snappy-java, so a future version should have 
support. This issue will have to wait until such version is out.

I don't expect active support for arm64, but given the latest developments on 
arm64 servers, I'm interested in experimenting with it. It seems I'm not the 
first one to think about it: http://www.sparkonarm.com/ ;-)

> UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is 
> found)
> -
>
> Key: SPARK-13690
> URL: https://issues.apache.org/jira/browse/SPARK-13690
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
> Environment: $ java -version
> java version "1.8.0_73"
> Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
> $ uname -a
> Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 
> aarch64 aarch64 aarch64 GNU/Linux
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> UnsafeShuffleWriterSuite fails because of missing Snappy native library on 
> arm64.
> {code}
> Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec 
> <<< FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite
> mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
>   Time elapsed: 0.072 sec  <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: 
> [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux 
> and os.arch=aarch64
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no 
> native library is found for os.name=Linux and os.arch=aarch64
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
>   Time elapsed: 0.041 sec  <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Caused by: java.lang.IllegalArgumentException: 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Running org.apache.spark.JavaAPISuite
> Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - 
> in org.apache.spark.JavaAPISuite
> Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - 
> in org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
> Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - 
> in org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
> Running org.apache.spark.api.java.OptionalSuite
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - 
> in org.apache.spark.api.java.OptionalSuite
> Results :

[jira] [Created] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)

2016-03-04 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-13690:


 Summary: UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no 
native library is found)
 Key: SPARK-13690
 URL: https://issues.apache.org/jira/browse/SPARK-13690
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
 Environment: $ java -version
java version "1.8.0_73"
Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)

$ uname -a
Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 
aarch64 aarch64 aarch64 GNU/Linux
Reporter: Santiago M. Mola
Priority: Minor


UnsafeShuffleWriterSuite fails because of missing Snappy native library on 
arm64.


{code}
Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec <<< 
FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite
mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
  Time elapsed: 0.072 sec  <<< ERROR!
java.lang.reflect.InvocationTargetException
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: 
[FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux 
and os.arch=aarch64
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no 
native library is found for os.name=Linux and os.arch=aarch64
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)

mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
  Time elapsed: 0.041 sec  <<< ERROR!
java.lang.reflect.InvocationTargetException
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
Caused by: java.lang.IllegalArgumentException: java.lang.NoClassDefFoundError: 
Could not initialize class org.xerial.snappy.Snappy
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)

Running org.apache.spark.JavaAPISuite
Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - 
in org.apache.spark.JavaAPISuite
Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - in 
org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - in 
org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
Running org.apache.spark.api.java.OptionalSuite
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - in 
org.apache.spark.api.java.OptionalSuite

Results :

Tests in error: 
  
UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy:389->testMergingSpills:337
 » InvocationTarget
  
UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy:384->testMergingSpills:337
 » InvocationTarget
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2016-01-12 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093533#comment-15093533
 ] 

Santiago M. Mola commented on SPARK-12449:
--

Implementing this interface or an equivalent one would help standarize a lot of 
advanced features that data sources have been doing for some time. And while 
doing so, it would prevent them from creating their own SQLContext variants or 
patching the running SQLContext at runtime (using extraStrategies).

Here's a list of data source that are currently this approach. It would also be 
good to take them into account for this JIRA. The proposed interface and 
strategy should probably support all of these use cases. Some of them also use 
their own catalog implementation, but that should be something for a separate 
JIRA.

*spark-sql-on-hbase*

Already mentioned by [~yzhou2001]. They are using HBaseContext with 
extraStrategies that inject HBaseStrategies doing aggregation push down:
https://github.com/Huawei-Spark/Spark-SQL-on-HBase/blob/master/src/main/scala/org/apache/spark/sql/hbase/execution/HBaseStrategies.scala

*memsql-spark-connector*

They offer both their own SQLContext or inject their MemSQL-specific push down 
strategy on runtime.
They do match Catalyst's LogicalPlan in the same way we're proposing to push 
down filters, projects, aggregates, limits, sorts and joins:
https://github.com/memsql/memsql-spark-connector/blob/master/connectorLib/src/main/scala/com/memsql/spark/pushdown/MemSQLPushdownStrategy.scala

*spark-iqmulus*

Strategy injected to push down counts and some aggregates:

https://github.com/IGNF/spark-iqmulus/blob/master/src/main/scala/fr/ign/spark/iqmulus/ExtraStrategies.scala

*druid-olap*

They use SparkPlanner, Strategy and LogicalPlan APIs to do extensive push down. 
Their API usage could be limited to LogicalPlan only if this JIRA is 
implemented:

https://github.com/SparklineData/spark-druid-olap/blob/master/src/main/scala/org/apache/spark/sql/sources/druid/

*magellan* _(probably out of scope)_

Does its own BroadcastJoin. Although, it seems to me that this usage would be 
out of scope for us.

https://github.com/harsha2010/magellan/blob/master/src/main/scala/magellan/execution/MagellanStrategies.scala

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-23 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070062#comment-15070062
 ] 

Santiago M. Mola commented on SPARK-12449:
--

The physical plan would not be consumed by data sources, only the logical plan. 

An alternative approach would be to use a different representation to pass the 
logical plan to the data source. If the relational algebra from Apache Calcite 
is stable enough, it could be used as the logical plan representation for this 
interface. 

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-23 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070100#comment-15070100
 ] 

Santiago M. Mola commented on SPARK-12449:
--

Well, at least with the implementation presented at the Spark Summit, only the 
logical plan is required. The physical plan is handled only by the planner 
strategy, which would be internal to Spark.

The strategy has all the logic required to split partial ops and push down only 
one part.

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-12-22 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068396#comment-15068396
 ] 

Santiago M. Mola commented on SPARK-11855:
--

I will not have time to finish this before 1.6 release. Feel free to close the 
issue, since it won't apply after the release.

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066655#comment-15066655
 ] 

Santiago M. Mola commented on SPARK-12449:
--

At Stratio we are interested in this kind of interface too, both for SQL and 
NoSQL data sources (e.g. MongoDB).

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014193#comment-15014193
 ] 

Santiago M. Mola commented on SPARK-11855:
--

Thanks Michael. Sounds reasonable. I'll prepare a PR reducing the 
incompatibilities where it can be done in a non-invasive way. 

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11855) UnresolvedRelation constructor is not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-11855:


 Summary: UnresolvedRelation constructor is not backwards 
compatible in branch-1.6
 Key: SPARK-11855
 URL: https://issues.apache.org/jira/browse/SPARK-11855
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Santiago M. Mola


UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Description: 
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with UnresolvedStar:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

  was:
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}


> UnresolvedRelation/UnresolvedStar constructors are not backwards compatible 
> in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with UnresolvedStar:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Summary: UnresolvedRelation/UnresolvedStar constructors are not backwards 
compatible in branch-1.6  (was: UnresolvedRelation constructor is not backwards 
compatible in branch-1.6)

> UnresolvedRelation/UnresolvedStar constructors are not backwards compatible 
> in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Summary: Catalyst breaks backwards compatibility in branch-1.6  (was: 
UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in 
branch-1.6)

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with UnresolvedStar:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
   Priority: Critical  (was: Major)
Description: 
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.

  was:
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with UnresolvedStar:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}


> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013914#comment-15013914
 ] 

Santiago M. Mola commented on SPARK-11855:
--

They have public visibility and no @DeveloperApi or @Experimental annotations, 
so I always assumed they are. I have been working with catalyst on a day-to-day 
basis for almost 1 year now. I understand that catalyst might not offer the 
same kind of backwards compatibility as spark-core, but it would be good to 
avoid breaking backwards compatibility, specially in cases where it is easy to 
do (which are most of the cases I encounter).

I think part of the solution is also marking some parts as @Experimental. For 
example, UnsafeArrayData interface changed wildly, and it's probably not viable 
to maintain backwards compatibility, but it should be marked as @Experimental 
if more breakage is expected before 2.0.

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Description: 
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

*Catalog* did get a lot of signatures changed too (because of TableIdentifier). 
Providing the older methods as deprecated also seems viable here.

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.

  was:
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.


> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2015-11-17 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-11780:


 Summary: Provide type aliases in org.apache.spark.sql.types for 
backwards compatibility
 Key: SPARK-11780
 URL: https://issues.apache.org/jira/browse/SPARK-11780
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Santiago M. Mola


With SPARK-11273, ArrayData, MapData and others were moved from  
org.apache.spark.sql.types to org.apache.spark.sql.catalyst.util.

Since this is a backward incompatible change, it would be good to provide type 
aliases from the old package (deprecated) to the new one.

For example:
{code}
package object types {
   @deprecated
   type ArrayData = org.apache.spark.sql.catalyst.util.ArrayData
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11186) Caseness inconsistency between SQLContext and HiveContext

2015-10-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11186:
-
Description: 
Default catalog behaviour for caseness is different in {{SQLContext}} and 
{{HiveContext}}.

{code}
  test("Catalog caseness (SQL)") {
val sqlc = new SQLContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }

  test("Catalog caseness (Hive)") {
val sqlc = new HiveContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }
{code}

Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
But the reason that this is needed seems undocumented (both in the manual or in 
the source code comments).

  was:
Default catalog behaviour for caseness is different in {{SQLContext}} and 
{{HiveContext}}.

{code}
  test("Catalog caseness (SQL)") {
val sqlc = new SQLContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }

  test("Catalog caseness (Hive)") {
val sqlc = new HiveContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }
{/code}

Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
But the reason that this is needed seems undocumented (both in the manual or in 
the source code comments).


> Caseness inconsistency between SQLContext and HiveContext
> -
>
> Key: SPARK-11186
> URL: https://issues.apache.org/jira/browse/SPARK-11186
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Santiago M. Mola
>Priority: Minor
>
> Default catalog behaviour for caseness is different in {{SQLContext}} and 
> {{HiveContext}}.
> {code}
>   test("Catalog caseness (SQL)") {
> val sqlc = new SQLContext(sc)
> val relationName = "MyTable"
> sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
> BaseRelation {
>   override def sqlContext: SQLContext = sqlc
>   override def schema: StructType = StructType(Nil)
> }))
> val tables = sqlc.tableNames()
> assert(tables.contains(relationName))
>   }
>   test("Catalog caseness (Hive)") {
> val sqlc = new HiveContext(sc)
> val relationName = "MyTable"
> sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
> BaseRelation {
>   override def sqlContext: SQLContext = sqlc
>   override def schema: StructType = StructType(Nil)
> }))
> val tables = sqlc.tableNames()
> assert(tables.contains(relationName))
>   }
> {code}
> Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
> But the reason that this is needed seems undocumented (both in the manual or 
> in the source code comments).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-10-01 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939660#comment-14939660
 ] 

Santiago M. Mola commented on SPARK-7275:
-

LogicalRelation was moved to execution.datasources in Spark 1.5, but it's still 
private[sql]. Can we make it public now?

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8377) Identifiers caseness information should be available at any time

2015-08-25 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710855#comment-14710855
 ] 

Santiago M. Mola commented on SPARK-8377:
-

Right. However, there is no distinction between an identifier that was quoted 
by the user and one that was not. So the user intent is lost. If we see a, we 
don't know if the user wanted strictly a or case insensitive a. So if we 
have a column a and a column A, which one should we match?

 Identifiers caseness information should be available at any time
 

 Key: SPARK-8377
 URL: https://issues.apache.org/jira/browse/SPARK-8377
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola

 Currently, we have the option of having a case sensitive catalog or not. A 
 case insensitive catalog just lowercases all identifiers. However, when 
 pushing down to a data source, we lose the information about if an identifier 
 should be case insensitive or strictly lowercase.
 Ideally, we would be able to distinguish a case insensitive identifier from a 
 case sensitive one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9307) Logging: Make it either stable or private[spark]

2015-07-24 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-9307:

Description: 
org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].

  was:
org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{/code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].


 Logging: Make it either stable or private[spark]
 

 Key: SPARK-9307
 URL: https://issues.apache.org/jira/browse/SPARK-9307
 Project: Spark
  Issue Type: Improvement
Reporter: Santiago M. Mola
Priority: Minor

 org.apache.spark.Logging is a public class that is quite easy to include from 
 any IDE, assuming it's safe to use because it's part of the public API.
 However, its Javadoc states:
 {code}
   NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
 utility.
   This will likely be changed or removed in future releases.
 {code}
 It would be safer to either make a commitment for the backwards-compatibility 
 of this class, or make it private[spark].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9307) Logging: Make it either stable or private[spark]

2015-07-24 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-9307:
---

 Summary: Logging: Make it either stable or private[spark]
 Key: SPARK-9307
 URL: https://issues.apache.org/jira/browse/SPARK-9307
 Project: Spark
  Issue Type: Improvement
Reporter: Santiago M. Mola
Priority: Minor


org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{/code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext

2015-07-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614645#comment-14614645
 ] 

Santiago M. Mola commented on SPARK-6981:
-

Any progress on this?

 [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext
 

 Key: SPARK-6981
 URL: https://issues.apache.org/jira/browse/SPARK-6981
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0, 1.4.0
Reporter: Edoardo Vacchi
Priority: Minor

 In order to simplify extensibility with new strategies from third-parties, it 
 should be better to factor SparkPlanner and QueryExecution in their own 
 classes. Dependent types add additional, unnecessary complexity; besides, 
 HiveContext would benefit from this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-07-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614627#comment-14614627
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~davies] NULL values are grouped together when using a GROUP BY clause.

See 
https://en.wikipedia.org/wiki/Null_%28SQL%29#When_two_nulls_are_equal:_grouping.2C_sorting.2C_and_some_set_operations

{quote}
Because SQL:2003 defines all Null markers as being unequal to one another, a 
special definition was required in order to group Nulls together when 
performing certain operations. SQL defines any two values that are equal to 
one another, or any two Nulls, as not distinct. This definition of not 
distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other 
keywords that perform grouping) are used.
{quote}

 CaseKeyWhen has incorrect NULL handling
 ---

 Key: SPARK-8636
 URL: https://issues.apache.org/jira/browse/SPARK-8636
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Santiago M. Mola
  Labels: starter

 CaseKeyWhen implementation in Spark uses the following equals implementation:
 {code}
   private def equalNullSafe(l: Any, r: Any) = {
 if (l == null  r == null) {
   true
 } else if (l == null || r == null) {
   false
 } else {
   l == r
 }
   }
 {code}
 Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
 is not unequal either). In this case, a NULL value in a CASE WHEN expression 
 should never match.
 For example, you can execute this in MySQL:
 {code}
 SELECT CASE NULL WHEN NULL THEN NULL MATCHES ELSE NULL DOES NOT MATCH END 
 FROM DUAL;
 {code}
 And the result will be NULL DOES NOT MATCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-29 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605557#comment-14605557
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~davies], [~animeshbaranawal] In SQL, NULL is never equal to NULL. Any 
comparison to NULL is UNKNOWN. Most SQL implementations represent UNKNOWN as 
NULL, too.

 CaseKeyWhen has incorrect NULL handling
 ---

 Key: SPARK-8636
 URL: https://issues.apache.org/jira/browse/SPARK-8636
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Santiago M. Mola
  Labels: starter

 CaseKeyWhen implementation in Spark uses the following equals implementation:
 {code}
   private def equalNullSafe(l: Any, r: Any) = {
 if (l == null  r == null) {
   true
 } else if (l == null || r == null) {
   false
 } else {
   l == r
 }
   }
 {code}
 Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
 is not unequal either). In this case, a NULL value in a CASE WHEN expression 
 should never match.
 For example, you can execute this in MySQL:
 {code}
 SELECT CASE NULL WHEN NULL THEN NULL MATCHES ELSE NULL DOES NOT MATCH END 
 FROM DUAL;
 {code}
 And the result will be NULL DOES NOT MATCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602520#comment-14602520
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~animeshbaranawal] Yes, I think so.

 CaseKeyWhen has incorrect NULL handling
 ---

 Key: SPARK-8636
 URL: https://issues.apache.org/jira/browse/SPARK-8636
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Santiago M. Mola
  Labels: starter

 CaseKeyWhen implementation in Spark uses the following equals implementation:
 {code}
   private def equalNullSafe(l: Any, r: Any) = {
 if (l == null  r == null) {
   true
 } else if (l == null || r == null) {
   false
 } else {
   l == r
 }
   }
 {code}
 Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
 is not unequal either). In this case, a NULL value in a CASE WHEN expression 
 should never match.
 For example, you can execute this in MySQL:
 {code}
 SELECT CASE NULL WHEN NULL THEN NULL MATCHES ELSE NULL DOES NOT MATCH END 
 FROM DUAL;
 {code}
 And the result will be NULL DOES NOT MATCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8654) Analysis exception when using NULL IN (...): invalid cast

2015-06-26 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8654:
---

 Summary: Analysis exception when using NULL IN (...): invalid 
cast
 Key: SPARK-8654
 URL: https://issues.apache.org/jira/browse/SPARK-8654
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


The following query throws an analysis exception:

{code}
SELECT * FROM t WHERE NULL NOT IN (1, 2, 3);
{code}

The exception is:

{code}
org.apache.spark.sql.AnalysisException: invalid cast from int to null;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:66)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
{code}

Here is a test that can be added to AnalysisSuite to check the issue:

{code}
  test(SPARK- regression test) {
val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), 
a)() :: Nil,
  LocalRelation()
)
caseInsensitiveAnalyze(plan)
  }
{code}

Note that this kind of query is a corner case, but it is still valid SQL. An 
expression such as NULL IN (...) or NULL NOT IN (...) always gives NULL as 
a result, even if the list contains NULL. So it is safe to translate these 
expressions to Literal(null) during analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6064) Checking data types when resolving types

2015-06-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602755#comment-14602755
 ] 

Santiago M. Mola commented on SPARK-6064:
-

This issue might have been superseded by 
https://issues.apache.org/jira/browse/SPARK-7562

 Checking data types when resolving types
 

 Key: SPARK-6064
 URL: https://issues.apache.org/jira/browse/SPARK-6064
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Kai Zeng

 In catalyst/expressions/arithmetic.scala and 
 catalyst/expressions/predicates.scala, many arithmetic/predicate requires the 
 operands to be of certain numeric type. 
 These type checking codes should be done when we are resolving the 
 expressions.
 See this PR:
 https://github.com/apache/spark/pull/4685



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-8628:

Description: 
SPARK-5009 introduced the following code in AbstractSparkSQLParser:

{code}
def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) = plan
  case failureOrError = sys.error(failureOrError.toString)
}
  }
{code}

The corresponding initialize method in SqlLexical is not thread-safe:

{code}
  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }
{code}

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.

  was:
SPARK-5009 introduced the following code:

def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) = plan
  case failureOrError = sys.error(failureOrError.toString)
}
  }

The corresponding initialize method in SqlLexical is not thread-safe:

  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.


 Race condition in AbstractSparkSQLParser.parse
 --

 Key: SPARK-8628
 URL: https://issues.apache.org/jira/browse/SPARK-8628
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Santiago M. Mola
Priority: Critical
  Labels: regression

 SPARK-5009 introduced the following code in AbstractSparkSQLParser:
 {code}
 def parse(input: String): LogicalPlan = {
 // Initialize the Keywords.
 lexical.initialize(reservedWords)
 phrase(start)(new lexical.Scanner(input)) match {
   case Success(plan, _) = plan
   case failureOrError = sys.error(failureOrError.toString)
 }
   }
 {code}
 The corresponding initialize method in SqlLexical is not thread-safe:
 {code}
   /* This is a work around to support the lazy setting */
   def initialize(keywords: Seq[String]): Unit = {
 reserved.clear()
 reserved ++= keywords
   }
 {code}
 I'm hitting this when parsing multiple SQL queries concurrently. When one 
 query parsing starts, it empties the reserved keyword list, then a 
 race-condition occurs and other queries fail to parse because they recognize 
 keywords as identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601012#comment-14601012
 ] 

Santiago M. Mola commented on SPARK-8628:
-

Here is an example of failure with Spark 1.4.0:

{code}
[1.152] failure: ``union'' expected but identifier OR found

SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 
ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR ab OR de

   ^
java.lang.RuntimeException: [1.152] failure: ``union'' expected but identifier 
OR found

SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 
ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR ab OR de

   ^
at scala.sys.package$.error(package.scala:27)
{code}

 Race condition in AbstractSparkSQLParser.parse
 --

 Key: SPARK-8628
 URL: https://issues.apache.org/jira/browse/SPARK-8628
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Santiago M. Mola
Priority: Critical
  Labels: regression

 SPARK-5009 introduced the following code:
 def parse(input: String): LogicalPlan = {
 // Initialize the Keywords.
 lexical.initialize(reservedWords)
 phrase(start)(new lexical.Scanner(input)) match {
   case Success(plan, _) = plan
   case failureOrError = sys.error(failureOrError.toString)
 }
   }
 The corresponding initialize method in SqlLexical is not thread-safe:
   /* This is a work around to support the lazy setting */
   def initialize(keywords: Seq[String]): Unit = {
 reserved.clear()
 reserved ++= keywords
   }
 I'm hitting this when parsing multiple SQL queries concurrently. When one 
 query parsing starts, it empties the reserved keyword list, then a 
 race-condition occurs and other queries fail to parse because they recognize 
 keywords as identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8628:
---

 Summary: Race condition in AbstractSparkSQLParser.parse
 Key: SPARK-8628
 URL: https://issues.apache.org/jira/browse/SPARK-8628
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.3.1, 1.3.0
Reporter: Santiago M. Mola
Priority: Critical


SPARK-5009 introduced the following code:

def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) = plan
  case failureOrError = sys.error(failureOrError.toString)
}
  }

The corresponding initialize method in SqlLexical is not thread-safe:

  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-25 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8636:
---

 Summary: CaseKeyWhen has incorrect NULL handling
 Key: SPARK-8636
 URL: https://issues.apache.org/jira/browse/SPARK-8636
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Santiago M. Mola


CaseKeyWhen implementation in Spark uses the following equals implementation:

{code}
  private def equalNullSafe(l: Any, r: Any) = {
if (l == null  r == null) {
  true
} else if (l == null || r == null) {
  false
} else {
  l == r
}
  }
{code}

Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
is not unequal either). In this case, a NULL value in a CASE WHEN expression 
should never match.

For example, you can execute this in MySQL:

{code}
SELECT CASE NULL WHEN NULL THEN NULL MATCHES ELSE NULL DOES NOT MATCH END 
FROM DUAL;
{code}

And the result will be NULL DOES NOT MATCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

2015-06-15 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585883#comment-14585883
 ] 

Santiago M. Mola commented on SPARK-:
-

I opened SPARK-8377 to track the general case, since I have this problem with 
other data sources, not just JDBC.

 org.apache.spark.sql.jdbc.JDBCRDD  does not escape/quote column names
 -

 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment:  
Reporter: John Ferguson
Priority: Critical

 Is there a way to have JDBC DataFrames use quoted/escaped column names?  
 Right now, it looks like it sees the names correctly in the schema created 
 but does not escape them in the SQL it creates when they are not compliant:
 org.apache.spark.sql.jdbc.JDBCRDD
 
 private val columnList: String = {
 val sb = new StringBuilder()
 columns.foreach(x = sb.append(,).append(x))
 if (sb.length == 0) 1 else sb.substring(1)
 }
 If you see value in this, I would take a shot at adding the quoting 
 (escaping) of column names here.  If you don't do it, some drivers... like 
 postgresql's will simply drop case all names when parsing the query.  As you 
 can see in the TL;DR below that means they won't match the schema I am given.
 TL;DR:
  
 I am able to connect to a Postgres database in the shell (with driver 
 referenced):
val jdbcDf = 
 sqlContext.jdbc(jdbc:postgresql://localhost/sparkdemo?user=dbuser, sp500)
 In fact when I run:
jdbcDf.registerTempTable(sp500)
val avgEPSNamed = sqlContext.sql(SELECT AVG(`Earnings/Share`) as AvgCPI 
 FROM sp500)
 and
val avgEPSProg = jsonDf.agg(avg(jsonDf.col(Earnings/Share)))
 The values come back as expected.  However, if I try:
jdbcDf.show
 Or if I try

val all = sqlContext.sql(SELECT * FROM sp500)
all.show
 I get errors about column names not being found.  In fact the error includes 
 a mention of column names all lower cased.  For now I will change my schema 
 to be more restrictive.  Right now it is, per a Stack Overflow poster, not 
 ANSI compliant by doing things that are allowed by 's in pgsql, MySQL and 
 SQLServer.  BTW, our users are giving us tables like this... because various 
 tools they already use support non-compliant names.  In fact, this is mild 
 compared to what we've had to support.
 Currently the schema in question uses mixed case, quoted names with special 
 characters and spaces:
 CREATE TABLE sp500
 (
 Symbol text,
 Name text,
 Sector text,
 Price double precision,
 Dividend Yield double precision,
 Price/Earnings double precision,
 Earnings/Share double precision,
 Book Value double precision,
 52 week low double precision,
 52 week high double precision,
 Market Cap double precision,
 EBITDA double precision,
 Price/Sales double precision,
 Price/Book double precision,
 SEC Filings text
 ) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8377) Identifiers caseness information should be available at any time

2015-06-15 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8377:
---

 Summary: Identifiers caseness information should be available at 
any time
 Key: SPARK-8377
 URL: https://issues.apache.org/jira/browse/SPARK-8377
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola


Currently, we have the option of having a case sensitive catalog or not. A case 
insensitive catalog just lowercases all identifiers. However, when pushing down 
to a data source, we lose the information about if an identifier should be case 
insensitive or strictly lowercase.

Ideally, we would be able to distinguish a case insensitive identifier from a 
case sensitive one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8370) Add API for data sources to register databases

2015-06-15 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8370:
---

 Summary: Add API for data sources to register databases
 Key: SPARK-8370
 URL: https://issues.apache.org/jira/browse/SPARK-8370
 Project: Spark
  Issue Type: New Feature
Reporter: Santiago M. Mola


This API would allow to register a database with a data source instead of just 
a table. Registering a data source database would register all its table and 
maintain the catalog updated. The catalog could delegate to the data source 
lookups of tables for a database registered with this API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8370) Add API for data sources to register databases

2015-06-15 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-8370:

Component/s: SQL

 Add API for data sources to register databases
 --

 Key: SPARK-8370
 URL: https://issues.apache.org/jira/browse/SPARK-8370
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Santiago M. Mola

 This API would allow to register a database with a data source instead of 
 just a table. Registering a data source database would register all its table 
 and maintain the catalog updated. The catalog could delegate to the data 
 source lookups of tables for a database registered with this API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558758#comment-14558758
 ] 

Santiago M. Mola commented on SPARK-7727:
-

[~chenghao] I think that is a good idea. Analyzer could be converted into a 
trait, moving current Analyzer to DefaultAnalyzer. It is probably a good idea 
to use a separate JIRA and pull request for that though.

 Avoid inner classes in RuleExecutor
 ---

 Key: SPARK-7727
 URL: https://issues.apache.org/jira/browse/SPARK-7727
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
  Labels: easyfix, starter

 In RuleExecutor, the following classes and objects are defined as inner 
 classes or objects: Strategy, Once, FixedPoint, Batch.
 This does not seem to accomplish anything in this case, but makes 
 extensibility harder. For example, if I want to define a new Optimizer that 
 uses all batches from the DefaultOptimizer plus some more, I would do 
 something like:
 {code}
 new Optimizer {
 override protected val batches: Seq[Batch] =
   DefaultOptimizer.batches ++ myBatches
  }
 {code}
 But this will give a typing error because batches in DefaultOptimizer are of 
 type DefaultOptimizer#Batch while myBatches are this#Batch.
 Workarounds include either copying the list of batches from DefaultOptimizer 
 or using a method like this:
 {code}
 private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
   val strategy = b.strategy.maxIterations match {
 case 1 = Once
 case n = FixedPoint(n)
   }
   Batch(b.name, strategy, b.rules)
 }
 {code}
 However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7727:

Comment: was deleted

(was: [~evacchi] I'm sorry I opened this duplicate for: 
https://issues.apache.org/jira/browse/SPARK-7823

Not sure which one to mark as duplicate since both have pull requests.)

 Avoid inner classes in RuleExecutor
 ---

 Key: SPARK-7727
 URL: https://issues.apache.org/jira/browse/SPARK-7727
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
  Labels: easyfix, starter

 In RuleExecutor, the following classes and objects are defined as inner 
 classes or objects: Strategy, Once, FixedPoint, Batch.
 This does not seem to accomplish anything in this case, but makes 
 extensibility harder. For example, if I want to define a new Optimizer that 
 uses all batches from the DefaultOptimizer plus some more, I would do 
 something like:
 {code}
 new Optimizer {
 override protected val batches: Seq[Batch] =
   DefaultOptimizer.batches ++ myBatches
  }
 {code}
 But this will give a typing error because batches in DefaultOptimizer are of 
 type DefaultOptimizer#Batch while myBatches are this#Batch.
 Workarounds include either copying the list of batches from DefaultOptimizer 
 or using a method like this:
 {code}
 private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
   val strategy = b.strategy.maxIterations match {
 case 1 = Once
 case n = FixedPoint(n)
   }
   Batch(b.name, strategy, b.rules)
 }
 {code}
 However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7823) [SQL] Batch, FixedPoint, Strategy should not be inner classes of class RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-7823.
-
Resolution: Duplicate

This is a duplicate of https://issues.apache.org/jira/browse/SPARK-7727

 [SQL] Batch, FixedPoint, Strategy should not be inner classes of class 
 RuleExecutor
 ---

 Key: SPARK-7823
 URL: https://issues.apache.org/jira/browse/SPARK-7823
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1, 1.4.0
Reporter: Edoardo Vacchi
Priority: Minor

 Batch, FixedPoint, Strategy, Once, are defined within the class 
 RuleExecutor[TreeType]. This makes unnecessarily complicated to reuse batches 
 of rules within custom optimizers. E.g:
 {code:java}
 object DefaultOptimizer extends Optimizer {
   override val batches = /* batches defined here */
 }
 object MyCustomOptimizer extends Optimizer {
   override val batches = 
 Batch(my custom batch ...) ::
 DefaultOptimizer.batches
 }
 {code}
 MyCustomOptimizer won't compile, because DefaultOptimizer.batches has type 
 Seq[DefaultOptimizer.this.Batch]. 
 Solution: Batch, FixedPoint, etc. should be moved *outside* the 
 RuleExecutor[T] class body, either in a companion object or right in the 
 `rules` package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3846) KryoException when doing joins in SparkSQL

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558808#comment-14558808
 ] 

Santiago M. Mola commented on SPARK-3846:
-

[~huangjs]  Would you mind adding a test case here (an example of data and 
exact code used to produce the exception)?

 KryoException when doing joins in SparkSQL 
 ---

 Key: SPARK-3846
 URL: https://issues.apache.org/jira/browse/SPARK-3846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.2.0
Reporter: Jianshi Huang

 The error is reproducible when I join two tables manually. The error message 
 is like follows.
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
 in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
 Unable to find class: 
 __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3846) KryoException when doing joins in SparkSQL

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Priority: Blocker  (was: Major)

 KryoException when doing joins in SparkSQL 
 ---

 Key: SPARK-3846
 URL: https://issues.apache.org/jira/browse/SPARK-3846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.2.0
Reporter: Jianshi Huang
Priority: Blocker

 The error is reproducible when I join two tables manually. The error message 
 is like follows.
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
 in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
 Unable to find class: 
 __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo) on joins when enabling codegen

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Summary: [SQL] Serialization exception (Kryo) on joins when enabling 
codegen   (was: [SQL] Serialization exception (Kryo and Java) on joins when 
enabling codegen )

 [SQL] Serialization exception (Kryo) on joins when enabling codegen 
 

 Key: SPARK-3846
 URL: https://issues.apache.org/jira/browse/SPARK-3846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.2.0
Reporter: Jianshi Huang
Priority: Blocker

 The error is reproducible when I join two tables manually. The error message 
 is like follows.
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
 in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
 Unable to find class: 
 __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558777#comment-14558777
 ] 

Santiago M. Mola commented on SPARK-7727:
-

[~evacchi] I'm sorry I opened this duplicate for: 
https://issues.apache.org/jira/browse/SPARK-7823

Not sure which one to mark as duplicate since both have pull requests.

 Avoid inner classes in RuleExecutor
 ---

 Key: SPARK-7727
 URL: https://issues.apache.org/jira/browse/SPARK-7727
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
  Labels: easyfix, starter

 In RuleExecutor, the following classes and objects are defined as inner 
 classes or objects: Strategy, Once, FixedPoint, Batch.
 This does not seem to accomplish anything in this case, but makes 
 extensibility harder. For example, if I want to define a new Optimizer that 
 uses all batches from the DefaultOptimizer plus some more, I would do 
 something like:
 {code}
 new Optimizer {
 override protected val batches: Seq[Batch] =
   DefaultOptimizer.batches ++ myBatches
  }
 {code}
 But this will give a typing error because batches in DefaultOptimizer are of 
 type DefaultOptimizer#Batch while myBatches are this#Batch.
 Workarounds include either copying the list of batches from DefaultOptimizer 
 or using a method like this:
 {code}
 private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
   val strategy = b.strategy.maxIterations match {
 case 1 = Once
 case n = FixedPoint(n)
   }
   Batch(b.name, strategy, b.rules)
 }
 {code}
 However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5707) Enabling spark.sql.codegen throws ClassNotFound exception

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558823#comment-14558823
 ] 

Santiago M. Mola commented on SPARK-5707:
-

This is probably a duplicate of SPARK-3846.

 Enabling spark.sql.codegen throws ClassNotFound exception
 -

 Key: SPARK-5707
 URL: https://issues.apache.org/jira/browse/SPARK-5707
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0, 1.3.1
 Environment: yarn-client mode, spark.sql.codegen=true
Reporter: Yi Yao
Assignee: Ram Sriharsha
Priority: Blocker

 Exception thrown:
 {noformat}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in 
 stage 133.0 failed 4 times, most recent failure: Lost task 13.3 in stage 
 133.0 (TID 3066, cdh52-node2): java.io.IOException: 
 com.esotericsoftware.kryo.KryoException: Unable to find class: 
 __wrapper$1$81257352e1c844aebf09cb84fe9e7459.__wrapper$1$81257352e1c844aebf09cb84fe9e7459$SpecificRow$1
 Serialization trace:
 hashTable (org.apache.spark.sql.execution.joins.UniqueKeyHashedRelation)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at 
 org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:62)
 at 
 org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:61)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 

[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Summary: [SQL] Serialization exception (Kryo and Java) on joins when 
enabling codegen   (was: KryoException when doing joins in SparkSQL )

 [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen 
 -

 Key: SPARK-3846
 URL: https://issues.apache.org/jira/browse/SPARK-3846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.2.0
Reporter: Jianshi Huang
Priority: Blocker

 The error is reproducible when I join two tables manually. The error message 
 is like follows.
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
 in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
 Unable to find class: 
 __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
 
 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
 
 org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559097#comment-14559097
 ] 

Santiago M. Mola commented on SPARK-7012:
-

[~6133d] SQLContext parses DDL statements (such as CREATE TEMPORARY TABLE) with 
an independent parser called DDLParser:
https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L87

The parsing of the columns for the schema is done in DDLParser.column:
https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176

 Add support for NOT NULL modifier for column definitions on DDLParser
 -

 Key: SPARK-7012
 URL: https://issues.apache.org/jira/browse/SPARK-7012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor
  Labels: easyfix

 Add support for NOT NULL modifier for column definitions on DDLParser. This 
 would add support for the following syntax:
 CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559331#comment-14559331
 ] 

Santiago M. Mola commented on SPARK-3815:
-

[~yanakad] Is this still present in more recent versions? If yes, could you 
provide a minimal test case (query + data)?

 LPAD function does not work in where predicate
 --

 Key: SPARK-3815
 URL: https://issues.apache.org/jira/browse/SPARK-3815
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yana Kadiyska
Priority: Minor

 select customer_id from mytable where 
 pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2
 produces:
 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing 
 query:
 org.apache.spark.SparkException: Task not serializable
 at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
 at 
 org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
 at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
 at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597)
 at 
 org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146)
 at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
 at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
 at 
 org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
 at 
 org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
 at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
 at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
 at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 The following work fine:
 select concat_ws('-', LPAD(cast(112717 % 1024 AS 

[jira] [Commented] (SPARK-4867) UDF clean up

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559423#comment-14559423
 ] 

Santiago M. Mola commented on SPARK-4867:
-

Maybe this issue can be split in smaller tasks? A lot of built-in functions can 
be removed from the parser quite easily by registering them in the 
FunctionRegistry. I am doing this with a lot of fixed-arity functions.

I'm using some helper functions to create FunctionBuilders for Expression for 
use with the FunctionRegistry. The main helper looks like this:

{code}
  def expression[T : Expression](arity: Int)(implicit tag: ClassTag[T]): 
ExpressionBuilder = {
val argTypes = (1 to arity).map(x = classOf[Expression])
val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*)
(expressions: Seq[Expression]) = {
  if (expressions.size != arity) {
throw new IllegalArgumentException(
  sInvalid number of arguments: ${expressions.size} (must be equal to 
$arity)
)
  }
  constructor.newInstance(expressions: _*).asInstanceOf[Expression]
}
  }
{code}

and can be used like this:

{code}
functionRegistry.registerFunction(MY_FUNCTION, expression[MyFunction])
{code}

If this approach looks like what is needed, I can extend it to use expressions 
with a variable number of parameters. Also, with some syntatic sugar we can 
provide a function that works this way:

{code}
functionRegistry.registerFunction[MyFunction]
// Register the builder produced by expression[MyFunction] with name 
MY_FUNCTION by using a camelcase - underscore-separated conversion.
{code}

How does this sound?


 UDF clean up
 

 Key: SPARK-4867
 URL: https://issues.apache.org/jira/browse/SPARK-4867
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Priority: Blocker

 Right now our support and internal implementation of many functions has a few 
 issues.  Specifically:
  - UDFS don't know their input types and thus don't do type coercion.
  - We hard code a bunch of built in functions into the parser.  This is bad 
 because in SQL it creates new reserved words for things that aren't actually 
 keywords.  Also it means that for each function we need to add support to 
 both SQLContext and HiveContext separately.
 For this JIRA I propose we do the following:
  - Change the interfaces for registerFunction and ScalaUdf to include types 
 for the input arguments as well as the output type.
  - Add a rule to analysis that does type coercion for UDFs.
  - Add a parse rule for functions to SQLParser.
  - Rewrite all the UDFs that are currently hacked into the various parsers 
 using this new functionality.
 Depending on how big this refactoring becomes we could split parts 12 from 
 part 3 above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4678) A SQL query with subquery fails with TreeNodeException

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559335#comment-14559335
 ] 

Santiago M. Mola commented on SPARK-4678:
-

[~ozawa] Does this happen in more recent versions?

 A SQL query with subquery fails with TreeNodeException
 --

 Key: SPARK-4678
 URL: https://issues.apache.org/jira/browse/SPARK-4678
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.1
Reporter: Tsuyoshi Ozawa

 {code}
 spark-sql create external table if  NOT EXISTS randomText100GB(text string) 
 location 'hdfs:///user/ozawa/randomText100GB'; 
 spark-sql CREATE TABLE wordcount AS
   SELECT word, count(1) AS count
   FROM (SELECT 
 EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
   AS word FROM randomText100GB) words
   GROUP BY word;
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
 stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 
 (TID 25, hadoop-slave2.c.gcp-s
 amples.internal): 
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
 attribute, tree: word#5
 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
 
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 scala.collection.AbstractTraversable.map(Traversable.scala:105)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.init(Projection.scala:52)
 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
 
 org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
 
 org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola reopened SPARK-7724:
-

Thanks. Here's a PR.

 Add support for Intersect and Except in Catalyst DSL
 

 Key: SPARK-7724
 URL: https://issues.apache.org/jira/browse/SPARK-7724
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Trivial
  Labels: easyfix, starter

 Catalyst DSL to create logical plans supports most of the current plan, but 
 it is missing Except and Intersect. See LogicalPlanFunctions:
 https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5305) Using a field in a WHERE clause that is not in the schema does not throw an exception.

2015-05-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555070#comment-14555070
 ] 

Santiago M. Mola commented on SPARK-5305:
-

[~sonixbp] What version were you using? Do you still experience this problem? 
It does not seem possible with recent versions.

 Using a field in a WHERE clause that is not in the schema does not throw an 
 exception.
 --

 Key: SPARK-5305
 URL: https://issues.apache.org/jira/browse/SPARK-5305
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Corey J. Nolet

 Given a schema:
 key1 = String
 key2 = Integer
 The following sql statement doesn't seem to throw an exception:
 SELECT * FROM myTable WHERE doesntExist = 'val1'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5755) remove unnecessary Add for unary plus sign

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-5755.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

 remove unnecessary Add for unary plus sign 
 ---

 Key: SPARK-5755
 URL: https://issues.apache.org/jira/browse/SPARK-5755
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Adrian Wang
Priority: Minor
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5755) remove unnecessary Add for unary plus sign (HiveQL)

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-5755:

Summary: remove unnecessary Add for unary plus sign (HiveQL)  (was: remove 
unnecessary Add for unary plus sign )

 remove unnecessary Add for unary plus sign (HiveQL)
 ---

 Key: SPARK-5755
 URL: https://issues.apache.org/jira/browse/SPARK-5755
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Adrian Wang
Priority: Minor
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7754) [SQL] Use PartialFunction literals instead of objects in Catalyst

2015-05-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554935#comment-14554935
 ] 

Santiago M. Mola commented on SPARK-7754:
-

Not all rules use transform. Some use transformUp and others use 
transformAllExpressions. Maybe this rule API could be extended to cover these 
cases.

 [SQL] Use PartialFunction literals instead of objects in Catalyst
 -

 Key: SPARK-7754
 URL: https://issues.apache.org/jira/browse/SPARK-7754
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Edoardo Vacchi
Priority: Minor

 Catalyst rules extend two distinct rule types: {{Rule[LogicalPlan]}} and 
 {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
 The distinction is fairly subtle: in the end, both rule types are supposed to 
 define a method {{apply(plan: LogicalPlan)}}
 (where LogicalPlan is either Logical- or Spark-) which returns a transformed 
 plan (or a sequence thereof, in the case
 of Strategy).
 Ceremonies asides, the body of such method is always of the kind:
 {code:java}
  def apply(plan: PlanType) = plan match pf
 {code}
 where `pf` would be some `PartialFunction` of the PlanType:
 {code:java}
   val pf = {
 case ... = ...
   }
 {code}
 This is JIRA is a proposal to introduce utility methods to
   a) reduce the boilerplate to define rewrite rules
   b) turning them back into what they essentially represent: function types.
 These changes would be backwards compatible, and would greatly help in 
 understanding what the code does. Current use of objects is redundant and 
 possibly confusing.
 *{{Rule[LogicalPlan]}}*
 a) Introduce the utility object
 {code:java}
   object rule 
 def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 override val ruleName = name
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 {code}
 b) progressively replace the boilerplate-y object definitions; e.g.
 {code:java}
 object MyRewriteRule extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case ... = ...
 }
 {code}
 with
 {code:java}
 // define a Rule[LogicalPlan]
 val MyRewriteRule = rule {
   case ... = ...
 }
 {code}
 and/or :
 {code:java}
 // define a named Rule[LogicalPlan]
 val MyRewriteRule = rule.named(My rewrite rule) {
   case ... = ...
 }
 {code}
 *Strategies*
 A similar solution could be applied to shorten the code for
 Strategies, which are total functions
 only because they are all supposed to manage the default case,
 possibly returning `Nil`. In this case
 we might introduce the following utility:
 {code:java}
 object strategy {
   /**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan for 
 each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
   def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
 }
   /**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] 
 ].
* The partial function must therefore return a Seq[SparkPlan] for each 
 case.
* Unhandled cases will automatically return Seq.empty
*/
  def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
 }
 }
 {code}
 Usage:
 {code:java}
 val mystrategy = strategy { case ... = ... }
 val seqstrategy = strategy.seq { case ... = ... }
 {code}
 *Further possible improvements:*
 Making the utility methods `implicit`, thereby
 further reducing the rewrite rules to:
 {code:java}
 // define a PartialFunction[LogicalPlan, LogicalPlan]
 // the implicit would convert it into a Rule[LogicalPlan] at the use sites
 val MyRewriteRule = {
   case ... = ...
 }
 {code}
 *Caveats*
 Because of the way objects are initialized vs. vals, it might be necessary
 reorder instructions so that vals are actually initialized before they are 
 used.
 E.g.:
 {code:java}
 class MyOptimizer extends Optimizer {
   override val batches: Seq[Batch] =
   ...
   Batch(Other rules, FixedPoint(100),
 

[jira] [Comment Edited] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-20 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553059#comment-14553059
 ] 

Santiago M. Mola edited comment on SPARK-7724 at 5/20/15 8:36 PM:
--

DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except classes directly 
though.


was (Author: smolav):
DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except clases directly 
though.

 Add support for Intersect and Except in Catalyst DSL
 

 Key: SPARK-7724
 URL: https://issues.apache.org/jira/browse/SPARK-7724
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Trivial
  Labels: easyfix, starter

 Catalyst DSL to create logical plans supports most of the current plan, but 
 it is missing Except and Intersect. See LogicalPlanFunctions:
 https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-20 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553059#comment-14553059
 ] 

Santiago M. Mola commented on SPARK-7724:
-

DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except clases directly 
though.

 Add support for Intersect and Except in Catalyst DSL
 

 Key: SPARK-7724
 URL: https://issues.apache.org/jira/browse/SPARK-7724
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Trivial
  Labels: easyfix, starter

 Catalyst DSL to create logical plans supports most of the current plan, but 
 it is missing Except and Intersect. See LogicalPlanFunctions:
 https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7724:
---

 Summary: Add support for Intersect and Except in Catalyst DSL
 Key: SPARK-7724
 URL: https://issues.apache.org/jira/browse/SPARK-7724
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Trivial


Catalyst DSL to create logical plans supports most of the current plan, but it 
is missing Except and Intersect. See LogicalPlanFunctions:

https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7727:
---

 Summary: Avoid inner classes in RuleExecutor
 Key: SPARK-7727
 URL: https://issues.apache.org/jira/browse/SPARK-7727
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola


In RuleExecutor, the following classes and objects are defined as inner classes 
or objects: Strategy, Once, FixedPoint, Batch.

This does not seem to accomplish anything in this case, but makes extensibility 
harder. For example, if I want to define a new Optimizer that uses all batches 
from the DefaultOptimizer plus some more, I would do something like:

{code}
new Optimizer {
override protected val batches: Seq[Batch] =
  DefaultOptimizer.batches ++ myBatches
 }
{code}

But this will give a typing error because batches in DefaultOptimizer are of 
type DefaultOptimizer#Batch while myBatches are this#Batch.

Workarounds include either copying the list of batches from DefaultOptimizer or 
using a method like this:

{code}
private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
  val strategy = b.strategy.maxIterations match {
case 1 = Once
case n = FixedPoint(n)
  }
  Batch(b.name, strategy, b.rules)
}
{code}

However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-05-17 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547362#comment-14547362
 ] 

Santiago M. Mola commented on SPARK-7275:
-

[~rxin] What are your thoughts on this?

 Make LogicalRelation public
 ---

 Key: SPARK-7275
 URL: https://issues.apache.org/jira/browse/SPARK-7275
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor

 It seems LogicalRelation is the only part of the LogicalPlan that is not 
 public. This makes it harder to work with full logical plans from third party 
 packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543602#comment-14543602
 ] 

Santiago M. Mola commented on SPARK-6743:
-

This problem only happens for cached relations. Here is the root of the problem:

{code}
/* Fails. Got: Array(Row(A1), Row(A2) */
assertResult(Array(Row(), Row()))(
  InMemoryColumnarTableScan(Nil, Nil, 
sqlc.table(tab0).queryExecution.sparkPlan.asInstanceOf[InMemoryColumnarTableScan].relation)
.execute().collect()
)
{code}

InMemoryColumnarTableScan returns the narrowest column when no attributes are 
requested:

{code}
 // Find the ordinals and data types of the requested columns.  If none are 
requested, use the
 // narrowest (the field with minimum default element size).
  val (requestedColumnIndices, requestedColumnDataTypes) = if 
(attributes.isEmpty) {
val (narrowestOrdinal, narrowestDataType) =
  relation.output.zipWithIndex.map { case (a, ordinal) =
ordinal - a.dataType
  } minBy { case (_, dataType) =
ColumnType(dataType).defaultSize
  }
Seq(narrowestOrdinal) - Seq(narrowestDataType)
  } else {
attributes.map { a =
  relation.output.indexWhere(_.exprId == a.exprId) - a.dataType
}.unzip
  }
{code}

It seems this is what leads to incorrect results.

 Join with empty projection on one side produces invalid results
 ---

 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Critical

 {code:java}
 val sqlContext = new SQLContext(sc)
 val tab0 = sc.parallelize(Seq(
   (83,0,38),
   (26,0,79),
   (43,81,24)
 ))
 sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
 tab0)
 sqlContext.cacheTable(tab0)   
 val df1 = sqlContext.sql(SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
 BY tab0._2, cor0._2)
 val result1 = df1.collect()
 val df2 = sqlContext.sql(SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
 cor0._2)
 val result2 = df2.collect()
 val df3 = sqlContext.sql(SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2)
 val result3 = df3.collect()
 {code}
 Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
 is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
 results would be Row(0), Row(81), which are ok for the third query. The first 
 query also produces valid results, and the only difference is that the left 
 side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543403#comment-14543403
 ] 

Santiago M. Mola commented on SPARK-7012:
-

In Spark SQL, every expression can be nullable or not (i.e. values can be null 
or not). All Spark SQL and Catalyst internals support specifying this.

See, for example, StructField, which is the relevant class for schemas:
https://github.com/apache/spark/blob/2d6612cc8b98f767d73c4d15e4065bf3d6c12ea7/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala#L31

Or AttributeReference:
https://github.com/apache/spark/blob/c1080b6fddb22d84694da2453e46a03fbc041576/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L166

However, when creating a temporary table through a SQL statement (CREATE 
TEMPORARY TABLE), there is no way of specifying if a column is nullable or not 
(it will be always nullable by default).

Standard SQL supports a constraint called NOT NULL to specify that a column 
is not nullable. See:
http://www.w3schools.com/sql/sql_notnull.asp

In order to implement this, the parser for CREATE TEMPORARY TABLE, that is, 
DDLParser, should be modifyed to allow NOT NULL and set nullable = false 
accordingly in StructField. See:
https://github.com/apache/spark/blob/0595b6de8f1da04baceda082553c2aa1aa2cb006/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176


 Add support for NOT NULL modifier for column definitions on DDLParser
 -

 Key: SPARK-7012
 URL: https://issues.apache.org/jira/browse/SPARK-7012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor
  Labels: easyfix

 Add support for NOT NULL modifier for column definitions on DDLParser. This 
 would add support for the following syntax:
 CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543435#comment-14543435
 ] 

Santiago M. Mola commented on SPARK-6743:
-

Sorry, my first example was not very clear. Here is a more precise one:

{code}
 val sqlc = new SQLContext(sc)

val tab0 = sc.parallelize(Seq(
  Tuple1(A1),
  Tuple1(A2)
))
sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab0), tab0)
sqlc.cacheTable(tab0)

val tab1 = sc.parallelize(Seq(
  Tuple1(B1),
  Tuple1(B2)
))
sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab1), tab1)
sqlc.cacheTable(tab1)

/* Succeeds */
val result1 = sqlc.sql(SELECT tab0._1,tab1._1 FROM tab0, tab1 GROUP BY 
tab0._1,tab1._1 ORDER BY tab0._1, tab1._1).collect()
assertResult(Array(Row(A1, B1), Row(A1, B2), Row(A2, B1), 
Row(A2, B2)))(result1)

/* Fails. Got: Array([A1], [A2]) */
val result2 = sqlc.sql(SELECT tab1._1 FROM tab0, tab1 GROUP BY tab1._1 
ORDER BY tab1._1).collect()
assertResult(Array(Row(B1), Row(B2)))(result2)
{code}

 Join with empty projection on one side produces invalid results
 ---

 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Critical

 {code:java}
 val sqlContext = new SQLContext(sc)
 val tab0 = sc.parallelize(Seq(
   (83,0,38),
   (26,0,79),
   (43,81,24)
 ))
 sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
 tab0)
 sqlContext.cacheTable(tab0)   
 val df1 = sqlContext.sql(SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
 BY tab0._2, cor0._2)
 val result1 = df1.collect()
 val df2 = sqlContext.sql(SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
 cor0._2)
 val result2 = df2.collect()
 val df3 = sqlContext.sql(SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2)
 val result3 = df3.collect()
 {code}
 Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
 is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
 results would be Row(0), Row(81), which are ok for the third query. The first 
 query also produces valid results, and the only difference is that the left 
 side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4758) Make metastore_db in-memory for HiveContext

2015-05-13 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541865#comment-14541865
 ] 

Santiago M. Mola commented on SPARK-4758:
-

This could also make testing more convenient. Is there any progress on this?

 Make metastore_db in-memory for HiveContext
 ---

 Key: SPARK-4758
 URL: https://issues.apache.org/jira/browse/SPARK-4758
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0, 1.3.0
Reporter: Jianshi Huang
Priority: Minor

 HiveContext by default will create a local folder metastore_db.
 This is not very user friendly as the metastore_db will be locked by 
 HiveContext and thus will block multiple Spark process to start from the same 
 directory.
 I would propose adding a default hive-site.xml in conf/ with the following 
 content.
 configuration
   property
 namejavax.jdo.option.ConnectionURL/name
 valuejdbc:derby:memory:databaseName=metastore_db;create=true/value
   /property
   property
 namejavax.jdo.option.ConnectionDriverName/name
 valueorg.apache.derby.jdbc.EmbeddedDriver/value
   /property
   property
 namehive.metastore.warehouse.dir/name
 valuefile://${user.dir}/hive/warehouse/value
   /property
 /configuration
 jdbc:derby:memory:databaseName=metastore_db;create=true Will make sure the 
 embedded derby database is created in-memory.
 Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7566) HiveContext.analyzer cannot be overriden

2015-05-12 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7566:
---

 Summary: HiveContext.analyzer cannot be overriden
 Key: SPARK-7566
 URL: https://issues.apache.org/jira/browse/SPARK-7566
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola


Trying to override HiveContext.analyzer will give the following compilation 
error:

{code}
Error:(51, 36) overriding lazy value analyzer in class HiveContext of type 
org.apache.spark.sql.catalyst.analysis.Analyzer{val extendedResolutionRules: 
List[org.apache.spark.sql.catalyst.rules.Rule[org.apache.spark.sql.catalyst.plans.logical.LogicalPlan]]};
 lazy value analyzer has incompatible type
  override protected[sql] lazy val analyzer: Analyzer = {
   ^
{code}

That is because the type changed inadvertedly when omitting the type 
declaration of the return type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-05-11 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538734#comment-14538734
 ] 

Santiago M. Mola commented on SPARK-7088:
-

Any thoughts on this?

 [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
 -

 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical
  Labels: regression

 We're using some custom logical plans. We are now migrating from Spark 1.3.0 
 to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
 internal code, so we understand that. But now the ResolveReferences rule, 
 that used to work with third-party logical plans just does not work, without 
 any possible workaround that I'm aware other than just copying 
 ResolveReferences rule and using it with our own fix.
 The change in question is this section of code:
 {code}
 }.headOption.getOrElse { // Only handle first case, others will be 
 fixed on the next pass.
   sys.error(
 s
   |Failure when resolving conflicting references in Join:
   |$plan
   |
   |Conflicting attributes: ${conflictingAttributes.mkString(,)}
   .stripMargin)
 }
 {code}
 Which causes the following error on analysis:
 {code}
 Failure when resolving conflicting references in Join:
 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
 c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
  'Join Inner, None
   Subquery l
Subquery h
 Project [name#12,node#36]
  CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
   Subquery v
Subquery h_src
 LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
 mapPartitions at ExistingRDD.scala:37
   Subquery r
Subquery h
 Project [name#40,node#36]
  CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
   Subquery v
Subquery h_src
 LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
 mapPartitions at ExistingRDD.scala:37
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-11 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538737#comment-14538737
 ] 

Santiago M. Mola commented on SPARK-6743:
-

Any thoughts on this?

 Join with empty projection on one side produces invalid results
 ---

 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Critical

 {code:java}
 val sqlContext = new SQLContext(sc)
 val tab0 = sc.parallelize(Seq(
   (83,0,38),
   (26,0,79),
   (43,81,24)
 ))
 sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
 tab0)
 sqlContext.cacheTable(tab0)   
 val df1 = sqlContext.sql(SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
 BY tab0._2, cor0._2)
 val result1 = df1.collect()
 val df2 = sqlContext.sql(SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
 cor0._2)
 val result2 = df2.collect()
 val df3 = sqlContext.sql(SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2)
 val result3 = df3.collect()
 {code}
 Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
 is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
 results would be Row(0), Row(81), which are ok for the third query. The first 
 query also produces valid results, and the only difference is that the left 
 side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-05-07 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532159#comment-14532159
 ] 

Santiago M. Mola commented on SPARK-7275:
-

[~gweidner] I work on a project that extends Spark SQL with a richer data 
sources API. One of such extensions is the ability to push down a subtree of 
the logical plan in full to a data source. Data sources implementing this API 
must be able to inspect the LogicalPlan they're given, and that includes 
matching LogicalRelation. If a data source is in its own Java package (i.e. not 
org.apache.spark.sql) which is the usual case, it will not be able to match a 
LogicalRelation out of the box. Currently, I implemented a workaround by adding 
a public extractor IsLogicalRelation in org.apache.spark.sql package that 
proxies LogicalRelation to outsider packages... which is, of course, a ugly 
hack.

Note that LogicalRelation is the only element of the logical plan which is not 
public.

 Make LogicalRelation public
 ---

 Key: SPARK-7275
 URL: https://issues.apache.org/jira/browse/SPARK-7275
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor

 It seems LogicalRelation is the only part of the LogicalPlan that is not 
 public. This makes it harder to work with full logical plans from third party 
 packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7275) Make LogicalRelation public

2015-04-30 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7275:
---

 Summary: Make LogicalRelation public
 Key: SPARK-7275
 URL: https://issues.apache.org/jira/browse/SPARK-7275
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


It seems LogicalRelation is the only part of the LogicalPlan that is not 
public. This makes it harder to work with full logical plans from third party 
packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37


  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand at. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37



 [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
 -

 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical

 We're using some custom logical plans. We are now migrating from Spark 1.3.0 
 to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
 internal code, so we understand that. But now the ResolveReferences rule, 
 that used to work with third-party logical plans just does not work, without 
 any possible workaround that I'm aware other than just copying 
 ResolveReferences rule and using it with our own fix.
 The change in question is this section of code:
 }.headOption.getOrElse { // Only handle first case, others will be 
 fixed on the next pass.
   sys.error(
 s
   |Failure when resolving conflicting references in Join:
   |$plan
   |
   |Conflicting attributes: ${conflictingAttributes.mkString(,)}
   .stripMargin)
 }
 Which causes the following error on analysis:
 Failure when resolving conflicting references in Join:
 'Project 

[jira] [Created] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7088:
---

 Summary: [REGRESSION] Spark 1.3.1 breaks analysis of third-party 
logical plans
 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical


We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand at. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}

  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37



 [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
 -

 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical

 We're using some custom logical plans. We are now migrating from Spark 1.3.0 
 to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
 internal code, so we understand that. But now the ResolveReferences rule, 
 that used to work with third-party logical plans just does not work, without 
 any possible workaround that I'm aware other than just copying 
 ResolveReferences rule and using it with our own fix.
 The change in question is this section of code:
 {code}
 }.headOption.getOrElse { // Only handle first case, others will be 
 fixed on the next pass.
   sys.error(
 s
   |Failure when resolving conflicting references in Join:
   |$plan
   |
   |Conflicting attributes: ${conflictingAttributes.mkString(,)}
   .stripMargin)
 }
 {code}
 Which causes the following error on analysis:
 {code}
 Failure when resolving 

[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Labels: regression  (was: )

 [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
 -

 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical
  Labels: regression

 We're using some custom logical plans. We are now migrating from Spark 1.3.0 
 to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
 internal code, so we understand that. But now the ResolveReferences rule, 
 that used to work with third-party logical plans just does not work, without 
 any possible workaround that I'm aware other than just copying 
 ResolveReferences rule and using it with our own fix.
 The change in question is this section of code:
 {code}
 }.headOption.getOrElse { // Only handle first case, others will be 
 fixed on the next pass.
   sys.error(
 s
   |Failure when resolving conflicting references in Join:
   |$plan
   |
   |Conflicting attributes: ${conflictingAttributes.mkString(,)}
   .stripMargin)
 }
 {code}
 Which causes the following error on analysis:
 {code}
 Failure when resolving conflicting references in Join:
 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
 c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
  'Join Inner, None
   Subquery l
Subquery h
 Project [name#12,node#36]
  CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
   Subquery v
Subquery h_src
 LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
 mapPartitions at ExistingRDD.scala:37
   Subquery r
Subquery h
 Project [name#40,node#36]
  CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
   Subquery v
Subquery h_src
 LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
 mapPartitions at ExistingRDD.scala:37
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}

  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(,)}
  .stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}


 [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
 -

 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical

 We're using some custom logical plans. We are now migrating from Spark 1.3.0 
 to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
 internal code, so we understand that. But now the ResolveReferences rule, 
 that used to work with third-party logical plans just does not work, without 
 any possible workaround that I'm aware other than just copying 
 ResolveReferences rule and using it with our own fix.
 The change in question is this section of code:
 {code}
 }.headOption.getOrElse { // Only handle first case, others will be 
 fixed on the next pass.
   sys.error(
 s
   |Failure when resolving conflicting references in Join:
   |$plan
   |
   |Conflicting attributes: ${conflictingAttributes.mkString(,)}
   .stripMargin)
 }
 {code}
 Which causes the following error on analysis:
 {code}
 Failure when resolving conflicting 

[jira] [Created] (SPARK-7034) Support escaped double quotes on data source options

2015-04-21 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7034:
---

 Summary: Support escaped double quotes on data source options
 Key: SPARK-7034
 URL: https://issues.apache.org/jira/browse/SPARK-7034
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


Currently, this is not supported:

CREATE TEMPORARY TABLE t
USING my.data.source
OPTIONS (
  myFancyOption with \escaped\ double quotes
);

it will produce a parsing error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-04-20 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7012:
---

 Summary: Add support for NOT NULL modifier for column definitions 
on DDLParser
 Key: SPARK-7012
 URL: https://issues.apache.org/jira/browse/SPARK-7012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Add support for NOT NULL modifier for column definitions on DDLParser. This 
would add support for the following syntax:

CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6874) Add support for SQL:2003 array type declaration syntax

2015-04-12 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6874:
---

 Summary: Add support for SQL:2003 array type declaration syntax
 Key: SPARK-6874
 URL: https://issues.apache.org/jira/browse/SPARK-6874
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


As of SQL:2003, arrays are standard SQL types, However, declaration syntax 
differs from Spark's CQL-like syntax. Examples of standard syntax:

BIGINT ARRAY
BIGINT ARRAY[100]
BIGINT ARRAY[100] ARRAY[200]

It would be great to have support standard syntax here.

Some additional details that this addition should have IMO:
- Forbit mixed syntax such as ARRAYINT ARRAY[100]
- Ignore the maximum capacity (ARRAY[N]) but allow it to be specified. This 
seems to be what others (i.e. PostgreSQL) are doing.
ARRAYBIGINT ARRAY[100]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6863) Formatted list broken on Hive compatibility section of SQL programming guide

2015-04-11 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6863:
---

 Summary: Formatted list broken on Hive compatibility section of 
SQL programming guide
 Key: SPARK-6863
 URL: https://issues.apache.org/jira/browse/SPARK-6863
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Trivial


Formatted list broken on Hive compatibility section of SQL programming guide. 
It does not appear as a list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6740) SQL operator and condition precedence is not honoured

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6740:
---

 Summary: SQL operator and condition precedence is not honoured
 Key: SPARK-6740
 URL: https://issues.apache.org/jira/browse/SPARK-6740
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola


The following query from the SQL Logic Test suite fails to parse:

SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT ( - _2 + - 39 ) IS NULL

while the following (equivalent) does parse correctly:

SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT (( - _2 + - 39 ) IS NULL)

SQLite, MySQL and Oracle (and probably most SQL implementations) define IS with 
higher precedence than NOT, so the first query is valid and well-defined.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6741) Add support for SELECT ALL syntax

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6741:
---

 Summary: Add support for SELECT ALL syntax
 Key: SPARK-6741
 URL: https://issues.apache.org/jira/browse/SPARK-6741
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Support SELECT ALL syntax (equivalent to SELECT, without DISTINCT).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-04-07 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6743:

Priority: Critical  (was: Major)

 Join with empty projection on one side produces invalid results
 ---

 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Critical

 {code:java}
 val sqlContext = new SQLContext(sc)
 val tab0 = sc.parallelize(Seq(
   (83,0,38),
   (26,0,79),
   (43,81,24)
 ))
 sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
 tab0)
 sqlContext.cacheTable(tab0)   
 val df1 = sqlContext.sql(SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
 BY tab0._2, cor0._2)
 val result1 = df1.collect()
 val df2 = sqlContext.sql(SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
 cor0._2)
 val result2 = df2.collect()
 val df3 = sqlContext.sql(SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2)
 val result3 = df3.collect()
 {code}
 Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
 is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
 results would be Row(0), Row(81), which are ok for the third query. The first 
 query also produces valid results, and the only difference is that the left 
 side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6743:
---

 Summary: Join with empty projection on one side produces invalid 
results
 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola


{code:java}
val sqlContext = new SQLContext(sc)
val tab0 = sc.parallelize(Seq(
  (83,0,38),
  (26,0,79),
  (43,81,24)
))
sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
tab0)
sqlContext.cacheTable(tab0)   
val df1 = sqlContext.sql(SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP BY 
tab0._2, cor0._2)
val result1 = df1.collect()
val df2 = sqlContext.sql(SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY cor0._2)
val result2 = df2.collect()
val df3 = sqlContext.sql(SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2)
val result3 = df3.collect()
{code}

Given the previous code, result2 equals to Row(43), Row(83), Row(26), which is 
wrong. These results correspond to cor0._1, instead of cor0._2. Correct results 
would be Row(0), Row(81), which are ok for the third query. The first query 
also produces valid results, and the only difference is that the left side of 
the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6744) Add support for CROSS JOIN syntax

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6744:
---

 Summary: Add support for CROSS JOIN syntax
 Key: SPARK-6744
 URL: https://issues.apache.org/jira/browse/SPARK-6744
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
 Environment: Add support for the standard CROSS JOIN syntax.
Reporter: Santiago M. Mola
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6611) Add support for INTEGER as synonym of INT to DDLParser

2015-03-30 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6611:
---

 Summary: Add support for INTEGER as synonym of INT to DDLParser
 Key: SPARK-6611
 URL: https://issues.apache.org/jira/browse/SPARK-6611
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Add support for INTEGER as synonym of INT to DDLParser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6320) Adding new query plan strategy to SQLContext

2015-03-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368683#comment-14368683
 ] 

Santiago M. Mola commented on SPARK-6320:
-

[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]]'s 
use case and might improve code organization of the core strategies in the 
future.

 Adding new query plan strategy to SQLContext
 

 Key: SPARK-6320
 URL: https://issues.apache.org/jira/browse/SPARK-6320
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Youssef Hatem
Priority: Minor

 Hi,
 I would like to add a new strategy to {{SQLContext}}. To do this I created a 
 new class which extends {{Strategy}}. In my new class I need to call 
 {{planLater}} function. However this method is defined in {{SparkPlanner}} 
 (which itself inherits the method from {{QueryPlanner}}).
 To my knowledge the only way to make {{planLater}} function visible to my new 
 strategy is to define my strategy inside another class that extends 
 {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will 
 have to extend the {{SQLContext}} such that I can override the {{planner}} 
 field with the new {{Planner}} class I created.
 It seems that this is a design problem because adding a new strategy seems to 
 require extending {{SQLContext}} (unless I am doing it wrong and there is a 
 better way to do it).
 Thanks a lot,
 Youssef



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6320) Adding new query plan strategy to SQLContext

2015-03-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368683#comment-14368683
 ] 

Santiago M. Mola edited comment on SPARK-6320 at 3/19/15 8:12 AM:
--

[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]'s 
use case and might improve code organization of the core strategies in the 
future.


was (Author: smolav):
[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]]'s 
use case and might improve code organization of the core strategies in the 
future.

 Adding new query plan strategy to SQLContext
 

 Key: SPARK-6320
 URL: https://issues.apache.org/jira/browse/SPARK-6320
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Youssef Hatem
Priority: Minor

 Hi,
 I would like to add a new strategy to {{SQLContext}}. To do this I created a 
 new class which extends {{Strategy}}. In my new class I need to call 
 {{planLater}} function. However this method is defined in {{SparkPlanner}} 
 (which itself inherits the method from {{QueryPlanner}}).
 To my knowledge the only way to make {{planLater}} function visible to my new 
 strategy is to define my strategy inside another class that extends 
 {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will 
 have to extend the {{SQLContext}} such that I can override the {{planner}} 
 field with the new {{Planner}} class I created.
 It seems that this is a design problem because adding a new strategy seems to 
 require extending {{SQLContext}} (unless I am doing it wrong and there is a 
 better way to do it).
 Thanks a lot,
 Youssef



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Description: 
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version 1.7.0_75
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

  was:
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]



 Build error on Windows: polymorphic expression cannot be instantiated to 
 expected type
 --

 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: 
Reporter: Santiago M. Mola
  Labels: build-failure

 $ bash build/sbt -Phadoop-2.3 assembly
 [...]
 [error] 
 C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
  polymorphic expression cannot be instantiated to expected type;
 [error]  found   : [T(in method 
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 [error]  required: 
 org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
 functionToUdfBuilder)]
 [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
 T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 [error]   
   ^
 [...]
 $ uname -a
 CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
 $ java -version
 java version 1.7.0_75
 Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
 $ scala -version
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 $ build/zinc-0.3.5.3/bin/zinc -version
 zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Component/s: SQL

 Build error on Windows: polymorphic expression cannot be instantiated to 
 expected type
 --

 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: 
Reporter: Santiago M. Mola
  Labels: build-failure

 $ bash build/sbt -Phadoop-2.3 assembly
 [...]
 [error] 
 C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
  polymorphic expression cannot be instantiated to expected type;
 [error]  found   : [T(in method 
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 [error]  required: 
 org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
 functionToUdfBuilder)]
 [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
 T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 [error]   
   ^
 [...]
 $ uname -a
 CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
 $ java -version
 java version 1.7.0_75
 Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
 $ scala -version
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 $ build/zinc-0.3.5.3/bin/zinc -version
 zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6410:
---

 Summary: Build error on Windows: polymorphic expression cannot be 
instantiated to expected type
 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.0
 Environment: $ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version 1.7.0_75
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

Reporter: Santiago M. Mola


$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Description: 
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version 1.7.0_75
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

  was:
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version 1.7.0_75
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3


 Build error on Windows: polymorphic expression cannot be instantiated to 
 expected type
 --

 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: 
Reporter: Santiago M. Mola
  Labels: build-failure
 Attachments: output.log


 $ bash build/sbt -Phadoop-2.3 assembly
 [...]
 [error] 
 C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
  polymorphic expression cannot be instantiated to expected type;
 [error]  found   : [T(in method 
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 [error]  required: 
 org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
 functionToUdfBuilder)]
 [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
 T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 [error]   
   ^
 [...]
 $ uname -a
 CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
 $ java -version
 java version 1.7.0_75
 Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
 $ scala -version
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 $ build/zinc-0.3.5.3/bin/zinc -version
 zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Attachment: output.log

Full error log.

 Build error on Windows: polymorphic expression cannot be instantiated to 
 expected type
 --

 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: 
Reporter: Santiago M. Mola
  Labels: build-failure
 Attachments: output.log


 $ bash build/sbt -Phadoop-2.3 assembly
 [...]
 [error] 
 C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
  polymorphic expression cannot be instantiated to expected type;
 [error]  found   : [T(in method 
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 [error]  required: 
 org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
 functionToUdfBuilder)]
 [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
 T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 [error]   
   ^
 [...]
 $ uname -a
 CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
 $ java -version
 java version 1.7.0_75
 Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
 $ scala -version
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 $ build/zinc-0.3.5.3/bin/zinc -version
 zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-6410.
-
Resolution: Not a Problem

This has something to do with the incremental compiler. I got it working after 
running sbt clean.

 Build error on Windows: polymorphic expression cannot be instantiated to 
 expected type
 --

 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: 
Reporter: Santiago M. Mola
  Labels: build-failure
 Attachments: output.log


 $ bash build/sbt -Phadoop-2.3 assembly
 [...]
 [error] 
 C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
  polymorphic expression cannot be instantiated to expected type;
 [error]  found   : [T(in method 
 apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
 [error]  required: 
 org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
 functionToUdfBuilder)]
 [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
 T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
 [error]   
   ^
 [...]
 $ uname -a
 CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
 $ java -version
 java version 1.7.0_75
 Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
 $ scala -version
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 $ build/zinc-0.3.5.3/bin/zinc -version
 zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6397) Check the missingInput simply

2015-03-18 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367437#comment-14367437
 ] 

Santiago M. Mola commented on SPARK-6397:
-

I think a proper title would be: Override QueryPlan.missingInput when necessary 
and rely on it CheckAnalysis.
And description: Currently, some LogicalPlans do not override missingInput, but 
they should. Then, the lack of proper missingInput implementations leaks to 
CheckAnalysis.

(I'm about to create a pull request that fixes this problem in some more places)



 Check the missingInput simply
 -

 Key: SPARK-6397
 URL: https://issues.apache.org/jira/browse/SPARK-6397
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yadong Qi
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4799) Spark should not rely on local host being resolvable on every node

2014-12-09 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-4799:
---

 Summary: Spark should not rely on local host being resolvable on 
every node
 Key: SPARK-4799
 URL: https://issues.apache.org/jira/browse/SPARK-4799
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Tested a Spark+Mesos cluster on top of Docker to 
reproduce the issue.
Reporter: Santiago M. Mola


Spark fails when a node hostname is not resolvable by other nodes.

See an example trace:

{code}
14/12/09 17:02:41 ERROR SendingConnection: Error connecting to 
27e434cf36ac:35093
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at 
org.apache.spark.network.SendingConnection.connect(Connection.scala:299)
at 
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:278)
at 
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
{code}

The relevant code is here:
https://github.com/apache/spark/blob/bcb5cdad614d4fce43725dfec3ce88172d2f8c11/core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala#L170

{code}
val id = new ConnectionManagerId(Utils.localHostName, 
serverChannel.socket.getLocalPort)
{code}

This piece of code should use the host IP with Utils.localIpAddress or a method 
that acknowleges user settings (e.g. SPARK_LOCAL_IP). Since I cannot think 
about a use case for using hostname here, I'm creating a PR with the former 
solution, but if you think the later is better, I'm willing to create a new PR 
with a more elaborate fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer

2014-06-09 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021797#comment-14021797
 ] 

Santiago M. Mola commented on SPARK-1977:
-

Xiangrui Meng, I can't reproduce it at the moment. It takes a quite big dataset 
to reproduce and I have my machines busy. But I'm pretty sure the stacktrace is 
exactly the same as the one posted by Neville Li. My bet is that this will be 
fixed with next Twitter Chill release: 
https://github.com/twitter/chill/commit/b47512c2c75b94b7c5945985306fa303576bf90d

 mutable.BitSet in ALS not serializable with KryoSerializer
 --

 Key: SPARK-1977
 URL: https://issues.apache.org/jira/browse/SPARK-1977
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Neville Li
Priority: Minor

 OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member.
 KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't 
 register mutable.BitSet.
 Right now we have to register mutable.BitSet manually. A proper fix would be 
 using immutable.BitSet in ALS or register mutable.BitSet in upstream chill.
 {code}
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 
 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: 
 com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: 
 scala.collection.mutable.HashSet
 Serialization trace:
 shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 java.lang.Thread.run(Thread.java:662)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
   at 
 

[jira] [Comment Edited] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer

2014-06-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019655#comment-14019655
 ] 

Santiago M. Mola edited comment on SPARK-1977 at 6/6/14 7:55 AM:
-

I can reproduce this depending on the size of the dataset:

{noformat}
spark-submit mllib-movielens-evaluation-assembly-1.0.jar --master 
spark://mllib1:7077
--class com.example.MovieLensALS --rank 10 --numIterations 20 --lambda 1.0 
--kryo
hdfs:/movielens/oversampled.dat
{noformat}

The exception will not be thrown for small datasets. It will successfully run 
with MovieLens 100k and 10M. However, when I run it on a 100M dataset, the 
exception will be thrown.

My MovieLensALS is mostly the same as the one shipped with Spark. I just added 
cross-validation. Rating is registered in Kryo just as in the stock example.

{noformat}
# cat RELEASE 
Spark 1.0.0 built for Hadoop 2.2.0
{noformat}




was (Author: smolav):
I can reproduce this depending on the size of the dataset:

{noformat}
spark-submit mllib-movielens-evaluation-assembly-1.0.jar --master 
spark://mllib1:7077
--class com.example.MovieLensALS --rank 10 --numIterations 20 --lambda 1.0 
--kryo
hdfs:/movielens/oversampled.dat
{noformat}

The exeption will not be triggered for small datasets. It will successfully run 
with MovieLens 100k and 10M. However, when I run it on a 100M dataset, the 
exception will be triggered.

My MovieLensALS is mostly the same as the one shipped with Spark. I just added 
cross-validation. Rating is registered in Kryo just as in the stock example.

{noformat}
# cat RELEASE 
Spark 1.0.0 built for Hadoop 2.2.0
{noformat}



 mutable.BitSet in ALS not serializable with KryoSerializer
 --

 Key: SPARK-1977
 URL: https://issues.apache.org/jira/browse/SPARK-1977
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Neville Li
Priority: Minor

 OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member.
 KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't 
 register mutable.BitSet.
 Right now we have to register mutable.BitSet manually. A proper fix would be 
 using immutable.BitSet in ALS or register mutable.BitSet in upstream chill.
 {code}
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 
 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: 
 com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: 
 scala.collection.mutable.HashSet
 Serialization trace:
 shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)