[jira] [Updated] (SPARK-4648) Support Coalesce in Spark SQL.

2014-11-29 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4648:
---
Description: 
Support Coalesce function in Spark SQL.
Support type widening in Coalesce function.
And replace Coalesce UDF in Spark Hive with local Coalesce function since it is 
memory efficient and faster.

  was:Support Coalesce function in Spark SQL


 Support Coalesce in Spark SQL.
 --

 Key: SPARK-4648
 URL: https://issues.apache.org/jira/browse/SPARK-4648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 Support Coalesce function in Spark SQL.
 Support type widening in Coalesce function.
 And replace Coalesce UDF in Spark Hive with local Coalesce function since it 
 is memory efficient and faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4648) Support COALESCE function in Spark SQL and HiveQL

2014-11-29 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4648:
---
Summary: Support COALESCE function in Spark SQL and HiveQL  (was: Support 
Coalesce in Spark SQL.)

 Support COALESCE function in Spark SQL and HiveQL
 -

 Key: SPARK-4648
 URL: https://issues.apache.org/jira/browse/SPARK-4648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 Support Coalesce function in Spark SQL.
 Support type widening in Coalesce function.
 And replace Coalesce UDF in Spark Hive with local Coalesce function since it 
 is memory efficient and faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4658) Code documentation issue in DDL of datasource

2014-11-29 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4658:
--

 Summary: Code documentation issue in DDL of datasource
 Key: SPARK-4658
 URL: https://issues.apache.org/jira/browse/SPARK-4658
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Ravindra Pesala
Priority: Minor


The syntax mentioned to create table for datasource in ddl.scala file is 
documented with wrong syntax like
{code}
  /**
   * CREATE FOREIGN TEMPORARY TABLE avroTable
   * USING org.apache.spark.sql.avro
   * OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
   */
{code}

but the correct syntax is 
{code}
 /**
   * CREATE TEMPORARY TABLE avroTable
   * USING org.apache.spark.sql.avro
   * OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
   */
{code}

Wrong syntax is documented in newParquet.scala like
{code}
`CREATE TABLE ... USING org.apache.spark.sql.parquet`.  
{code} 
 but the correct syntax is 
{code}
`CREATE TEMPORARY TABLE ... USING org.apache.spark.sql.parquet`.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4648) Use available Coalesce function in HiveQL instead of using HiveUDF. And support Coalesce in Spark SQL.

2014-11-28 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4648:
--

 Summary: Use available Coalesce function in HiveQL instead of 
using HiveUDF. And support Coalesce in Spark SQL.
 Key: SPARK-4648
 URL: https://issues.apache.org/jira/browse/SPARK-4648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala


Currently HiveQL uses Hive UDF function for Coalesce. Usually using hive udfs 
are memory intensive. Since Coalesce function is already available in Spark , 
we can make use of it. 
And also support Coalesce function in Spar SQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4648) Support Coalesce in Spark SQL.

2014-11-28 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4648:
---
Summary: Support Coalesce in Spark SQL.  (was: Use available Coalesce 
function in HiveQL instead of using HiveUDF. And support Coalesce in Spark SQL.)

 Support Coalesce in Spark SQL.
 --

 Key: SPARK-4648
 URL: https://issues.apache.org/jira/browse/SPARK-4648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 Currently HiveQL uses Hive UDF function for Coalesce. Usually using hive udfs 
 are memory intensive. Since Coalesce function is already available in Spark , 
 we can make use of it. 
 And also support Coalesce function in Spar SQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4648) Support Coalesce in Spark SQL.

2014-11-28 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4648:
---
Description: Support Coalesce function in Spark SQL  (was: Currently HiveQL 
uses Hive UDF function for Coalesce. Usually using hive udfs are memory 
intensive. Since Coalesce function is already available in Spark , we can make 
use of it. 
And also support Coalesce function in Spar SQL)

 Support Coalesce in Spark SQL.
 --

 Key: SPARK-4648
 URL: https://issues.apache.org/jira/browse/SPARK-4648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 Support Coalesce function in Spark SQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4650) Supporting multi column support in count(distinct c1,c2..) in Spark SQL

2014-11-28 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4650:
--

 Summary: Supporting multi column support in count(distinct 
c1,c2..) in Spark SQL
 Key: SPARK-4650
 URL: https://issues.apache.org/jira/browse/SPARK-4650
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala


Support  multi column support inside count(distinct c1,c2..) which is not 
working in Spark SQL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4650) Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL

2014-11-28 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4650:
---
Summary: Supporting multi column support in countDistinct function like 
count(distinct c1,c2..) in Spark SQL  (was: Supporting multi column support in 
count(distinct c1,c2..) in Spark SQL)

 Supporting multi column support in countDistinct function like count(distinct 
 c1,c2..) in Spark SQL
 ---

 Key: SPARK-4650
 URL: https://issues.apache.org/jira/browse/SPARK-4650
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 Support  multi column support inside count(distinct c1,c2..) which is not 
 working in Spark SQL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4513:
--

 Summary: Support relational operator '=' in Spark SQL
 Key: SPARK-4513
 URL: https://issues.apache.org/jira/browse/SPARK-4513
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Ravindra Pesala


The relational operator '='  is not working in Spark SQL. Same works in Spark 
HiveQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-4513:
---
Component/s: SQL

 Support relational operator '=' in Spark SQL
 --

 Key: SPARK-4513
 URL: https://issues.apache.org/jira/browse/SPARK-4513
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala

 The relational operator '='  is not working in Spark SQL. Same works in 
 Spark HiveQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

2014-11-05 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199596#comment-14199596
 ] 

Ravindra Pesala commented on SPARK-4226:


Currently there is no support of subquery expressions as predicates. I am 
working on it.

 SparkSQL - Add support for subqueries in predicates
 ---

 Key: SPARK-4226
 URL: https://issues.apache.org/jira/browse/SPARK-4226
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
 Environment: Spark 1.2 snapshot
Reporter: Terry Siu

 I have a test table defined in Hive as follows:
 CREATE TABLE sparkbug (
   id INT,
   event STRING
 ) STORED AS PARQUET;
 and insert some sample data with ids 1, 2, 3.
 In a Spark shell, I then create a HiveContext and then execute the following 
 HQL to test out subquery predicates:
 val hc = HiveContext(hc)
 hc.hql(select customerid from sparkbug where customerid in (select 
 customerid from sparkbug where customerid in (2,3)))
 I get the following error:
 java.lang.RuntimeException: Unsupported language features in query: select 
 customerid from sparkbug where customerid in (select customerid from sparkbug 
 where customerid in (2,3))
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
 sparkbug
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_TABLE_OR_COL
   customerid
 TOK_WHERE
   TOK_SUBQUERY_EXPR
 TOK_SUBQUERY_OP
   in
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
 sparkbug
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_TABLE_OR_COL
   customerid
 TOK_WHERE
   TOK_FUNCTION
 in
 TOK_TABLE_OR_COL
   customerid
 2
 3
 TOK_TABLE_OR_COL
   customerid
 scala.NotImplementedError: No parse rules for ASTNode type: 817, text: 
 TOK_SUBQUERY_EXPR :
 TOK_SUBQUERY_EXPR
   TOK_SUBQUERY_OP
 in
   TOK_QUERY
 TOK_FROM
   TOK_TABREF
 TOK_TABNAME
   sparkbug
 TOK_INSERT
   TOK_DESTINATION
 TOK_DIR
   TOK_TMP_FILE
   TOK_SELECT
 TOK_SELEXPR
   TOK_TABLE_OR_COL
 customerid
   TOK_WHERE
 TOK_FUNCTION
   in
   TOK_TABLE_OR_COL
 customerid
   2
   3
   TOK_TABLE_OR_COL
 customerid
  +
  
 org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1098)
 
 at scala.sys.package$.error(package.scala:27)
 at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:252)
 at 
 org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
 at 
 org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
 at 
 scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
 This thread
 http://apache-spark-user-list.1001560.n3.nabble.com/Subquery-in-having-clause-Spark-1-1-0-td17401.html
 also brings up lack of subquery support in SparkSQL. It would be nice to have 
 subquery predicate support in a near, future release (1.3, maybe?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4207) Query which has syntax like 'not like' is not working in Spark SQL

2014-11-03 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4207:
--

 Summary: Query which has syntax like 'not like' is not working in 
Spark SQL
 Key: SPARK-4207
 URL: https://issues.apache.org/jira/browse/SPARK-4207
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala


Queries which has 'not like' is not working in Spark SQL. Same works in Spark 
HiveQL.

{code}
sql(SELECT * FROM records where value not like 'val%')
{code}

The above query fails with below exception

{code}
Exception in thread main java.lang.RuntimeException: [1.39] failure: ``IN'' 
expected but `like' found

SELECT * FROM records where value not like 'val%'
  ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:75)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:75)
at 
org.apache.spark.sql.catalyst.SparkSQLParser$$anonfun$org$apache$spark$sql$catalyst$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:186)

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4154) Query does not work if it has not between in Spark SQL and HQL

2014-10-30 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4154:
--

 Summary: Query does not work if it has not between  in Spark SQL 
and HQL
 Key: SPARK-4154
 URL: https://issues.apache.org/jira/browse/SPARK-4154
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala


if the query contains not between does not work.
{code}
SELECT * FROM src where key not between 10 and 20
{code}

It gives the following error

{code}
Exception in thread main java.lang.RuntimeException: 
Unsupported language features in query: SELECT * FROM src where key not between 
10 and 20
TOK_QUERY
  TOK_FROM
TOK_TABREF
  TOK_TABNAME
src
  TOK_INSERT
TOK_DESTINATION
  TOK_DIR
TOK_TMP_FILE
TOK_SELECT
  TOK_SELEXPR
TOK_ALLCOLREF
TOK_WHERE
  TOK_FUNCTION
between
KW_TRUE
TOK_TABLE_OR_COL
  key
10
20

scala.NotImplementedError: No parse rules for ASTNode type: 256, text: KW_TRUE :
KW_TRUE
 +
 
org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1088)

at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:251)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4131) Support Writing data into the filesystem from queries

2014-10-29 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188147#comment-14188147
 ] 

Ravindra Pesala commented on SPARK-4131:


I will work on this issue.

 Support Writing data into the filesystem from queries
 ---

 Key: SPARK-4131
 URL: https://issues.apache.org/jira/browse/SPARK-4131
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 1.1.0
Reporter: XiaoJing wang
Priority: Critical
 Fix For: 1.3.0

   Original Estimate: 0.05h
  Remaining Estimate: 0.05h

 Writing data into the filesystem from queries,SparkSql is not support .
 eg:
 codeinsert overwrite LOCAL DIRECTORY  '/data1/wangxj/sql_spark' select * 
 from page_views; /code
 out:
 code
 java.lang.RuntimeException: 
 Unsupported language features in query: insert overwrite LOCAL DIRECTORY  
 '/data1/wangxj/sql_spark' select * from page_views
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
 page_views
   TOK_INSERT
 TOK_DESTINATION
   TOK_LOCAL_DIR
 '/data1/wangxj/sql_spark'
 TOK_SELECT
   TOK_SELEXPR
 TOK_ALLCOLREF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4120) Join of multiple tables with syntax like SELECT .. FROM T1,T2,T3.. does not work in SparkSQL

2014-10-28 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4120:
--

 Summary: Join of multiple tables with syntax like SELECT .. FROM 
T1,T2,T3.. does not work in SparkSQL
 Key: SPARK-4120
 URL: https://issues.apache.org/jira/browse/SPARK-4120
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Ravindra Pesala
 Fix For: 1.2.0


The queries with more than like 2 tables does not work. 
{code}
sql(SELECT * FROM records1 as a,records2 as b,records3 as c where a.key=b.key 
and a.key=c.key)
{code}

The above query gives following exception.
{code}
Exception in thread main java.lang.RuntimeException: [1.40] failure: 
``UNION'' expected but `,' found

SELECT * FROM records1 as a,records2 as b,records3 as c where a.key=b.key and 
a.key=c.key
   ^
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33)
at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:75)

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3814) Bitwise does not work in Hive

2014-10-16 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173526#comment-14173526
 ] 

Ravindra Pesala commented on SPARK-3814:


Added support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in this PR and I 
updated the title of this defect.

 Bitwise  does not work  in Hive
 

 Key: SPARK-3814
 URL: https://issues.apache.org/jira/browse/SPARK-3814
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yana Kadiyska
Assignee: Ravindra Pesala
Priority: Minor

 Error: java.lang.RuntimeException: 
 Unsupported language features in query: select (case when bit_field  1=1 
 then r_end - r_start else NULL end) from mytable where pkey='0178-2014-07' 
 LIMIT 2
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
mytable 
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_FUNCTION
   when
   =
 
   TOK_TABLE_OR_COL
 bit_field
   1
 1
   -
 TOK_TABLE_OR_COL
   r_end
 TOK_TABLE_OR_COL
   r_start
   TOK_NULL
 TOK_WHERE
   =
 TOK_TABLE_OR_COL
   pkey
 '0178-2014-07'
 TOK_LIMIT
   2
 SQLState:  null
 ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3814) Support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in Spark HQL and SQL

2014-10-16 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-3814:
---
Summary: Support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in Spark HQL and 
SQL  (was: Bitwise  does not work  in Hive)

 Support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in Spark HQL and SQL
 --

 Key: SPARK-3814
 URL: https://issues.apache.org/jira/browse/SPARK-3814
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yana Kadiyska
Assignee: Ravindra Pesala
Priority: Minor

 Error: java.lang.RuntimeException: 
 Unsupported language features in query: select (case when bit_field  1=1 
 then r_end - r_start else NULL end) from mytable where pkey='0178-2014-07' 
 LIMIT 2
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
mytable 
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_FUNCTION
   when
   =
 
   TOK_TABLE_OR_COL
 bit_field
   1
 1
   -
 TOK_TABLE_OR_COL
   r_end
 TOK_TABLE_OR_COL
   r_start
   TOK_NULL
 TOK_WHERE
   =
 TOK_TABLE_OR_COL
   pkey
 '0178-2014-07'
 TOK_LIMIT
   2
 SQLState:  null
 ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3880) HBase as data source to SparkSQL

2014-10-10 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166838#comment-14166838
 ] 

Ravindra Pesala commented on SPARK-3880:


There is already some work going in the direction of adding foreign data 
sources to Spark SQL. https://github.com/apache/spark/pull/2475. So I guess 
Hbase is also like foreign data source and it should fit into this design.  
Adding new project/context for each datasource may be cumbersome to maintain. 
Can we improve on the current PR to add DDL support.

 HBase as data source to SparkSQL
 

 Key: SPARK-3880
 URL: https://issues.apache.org/jira/browse/SPARK-3880
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Yan
Assignee: Yan
 Attachments: HBaseOnSpark.docx






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3823) Spark Hive SQL readColumn is not reset each time for a new query

2014-10-10 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167239#comment-14167239
 ] 

Ravindra Pesala commented on SPARK-3823:


It seems this issue is duplicate of 
https://issues.apache.org/jira/browse/SPARK-3559

 Spark Hive SQL readColumn is not reset each time for a new query
 

 Key: SPARK-3823
 URL: https://issues.apache.org/jira/browse/SPARK-3823
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Alex Liu

 After a few queries running in the same hiveContext, 
 hive.io.file.readcolumn.ids and hive.io.file.readcolumn.names values are 
 added on by pre-running queries.
 e.g. running the following querys
 {code}
 hql(use sql_integration_ks)
 val container = hql(select * from double_table as aa JOIN boolean_table as 
 bb on aa.type_id = bb.type_id)
 container.collect().foreach(println)
 val container = hql(select * from ascii_table ORDER BY type_id)
  container.collect().foreach(println)
 val container = hql(select shippers.shippername, COUNT(orders.orderid) AS 
 numorders FROM orders LEFT JOIN shippers ON 
 orders.shipperid=shippers.shipperid GROUP BY shippername)
  container.collect().foreach(println)
 val container = hql(select * from ascii_table where type_id  126)
 container.collect().length
 {code}
 The read column ids for the last query are [2, 0, 3, 1]
 read column names are : 
 type_id,value,type_id,value,type_id,value,orderid,shipperid,shipper name, 
 shipperid
 The source code is at 
 https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala#L80
 hiveContext has a shared hiveconf which add readColumns for each query. It 
 should be reset each time for a new hive query or remove the duplicate 
 readColumn Ids



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3834) Backticks not correctly handled in subquery aliases

2014-10-09 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164857#comment-14164857
 ] 

Ravindra Pesala commented on SPARK-3834:


Ok [~marmbrus] , I will work on it.

 Backticks not correctly handled in subquery aliases
 ---

 Key: SPARK-3834
 URL: https://issues.apache.org/jira/browse/SPARK-3834
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Ravindra Pesala
Priority: Blocker

 [~ravi.pesala]  assigning to you since you fixed the last problem here.  Let 
 me know if you don't have time to work on this or if you have any questions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3814) Bitwise does not work in Hive

2014-10-09 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165790#comment-14165790
 ] 

Ravindra Pesala commented on SPARK-3814:


https://github.com/apache/spark/pull/2736

 Bitwise  does not work  in Hive
 

 Key: SPARK-3814
 URL: https://issues.apache.org/jira/browse/SPARK-3814
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yana Kadiyska
Priority: Minor

 Error: java.lang.RuntimeException: 
 Unsupported language features in query: select (case when bit_field  1=1 
 then r_end - r_start else NULL end) from mytable where pkey='0178-2014-07' 
 LIMIT 2
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
mytable 
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_FUNCTION
   when
   =
 
   TOK_TABLE_OR_COL
 bit_field
   1
 1
   -
 TOK_TABLE_OR_COL
   r_end
 TOK_TABLE_OR_COL
   r_start
   TOK_NULL
 TOK_WHERE
   =
 TOK_TABLE_OR_COL
   pkey
 '0178-2014-07'
 TOK_LIMIT
   2
 SQLState:  null
 ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3834) Backticks not correctly handled in subquery aliases

2014-10-09 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165800#comment-14165800
 ] 

Ravindra Pesala commented on SPARK-3834:


https://github.com/apache/spark/pull/2737

 Backticks not correctly handled in subquery aliases
 ---

 Key: SPARK-3834
 URL: https://issues.apache.org/jira/browse/SPARK-3834
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Ravindra Pesala
Priority: Blocker

 [~ravi.pesala]  assigning to you since you fixed the last problem here.  Let 
 me know if you don't have time to work on this or if you have any questions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3813) Support case when conditional functions in Spark SQL

2014-10-06 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-3813:
--

 Summary: Support case when conditional functions in Spark SQL
 Key: SPARK-3813
 URL: https://issues.apache.org/jira/browse/SPARK-3813
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala
 Fix For: 1.2.0


The SQL queries which has following conditional functions are not supported in 
Spark SQL.
{code}
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
{code}
The same functions can work in Spark HiveQL.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3813) Support case when conditional functions in Spark SQL

2014-10-06 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160368#comment-14160368
 ] 

Ravindra Pesala commented on SPARK-3813:


The below code gives the exception.
{code}
import sqlContext._
val rdd = sc.parallelize((1 to 100).map(i = Record(i, sval_$i)))
rdd.registerTempTable(records)
println(Result of SELECT *:)
sql(SELECT case key when '93' then 'ravi' else key end FROM 
records).collect()
{code}

{code}
  java.lang.RuntimeException: [1.17] failure: ``UNION'' expected but identifier 
when found

SELECT case key when 93 then 0 else 1 end FROM records
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:74)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:267)
{code}


 Support case when conditional functions in Spark SQL
 --

 Key: SPARK-3813
 URL: https://issues.apache.org/jira/browse/SPARK-3813
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala
 Fix For: 1.2.0


 The SQL queries which has following conditional functions are not supported 
 in Spark SQL.
 {code}
 CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
 CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
 {code}
 The same functions can work in Spark HiveQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3814) Bitwise does not work in Hive

2014-10-06 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161472#comment-14161472
 ] 

Ravindra Pesala commented on SPARK-3814:


Currently there is no support of  Bitwise  in Spark HiveQl and Spark SQL as 
well.

I am working on this issue.

And as well as we need to support Bitwise | , Bitwise ^ , Bitwise ~. I will add 
seperate jira for these operations and I will work on these.
Thank you.

 Bitwise  does not work  in Hive
 

 Key: SPARK-3814
 URL: https://issues.apache.org/jira/browse/SPARK-3814
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yana Kadiyska
Priority: Minor

 Error: java.lang.RuntimeException: 
 Unsupported language features in query: select (case when bit_field  1=1 
 then r_end - r_start else NULL end) from mytable where pkey='0178-2014-07' 
 LIMIT 2
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
mytable 
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_FUNCTION
   when
   =
 
   TOK_TABLE_OR_COL
 bit_field
   1
 1
   -
 TOK_TABLE_OR_COL
   r_end
 TOK_TABLE_OR_COL
   r_start
   TOK_NULL
 TOK_WHERE
   =
 TOK_TABLE_OR_COL
   pkey
 '0178-2014-07'
 TOK_LIMIT
   2
 SQLState:  null
 ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3100) Spark RDD partitions are not running in the workers as per locality information given by each partition.

2014-09-30 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-3100:
---
Description: 
I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 
workers.
When I run this RDD in Spark standalone cluster with 4 workers(even master 
machine has one worker), it runs all partitions in one node only even though I 
have given locality preferences in my SampleRDD program. 

*Sample Code*
{code}
class SamplePartition(rddId: Int, val idx: Int,val tableSplit:Seq[String])
  extends Partition {
  override def hashCode(): Int = 41 * (41 + rddId) + idx 
  override val index: Int = idx
}


class SampleRDD[K,V](
sc : SparkContext,keyClass: KeyVal[K,V])
  extends RDD[(K,V)](sc, Nil)
  with Logging {

  override def getPartitions: Array[Partition] = {
val hosts = Array(master,slave1,slave2,slave3)
val result = new Array[Partition](4)
for (i - 0 until result.length) 
{
  result(i) = new SamplePartition(id, i, Array(hosts(i)))
}
result
  }
  

  
  override def compute(theSplit: Partition, context: TaskContext) = {
val iter = new Iterator[(K,V)] {
  val split = theSplit.asInstanceOf[SamplePartition]
  logInfo(Executed task for the split + split.tableSplit)

  // Register an on-task-completion callback to close the input stream.
  context.addOnCompleteCallback(() = close())
  var havePair = false
  var finished = false
  override def hasNext: Boolean = {
if (!finished  !havePair) 
{
  finished = !false
  havePair = !finished
}
!finished
  }
  override def next(): (K,V) = {
if (!hasNext) {
  throw new java.util.NoSuchElementException(End of stream)
}
havePair = false
val key = new Key()
val value = new Value()
keyClass.getKey(key, value)
  }
  private def close() {
try {
//  reader.close()
} catch {
  case e: Exception = logWarning(Exception in RecordReader.close(), 
e)
}
  }
}
iter
  }
  
  override def getPreferredLocations(split: Partition): Seq[String] = {
val theSplit = split.asInstanceOf[SamplePartition]
val s = theSplit.tableSplit.filter(_ != localhost)
logInfo(Host Name : +s(0))
s
  }
}

trait KeyVal[K,V] extends Serializable {
def getKey(key : Key,value : Value) : (K,V) 
}
class KeyValImpl extends KeyVal[Key,Value] {
  override def getKey(key : Key,value : Value) = (key,value)
}
case class Key()
case class Value()

object SampleRDD {
def main(args: Array[String]) : Unit={
val d = SparkContext.jarOfClass(this.getClass)
val ar = new Array[String](d.size)
var i = 0
d.foreach{
  p= ar(i)=p;
  i = i+1
  }   
   val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar) 
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;
}
}
{code}
Following is the log it shows.

INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/0 is now 
RUNNING
INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/2 is now 
RUNNING
INFO  18-08 16:38:33,383 - Executor updated: app-20140818163833-0005/1 is now 
RUNNING
INFO  18-08 16:38:33,385 - Executor updated: app-20140818163833-0005/3 is now 
RUNNING
INFO  18-08 16:38:34,976 - Registered executor: Actor 
akka.tcp://sparkExecutor@master:47563/user/Executor#-398354094 with ID 0
INFO  18-08 16:38:34,984 - Starting task 0.0:0 as TID 0 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,989 - Serialized task 0.0:0 as 1261 bytes in 3 ms
INFO  18-08 16:38:34,992 - Starting task 0.0:1 as TID 1 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,993 - Serialized task 0.0:1 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,993 - Starting task 0.0:2 as TID 2 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,993 - Serialized task 0.0:2 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,994 - Starting task 0.0:3 as TID 3 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,994 - Serialized task 0.0:3 as 1261 bytes in 0 ms
INFO  18-08 16:38:35,174 - Registering block manager master:42125 with 294.4 MB 
RAM
INFO  18-08 16:38:35,296 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave1:31726/user/Executor#492173410 with ID 2
INFO  18-08 16:38:35,302 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave2:25769/user/Executor#1762839887 with ID 1
INFO  18-08 16:38:35,317 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave3:51032/user/Executor#981476000 with ID 3





  was:
I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 
workers.
When I run this RDD in Spark standalone cluster with 4 workers(even master 
machine has one worker), it 

[jira] [Updated] (SPARK-3100) Spark RDD partitions are not running in the workers as per locality information given by each partition.

2014-09-30 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated SPARK-3100:
---
Description: 
I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 
workers.
When I run this RDD in Spark standalone cluster with 4 workers(even master 
machine has one worker), it runs all partitions in one node only even though I 
have given locality preferences in my SampleRDD program. 

*Sample Code*
{code}
class SamplePartition(rddId: Int, val idx: Int,val tableSplit:Seq[String])
  extends Partition {
  override def hashCode(): Int = 41 * (41 + rddId) + idx 
  override val index: Int = idx
}


class SampleRDD[K,V](
sc : SparkContext,keyClass: KeyVal[K,V])
  extends RDD[(K,V)](sc, Nil)
  with Logging {

  override def getPartitions: Array[Partition] = {
val hosts = Array(master,slave1,slave2,slave3)
val result = new Array[Partition](4)
for (i - 0 until result.length) 
{
  result(i) = new SamplePartition(id, i, Array(hosts(i)))
}
result
  }
  

  
  override def compute(theSplit: Partition, context: TaskContext) = {
val iter = new Iterator[(K,V)] {
  val split = theSplit.asInstanceOf[SamplePartition]
  logInfo(Executed task for the split + split.tableSplit)

  // Register an on-task-completion callback to close the input stream.
  context.addOnCompleteCallback(() = close())
  var havePair = false
  var finished = false
  override def hasNext: Boolean = {
if (!finished  !havePair) 
{
  finished = !false
  havePair = !finished
}
!finished
  }
  override def next(): (K,V) = {
if (!hasNext) {
  throw new java.util.NoSuchElementException(End of stream)
}
havePair = false
val key = new Key()
val value = new Value()
keyClass.getKey(key, value)
  }
  private def close() {
try {
//  reader.close()
} catch {
  case e: Exception = logWarning(Exception in RecordReader.close(), 
e)
}
  }
}
iter
  }
  
  override def getPreferredLocations(split: Partition): Seq[String] = {
val theSplit = split.asInstanceOf[SamplePartition]
val s = theSplit.tableSplit.filter(_ != localhost)
logInfo(Host Name : +s(0))
s
  }
}

trait KeyVal[K,V] extends Serializable {
def getKey(key : Key,value : Value) : (K,V) 
}
class KeyValImpl extends KeyVal[Key,Value] {
  override def getKey(key : Key,value : Value) = (key,value)
}
case class Key()
case class Value()

object SampleRDD {
def main(args: Array[String]) : Unit={
val d = SparkContext.jarOfClass(this.getClass)
val ar = new Array[String](d.size)
var i = 0
d.foreach{
  p= ar(i)=p;
  i = i+1
  }   
   val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar) 
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;
}
}
{code}
Following is the log it shows.
{code}
INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/0 is now 
RUNNING
INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/2 is now 
RUNNING
INFO  18-08 16:38:33,383 - Executor updated: app-20140818163833-0005/1 is now 
RUNNING
INFO  18-08 16:38:33,385 - Executor updated: app-20140818163833-0005/3 is now 
RUNNING
INFO  18-08 16:38:34,976 - Registered executor: Actor 
akka.tcp://sparkExecutor@master:47563/user/Executor#-398354094 with ID 0
INFO  18-08 16:38:34,984 - Starting task 0.0:0 as TID 0 on executor 0: master 
(PROCESS_LOCAL)
INFO  18-08 16:38:34,989 - Serialized task 0.0:0 as 1261 bytes in 3 ms
INFO  18-08 16:38:34,992 - Starting task 0.0:1 as TID 1 on executor 0: master 
(PROCESS_LOCAL)
INFO  18-08 16:38:34,993 - Serialized task 0.0:1 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,993 - Starting task 0.0:2 as TID 2 on executor 0: master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,993 - Serialized task 0.0:2 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,994 - Starting task 0.0:3 as TID 3 on executor 0: master 
(PROCESS_LOCAL)
INFO  18-08 16:38:34,994 - Serialized task 0.0:3 as 1261 bytes in 0 ms
INFO  18-08 16:38:35,174 - Registering block manager master:42125 with 294.4 MB 
RAM
INFO  18-08 16:38:35,296 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave1:31726/user/Executor#492173410 with ID 2
INFO  18-08 16:38:35,302 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave2:25769/user/Executor#1762839887 with ID 1
INFO  18-08 16:38:35,317 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave3:51032/user/Executor#981476000 with ID 3
{code}

Here all the tasks are assigned to master only, even I though I have mentioned 
the locality preferences



  was:
I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 

[jira] [Comment Edited] (SPARK-3100) Spark RDD partitions are not running in the workers as per locality information given by each partition.

2014-09-30 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100664#comment-14100664
 ] 

Ravindra Pesala edited comment on SPARK-3100 at 9/30/14 7:47 AM:
-

It seems there is an issue in synchronization of task allocation and 
registering of executors. As per the above log I observed that tasks has been 
allocated with single registered executor. After these tasks are allocated to 
registered executor,remaining executors are started registering. So this may be 
synchronization issue.

I have added sleep in my driver for 5 seconds then it started working properly.
{code}
  val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar)
   *Thread.sleep(5000)*
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;
{code}

Now the tasks are assigned to all the nodes

{code}
INFO  18-08 19:44:36,301 - Registered executor: 
Actor[akka.tcp://sparkExecutor@master:32457/user/Executor#-966982652] with ID 0
INFO  18-08 19:44:36,505 - Registering block manager master:59964 with 294.4 MB 
RAM
INFO  18-08 19:44:36,578 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave1:11653/user/Executor#909834981] with ID 1
INFO  18-08 19:44:36,591 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave2:59220/user/Executor#301495226] with ID 3
INFO  18-08 19:44:36,643 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave3:11232/user/Executor#-1118376183] with ID 2
INFO  18-08 19:44:36,804 - Registering block manager slave1:14596 with 294.4 MB 
RAM
INFO  18-08 19:44:36,809 - Registering block manager slave2:10418 with 294.4 MB 
RAM
INFO  18-08 19:44:36,871 - Registering block manager slave3:45973 with 294.4 MB 
RAM
INFO  18-08 19:44:39,507 - Starting job: collect at SampleRDD.scala:142
INFO  18-08 19:44:39,520 - Got job 0 (collect at SampleRDD.scala:142) with 4 
output partitions (allowLocal=false)
INFO  18-08 19:44:39,521 - Final stage: Stage 0(collect at SampleRDD.scala:142)
INFO  18-08 19:44:39,521 - Parents of final stage: List()
INFO  18-08 19:44:39,526 - Missing parents: List()
INFO  18-08 19:44:39,532 - Submitting Stage 0 (SampleRDD[0] at RDD at 
SampleRDD.scala:28), which has no missing parents
INFO  18-08 19:44:39,537 - Host Name : master
INFO  18-08 19:44:39,539 - Host Name : slave1
INFO  18-08 19:44:39,539 - Host Name : slave2
INFO  18-08 19:44:39,540 - Host Name : slave3
INFO  18-08 19:44:39,563 - Submitting 4 missing tasks from Stage 0 
(SampleRDD[0] at RDD at SampleRDD.scala:28)
INFO  18-08 19:44:39,564 - Adding task set 0.0 with 4 tasks
INFO  18-08 19:44:39,579 - Starting task 0.0:2 as TID 0 on executor 3: *slave2 
(NODE_LOCAL)*
INFO  18-08 19:44:39,583 - Serialized task 0.0:2 as 1261 bytes in 2 ms
INFO  18-08 19:44:39,585 - Starting task 0.0:0 as TID 1 on executor 0: *master 
(NODE_LOCAL)*
INFO  18-08 19:44:39,585 - Serialized task 0.0:0 as 1261 bytes in 0 ms
INFO  18-08 19:44:39,586 - Starting task 0.0:1 as TID 2 on executor 1: *slave1 
(NODE_LOCAL)*
INFO  18-08 19:44:39,586 - Serialized task 0.0:1 as 1261 bytes in 0 ms
INFO  18-08 19:44:39,587 - Starting task 0.0:3 as TID 3 on executor 2: *slave3 
(NODE_LOCAL)*
INFO  18-08 19:44:39,587 - Serialized task 0.0:3 as 1261 bytes in 0 ms
{code}

Is it expected behavior? Please comment on it.


was (Author: ravipesala):
It seems there is an issue in synchronization of task allocation and 
registering of executors. As per the above log I observed that task allocations 
are done with single registered executor. After these tasks are started 
,remaining executors are started registering. So this is synchronization issue.

I have added sleep in my driver for 5 seconds then it started working properly.

  val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar)
   *Thread.sleep(5000)*
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;

INFO  18-08 19:44:36,301 - Registered executor: 
Actor[akka.tcp://sparkExecutor@master:32457/user/Executor#-966982652] with ID 0
INFO  18-08 19:44:36,505 - Registering block manager master:59964 with 294.4 MB 
RAM
INFO  18-08 19:44:36,578 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave1:11653/user/Executor#909834981] with ID 1
INFO  18-08 19:44:36,591 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave2:59220/user/Executor#301495226] with ID 3
INFO  18-08 19:44:36,643 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave3:11232/user/Executor#-1118376183] with ID 2
INFO  18-08 19:44:36,804 - Registering block manager slave1:14596 with 294.4 MB 
RAM
INFO  18-08 19:44:36,809 - Registering block manager slave2:10418 with 294.4 MB 
RAM
INFO  18-08 19:44:36,871 - Registering block manager slave3:45973 with 294.4 MB 
RAM
INFO  18-08 19:44:39,507 - Starting job: collect at SampleRDD.scala:142
INFO  18-08 19:44:39,520 - 

[jira] [Commented] (SPARK-3708) Backticks aren't handled correctly is aliases

2014-09-29 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152321#comment-14152321
 ] 

Ravindra Pesala commented on SPARK-3708:


I guess here you mentioned about HiveContext as there is no support of backtick 
in SqlContext.  I will work on this issue.Thank you.

 Backticks aren't handled correctly is aliases
 -

 Key: SPARK-3708
 URL: https://issues.apache.org/jira/browse/SPARK-3708
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Michael Armbrust

 Here's a failing test case:
 {code}
 sql(SELECT k FROM (SELECT `key` AS `k` FROM src) a)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3654) Implement all extended HiveQL statements/commands with a separate parser combinator

2014-09-29 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152789#comment-14152789
 ] 

Ravindra Pesala commented on SPARK-3654:


https://github.com/apache/spark/pull/2590

 Implement all extended HiveQL statements/commands with a separate parser 
 combinator
 ---

 Key: SPARK-3654
 URL: https://issues.apache.org/jira/browse/SPARK-3654
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cheng Lian

 Statements and commands like {{SET}}, {{CACHE TABLE}} and {{ADD JAR}} etc. 
 are currently parsed in a quite hacky way, like this:
 {code}
 if (sql.trim.toLowerCase.startsWith(cache table)) {
   sql.trim.toLowerCase.startsWith(cache table) match {
 ...
   }
 }
 {code}
 It would be much better to add an extra parser combinator that parses these 
 syntax extensions first, and then fallback to the normal Hive parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error

2014-09-23 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144514#comment-14144514
 ] 

Ravindra Pesala commented on SPARK-3371:


It seems like an issue, By default SQl parser creates the aliases to the 
functions in grouping expressions with generated alias names. So if user gives 
the alias names to the functions inside projection then it does not match the 
generated alias name of grouping expression. 

I am working on this issue, will create the PR today.

 Spark SQL: Renaming a function expression with group by gives error
 ---

 Key: SPARK-3371
 URL: https://issues.apache.org/jira/browse/SPARK-3371
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Pei-Lun Lee

 {code}
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val rdd = sc.parallelize(List({foo:bar}))
 sqlContext.jsonRDD(rdd).registerAsTable(t1)
 sqlContext.registerFunction(len, (s: String) = s.length)
 sqlContext.sql(select len(foo) as a, count(1) from t1 group by 
 len(foo)).collect()
 {code}
 running above code in spark-shell gives the following error
 {noformat}
 14/09/03 17:20:13 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 214)
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
 attribute, tree: foo#0
   at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:201)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:199)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 {noformat}
 remove as a in the query causes no error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error

2014-09-23 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145258#comment-14145258
 ] 

Ravindra Pesala commented on SPARK-3371:


https://github.com/apache/spark/pull/2511

 Spark SQL: Renaming a function expression with group by gives error
 ---

 Key: SPARK-3371
 URL: https://issues.apache.org/jira/browse/SPARK-3371
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Pei-Lun Lee

 {code}
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val rdd = sc.parallelize(List({foo:bar}))
 sqlContext.jsonRDD(rdd).registerAsTable(t1)
 sqlContext.registerFunction(len, (s: String) = s.length)
 sqlContext.sql(select len(foo) as a, count(1) from t1 group by 
 len(foo)).collect()
 {code}
 running above code in spark-shell gives the following error
 {noformat}
 14/09/03 17:20:13 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 214)
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
 attribute, tree: foo#0
   at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:201)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:199)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 {noformat}
 remove as a in the query causes no error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-19 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140124#comment-14140124
 ] 

Ravindra Pesala commented on SPARK-3298:


I guess, we should add some API like *SqlContext.isTableExists(tableName)* to 
check whether the table already exists or not. So by using this API user can 
check the table existence  and then register the table.
The current API *SqlContext.table(tableName)*  throws exception if the table is 
not present,so we cannot use it for this purpose. Please comment on it.


 [SQL] registerAsTable / registerTempTable overwrites old tables
 ---

 Key: SPARK-3298
 URL: https://issues.apache.org/jira/browse/SPARK-3298
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Evan Chan
Priority: Minor
  Labels: newbie

 At least in Spark 1.0.2,  calling registerAsTable(a) when a had been 
 registered before does not cause an error.  However, there is no way to 
 access the old table, even though it may be cached and taking up space.
 How about at least throwing an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3536) SELECT on empty parquet table throws exception

2014-09-19 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140145#comment-14140145
 ] 

Ravindra Pesala commented on SPARK-3536:


It return null metadata from parquet if querying on empty parquet file while 
calculating splits.So we should add null check and returns the empty splits 
solves the issue.

 SELECT on empty parquet table throws exception
 --

 Key: SPARK-3536
 URL: https://issues.apache.org/jira/browse/SPARK-3536
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
  Labels: starter

 Reported by [~matei].  Reproduce as follows:
 {code}
 scala case class Data(i: Int)
 defined class Data
 scala createParquetFile[Data](testParquet)
 scala parquetFile(testParquet).count()
 14/09/15 14:34:17 WARN scheduler.DAGScheduler: Creating new stage failed due 
 to exception - job: 0
 java.lang.NullPointerException
   at 
 org.apache.spark.sql.parquet.FilteringParquetRowInputFormat.getSplits(ParquetTableOperations.scala:438)
   at 
 parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:344)
   at 
 org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3536) SELECT on empty parquet table throws exception

2014-09-19 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140269#comment-14140269
 ] 

Ravindra Pesala commented on SPARK-3536:


[~isaias.barroso] I have submitted the PR  4 hours ago,but I am not sure why it 
is not yet linked it to jira.

 SELECT on empty parquet table throws exception
 --

 Key: SPARK-3536
 URL: https://issues.apache.org/jira/browse/SPARK-3536
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
  Labels: starter

 Reported by [~matei].  Reproduce as follows:
 {code}
 scala case class Data(i: Int)
 defined class Data
 scala createParquetFile[Data](testParquet)
 scala parquetFile(testParquet).count()
 14/09/15 14:34:17 WARN scheduler.DAGScheduler: Creating new stage failed due 
 to exception - job: 0
 java.lang.NullPointerException
   at 
 org.apache.spark.sql.parquet.FilteringParquetRowInputFormat.getSplits(ParquetTableOperations.scala:438)
   at 
 parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:344)
   at 
 org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2594) Add CACHE TABLE name AS SELECT ...

2014-09-16 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135303#comment-14135303
 ] 

Ravindra Pesala commented on SPARK-2594:


[~marmbrus]  There is a confusion over eager and lazy caching for CACHE TABLE 
AS SELECT .. Actually the PR is created with eager caching, but there is 
comments from [~liancheng] in PR mentions that SQLContext.cacheTable, CACHE 
TABLE name are both lazy.So making all three eager also seems acceptable, but 
this kinda breaks downward compatibility (or at least breaks existing 
performance assumptions of caching functions/statements).

Please comment on it.

 Add CACHE TABLE name AS SELECT ...
 

 Key: SPARK-2594
 URL: https://issues.apache.org/jira/browse/SPARK-2594
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2594) Add CACHE TABLE name AS SELECT ...

2014-09-01 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117464#comment-14117464
 ] 

Ravindra Pesala commented on SPARK-2594:


Thank you Michael. Following are the tasks which I am planning to do to support 
this feature.
1. Change the SqlParser to support ADD CACHE TABLE syntax.And also we can 
change the HiveQl to support syntax for hive.
2. Add new strategy 'AddCacheTable' to 'SparkStrategies'
3. In the new strategy 'AddCacheTable', register the tableName with the plan 
and cache the same with tableName.

Please review it and comment.

 Add CACHE TABLE name AS SELECT ...
 

 Key: SPARK-2594
 URL: https://issues.apache.org/jira/browse/SPARK-2594
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2594) Add CACHE TABLE name AS SELECT ...

2014-08-27 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113342#comment-14113342
 ] 

Ravindra Pesala commented on SPARK-2594:


Please assign this to me.

 Add CACHE TABLE name AS SELECT ...
 

 Key: SPARK-2594
 URL: https://issues.apache.org/jira/browse/SPARK-2594
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2693) Support for UDAF Hive Aggregates like PERCENTILE

2014-08-26 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110707#comment-14110707
 ] 

Ravindra Pesala commented on SPARK-2693:


UDAF is deprecated in HIve, Though there can be few functions which could have 
implemented using this interface. We can support the same in spark for backward 
compatability. 
As you mentioned supporting UDAF in spark requires to write a wrapper.
*Please assign it to me.*

 Support for UDAF Hive Aggregates like PERCENTILE
 

 Key: SPARK-2693
 URL: https://issues.apache.org/jira/browse/SPARK-2693
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust

 {code}
 SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4), 
 year,month,day FROM  raw_data_table  GROUP BY year, month, day
 MIN, MAX and AVG functions work fine for me, but with PERCENTILE, I get an 
 error as shown below.
 Exception in thread main java.lang.RuntimeException: No handler for udf 
 class org.apache.hadoop.hive.ql.udf.UDAFPercentile
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
 at 
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
 at 
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
 at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
 {code}
 This aggregate extends UDAF, which we don't yet have a wrapper for.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2693) Support for UDAF Hive Aggregates like PERCENTILE

2014-08-26 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110707#comment-14110707
 ] 

Ravindra Pesala edited comment on SPARK-2693 at 8/26/14 2:05 PM:
-

UDAF is deprecated in HIve, Though there can be few functions which could have 
implemented using this interface. We can support the same in spark for backward 
compatability. 
As you mentioned supporting UDAF in spark requires to write a wrapper.

Please assign it to me.


was (Author: ravipesala):
UDAF is deprecated in HIve, Though there can be few functions which could have 
implemented using this interface. We can support the same in spark for backward 
compatability. 
As you mentioned supporting UDAF in spark requires to write a wrapper.
*Please assign it to me.*

 Support for UDAF Hive Aggregates like PERCENTILE
 

 Key: SPARK-2693
 URL: https://issues.apache.org/jira/browse/SPARK-2693
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust

 {code}
 SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4), 
 year,month,day FROM  raw_data_table  GROUP BY year, month, day
 MIN, MAX and AVG functions work fine for me, but with PERCENTILE, I get an 
 error as shown below.
 Exception in thread main java.lang.RuntimeException: No handler for udf 
 class org.apache.hadoop.hive.ql.udf.UDAFPercentile
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
 at 
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
 at 
 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
 at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
 {code}
 This aggregate extends UDAF, which we don't yet have a wrapper for.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3100) Spark RDD partitions are not running in the workers as per locality information given by each partition.

2014-08-18 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-3100:
--

 Summary: Spark RDD partitions are not running in the workers as 
per locality information given by each partition.
 Key: SPARK-3100
 URL: https://issues.apache.org/jira/browse/SPARK-3100
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: Running in Spark Standalone Cluster
Reporter: Ravindra Pesala


I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 
workers.
When I run this RDD in Spark standalone cluster with 4 workers(even master 
machine has one worker), it runs all partitions in one node only even though I 
have given locality preferences in my SampleRDD program. 

*Sample Code*

class SamplePartition(rddId: Int, val idx: Int,val tableSplit:Seq[String])
  extends Partition {
  override def hashCode(): Int = 41 * (41 + rddId) + idx 
  override val index: Int = idx
}


class SampleRDD[K,V](
sc : SparkContext,keyClass: KeyVal[K,V])
  extends RDD[(K,V)](sc, Nil)
  with Logging {

  override def getPartitions: Array[Partition] = {
val hosts = Array(master,slave1,slave2,slave3)
val result = new Array[Partition](4)
for (i - 0 until result.length) 
{
  result(i) = new SamplePartition(id, i, Array(hosts(i)))
}
result
  }
  

  
  override def compute(theSplit: Partition, context: TaskContext) = {
val iter = new Iterator[(K,V)] {
  val split = theSplit.asInstanceOf[SamplePartition]
  logInfo(Executed task for the split + split.tableSplit)

  // Register an on-task-completion callback to close the input stream.
  context.addOnCompleteCallback(() = close())
  var havePair = false
  var finished = false
  override def hasNext: Boolean = {
if (!finished  !havePair) 
{
  finished = !false
  havePair = !finished
}
!finished
  }
  override def next(): (K,V) = {
if (!hasNext) {
  throw new java.util.NoSuchElementException(End of stream)
}
havePair = false
val key = new Key()
val value = new Value()
keyClass.getKey(key, value)
  }
  private def close() {
try {
//  reader.close()
} catch {
  case e: Exception = logWarning(Exception in RecordReader.close(), 
e)
}
  }
}
iter
  }
  
  override def getPreferredLocations(split: Partition): Seq[String] = {
val theSplit = split.asInstanceOf[SamplePartition]
val s = theSplit.tableSplit.filter(_ != localhost)
logInfo(Host Name : +s(0))
s
  }
}

trait KeyVal[K,V] extends Serializable {
def getKey(key : Key,value : Value) : (K,V) 
}
class KeyValImpl extends KeyVal[Key,Value] {
  override def getKey(key : Key,value : Value) = (key,value)
}
case class Key()
case class Value()

object SampleRDD {
def main(args: Array[String]) : Unit={
val d = SparkContext.jarOfClass(this.getClass)
val ar = new Array[String](d.size)
var i = 0
d.foreach{
  p= ar(i)=p;
  i = i+1
  }   
   val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar) 
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;
}
}

Following is the log it shows.

INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/0 is now 
RUNNING
INFO  18-08 16:38:33,382 - Executor updated: app-20140818163833-0005/2 is now 
RUNNING
INFO  18-08 16:38:33,383 - Executor updated: app-20140818163833-0005/1 is now 
RUNNING
INFO  18-08 16:38:33,385 - Executor updated: app-20140818163833-0005/3 is now 
RUNNING
INFO  18-08 16:38:34,976 - Registered executor: Actor 
akka.tcp://sparkExecutor@master:47563/user/Executor#-398354094 with ID 0
INFO  18-08 16:38:34,984 - Starting task 0.0:0 as TID 0 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,989 - Serialized task 0.0:0 as 1261 bytes in 3 ms
INFO  18-08 16:38:34,992 - Starting task 0.0:1 as TID 1 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,993 - Serialized task 0.0:1 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,993 - Starting task 0.0:2 as TID 2 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,993 - Serialized task 0.0:2 as 1261 bytes in 0 ms
INFO  18-08 16:38:34,994 - Starting task 0.0:3 as TID 3 on executor 0: *master 
(PROCESS_LOCAL)*
INFO  18-08 16:38:34,994 - Serialized task 0.0:3 as 1261 bytes in 0 ms
INFO  18-08 16:38:35,174 - Registering block manager master:42125 with 294.4 MB 
RAM
INFO  18-08 16:38:35,296 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave1:31726/user/Executor#492173410 with ID 2
INFO  18-08 16:38:35,302 - Registered executor: Actor 
akka.tcp://sparkExecutor@slave2:25769/user/Executor#1762839887 with ID 1
INFO  18-08 16:38:35,317 - 

[jira] [Commented] (SPARK-3100) Spark RDD partitions are not running in the workers as per locality information given by each partition.

2014-08-18 Thread Ravindra Pesala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100664#comment-14100664
 ] 

Ravindra Pesala commented on SPARK-3100:


It seems there is an issue in synchronization of task allocation and 
registering of executors. As per the above log I observed that task allocations 
are done with single registered executor. After these tasks are started 
,remaining executors are started registering. So this is synchronization issue.

I have added sleep in my driver for 5 seconds then it started working properly.

  val sc = new SparkContext(spark://master:7077, SampleSpark, 
/opt/spark-1.0.0-rc3/,ar)
   *Thread.sleep(5000)*
   val rdd = new SampleRDD(sc,new KeyValImpl());
   rdd.collect;

INFO  18-08 19:44:36,301 - Registered executor: 
Actor[akka.tcp://sparkExecutor@master:32457/user/Executor#-966982652] with ID 0
INFO  18-08 19:44:36,505 - Registering block manager master:59964 with 294.4 MB 
RAM
INFO  18-08 19:44:36,578 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave1:11653/user/Executor#909834981] with ID 1
INFO  18-08 19:44:36,591 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave2:59220/user/Executor#301495226] with ID 3
INFO  18-08 19:44:36,643 - Registered executor: 
Actor[akka.tcp://sparkExecutor@slave3:11232/user/Executor#-1118376183] with ID 2
INFO  18-08 19:44:36,804 - Registering block manager slave1:14596 with 294.4 MB 
RAM
INFO  18-08 19:44:36,809 - Registering block manager slave2:10418 with 294.4 MB 
RAM
INFO  18-08 19:44:36,871 - Registering block manager slave3:45973 with 294.4 MB 
RAM
INFO  18-08 19:44:39,507 - Starting job: collect at SampleRDD.scala:142
INFO  18-08 19:44:39,520 - Got job 0 (collect at SampleRDD.scala:142) with 4 
output partitions (allowLocal=false)
INFO  18-08 19:44:39,521 - Final stage: Stage 0(collect at SampleRDD.scala:142)
INFO  18-08 19:44:39,521 - Parents of final stage: List()
INFO  18-08 19:44:39,526 - Missing parents: List()
INFO  18-08 19:44:39,532 - Submitting Stage 0 (SampleRDD[0] at RDD at 
SampleRDD.scala:28), which has no missing parents
INFO  18-08 19:44:39,537 - Host Name : master
INFO  18-08 19:44:39,539 - Host Name : slave1
INFO  18-08 19:44:39,539 - Host Name : slave2
INFO  18-08 19:44:39,540 - Host Name : slave3
INFO  18-08 19:44:39,563 - Submitting 4 missing tasks from Stage 0 
(SampleRDD[0] at RDD at SampleRDD.scala:28)
INFO  18-08 19:44:39,564 - Adding task set 0.0 with 4 tasks
INFO  18-08 19:44:39,579 - Starting task 0.0:2 as TID 0 on executor 3: *slave2 
(NODE_LOCAL)*
INFO  18-08 19:44:39,583 - Serialized task 0.0:2 as 1261 bytes in 2 ms
INFO  18-08 19:44:39,585 - Starting task 0.0:0 as TID 1 on executor 0: *master 
(NODE_LOCAL)*
INFO  18-08 19:44:39,585 - Serialized task 0.0:0 as 1261 bytes in 0 ms
INFO  18-08 19:44:39,586 - Starting task 0.0:1 as TID 2 on executor 1: *slave1 
(NODE_LOCAL)*
INFO  18-08 19:44:39,586 - Serialized task 0.0:1 as 1261 bytes in 0 ms
INFO  18-08 19:44:39,587 - Starting task 0.0:3 as TID 3 on executor 2: *slave3 
(NODE_LOCAL)*
INFO  18-08 19:44:39,587 - Serialized task 0.0:3 as 1261 bytes in 0 ms


 Spark RDD partitions are not running in the workers as per locality 
 information given by each partition.
 

 Key: SPARK-3100
 URL: https://issues.apache.org/jira/browse/SPARK-3100
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
 Environment: Running in Spark Standalone Cluster
Reporter: Ravindra Pesala

 I created a simple custom RDD (SampleRDD.scala)and created 4 splits for 4 
 workers.
 When I run this RDD in Spark standalone cluster with 4 workers(even master 
 machine has one worker), it runs all partitions in one node only even though 
 I have given locality preferences in my SampleRDD program. 
 *Sample Code*
 class SamplePartition(rddId: Int, val idx: Int,val tableSplit:Seq[String])
   extends Partition {
   override def hashCode(): Int = 41 * (41 + rddId) + idx 
   override val index: Int = idx
 }
 class SampleRDD[K,V](
 sc : SparkContext,keyClass: KeyVal[K,V])
   extends RDD[(K,V)](sc, Nil)
   with Logging {
   override def getPartitions: Array[Partition] = {
 val hosts = Array(master,slave1,slave2,slave3)
 val result = new Array[Partition](4)
 for (i - 0 until result.length) 
 {
   result(i) = new SamplePartition(id, i, Array(hosts(i)))
 }
 result
   }
   
   
   override def compute(theSplit: Partition, context: TaskContext) = {
 val iter = new Iterator[(K,V)] {
   val split = theSplit.asInstanceOf[SamplePartition]
   logInfo(Executed task for the split + split.tableSplit)
 
   // Register an on-task-completion callback to close the input stream.