ISDATE Function

2015-11-16 Thread Ravisankar Mani
Hi Everyone,


 In MSSQL server suppprt "ISDATE()" function is used to fine current column
values date or not?.  Is any possible to achieve current column values date
or not?


Regards,
Ravi


pagination spark sq

2015-10-12 Thread Ravisankar Mani
Hi everyone,

Can you please share optimized query for pagination spark sql?


In Ms SQL Server, They have supported "offset" method query for specific
row selection.

Please find the following query

Select BusinessEntityID,[FirstName], [LastName],[JobTitle]
from HumanResources.vEmployee
Order By BusinessEntityID
--OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

Is this support OFFSET method in spark sql? Kindly share the useful details.

Regards,
Ravi


Having Clause with variation and stddev

2015-08-21 Thread Ravisankar Mani
Hi,


Exception thrown when using Having Clause with variation or stddev. It
works perfectly when using other aggregate functions(Like
sum,count,min,max..)

SELECT SUM(1) AS `sum_number_of_records_ok` FROM
`some_db`.`some_table` `some_table`
GROUP BY 1 HAVING (STDDEV(1)  0)

SELECT SUM(1) AS `sum_number_of_records_ok` FROM
`some_db`.`some_table` `some_table`
GROUP BY 1 HAVING (VARIANCE(1)  0)

Could you please share any other way for using this kind query?

Regards,

Ravi


Exception in spark

2015-08-14 Thread Ravisankar Mani
Hi all,

  I got an exception like
“org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
to dataType on unresolved object” when using some where condition queries.
I am using 1.4.0 version spark. But its perfectly working in hive .

Please refer the following query. I have applied date time casting in where
clause.

Could please help to resolve this exception?

SELECT
from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd')
AS `leavedetails_created` FROM `default`.`leavedetails` AS leavedetails
WHERE ((
(from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd')
IN ('2015/01/01','2015/01/02')) AND (NOT(`leavedetails`.`created` IS
NULL GROUP BY
from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd')



Regards.
Ravi


Re: Exception in spark

2015-08-11 Thread Ravisankar Mani
Hi Josh

Please ignore the last mail stack trace. Kindly refer the exception details.


{org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
to dataType on unresolved object, tree: 'Sheet1.Teams}


Regards,
Ravi

On Wed, Aug 12, 2015 at 1:34 AM, Ravisankar Mani rrav...@gmail.com wrote:

 Hi Rosan,

 Thanks for your response. Kindly refer the following query and stack
 trace. I have checked same query in hive, It works perfectly. In case i
 have removed in in where class, it works in spark


 SELECT If(ISNOTNULL(SUM(`Sheet1`.`Runs`)),SUM(`Sheet1`.`Runs`),0) AS
 `Sheet1_Runs` ,`Sheet1`.`Teams` AS `Sheet1_Teams` FROM
 default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` INNER JOIN (SELECT
 `Sheet1`.`Teams` AS
 `Teams_0`,If(ISNOTNULL(COUNT(`Sheet1`.`Teams`)),COUNT(`Sheet1`.`Teams`),0)
 AS `Measure_0` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1`
 WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND
 (`Sheet1`.`Teams`  '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY
 `Sheet1`.`Teams`  ORDER BY `Measure_0` DESC LIMIT 5 ) `T_0` ON (
 `Sheet1`.`Teams` =`T_0`.`Teams_0` ) WHERE `Sheet1`.`Teams` IN ('Chennai
 Super Kings')) AND (`Sheet1`.`Teams`  '' )) AND (NOT(`Sheet1`.`Teams` IS
 NULL GROUP BY `Sheet1`.`Teams`



at Syncfusion.ThriftHive.Base.HqlCommand.ExecuteReader()
at
 Syncfusion.Dashboard.Base.Data.HiveQueryBuilder.ExecuteReaderQuery(String
 query, String connectionString) in
 f:\Back_To_Svn\source\base\dashboard.base\src\Data\HiveServer.cs:line 409
at Syncfusion.Dashboard.Base.Data.ServerDataProvider.GetTable(String
 tableName, List`1 schemaInfoCollection, List`1
 originalSchemaInfoCollection, List`1 initialFilterList, List`1
 viewerFilterList, Boolean isSelectQueryForServerModeFilterPopup, Boolean
 isNestedFilter) in
 f:\Back_To_Svn\source\base\dashboard.base\src\Data\Data.cs:line 536
at
 Syncfusion.Dashboard.Base.Engine.RelationalEngine.GetDataFromMainSource(List`1
 schemaInfos, List`1 viewerFilterList) in
 f:\Back_To_Svn\source\base\dashboard.base\src\Engine\RelationDataEngine.cs:line
 902
at
 DashboardService.DashboardService.ApplyFilterServerMode(RelationalEngine
 engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties,
 Boolean isParamsColumn, SchemaInfo paramsSchema) in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
 1030
at DashboardService.DashboardService.ApplyFilter(RelationalEngine
 engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties,
 Boolean isParamsColumn, SchemaInfo paramsSchema) in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
 1047
at DashboardService.DashboardService.GetProcessedControlData(List`1
 elements, RelationalEngine dataEngine, DashboardItem control,
 RelationalReport currentReport) in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
 205
at DashboardService.DashboardService.GetGridControlData(DashboardItem
 controlObj, List`1 columnList, List`1 gridSchemaInfos, Dictionary`2
 barData, Dictionary`2 formattedColumns, List`1 colourSaturationColumns)
 in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line
 115
at DashboardService.DashboardService.GetGridData(DashboardItem
 controlObj, Boolean isLoadGrid) in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line
 47
at DashboardService.DashboardService.ControlConfigHelper(DashboardItem
 control, Boolean IsLoad, DataFetcherGrid fn) in
 f:\Back_To_Svn\source\js\dashboardsamples\wcf\DashboardService.svc.cs:line
 1269


 Regards,
 Ravi


 On Tue, Aug 11, 2015 at 11:53 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Can you share a query or stack trace? More information would make this
 question easier to answer.

 On Tue, Aug 11, 2015 at 8:50 PM, Ravisankar Mani rrav...@gmail.com
 wrote:

 Hi all,

   We got an exception like
 “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
 to dataType on unresolved object” when using some where condition queries.
 I am using 1.4.0 version spark. Is this exception resolved in latest spark?


 Regards,
 Ravi






Re: Exception in spark

2015-08-11 Thread Ravisankar Mani
Hi Rosan,

Thanks for your response. Kindly refer the following query and stack trace.
I have checked same query in hive, It works perfectly. In case i have
removed in in where class, it works in spark


SELECT If(ISNOTNULL(SUM(`Sheet1`.`Runs`)),SUM(`Sheet1`.`Runs`),0) AS
`Sheet1_Runs` ,`Sheet1`.`Teams` AS `Sheet1_Teams` FROM
default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` INNER JOIN (SELECT
`Sheet1`.`Teams` AS
`Teams_0`,If(ISNOTNULL(COUNT(`Sheet1`.`Teams`)),COUNT(`Sheet1`.`Teams`),0)
AS `Measure_0` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1`
WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND
(`Sheet1`.`Teams`  '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY
`Sheet1`.`Teams`  ORDER BY `Measure_0` DESC LIMIT 5 ) `T_0` ON (
`Sheet1`.`Teams` =`T_0`.`Teams_0` ) WHERE `Sheet1`.`Teams` IN ('Chennai
Super Kings')) AND (`Sheet1`.`Teams`  '' )) AND (NOT(`Sheet1`.`Teams` IS
NULL GROUP BY `Sheet1`.`Teams`



   at Syncfusion.ThriftHive.Base.HqlCommand.ExecuteReader()
   at
Syncfusion.Dashboard.Base.Data.HiveQueryBuilder.ExecuteReaderQuery(String
query, String connectionString) in
f:\Back_To_Svn\source\base\dashboard.base\src\Data\HiveServer.cs:line 409
   at Syncfusion.Dashboard.Base.Data.ServerDataProvider.GetTable(String
tableName, List`1 schemaInfoCollection, List`1
originalSchemaInfoCollection, List`1 initialFilterList, List`1
viewerFilterList, Boolean isSelectQueryForServerModeFilterPopup, Boolean
isNestedFilter) in
f:\Back_To_Svn\source\base\dashboard.base\src\Data\Data.cs:line 536
   at
Syncfusion.Dashboard.Base.Engine.RelationalEngine.GetDataFromMainSource(List`1
schemaInfos, List`1 viewerFilterList) in
f:\Back_To_Svn\source\base\dashboard.base\src\Engine\RelationDataEngine.cs:line
902
   at
DashboardService.DashboardService.ApplyFilterServerMode(RelationalEngine
engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties,
Boolean isParamsColumn, SchemaInfo paramsSchema) in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
1030
   at DashboardService.DashboardService.ApplyFilter(RelationalEngine
engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties,
Boolean isParamsColumn, SchemaInfo paramsSchema) in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
1047
   at DashboardService.DashboardService.GetProcessedControlData(List`1
elements, RelationalEngine dataEngine, DashboardItem control,
RelationalReport currentReport) in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line
205
   at DashboardService.DashboardService.GetGridControlData(DashboardItem
controlObj, List`1 columnList, List`1 gridSchemaInfos, Dictionary`2
barData, Dictionary`2 formattedColumns, List`1 colourSaturationColumns)
in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line
115
   at DashboardService.DashboardService.GetGridData(DashboardItem
controlObj, Boolean isLoadGrid) in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line
47
   at DashboardService.DashboardService.ControlConfigHelper(DashboardItem
control, Boolean IsLoad, DataFetcherGrid fn) in
f:\Back_To_Svn\source\js\dashboardsamples\wcf\DashboardService.svc.cs:line
1269


Regards,
Ravi


On Tue, Aug 11, 2015 at 11:53 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Can you share a query or stack trace? More information would make this
 question easier to answer.

 On Tue, Aug 11, 2015 at 8:50 PM, Ravisankar Mani rrav...@gmail.com
 wrote:

 Hi all,

   We got an exception like
 “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
 to dataType on unresolved object” when using some where condition queries.
 I am using 1.4.0 version spark. Is this exception resolved in latest spark?


 Regards,
 Ravi





Exception in spark

2015-08-11 Thread Ravisankar Mani
Hi all,

  We got an exception like
“org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
to dataType on unresolved object” when using some where condition queries.
I am using 1.4.0 version spark. Is this exception resolved in latest spark?


Regards,
Ravi


Hive Query(Top N)

2015-07-20 Thread Ravisankar Mani
Hi everyone

I have tried to  to achieve hierarchical based (index mode) top n creation
using spark query. it taken more time when i execute following query

Select SUM(`adventurepersoncontacts`.`contactid`)  AS
`adventurepersoncontacts_contactid` ,
`adventurepersoncontacts`.`fullname` AS `adventurepersoncontacts_fullname`
FROM  `default`.`adventurepersoncontacts` AS adventurepersoncontacts
 JOIN (
 SELECT
  `F_0`.`fullname_0_0` AS `fullname_0_0`,
  ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC)  AS `R_N_0`
  FROM(
   SELECT
`adventurepersoncontacts`.`fullname` AS `fullname_0_0`,
SUM(adventurepersoncontacts.contactid) AS `Measure_0`
FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts
GROUP BY `adventurepersoncontacts`.`fullname`
   )`F_0`
) `T_0` on ((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` )
AND (`T_0`.`R_N_0` = 5)) GROUP BY `adventurepersoncontacts`.`fullname

In mentioned query, I have set row index in every group according to the
aggregation type. Row_number calculation according to the aggregation like
Row_Number() Over( order by sum( column name) order by DESC) not directly
supported in spark . So i have using sub query like

 SELECT
  `F_0`.`fullname_0_0` AS `fullname_0_0`,
  ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC)  AS `R_N_0`
  FROM(
   SELECT
`adventurepersoncontacts`.`fullname` AS `fullname_0_0`,
SUM(adventurepersoncontacts.contactid) AS `Measure_0`
FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts
GROUP BY `adventurepersoncontacts`.`fullname`
   )`F_0`
) `T_0`

The execution time was getting slowed when using following way. In case i
removed  main group by( remove  aggregation) method , it getting very fast
execution.
Refer following query.  I mentioned this query only for slow execution when
i include main group by


 Select  `adventurepersoncontacts`.`contactid`  AS
`adventurepersoncontacts_contactid` ,
`adventurepersoncontacts`.`fullname` AS `adventurepersoncontacts_fullname`
FROM  `default`.`adventurepersoncontacts` AS adventurepersoncontacts
 JOIN (
 SELECT
  `F_0`.`fullname_0_0` AS `fullname_0_0`,
  ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC)  AS `R_N_0`
  FROM(
   SELECT
`adventurepersoncontacts`.`fullname` AS `fullname_0_0`,
SUM(adventurepersoncontacts.contactid) AS `Measure_0`
FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts
GROUP BY `adventurepersoncontacts`.`fullname`
   )`F_0`
) `T_0` on ((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` )
AND (`T_0`.`R_N_0` = 5))


I have found other way for getting this hierarchical  query.

I have created a temp table ( The table refers  inner sub query(refer
previous query) )

 CREATE external table IF NOT EXISTS  temp_table  AS SELECT
SUM(adventurepersoncontacts.contactid) as contactid,
adventurepersoncontacts.fullname as fullname FROM
`default`.`adventurepersoncontacts` AS adventurepersoncontacts GROUP BY
`adventurepersoncontacts`.`fullname`;



SELECT  SUM(`adventurepersoncontacts`.`contactid`) AS
`adventurepersoncontacts_contactid` ,`adventurepersoncontacts`.`fullname`
AS `adventurepersoncontacts_fullname` FROM
`default`.`adventurepersoncontacts` AS adventurepersoncontacts JOIN (SELECT
`fullname` AS `fullname_0_0`,ROW_NUMBER() over( order by `contactid` desc)
AS `R_N_0` FROM default.temp_table) `T_0` on
((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` ) AND
(`T_0`.`R_N_0` = 2)) GROUP BY `adventurepersoncontacts`.`fullname`;

This is other way. But this execution time was delayed when using create
table using select statement


Please help any other  optimized way to achieve my requirement.


Regards,
Ravi


Udf's in spark

2015-07-14 Thread Ravisankar Mani
Hi Everyone,

As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs.
I have built the UDF's in hive meta store. It working perfectly in hive
connection. But it is not working in spark (java.lang.RuntimeException:
Couldn't find function DATE_FORMAT).

Could you please help how to use this hive UDF' s in sprak?


Regards,

Ravisankar M R


Spark performance

2015-07-10 Thread Ravisankar Mani
Hi everyone,

I have planned to move mssql server to spark?.  I have using around 50,000
to 1l records.
 The spark performance is slow when compared to mssql server.

What is the best data base(Spark or sql) to store or retrieve data around
50,000 to 1l records ?

regards,
Ravi


Performance slow

2015-07-09 Thread Ravisankar Mani
Hi everyone,

More time to be taken when i execute query using  (group by +
order by) or (group by + cast + order by)   in same query. Kindly refer the
following query

Could you please provide any solution regarding thisd performance issue?



SELECT If(ISNOTNULL(SUM(CAST(adventurepersoncontacts.contactid AS
decimal(38,6,SUM(CAST(adventurepersoncontacts.contactid AS
decimal(38,6))),0) AS adventurepersoncontacts_contactid,
adventurepersoncontacts.fullname as adventurepersoncontacts_fullname FROM
default.adventurepersoncontacts AS adventurepersoncontacts order by
adventurepersoncontacts.fullname asc

Regards,
Ravisankar M R


Day of year

2015-07-08 Thread Ravisankar Mani
Hi everyone,

I can't get 'day of year'  when using spark sql query. Can you help any way
to achieve day of year?

Regards,
Ravi


Spark query

2015-07-08 Thread Ravisankar Mani
Hi everyone,

I can't get 'day of year'  when using spark query. Can you help any way to
achieve day of year?

Regards,
Ravi


Spark SQL - Setting YARN Classpath for primordial class loader

2015-06-26 Thread Kumaran Mani
Hi,

The response to the below thread for making yarn-client mode work by adding
the JDBC driver JAR to spark.{driver,executor}.extraClassPath works fine.

http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCAAOnQ7vHeBwDU2_EYeMuQLyVZ77+N_jDGuinxOB=sff2lkc...@mail.gmail.com%3E

But for yarn-cluster mode I am still getting below error

ERROR yarn.ApplicationMaster: User class threw exception:
oracle.jdbc.OracleDriver



Is there anything more needs to be done ?


Help with Spark SQL Hash Distribution

2015-05-04 Thread Mani
I am trying to distribute a table using a particular column which is the key 
that I’ll be using to perform join operations on the table. Is it possible to 
do this with Spark SQL?
I checked the method partitionBy() for rdds. But not sure how to specify which 
column is the key? Can anyone give an example?

Thanks
Mani
Graduate Student, Department of Computer Science
Virginia Tech








Question on Spark SQL performance of Range Queries on Large Datasets

2015-04-27 Thread Mani
Hi,

I am a graduate student from Virginia Tech (USA) pursuing my Masters in 
Computer Science. I’ve been researching on parallel and distributed databases 
and their performance for running some Range queries involving simple joins and 
group by on large datasets. As part of my research, I tried evaluating query 
performance of Spark SQL on the data set that I have. It would be really great 
if you could please confirm on the numbers that I get from Spark SQL? Following 
is the type of query that am running,

Table 1 - 22,000,483 records
Table 2 - 10,173,311 records

Query : SELECT b.x, count(b.y) FROM Table1 a, Table2 b WHERE a.y=b.y AND 
a.z=‘' GROUP BY b.x ORDER BY b.x

Total Running Time
4 Worker Nodes:177.68s
8 Worker Nodes: 186.72s

I am using Apache Spark 1.3.0 with the default configuration. Is the query 
running time reasonable? Is it because of non-availability of indexes 
increasing the query run time? Can you please clarify?

Thanks
Mani
Graduate Student, Department of Computer Science
Virginia Tech