Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Mich Talebzadeh
Spark sql has both FLOOR and CEILING functions

spark-sql> select FLOOR(11.95),CEILING(11.95);
11.012.0



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 4 March 2016 at 12:35, Ajay Chander  wrote:

> Hi Ashok,
>
> Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
> have that functionality. Let me know if it works.
>
> Thanks,
> Ajay
>
>
> On Friday, March 4, 2016, ashokkumar rajendran <
> ashokkumar.rajend...@gmail.com> wrote:
>
>> Hi Ayan,
>>
>> Thanks for the response. I am using SQL query (not Dataframe). Could you
>> please explain how I should import this sql function to it? Simply
>> importing this class to my driver code does not help here.
>>
>> Many functions that I need are already there in the sql.functions so I do
>> not want to rewrite them.
>>
>> Regards
>> Ashok
>>
>> On Fri, Mar 4, 2016 at 3:52 PM, ayan guha  wrote:
>>
>>> Most likely you are missing import of  org.apache.spark.sql.functions.
>>>
>>> In any case, you can write your own function for floor and use it as UDF.
>>>
>>> On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
>>> ashokkumar.rajend...@gmail.com> wrote:
>>>
 Hi,

 I load json file that has timestamp (as long in milliseconds) and
 several other attributes. I would like to group them by 5 minutes and store
 them as separate file.

 I am facing couple of problems here..
 1. Using Floor function at select clause (to bucket by 5mins) gives me
 error saying "java.util.NoSuchElementException: key not found: floor". How
 do I use floor function in select clause? I see that floor method is
 available in org.apache.spark.sql.functions clause but not sure why its not
 working here.
 2. Can I use the same in Group by clause?
 3. How do I store them as separate file after grouping them?

 String logPath = "my-json.gz";
 DataFrame logdf = sqlContext.read().json(logPath);
 logdf.registerTempTable("logs");
 DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp`
 as rawTimeStamp, `user.requestId` as requestId,
 *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
 bucketLogs.toJSON().saveAsTextFile("target_file");

 Regards
 Ashok

>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>


Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander
Hi Ashok,

Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.

Thanks,
Ajay

On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:

> Hi Ayan,
>
> Thanks for the response. I am using SQL query (not Dataframe). Could you
> please explain how I should import this sql function to it? Simply
> importing this class to my driver code does not help here.
>
> Many functions that I need are already there in the sql.functions so I do
> not want to rewrite them.
>
> Regards
> Ashok
>
> On Fri, Mar 4, 2016 at 3:52 PM, ayan guha  > wrote:
>
>> Most likely you are missing import of  org.apache.spark.sql.functions.
>>
>> In any case, you can write your own function for floor and use it as UDF.
>>
>> On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
>> ashokkumar.rajend...@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I load json file that has timestamp (as long in milliseconds) and
>>> several other attributes. I would like to group them by 5 minutes and store
>>> them as separate file.
>>>
>>> I am facing couple of problems here..
>>> 1. Using Floor function at select clause (to bucket by 5mins) gives me
>>> error saying "java.util.NoSuchElementException: key not found: floor". How
>>> do I use floor function in select clause? I see that floor method is
>>> available in org.apache.spark.sql.functions clause but not sure why its not
>>> working here.
>>> 2. Can I use the same in Group by clause?
>>> 3. How do I store them as separate file after grouping them?
>>>
>>> String logPath = "my-json.gz";
>>> DataFrame logdf = sqlContext.read().json(logPath);
>>> logdf.registerTempTable("logs");
>>> DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp`
>>> as rawTimeStamp, `user.requestId` as requestId,
>>> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
>>> bucketLogs.toJSON().saveAsTextFile("target_file");
>>>
>>> Regards
>>> Ashok
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran
Hi Ayan,

Thanks for the response. I am using SQL query (not Dataframe). Could you
please explain how I should import this sql function to it? Simply
importing this class to my driver code does not help here.

Many functions that I need are already there in the sql.functions so I do
not want to rewrite them.

Regards
Ashok

On Fri, Mar 4, 2016 at 3:52 PM, ayan guha  wrote:

> Most likely you are missing import of  org.apache.spark.sql.functions.
>
> In any case, you can write your own function for floor and use it as UDF.
>
> On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
> ashokkumar.rajend...@gmail.com> wrote:
>
>> Hi,
>>
>> I load json file that has timestamp (as long in milliseconds) and several
>> other attributes. I would like to group them by 5 minutes and store them as
>> separate file.
>>
>> I am facing couple of problems here..
>> 1. Using Floor function at select clause (to bucket by 5mins) gives me
>> error saying "java.util.NoSuchElementException: key not found: floor". How
>> do I use floor function in select clause? I see that floor method is
>> available in org.apache.spark.sql.functions clause but not sure why its not
>> working here.
>> 2. Can I use the same in Group by clause?
>> 3. How do I store them as separate file after grouping them?
>>
>> String logPath = "my-json.gz";
>> DataFrame logdf = sqlContext.read().json(logPath);
>> logdf.registerTempTable("logs");
>> DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as
>> rawTimeStamp, `user.requestId` as requestId,
>> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
>> bucketLogs.toJSON().saveAsTextFile("target_file");
>>
>> Regards
>> Ashok
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ayan guha
Most likely you are missing import of  org.apache.spark.sql.functions.

In any case, you can write your own function for floor and use it as UDF.

On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:

> Hi,
>
> I load json file that has timestamp (as long in milliseconds) and several
> other attributes. I would like to group them by 5 minutes and store them as
> separate file.
>
> I am facing couple of problems here..
> 1. Using Floor function at select clause (to bucket by 5mins) gives me
> error saying "java.util.NoSuchElementException: key not found: floor". How
> do I use floor function in select clause? I see that floor method is
> available in org.apache.spark.sql.functions clause but not sure why its not
> working here.
> 2. Can I use the same in Group by clause?
> 3. How do I store them as separate file after grouping them?
>
> String logPath = "my-json.gz";
> DataFrame logdf = sqlContext.read().json(logPath);
> logdf.registerTempTable("logs");
> DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as
> rawTimeStamp, `user.requestId` as requestId,
> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
> bucketLogs.toJSON().saveAsTextFile("target_file");
>
> Regards
> Ashok
>



-- 
Best Regards,
Ayan Guha