ordering of rows in dataframe

2023-12-05 Thread Som Lima
want to maintain the order of the rows in the data frame in Pyspark. Is
there any way to achieve this for this function here we have the row ID
which will give numbering to each row. Currently, the below function
results in the rearrangement of the row in the data frame.

def createRowIdColumn( new_column, position, start_value):
row_count = df.count()
row_ids = spark.range(int(start_value), int(start_value) +
row_count, 1).toDF(new_column)
window = Window.orderBy(lit(1))
df_row_ids = row_ids.withColumn("row_num", row_number().over(window) - 1)
df_with_row_num = df.withColumn("row_num", row_number().over(window) - 1)

if position == "Last Column":
result = df_with_row_num.join(df_row_ids, on="row_num").drop("row_num")
else:
result = df_row_ids.join(df_with_row_num, on="row_num").drop("row_num")

return result.orderBy(new_column)

Please let me know the solution if we can achieve this requirement.


Re: Lightbend Scala professional training & certification

2020-04-29 Thread Som Lima
I think I am going to focus on spring boot and apache camel.

I'll do Apache spark in the back ground.

So see you.

I am going to unsubscribe here.




On Wed, 29 Apr 2020, 13:58 Som Lima,  wrote:

> The end value is important  for me.
>
> I think certification in commercial framework is most valuable for me.
>
> What is free is the access to the framework. That is invaluable.
>
> In the past Access to commercial frameworks was not possible .
> One could only get it if a company sent you.
>
> This involved one week intense training courses. costing in the order of
> USD $3000. That is still the case. With Spring for example you can train
> freely and sit the certification for  minimal fee ($200) in the event a
> company doesn't send you on a one week USD $3000 course spring course.
>
> ACCESS : Being able to download and install a commercial framework  is
> priceless.
>
> I think  frameworks like Spark and accompanying mathematical  concepts
> cannot be learned in one week but highly achievable  for proficient
> commercial development in two months. Resources for learning relevant maths
> are abundant. Actually relevant maths are encapsulated in  APIs.  That is
> Bad news for masters and phD mathematicians.
>
> During one such one week course my colleague , an american outspoken type
> said .  he was right.  As it happened the framework fell short due to
> concepts I knew from commercial software development experience.
>
>
> I would say certification in Java is only worth 1/10  to that of
> certification in a java framework like J2ee , spring , Spark.
> Even certification as a DBA is more valuable than Language Certification
> because simply you can show you can operate equipment like a database.
>
> With language certification you can build an IDE. Same as  people who buy
> a lathe the first thing they do is use it to build another Lathe.
>
> On Wed, 29 Apr 2020, 13:09 Mich Talebzadeh, 
> wrote:
>
>> I don't think that will be free!
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 29 Apr 2020 at 13:01, Som Lima  wrote:
>>
>>> Is there a databricks or other professional certification for Apache
>>> Spark  ?
>>>
>>>
>>> On Wed, 29 Apr 2020, 11:29 Mich Talebzadeh, 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Has anyone had experience of taking training courses  with Lightbend
>>>> training <https://www.lightbend.com/services/training>on Scala
>>>>
>>>> I believe they are offering free Scala courses and certifications.
>>>>
>>>> Thanks,
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>


Re: Lightbend Scala professional training & certification

2020-04-29 Thread Som Lima
The end value is important  for me.

I think certification in commercial framework is most valuable for me.

What is free is the access to the framework. That is invaluable.

In the past Access to commercial frameworks was not possible .
One could only get it if a company sent you.

This involved one week intense training courses. costing in the order of
USD $3000. That is still the case. With Spring for example you can train
freely and sit the certification for  minimal fee ($200) in the event a
company doesn't send you on a one week USD $3000 course spring course.

ACCESS : Being able to download and install a commercial framework  is
priceless.

I think  frameworks like Spark and accompanying mathematical  concepts
cannot be learned in one week but highly achievable  for proficient
commercial development in two months. Resources for learning relevant maths
are abundant. Actually relevant maths are encapsulated in  APIs.  That is
Bad news for masters and phD mathematicians.

During one such one week course my colleague , an american outspoken type
said .  he was right.  As it happened the framework fell short due to
concepts I knew from commercial software development experience.


I would say certification in Java is only worth 1/10  to that of
certification in a java framework like J2ee , spring , Spark.
Even certification as a DBA is more valuable than Language Certification
because simply you can show you can operate equipment like a database.

With language certification you can build an IDE. Same as  people who buy a
lathe the first thing they do is use it to build another Lathe.

On Wed, 29 Apr 2020, 13:09 Mich Talebzadeh, 
wrote:

> I don't think that will be free!
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 29 Apr 2020 at 13:01, Som Lima  wrote:
>
>> Is there a databricks or other professional certification for Apache
>> Spark  ?
>>
>>
>> On Wed, 29 Apr 2020, 11:29 Mich Talebzadeh, 
>> wrote:
>>
>>> Hi,
>>>
>>> Has anyone had experience of taking training courses  with Lightbend
>>> training <https://www.lightbend.com/services/training>on Scala
>>>
>>> I believe they are offering free Scala courses and certifications.
>>>
>>> Thanks,
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>


Re: Lightbend Scala professional training & certification

2020-04-29 Thread Som Lima
Is there a databricks or other professional certification for Apache Spark
?


On Wed, 29 Apr 2020, 11:29 Mich Talebzadeh, 
wrote:

> Hi,
>
> Has anyone had experience of taking training courses  with Lightbend
> training on Scala
>
> I believe they are offering free Scala courses and certifications.
>
> Thanks,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: Converting a date to milliseconds with time zone in Scala Eclipse IDE

2020-04-29 Thread Som Lima
Also you may be surprised  to learn that I started programming in scala
just yesterday. I was really please I had a challenge to solve rather than
copying example programmes which can be boring.

Judging from answers received I think some may find this information useful.

 I used a scala specific IDE  I got from http://scala-ide.org.

If you do use it there is a bug I fixed to make it work.
I added -vm to eclipse.ini file . The BUG  is:-
 then on the  NEXT line you put the path to jdk8.
Other jdk versions  can also causes other errors.

Eclipse.ini

-startup
plugins/org.eclipse.equinox.launcher_1.4.0.v20161219-1356.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.500.v20170531-1133
-Xmx256m
-Xms200m

-XX:MaxPermSize=384m
-vm
/path/to/java/jdk8u242-b08/bin


On Tue, 28 Apr 2020, 22:22 Mich Talebzadeh, 
wrote:

> Hi,
>
> Thank you all,
>
> I am just thinking of passing that date   06/04/2020 12:03:43  and
> getting the correct format from the module. In effect
>
> This date format  -MM-dd'T'HH:mm:ss.SZ as pattern
>
> in other words rather than new Date()  pass "06/04/2020 12:03:43" as string
>
> REgards,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 21:31, Som Lima  wrote:
>
>> import java.time._
>> import java.util.Date
>> import java.text.SimpleDateFormat
>> import java.util.Locale
>> import java.util.SimpleTimeZone
>>
>> object CalendarDemo extends App {
>>
>> println("Calendar Demo")
>>  val pattern = "E dd M  HH:mm:ss.SSSZ";
>> val simpleDateFormat = (new SimpleDateFormat(pattern, new
>> Locale("en", "UK")));
>> val date = simpleDateFormat.format(new Date());
>> System.out.println(date);
>>
>> val pattern2 = "dd  MM HH:mm:ss.SZ";
>> val simpleDateFormat2 = (new SimpleDateFormat(pattern2, new
>> Locale("en", "UK")));
>> val date2 = simpleDateFormat2.format(new Date());
>> System.out.println(date2);
>>
>> /* *
>> Pattern Syntax
>>
>> You can use the following symbols in your formatting pattern:
>> G Era designator (before christ, after christ)
>> y Year (e.g. 12 or 2012). Use either yy or .
>> M Month in year. Number of M's determine length of format (e.g. MM, MMM
>> or M)
>> d Day in month. Number of d's determine length of format (e.g. d or dd)
>> h Hour of day, 1-12 (AM / PM) (normally hh)
>> H Hour of day, 0-23 (normally HH)
>> m Minute in hour, 0-59 (normally mm)
>> s Second in minute, 0-59 (normally ss)
>> S Millisecond in second, 0-999 (normally SSS)
>> E Day in week (e.g Monday, Tuesday etc.)
>> D Day in year (1-366)
>> F Day of week in month (e.g. 1st Thursday of December)
>> w Week in year (1-53)
>> W Week in month (0-5)
>> a AM / PM marker
>> k Hour in day (1-24, unlike HH's 0-23)
>> K Hour in day, AM / PM (0-11)
>> z Time Zone
>> ' Escape for text delimiter
>> ' Single quote
>> **/
>>
>> }
>>
>>
>> On Tue, 28 Apr 2020, 19:18 Edgardo Szrajber, 
>> wrote:
>>
>>> Hi
>>> please check combining unix_timestamp and from_unixtime,
>>> Something like:
>>> from_unixtime(unix_timestamp( "06-04-2020 12:03:43"),"-MM-dd'T'HH:mm:ss
>>> Z")
>>>
>>> please note that I just wrote without any validation.
>>>
>>> In any case, you might want to check the documentation of both functions
>>> to check all valid formats. Also note that this functions are universal
>>> (not only in Spark, Hive) so you have a huge amount of documentation
>>> available.
>>>
>>> Bentzi
>>>
>>>
>>> On Tuesday, April 28, 2020, 08:32:18 PM GMT+3, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>
>>> Unfortunately that did not work.
>>>
>>> any other suggestions?
>>>
&

Re: Filtering on multiple columns in spark

2020-04-29 Thread Som Lima
>From your email the obvious seems to be that
10  is an Int because it is not surrounded in quotes ""
10 should be "10".

Although I can't image a telephone number with only 10 because that is what
you are trying to program.


In *Scala*, you can check *if *two operands *are equal* ( == ) or *not* (
!= ) *and* it returns true *if* the condition *is* met, false *if not* (
else ). By itself, ! *is *called the Logical *NOT* Operator.

On Wed, 29 Apr 2020, 08:45 Mich Talebzadeh, 
wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as
> below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> :49: error: value || is not a member of Int
>
>   filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> :49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>   filter(length(col("target_mobile_no")) !=== 10
> || substring(col("target_mobile_no"),1,1) !=== "7")
>
>  ^
>
> :49: error: value || is not a member of Int
>
>   filter(length(col("target_mobile_no")) !=== 10
> || substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: Converting a date to milliseconds with time zone in Scala with fixed date str

2020-04-28 Thread Som Lima
import java.time._
import java.util.Date
import java.text.SimpleDateFormat
import java.util.Locale
import java.util.SimpleTimeZone
import org.joda.time
import org.joda.time.DateTime

object CalendarDemo extends App {

println("Calendar Demo")
val pattern = "E dd M  HH:mm:ss.SSSZ";
val simpleDateFormat = (new SimpleDateFormat(pattern, new Locale("en",
"UK")));
val date = simpleDateFormat.format(new Date());
System.out.println(date);


val pattern2 = "dd  MM HH:mm:ss.SZ";
val simpleDateFormat2 = (new SimpleDateFormat(pattern2, new
Locale("en", "UK")));
val date2 = simpleDateFormat2.format(new Date());

System.out.println(date2);


val fixedStr = "2020-06-04T12:03:43";
val dt = new DateTime(fixedStr);
val jdkDate = dt.toDate();

val pattern3 = "dd  MM HH:mm:ss.SZ";
val simpleDateFormat3 = (new SimpleDateFormat(pattern2, new
Locale("en", "UK")));
val date3 = simpleDateFormat3.format(jdkDate);
System.out.println(date3);




/* *
Pattern Syntax

You can use the following symbols in your formatting pattern:
G Era designator (before christ, after christ)
y Year (e.g. 12 or 2012). Use either yy or .
M Month in year. Number of M's determine length of format (e.g. MM, MMM or
M)
d Day in month. Number of d's determine length of format (e.g. d or dd)
h Hour of day, 1-12 (AM / PM) (normally hh)
H Hour of day, 0-23 (normally HH)
m Minute in hour, 0-59 (normally mm)
s Second in minute, 0-59 (normally ss)
S Millisecond in second, 0-999 (normally SSS)
E Day in week (e.g Monday, Tuesday etc.)
D Day in year (1-366)
F Day of week in month (e.g. 1st Thursday of December)
w Week in year (1-53)
W Week in month (0-5)
a AM / PM marker
k Hour in day (1-24, unlike HH's 0-23)
K Hour in day, AM / PM (0-11)
z Time Zone
' Escape for text delimiter
' Single quote
**/

}

On Tue, 28 Apr 2020, 22:22 Mich Talebzadeh, 
wrote:

> Hi,
>
> Thank you all,
>
> I am just thinking of passing that date   06/04/2020 12:03:43  and
> getting the correct format from the module. In effect
>
> This date format  -MM-dd'T'HH:mm:ss.SZ as pattern
>
> in other words rather than new Date()  pass "06/04/2020 12:03:43" as string
>
> REgards,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 21:31, Som Lima  wrote:
>
>> import java.time._
>> import java.util.Date
>> import java.text.SimpleDateFormat
>> import java.util.Locale
>> import java.util.SimpleTimeZone
>>
>> object CalendarDemo extends App {
>>
>> println("Calendar Demo")
>>  val pattern = "E dd M  HH:mm:ss.SSSZ";
>> val simpleDateFormat = (new SimpleDateFormat(pattern, new
>> Locale("en", "UK")));
>> val date = simpleDateFormat.format(new Date());
>> System.out.println(date);
>>
>> val pattern2 = "dd  MM HH:mm:ss.SZ";
>> val simpleDateFormat2 = (new SimpleDateFormat(pattern2, new
>> Locale("en", "UK")));
>> val date2 = simpleDateFormat2.format(new Date());
>> System.out.println(date2);
>>
>> /* *
>> Pattern Syntax
>>
>> You can use the following symbols in your formatting pattern:
>> G Era designator (before christ, after christ)
>> y Year (e.g. 12 or 2012). Use either yy or .
>> M Month in year. Number of M's determine length of format (e.g. MM, MMM
>> or M)
>> d Day in month. Number of d's determine length of format (e.g. d or dd)
>> h Hour of day, 1-12 (AM / PM) (normally hh)
>> H Hour of day, 0-23 (normally HH)
>> m Minute in hour, 0-59 (normally mm)
>> s Second in minute, 0-59 (normally ss)
>> S Millisecond in second, 0-999 (normally SSS)
>> E Day in week (e.g Monday, Tuesday etc.)
>> D Day in year (1-366)
>> F Day of week in month (e.g. 1st Thursday of December)
>> w Week in year (1-53)
>> W Week in month (0-5)
>> a AM / PM marker
>> k Hour in day (1-24, unlike HH's 0-23)
&g

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Som Lima
import java.time._
import java.util.Date
import java.text.SimpleDateFormat
import java.util.Locale
import java.util.SimpleTimeZone

object CalendarDemo extends App {

println("Calendar Demo")
 val pattern = "E dd M  HH:mm:ss.SSSZ";
val simpleDateFormat = (new SimpleDateFormat(pattern, new Locale("en",
"UK")));
val date = simpleDateFormat.format(new Date());
System.out.println(date);

val pattern2 = "dd  MM HH:mm:ss.SZ";
val simpleDateFormat2 = (new SimpleDateFormat(pattern2, new
Locale("en", "UK")));
val date2 = simpleDateFormat2.format(new Date());
System.out.println(date2);

/* *
Pattern Syntax

You can use the following symbols in your formatting pattern:
G Era designator (before christ, after christ)
y Year (e.g. 12 or 2012). Use either yy or .
M Month in year. Number of M's determine length of format (e.g. MM, MMM or
M)
d Day in month. Number of d's determine length of format (e.g. d or dd)
h Hour of day, 1-12 (AM / PM) (normally hh)
H Hour of day, 0-23 (normally HH)
m Minute in hour, 0-59 (normally mm)
s Second in minute, 0-59 (normally ss)
S Millisecond in second, 0-999 (normally SSS)
E Day in week (e.g Monday, Tuesday etc.)
D Day in year (1-366)
F Day of week in month (e.g. 1st Thursday of December)
w Week in year (1-53)
W Week in month (0-5)
a AM / PM marker
k Hour in day (1-24, unlike HH's 0-23)
K Hour in day, AM / PM (0-11)
z Time Zone
' Escape for text delimiter
' Single quote
**/

}


On Tue, 28 Apr 2020, 19:18 Edgardo Szrajber, 
wrote:

> Hi
> please check combining unix_timestamp and from_unixtime,
> Something like:
> from_unixtime(unix_timestamp( "06-04-2020 12:03:43"),"-MM-dd'T'HH:mm:ss
> Z")
>
> please note that I just wrote without any validation.
>
> In any case, you might want to check the documentation of both functions
> to check all valid formats. Also note that this functions are universal
> (not only in Spark, Hive) so you have a huge amount of documentation
> available.
>
> Bentzi
>
>
> On Tuesday, April 28, 2020, 08:32:18 PM GMT+3, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Unfortunately that did not work.
>
> any other suggestions?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 17:41, Mich Talebzadeh 
> wrote:
>
> Thanks Neeraj, I'll check it out. !
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 17:26, neeraj bhadani 
> wrote:
>
> Hi Mich,
> You can try Spark DateTime function here and see if that helps.
>
>
> https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-datetime-functions-b66de737950a
>
> Regards,
> Neeraj
>
> On Tue, Apr 28, 2020 at 5:15 PM Mich Talebzadeh 
> wrote:
>
> Hi,
>
> I have a date in format like 06/04/2020 12:03:43 and we want it to be
> displayed as follows:
>
> -MM-dd'T'HH:mm:ss.SZ format
>
> So the input date is  GMT date time just we do not receive the
> information with it
>
> The output should have timezone information
>
>
> Appreciate any ideas.
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>


Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Som Lima
import java.time._
import java.util.Date
import java.text.SimpleDateFormat
import java.util.Locale
import java.util.SimpleTimeZone

object CalendarDemo extends App {

println("Calendar Demo")
 val pattern =3D "E dd M  HH:mm:ss.SSSZ";
val simpleDateFormat =3D (new SimpleDateFormat(pattern, new Locale("en"=
,
"UK")));
val date =3D simpleDateFormat.format(new Date());
System.out.println(date);

val pattern2 =3D "dd  MM HH:mm:ss.SZ";
val simpleDateFormat2 =3D (new SimpleDateFormat(pattern2, new

Locale("en", "UK")));
val date2 =3D simpleDateFormat2.format(new Date());

On Tue, 28 Apr 2020, 19:18 Edgardo Szrajber, 
wrote:

> Hi
> please check combining unix_timestamp and from_unixtime,
> Something like:
> from_unixtime(unix_timestamp( "06-04-2020 12:03:43"),"-MM-dd'T'HH:mm:ss
> Z")
>
> please note that I just wrote without any validation.
>
> In any case, you might want to check the documentation of both functions
> to check all valid formats. Also note that this functions are universal
> (not only in Spark, Hive) so you have a huge amount of documentation
> available.
>
> Bentzi
>
>
> On Tuesday, April 28, 2020, 08:32:18 PM GMT+3, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Unfortunately that did not work.
>
> any other suggestions?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 17:41, Mich Talebzadeh 
> wrote:
>
> Thanks Neeraj, I'll check it out. !
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 28 Apr 2020 at 17:26, neeraj bhadani 
> wrote:
>
> Hi Mich,
> You can try Spark DateTime function here and see if that helps.
>
>
> https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-datetime-functions-b66de737950a
>
> Regards,
> Neeraj
>
> On Tue, Apr 28, 2020 at 5:15 PM Mich Talebzadeh 
> wrote:
>
> Hi,
>
> I have a date in format like 06/04/2020 12:03:43 and we want it to be
> displayed as follows:
>
> -MM-dd'T'HH:mm:ss.SZ format
>
> So the input date is  GMT date time just we do not receive the
> information with it
>
> The output should have timezone information
>
>
> Appreciate any ideas.
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>


Re: Copyright Infringment

2020-04-25 Thread Som Lima
The statement in the book makes sense now. It is based on the premise as
detailed in paragraph 1).

*unless your are reproducing significant portion of the code. *





On Sat, 25 Apr 2020, 17:11 Holden Karau,  wrote:

> I’m one of the authors.
>
> I think you’ve misunderstood the licenses here, but I am not a lawyer.
> This is not legal advice but my understanding is:
>
> 1) Spark is Apache licensed so code your make using Spark doesn’t need to
> be open source (it’s not GPL)
>
> 2) If you want to use examples from the book and you aren’t using a
> substantial portion of the code from the book go for it. If you are using a
> substantial portion of the code please talk to O’Reilly (the publisher) for
> permission.
> If you look at the book’s example repo you can find information about the
> license the individual examples are available under, most are Apache
> licensed but some components examples are GPL licensed.
>
> I hope this helps and your able to use the examples in the book to get
> your job done and thanks for reading the book.
>
> On Sat, Apr 25, 2020 at 8:48 AM Som Lima  wrote:
>
>> The text is very clear on the issue of copyright infringement. Ask
>> permission or you are committing an unlawful act.
>>
>> The words "significant portion" has not been quantified.
>>
>> So I have nothing to ask of the authors except may be to quantify.
>> Quantification is a secondary issue.
>>
>> My reading of the text is that it applies to any spark user and not just
>> me personally.
>>
>> The authors need to make clear to all spark users whether copyright
>> infringement was their intent or not.
>>
>> The authors need to make clear to all spark users why should any
>> development team share their Use Case in order  avoid  falling on the
>> wrong side
>> of copyright infringement claims.
>>
>> I understand  you are also  a named author of a book on Apache usage.
>>
>> Perhaps you can share with us from your expertise  the need or your
>> motivation  for the addendum to the Apache Spark online usage documents.
>>
>> Let me rephrase my question.
>>
>> Does any Spark User feel as I do this text is a violation of Apache
>> foundation's  free licence agreement  ?
>>
>>
>>
>> On Sat, 25 Apr 2020, 16:18 Sean Owen,  wrote:
>>
>>> You'll want to ask the authors directly ; the book is not produced by
>>> the project itself, so can't answer here.
>>>
>>> On Sat, Apr 25, 2020, 8:42 AM Som Lima  wrote:
>>>
>>>> At the risk of being removed from the emailing I would like a
>>>> clarification because I do not want  to commit an unlawful act.
>>>> Can you please clarify if I would be infringing copyright due to this
>>>> text.
>>>> *Book:  High Performance Spark *
>>>> *authors: holden Karau Rachel Warren.*
>>>> *page xii:*
>>>>
>>>> * This book is here to help you get your job done ... If for example
>>>> code is offered with this book, you may use it in your programs and
>>>> documentation. You do not need to contact us for permission unless your are
>>>> reproducing significant portion of the code. *
>>>>
>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Copyright Infringment

2020-04-25 Thread Som Lima
The text is very clear on the issue of copyright infringement. Ask
permission or you are committing an unlawful act.

The words "significant portion" has not been quantified.

So I have nothing to ask of the authors except may be to quantify.
Quantification is a secondary issue.

My reading of the text is that it applies to any spark user and not just me
personally.

The authors need to make clear to all spark users whether copyright
infringement was their intent or not.

The authors need to make clear to all spark users why should any
development team share their Use Case in order  avoid  falling on the
wrong side
of copyright infringement claims.

I understand  you are also  a named author of a book on Apache usage.

Perhaps you can share with us from your expertise  the need or your
motivation  for the addendum to the Apache Spark online usage documents.

Let me rephrase my question.

Does any Spark User feel as I do this text is a violation of Apache
foundation's  free licence agreement  ?



On Sat, 25 Apr 2020, 16:18 Sean Owen,  wrote:

> You'll want to ask the authors directly ; the book is not produced by the
> project itself, so can't answer here.
>
> On Sat, Apr 25, 2020, 8:42 AM Som Lima  wrote:
>
>> At the risk of being removed from the emailing I would like a
>> clarification because I do not want  to commit an unlawful act.
>> Can you please clarify if I would be infringing copyright due to this
>> text.
>> *Book:  High Performance Spark *
>> *authors: holden Karau Rachel Warren.*
>> *page xii:*
>>
>> * This book is here to help you get your job done ... If for example code
>> is offered with this book, you may use it in your programs and
>> documentation. You do not need to contact us for permission unless your are
>> reproducing significant portion of the code. *
>>
>


Copyright Infringment

2020-04-25 Thread Som Lima
At the risk of being removed from the emailing I would like a clarification
because I do not want  to commit an unlawful act.
Can you please clarify if I would be infringing copyright due to this text.
*Book:  High Performance Spark *
*authors: holden Karau Rachel Warren.*
*page xii:*

* This book is here to help you get your job done ... If for example code
is offered with this book, you may use it in your programs and
documentation. You do not need to contact us for permission unless your are
reproducing significant portion of the code. *


Re: IDE suitable for Spark : Monitoring & Debugging Spark Jobs

2020-04-07 Thread Som Lima
The definitive guide
Chapter 18:
Monitoring and Debugging

"This chapter covers the key details you need to monitor and debug your
Spark Applications.  To do this , we will walk through the spark UI with an
example query designed to help you understand how to trace your  own jobs
through the executions life cycle. The example we'll look at will also help
you to understand  how to debug your jobs and where errors are likely to
occur."








On Tue, 7 Apr 2020, 18:28 Pat Ferrel,  wrote:

> IntelliJ Scala works well when debugging master=local. Has anyone used it
> for remote/cluster debugging? I’ve heard it is possible...
>
>
> From: Luiz Camargo  
> Reply: Luiz Camargo  
> Date: April 7, 2020 at 10:26:35 AM
> To: Dennis Suhari 
> 
> Cc: yeikel valdes  ,
> zahidr1...@gmail.com  ,
> user@spark.apache.org  
> Subject:  Re: IDE suitable for Spark
>
> I have used IntelliJ Spark/Scala with the sbt tool
>
> On Tue, Apr 7, 2020 at 1:18 PM Dennis Suhari 
> wrote:
>
>> We are using Pycharm resp. R Studio with Spark libraries to submit Spark
>> Jobs.
>>
>> Von meinem iPhone gesendet
>>
>> Am 07.04.2020 um 18:10 schrieb yeikel valdes :
>>
>> 
>>
>> Zeppelin is not an IDE but a notebook.  It is helpful to experiment but
>> it is missing a lot of the features that we expect from an IDE.
>>
>> Thanks for sharing though.
>>
>>  On Tue, 07 Apr 2020 04:45:33 -0400 * zahidr1...@gmail.com
>>  * wrote 
>>
>> When I first logged on I asked if there was a suitable IDE for Spark.
>> I did get a couple of responses.
>> *Thanks.*
>>
>> I did actually find one which is suitable IDE for spark.
>> That is  *Apache Zeppelin.*
>>
>> One of many reasons it is suitable for Apache Spark is.
>> The  *up and running Stage* which involves typing *bin/zeppelin-daemon.sh
>> start*
>> Go to browser and type *http://localhost:8080 *
>> That's it!
>>
>> Then to
>> * Hit the ground running*
>> There are also ready to go Apache Spark examples
>> showing off the type of functionality one will be using in real life
>> production.
>>
>> Zeppelin comes with  embedded Apache Spark  and scala as default
>> interpreter with 20 + interpreters.
>> I have gone on to discover there are a number of other advantages for
>> real time production
>> environment with Zeppelin offered up by other Apache Products.
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>> 
>>
>>
>>
>
> --
>
>
> Prof. Luiz Camargo
> Educador - Computação
>
>
>


Re: Serialization or internal functions?

2020-04-07 Thread Som Lima
Go to localhost:4040

While sparksession is running.

Go to localhost:4040

Select Stages from menu option.

Select Job you are interested in.


You can select additional metrics

Including  DAG visualisation.





On Tue, 7 Apr 2020, 17:14 yeikel valdes,  wrote:

> Thanks for your input Soma , but I am actually looking to understand the
> differences and not only on the performance.
>
>  On Sun, 05 Apr 2020 02:21:07 -0400 * somplastic...@gmail.com
>  * wrote 
>
> If you want to  measure optimisation in terms of time taken , then here is
> an idea  :)
>
>
> public class MyClass {
> public static void main(String args[])
> throws InterruptedException
> {
>   long start  =  System.currentTimeMillis();
>
> // replace with your add column code
> // enough data to measure
>Thread.sleep(5000);
>
>  long end  = System.currentTimeMillis();
>
>int timeTaken = 0;
>   timeTaken = (int) (end  - start );
>
>   System.out.println("Time taken  " + timeTaken) ;
> }
> }
>
> On Sat, 4 Apr 2020, 19:07 ,  wrote:
>
> Dear Community,
>
>
>
> Recently, I had to solve the following problem “for every entry of a
> Dataset[String], concat a constant value” , and to solve it, I used
> built-in functions :
>
>
>
> val data = Seq("A","b","c").toDS
>
>
>
> scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit("
> "),lit("concat"))).select("valueconcat").explain()
>
> == Physical Plan ==
>
> LocalTableScan [valueconcat#161]
>
>
>
> As an alternative , a much simpler version of the program is to use map,
> but it adds a serialization step that does not seem to be present for the
> version above :
>
>
>
> scala> data.map(e=> s"$e concat").explain
>
> == Physical Plan ==
>
> *(1) SerializeFromObject [staticinvoke(class
> org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0,
> java.lang.String, true], true, false) AS value#92]
>
> +- *(1) MapElements , obj#91: java.lang.String
>
>+- *(1) DeserializeToObject value#12.toString, obj#90: java.lang.String
>
>   +- LocalTableScan [value#12]
>
>
>
> Is this over-optimization or is this the right way to go?
>
>
>
> As a follow up , is there any better API to get the one and only column
> available in a DataSet[String] when using built-in functions?
> “col(data.columns.head)” works but it is not ideal.
>
>
>
> Thanks!
>
>
>


Re: Scala version compatibility

2020-04-06 Thread Som Lima
Those who followed best practices in software development
would start with a clean environment I.e.  installation of
operating system. Then install development tools keeping a  record of
version numbers. So that  at the time of deployment unforeseen errors are
avoided by duplicating  development environment , test environment  , final
client runtime environment.

That is now the purpose of docker.

So if you want run same source code or byte code in different environments
then you should run it. If you can programme and ask this question then you
should know how to.








On Mon, 6 Apr 2020, 20:50 Andrew Melo,  wrote:

> Hello all,
>
> I'm aware that Scala is not binary compatible between revisions. I have
> some Java code whose only Scala dependency is the transitive dependency
> through Spark. This code calls a Spark API which returns a Seq, which
> I then convert into a List with
> JavaConverters.seqAsJavaListConverter. Will this usage cause binary
> incompatibility if the jar is compiled in one Scala version and executed in
> another?
>
> I tried grokking
> https://docs.scala-lang.org/overviews/core/binary-compatibility-of-scala-releases.html,
> and wasn't quite able to make heads or tails of this particular case.
>
> Thanks!
> Andrew
>
>
>


Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-06 Thread Som Lima
Ok Try this one instead. (link below)

It has both  an EXIT which we know is  rude and abusive  instead of
graceful structured programming and also includes half hearted  user input
validation.

Do you think millions of spark users download and test these programmes and
repeat this rude programming behaviour.

I don't think they have any coding rules like the safety critical software
industry
But they do have strict emailing rules.

Do you think email rules are far more important than programming rules and
guidelines  ?


https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream/PageViewStream.scala



On Mon, 6 Apr 2020, 07:04 jane thorpe,  wrote:

> Hi Som ,
>
> Did you know that simple demo program of reading characters from file
> didn't work ?
> Who wrote that simple hello world type little program ?
>
> jane thorpe
> janethor...@aol.com
>
>
> -Original Message-
> From: jane thorpe 
> To: somplasticllc ; user 
> Sent: Fri, 3 Apr 2020 2:44
> Subject: Re: HDFS file hdfs://
> 127.0.0.1:9000/hdfs/spark/examples/README.txt
>
>
> Thanks darling
>
> I tried this and worked
>
> hdfs getconf -confKey fs.defaultFS
> hdfs://localhost:9000
>
>
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
>
> val textFile = sc.textFile("hdfs://
> 127.0.0.1:9000/hdfs/spark/examples/README.txt")
> val counts = textFile.flatMap(line => line.split(" "))
>  .map(word => (word, 1))
>  .reduceByKey(_ + _)
> counts.saveAsTextFile("hdfs://
> 127.0.0.1:9000/hdfs/spark/examples/README7.out")
>
> // Exiting paste mode, now interpreting.
>
> textFile: org.apache.spark.rdd.RDD[String] = hdfs://
> 127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at
> textFile at :27
> counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at
> reduceByKey at :30
>
> scala> :quit
>
>
> jane thorpe
> janethor...@aol.com
>
>
> -Original Message-
> From: Som Lima 
> CC: user 
> Sent: Tue, 31 Mar 2020 23:06
> Subject: Re: HDFS file
>
> Hi Jane
>
> Try this example
>
>
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
>
>
> Som
>
> On Tue, 31 Mar 2020, 21:34 jane thorpe, 
> wrote:
>
> hi,
>
> Are there setup instructions on the website for
> spark-3.0.0-preview2-bin-hadoop2.7
> I can run same program for hdfs format
>
> val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => 
> line.split(" "))
>  .map(word => (word, 1))
>  .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")
>
>
>
> val textFile = sc.textFile("/data/README.md")
> val counts = textFile.flatMap(line => line.split(" "))
>  .map(word => (word, 1))
>  .reduceByKey(_ + _)
> counts.saveAsTextFile("/data/wordcount")
>
> textFile: org.apache.spark.rdd.RDD[String] = /data/README.md
> MapPartitionsRDD[23] at textFile at :28
>
> counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at 
> reduceByKey at :31
>
> br
> Jane
>
>


Re: Serialization or internal functions?

2020-04-05 Thread Som Lima
If you want to  measure optimisation in terms of time taken , then here is
an idea  :)


public class MyClass {
public static void main(String args[])
throws InterruptedException
{
  long start  =  System.currentTimeMillis();

// replace with your add column code
// enough data to measure
   Thread.sleep(5000);

 long end  = System.currentTimeMillis();

   int timeTaken = 0;
  timeTaken = (int) (end  - start );

  System.out.println("Time taken  " + timeTaken) ;
}
}

On Sat, 4 Apr 2020, 19:07 ,  wrote:

> Dear Community,
>
>
>
> Recently, I had to solve the following problem “for every entry of a
> Dataset[String], concat a constant value” , and to solve it, I used
> built-in functions :
>
>
>
> val data = Seq("A","b","c").toDS
>
>
>
> scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit("
> "),lit("concat"))).select("valueconcat").explain()
>
> == Physical Plan ==
>
> LocalTableScan [valueconcat#161]
>
>
>
> As an alternative , a much simpler version of the program is to use map,
> but it adds a serialization step that does not seem to be present for the
> version above :
>
>
>
> scala> data.map(e=> s"$e concat").explain
>
> == Physical Plan ==
>
> *(1) SerializeFromObject [staticinvoke(class
> org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0,
> java.lang.String, true], true, false) AS value#92]
>
> +- *(1) MapElements , obj#91: java.lang.String
>
>+- *(1) DeserializeToObject value#12.toString, obj#90: java.lang.String
>
>   +- LocalTableScan [value#12]
>
>
>
> Is this over-optimization or is this the right way to go?
>
>
>
> As a follow up , is there any better API to get the one and only column
> available in a DataSet[String] when using built-in functions?
> “col(data.columns.head)” works but it is not ideal.
>
>
>
> Thanks!
>


Re: HDFS file

2020-03-31 Thread Som Lima
Hi Jane

Try this example

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala


Som

On Tue, 31 Mar 2020, 21:34 jane thorpe,  wrote:

> hi,
>
> Are there setup instructions on the website for
> spark-3.0.0-preview2-bin-hadoop2.7
> I can run same program for hdfs format
>
> val textFile = sc.textFile("hdfs://...")val counts = textFile.flatMap(line => 
> line.split(" "))
>  .map(word => (word, 1))
>  .reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")
>
>
>
> val textFile = sc.textFile("/data/README.md")
> val counts = textFile.flatMap(line => line.split(" "))
>  .map(word => (word, 1))
>  .reduceByKey(_ + _)
> counts.saveAsTextFile("/data/wordcount")
>
> textFile: org.apache.spark.rdd.RDD[String] = /data/README.md
> MapPartitionsRDD[23] at textFile at :28
>
> counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at 
> reduceByKey at :31
>
> br
> Jane
>