Re: Kotlin Spark API

2020-07-14 Thread Anwar AliKhan
Is kotlin another new language ? GRADY BOOCH; The United States Department of defence (DOD) is perhaps the largest user of computers in the world. By the mid-1970s, software development for its systems had reached crisis proportions: projects were often late, over budget and they often failed to

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
s exact same mathematical notations , examples etc. so it is a smooth transition from that courses. On Tue, 14 Jul 2020, 15:52 Sean Owen, wrote: > It is still copyrighted material, no matter its state of editing. Yes, > you should not be sharing this on the internet. > > On Tue, Jul 1

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
book is not freely available. > > I own it and it's wonderful, Mr. Géron deserves to benefit from it. > > On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan > wrote: > >> link to a free book which may be useful. >> >> Hands-On Machine Learning with Scikit-Lea

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
link to a free book which may be useful. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron https://bit.ly/2zxueGt 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron imple

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
This is very useful for me leading on from week4 of the Andrew Ng course. On Mon, 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron implementation in Spark ML, but > that's not what you're looking for. > To parallelize model training developed using standard libraries like

Re: Blog : Apache Spark Window Functions

2020-07-13 Thread Anwar AliKhan
ur octave app on Apache Spark. You can use Apache spark on a standalone whilst you prototype then with one line of code, change the parallelism to a distributed parallelism across cluster(s) of PCs. On Fri, 10 Jul 2020, 04:50 Anwar AliKhan, wrote: > My opinion would be go here. > >

Re: Blog : Apache Spark Window Functions

2020-07-09 Thread Anwar AliKhan
My opinion would be go here. https://www.coursera.org/courses?query=machine%20learning%20andrew%20ng Machine learning by Andrew Ng. After three weeks you will have more valuable skills than most engineers in silicon valley in the USA. I am past week 3. 😎🤐 He does go 90 miles per hour. I wish s

Re: When is a Bigint a long and when is a long a long

2020-06-28 Thread Anwar AliKhan
>>> >>> spark.range(10).map(_.toLong).reduce(_+_) >>> >>> If you collect(), you still have an Array[java.lang.Long]. But Scala >>> implicits and conversions make .reduce(_+_) work fine on that; there >>> is no "Java-friendly" overloa

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
OK Thanks On Sat, 27 Jun 2020, 17:36 Sean Owen, wrote: > It does not return a DataFrame. It returns Dataset[Long]. > You do not need to collect(). See my email. > > On Sat, Jun 27, 2020, 11:33 AM Anwar AliKhan > wrote: > >> So the range function actually returns BigInt

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
_+_) work fine on that; there > is no "Java-friendly" overload in the way. > > Normally all of this just works and you can ignore these differences. > This is a good example of a corner case in which it's inconvenient, > because of the old Java-friendly overloads. This is

When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
*As you know I have been puzzling over this issue :* *How come spark.range(100).reduce(_+_)* *worked in earlier spark version but not with the most recent versions.* *well,* *When you first create a dataset, by default the column "id" datatype is [BigInt],* *It is a bit like a coin Long on one s

Re: Where are all the jars gone ?

2020-06-25 Thread Anwar AliKhan
lang.Long, java.lang.Long) => java.lang.Long)java.lang.Long cannot be applied to ((java.lang.Long, java.lang.Long) => scala.Long) spark.range(1,101).reduce(_+_) <http://www.backbutton.co.uk/> On Wed, 24 Jun 2020, 19:54 Anwar AliKhan, wrote: > > I am using the method describe on t

Suggested Amendment to ./dev/make-distribution.sh

2020-06-25 Thread Anwar AliKhan
normal 😴 expectation especially if a project has been going for 10 years.😤😷 A message to say these packages are needed but not installed . Please wait while packages are being installed will be helpful to the user experience.🤗 On Wed, 24 Jun 2020, 16:21 Anwar AliKhan, wrote: > THA

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
g snapshot versions, which is a > different > beast entirely > <https://maven.apache.org/guides/getting-started/index.html#What_is_a_SNAPSHOT_version> > . > > On Wed, Jun 24, 2020 at 10:39 AM Anwar AliKhan > wrote: > >> THANKS >> >> >> It ap

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
sually in > the .m2 directory. > > Hope this helps. > > -ND > On 6/23/20 3:21 PM, Anwar AliKhan wrote: > > Hi, > > I prefer to do most of my projects in Python and for that I use Jupyter. > I have been downloading the compiled version of spark. > > I do not

Re: Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
ww.backbutton.co.uk/> On Wed, 24 Jun 2020, 11:07 Hyukjin Kwon, wrote: > Looks like you haven't installed the 'e1071' package. > > 2020년 6월 24일 (수) 오후 6:49, Anwar AliKhan 님이 작성: > >> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr >>

Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes minor error Spark r test failed , I don't use r so it doesn't effect me. ***installing help indices ** building package indices ** install

Found jars in /assembly/target/scala-2.12/jars

2020-06-23 Thread Anwar AliKhan

Where are all the jars gone ?

2020-06-23 Thread Anwar AliKhan
Hi, I prefer to do most of my projects in Python and for that I use Jupyter. I have been downloading the compiled version of spark. I do not normally like the source code version because the build process makes me nervous. You know with lines of stuff scrolling up the screen. What am I am going

Re: Hey good looking toPandas () error stack

2020-06-21 Thread Anwar AliKhan
#x27;Unsupported class file major > version 55' > > I see posts about the Java version being used. Are you sure your configs > are right? > > https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version > > On Sat, Jun 20, 2020 at 6:17

Re: Hey good looking toPandas () error stack

2020-06-20 Thread Anwar AliKhan
; 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) 80 raise 81 return deco IllegalArgumentException: 'Unsupported class file major version 55' On Fri, 19 Jun 2020, 08:06 Stephen Boesch, wrote: > afaik It has b

Re: Hey good looking toPandas ()

2020-06-19 Thread Anwar AliKhan
; On Thu, 18 Jun 2020 at 23:56, Anwar AliKhan > wrote: > >> I first ran the command >> df.show() >> >> For sanity check of my dataFrame. >> >> I wasn't impressed with the display. >> >> I then ran >> df.toPandas() in Jupiter Noteb

Hey good looking toPandas ()

2020-06-18 Thread Anwar AliKhan
I first ran the command df.show() For sanity check of my dataFrame. I wasn't impressed with the display. I then ran df.toPandas() in Jupiter Notebook. Now the display is really good looking . Is toPandas() a new function which became available in Spark 3.0 ?

Add python library

2020-06-06 Thread Anwar AliKhan
" > Have you looked into this article? https://medium.com/@SSKahani/pyspark-applications-dependencies-99415e0df987 " This is weird ! I was hanging out here https://machinelearningmastery.com/start-here/. When I came across this post. The weird part is I was just wondering how I can take one of

Re: Spark dataframe hdfs vs s3

2020-05-30 Thread Anwar AliKhan
Optimisation of Spark applications Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article presents several Spark

Re: [pyspark 2.3+] Dedupe records

2020-05-30 Thread Anwar AliKhan
What meaning Dataframes are RDDs under the cover ? What meaning deduplication ? Please send your bio data history and past commercial projects. The Wali Ahad agreed to release 300 million USD for new machine learning research Project to centralize government facilities to find better way to of

Re: Spark Security

2020-05-29 Thread Anwar AliKhan
What is the size of your .tsv file sir ? What is the size of your local hard drive sir ? Regards Wali Ahaad On Fri, 29 May 2020, 16:21 , wrote: > Hello, > > I plan to load in a local .tsv file from my hard drive using sparklyr (an > R package). I have figured out how to do this alread