Hi Mich, your assumptions 1 to 3 are all correct (nitpick: they're method *calls*, the methods being the part before the parentheses, but I assume that's what you meant). The last one is also a method call but uses syntactic sugar on top: `foreach(println)` boils down to `foreach(line => println(line))`.
On an unrelated side-note, I would suggest you add a period between every method call, it makes things easier to read and is actually required in certain circumstances. Specifically I would add a period before collect() and foreach(). best, --Jakob On Wed, Feb 10, 2016 at 2:35 PM, Mich Talebzadeh <mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > Hi Chandeep > > > > Many thanks for your help > > > > In the line below > > > > errlog.filter(line => line.contains("sed"))collect()foreach(println) > > > > Can you please clarify the components with the correct naming as I am new to > Scala > > errlog --> is the RDD? > filter(line => line.contains("sed")) is a method > collect() is another method ? > foreach (println) ? > > > > Thanks > > > > On 10/02/2016 21:28, Chandeep Singh wrote: > > Hi Mich, > > If you would like to print everything to the console you could - > errlog.filter(line => line.contains("sed"))collect()foreach(println) > > or you could always save to a file using any of the saveAs methods. > > Thanks, > Chandeep > > On Wed, Feb 10, 2016 at 8:14 PM, > <mich.talebza...@cloudtechnologypartners.co.uk> wrote: >> >> >> >> Hi, >> >> I have a bunch of files stored in hdfs /unit_files directory in total 319 >> files >> >> scala> val errlog = sc.textFile("/unix_files/*.ksh") >> >> scala> errlog.filter(line => line.contains("sed"))count() >> res104: Long = 1113 >> >> So it returns 1113 instances the word "sed" >> >> If I want to see the collection I can do >> >> >> scala> errlog.filter(line => line.contains("sed"))collect() >> >> res105: Array[String] = Array(" DSQUERY=${1} ; >> DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . >> in environment based on argument for script., " exec sp_spaceused", " >> exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), " >> BACKUPSERVER=$5 # Server that is used to load the transaction dump", >> " BACKUPSERVER=$5 # Server that is used to load the >> transaction dump", " BACKUPSERVER=$5 # Server that is used to >> load the transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed >> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat >> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > >> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e >> s/.ksh//), " B... >> scala> >> >> >> Now is there anyway I can retrieve all these instances or perhaps they are >> all wrapped up and I only see few lines? >> >> Thanks, >> >> Mich > > > > > > -- > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Cloud Technology > Partners Ltd, its subsidiaries or their employees, unless expressly so > stated. It is the responsibility of the recipient to ensure that this email > is virus free, therefore neither Cloud Technology partners Ltd, its > subsidiaries nor their employees accept any responsibility. > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org