Many thanks Jakob.
So it basically boils down to this demarcation as suggested which looks
clearer
val errlog = sc.textFile("/unix_files/*.ksh")
errlog.filter(line => line.contains("sed")).collect().foreach(line =>
println(line))
Regards,
Mich
On 10/02/2016 23:21, Jakob Odersky wrote:
> Hi Mich,
> your assumptions 1 to 3 are all correct (nitpick: they're method
> *calls*, the methods being the part before the parentheses, but I
> assume that's what you meant). The last one is also a method call but
> uses syntactic sugar on top: `foreach(println)` boils down to
> `foreach(line => println(line))`.
>
> On an unrelated side-note, I would suggest you add a period between
> every method call, it makes things easier to read and is actually
> required in certain circumstances. Specifically I would add a period
> before collect() and foreach().
>
> best,
> --Jakob
>
> On Wed, Feb 10, 2016 at 2:35 PM, Mich Talebzadeh
> <[email protected]> wrote:
> Hi Chandeep Many thanks for your help In the line below errlog.filter(line =>
> line.contains("sed"))collect()foreach(println) Can you please clarify the
> components with the correct naming as I am new to Scala errlog --> is the
> RDD? filter(line => line.contains("sed")) is a method collect() is another
> method ? foreach (println) ? Thanks On 10/02/2016 21:28, Chandeep Singh
> wrote: Hi Mich, If you would like to print everything to the console you
> could - errlog.filter(line => line.contains("sed"))collect()foreach(println)
> or you could always save to a file using any of the saveAs methods. Thanks,
> Chandeep On Wed, Feb 10, 2016 at 8:14 PM,
> <[email protected]> wrote: Hi, I have a bunch of
> files stored in hdfs /unit_files directory in total 319 files scala> val
> errlog = sc.textFile("/unix_files/*.ksh") scala> errlog.filter(line =>
> line.contains("sed"))count() res104: Long = 1113 So it returns 1113 instances
> the word "sed" If I want to see the collection I can do
scala> errlog.filter(line => line.contains("sed"))collect() res105:
Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ;
PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on
argument for script., " exec sp_spaceused", " exec sp_spaceused",
PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server that is
used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used to
load the transaction dump", " BACKUPSERVER=$5 # Server that is used to load the
transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed
s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat
$TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ >
$TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e
s/.ksh//), " B... scala> Now is there anyway I can retrieve all these instances
or perhaps they are all wrapped up and I only see few lines? Thanks, Mich -- Dr
Mich Talebzadeh LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
[1] http://talebzadehmich.wordpress.com [2] NOTE: The information in this
email is proprietary and confidential. This message is for the designated
recipient only, if you are not the intended recipient, you should destroy it
immediately. Any information in this message shall not be understood as given
or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their
employees, unless expressly so stated. It is the responsibility of the
recipient to ensure that this email is virus free, therefore neither Cloud
Technology partners Ltd, its subsidiaries nor their employees accept any
responsibility.
--
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
NOTE: The information in this email is proprietary and confidential.
This message is for the designated recipient only, if you are not the
intended recipient, you should destroy it immediately. Any information
in this message shall not be understood as given or endorsed by Cloud
Technology Partners Ltd, its subsidiaries or their employees, unless
expressly so stated. It is the responsibility of the recipient to ensure
that this email is virus free, therefore neither Cloud Technology
partners Ltd, its subsidiaries nor their employees accept any
responsibility.
Links:
------
[1]
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
[2] http://talebzadehmich.wordpress.com