Hi Mich,

If you would like to print everything to the console you could - errlog.
filter(line => line.contains("sed"))collect()foreach(println)

or you could always save to a file using any of the saveAs methods.

Thanks,
Chandeep

On Wed, Feb 10, 2016 at 8:14 PM, <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:

>
>
> Hi,
>
> I have a bunch of files stored in hdfs /unit_files directory in total 319 
> files
> scala> val errlog = sc.textFile("/unix_files/*.ksh")
>
> scala>  errlog.filter(line => line.contains("sed"))count()
> res104: Long = 1113
> So it returns 1113 instances the word "sed"
>
> If I want to see the collection I can do
>
>
> *scala>  errlog.filter(line => line.contains("sed"))collect()*
>
> res105: Array[String] = Array("                         DSQUERY=${1} ; 
> DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", #    . in 
> environment based on argument for script., "       exec sp_spaceused", "      
>   exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), "        
> BACKUPSERVER=$5        # Server that is used to load the transaction dump", " 
>        BACKUPSERVER=$5         # Server that is used to load the transaction 
> dump", "        BACKUPSERVER=$5         # Server that is used to load the 
> transaction dump", "    cat $TMPDIR/${DBNAME}_trandump.sql | sed 
> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat 
> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > 
> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e 
> s/.ksh//), "        B...
> scala>
>
>
> Now is there anyway I can retrieve all these instances or perhaps they are 
> all wrapped up and I only see few lines?
>
> Thanks,
>
> Mich
>
>

Reply via email to