Hi,
the RDD class does not have an exist()-method (in the Scala API), but
the functionality you need seems easy to resemble with the existing methods:

val containsNMatchingElements =
data.filter(qualifying_function).take(n).count() >= n

Note: I am not sure whether the intermediate take(n) really increases
performance, but the idea is to arbitrarily reduce the number of
elements in the RDD before counting because we are not interested in the
full count.

If you need to check specifically whether there is at least one matching
occurrence, it is probably preferable to use isEmpty() instead of
count() and check whether the result is false:

val contains1MatchingElement = !(data.filter(qualifying_function).isEmpty())

Best,
Carsten



Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
> Dear Spark Dev Community,
> 
> I am wondering if there is already a function to solve my problem. If
> not, then should I work on this?
> 
> Say you just want to check if a word exists in a huge text file. I could
> not find better ways than those mentioned here
> <http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6>. 
> 
> So, I was proposing if we have a function called /exists /in RDD with
> the following signature:
> 
> #returns the true if n elements exist which qualify our criteria.
> #qualifying function would receive the element and its index and return
> true or false. 
> def /exists/(qualifying_function, n):
>      ....
> 
> 
> Regards,
> Sandeep Giri,
> +1 347 781 4573 (US)
> +91-953-899-8962 (IN)
> 
> www.KnowBigData.com. <http://KnowBigData.com.>
> Phone: +1-253-397-1945 (Office)
> 
> linkedin icon <https://linkedin.com/company/knowbigdata> other site icon
> <http://knowbigdata.com> facebook icon
> <https://facebook.com/knowbigdata>twitter icon
> <https://twitter.com/IKnowBigData><https://twitter.com/IKnowBigData>
> 

-- 
Carsten Schnober
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP) Lab
FB 20 / Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
schno...@ukp.informatik.tu-darmstadt.de
www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
(AIPHES): www.aiphes.tu-darmstadt.de
PhD program: Knowledge Discovery in Scientific Literature (KDSL)
www.kdsl.tu-darmstadt.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to