Re: Spark SQL: filter if column substring does not contain a string

Ted Yu Sun, 15 Nov 2015 02:09:35 -0800

Please take a look at test_column_operators in python/pyspark/sql/tests.py

FYI


On Sat, Nov 14, 2015 at 11:49 PM, YaoPau <[email protected]> wrote:

> I'm using pyspark 1.3.0, and struggling with what should be simple.
> Basically, I'd like to run this:
>
> site_logs.filter(lambda r: 'page_row' in r.request[:20])
>
> meaning that I want to keep rows that have 'page_row' in the first 20
> characters of the request column.  The following is the closest I've come
> up
> with:
>
> pages = site_logs.filter("request like '%page_row%'")
>
> but that's missing the [:20] part.  If I instead try the .like function
> from
> the Column API:
>
> birf.filter(birf.request.like('bi_page')).take(5)
>
> I get... Py4JJavaError: An error occurred while calling o71.filter.
> : org.apache.spark.sql.AnalysisException: resolved attributes request
> missing from
> user_agent,status_code,log_year,bytes,log_month,request,referrer
>
>
> What is the code to run this filter, and what are some recommended ways to
> learn the Spark SQL syntax?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-filter-if-column-substring-does-not-contain-a-string-tp25385.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Spark SQL: filter if column substring does not contain a string

Reply via email to