I'm using pyspark 1.3.0, and struggling with what should be simple. 
Basically, I'd like to run this:

site_logs.filter(lambda r: 'page_row' in r.request[:20])

meaning that I want to keep rows that have 'page_row' in the first 20
characters of the request column.  The following is the closest I've come up
with:

pages = site_logs.filter("request like '%page_row%'")

but that's missing the [:20] part.  If I instead try the .like function from
the Column API:

birf.filter(birf.request.like('bi_page')).take(5)

I get... Py4JJavaError: An error occurred while calling o71.filter.
: org.apache.spark.sql.AnalysisException: resolved attributes request
missing from
user_agent,status_code,log_year,bytes,log_month,request,referrer


What is the code to run this filter, and what are some recommended ways to
learn the Spark SQL syntax?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-filter-if-column-substring-does-not-contain-a-string-tp25385.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to