I'm using pyspark 1.3.0, and struggling with what should be simple. Basically, I'd like to run this:
site_logs.filter(lambda r: 'page_row' in r.request[:20]) meaning that I want to keep rows that have 'page_row' in the first 20 characters of the request column. The following is the closest I've come up with: pages = site_logs.filter("request like '%page_row%'") but that's missing the [:20] part. If I instead try the .like function from the Column API: birf.filter(birf.request.like('bi_page')).take(5) I get... Py4JJavaError: An error occurred while calling o71.filter. : org.apache.spark.sql.AnalysisException: resolved attributes request missing from user_agent,status_code,log_year,bytes,log_month,request,referrer What is the code to run this filter, and what are some recommended ways to learn the Spark SQL syntax? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-filter-if-column-substring-does-not-contain-a-string-tp25385.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org