Please take a look at test_column_operators in python/pyspark/sql/tests.py FYI
On Sat, Nov 14, 2015 at 11:49 PM, YaoPau <[email protected]> wrote: > I'm using pyspark 1.3.0, and struggling with what should be simple. > Basically, I'd like to run this: > > site_logs.filter(lambda r: 'page_row' in r.request[:20]) > > meaning that I want to keep rows that have 'page_row' in the first 20 > characters of the request column. The following is the closest I've come > up > with: > > pages = site_logs.filter("request like '%page_row%'") > > but that's missing the [:20] part. If I instead try the .like function > from > the Column API: > > birf.filter(birf.request.like('bi_page')).take(5) > > I get... Py4JJavaError: An error occurred while calling o71.filter. > : org.apache.spark.sql.AnalysisException: resolved attributes request > missing from > user_agent,status_code,log_year,bytes,log_month,request,referrer > > > What is the code to run this filter, and what are some recommended ways to > learn the Spark SQL syntax? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-filter-if-column-substring-does-not-contain-a-string-tp25385.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
