Re: may I need a join here?

2022-01-24 Thread Gary Liu
You can use left anti join instead. isin accept a list type, not a column
type.

On Mon, Jan 24, 2022 at 01:38 Bitfox  wrote:

> >>> df.show(3)
>
> ++-+
>
> |word|count|
>
> ++-+
>
> |  on|1|
>
> | dec|1|
>
> |2020|1|
>
> ++-+
>
> only showing top 3 rows
>
>
> >>> df2.show(3)
>
> ++-+
>
> |stopword|count|
>
> ++-+
>
> |able|1|
>
> |   about|1|
>
> |   above|1|
>
> ++-+
>
> only showing top 3 rows
>
>
> >>> df3=df.filter(~col("word").isin(df2.stopword ))
>
> Traceback (most recent call last):
>
>   File "", line 1, in 
>
>   File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter
>
> jdf = self._jdf.filter(condition._jc)
>
>   File
> "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line
> 1310, in __call__
>
>   File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco
>
> raise converted from None
>
> pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4
> missing from word#0,count#1L in operator !Filter NOT word#0 IN
> (stopword#4).;
>
> !Filter NOT word#0 IN (stopword#4)
>
> +- LogicalRDD [word#0, count#1L], false
>
>
>
>
>
> The filter method doesn't work here.
>
> Maybe I need a join for two DF?
>
> What's the syntax for this?
>
>
>
> Thank you and regards,
>
> Bitfox
>
-- 
Gary Liu


may I need a join here?

2022-01-23 Thread Bitfox
>>> df.show(3)

++-+

|word|count|

++-+

|  on|1|

| dec|1|

|2020|1|

++-+

only showing top 3 rows


>>> df2.show(3)

++-+

|stopword|count|

++-+

|able|1|

|   about|1|

|   above|1|

++-+

only showing top 3 rows


>>> df3=df.filter(~col("word").isin(df2.stopword ))

Traceback (most recent call last):

  File "", line 1, in 

  File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter

jdf = self._jdf.filter(condition._jc)

  File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py",
line 1310, in __call__

  File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco

raise converted from None

pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4
missing from word#0,count#1L in operator !Filter NOT word#0 IN
(stopword#4).;

!Filter NOT word#0 IN (stopword#4)

+- LogicalRDD [word#0, count#1L], false





The filter method doesn't work here.

Maybe I need a join for two DF?

What's the syntax for this?



Thank you and regards,

Bitfox