subject:"\[Pyspark, SQL\] Very slow IN operator"

Re: [Pyspark, SQL] Very slow IN operator

2017-04-06 Thread Fred Reiss

If you just want to emulate pushing down a join, you can just wrap the IN list query in a JDBCRelation directly: scala> val r_df = spark.read.format("jdbc").option("url", > "jdbc:h2:/tmp/testdb").option("dbtable", "R").load() > r_df: org.apache.spark.sql.DataFrame = [A: int] > scala> r_df.show > +

Re: [Pyspark, SQL] Very slow IN operator

2017-04-06 Thread Maciej Bryński

2017-04-06 4:00 GMT+02:00 Michael Segel : > Just out of curiosity, what would happen if you put your 10K values in to a > temp table and then did a join against it? The answer is predicates pushdown. In my case I'm using this kind of query on JDBC table and IN predicate is executed on DB in less

Re: [Pyspark, SQL] Very slow IN operator

2017-04-05 Thread Michael Segel

Just out of curiosity, what would happen if you put your 10K values in to a temp table and then did a join against it? > On Apr 5, 2017, at 4:30 PM, Maciej Bryński wrote: > > Hi, > I'm trying to run queries with many values in IN operator. > > The result is that for more than 10K values IN op

Re: [Pyspark, SQL] Very slow IN operator

2017-04-05 Thread Garren Staubli

reply to this email, your message will be added to the discussion > below: > http://apache-spark-developers-list.1001551.n3. > nabble.com/Pyspark-SQL-Very-slow-IN-operator-tp21307.html > To unsubscribe from Apache Spark Developers List, click here > <http://apache-spark-developers

Re: [Pyspark, SQL] Very slow IN operator

2017-04-05 Thread Garren Staubli

reply to this email, your message will be added to the discussion > below: > http://apache-spark-developers-list.1001551.n3. > nabble.com/Pyspark-SQL-Very-slow-IN-operator-tp21307.html > To unsubscribe from Apache Spark Developers List, click here > <http://apache-spark-developers

[Pyspark, SQL] Very slow IN operator

2017-04-05 Thread Maciej Bryński

Hi, I'm trying to run queries with many values in IN operator. The result is that for more than 10K values IN operator is getting slower. For example this code is running about 20 seconds. df = spark.range(0,10,1,1) df.where('id in ({})'.format(','.join(map(str,range(10).count() Any

Re: [Pyspark, SQL] Very slow IN operator

Re: [Pyspark, SQL] Very slow IN operator

Re: [Pyspark, SQL] Very slow IN operator

Re: [Pyspark, SQL] Very slow IN operator

Re: [Pyspark, SQL] Very slow IN operator

[Pyspark, SQL] Very slow IN operator

6 matches

Site Navigation

Mail list logo

Footer information