Hello Arjen. Thank you for replying.
I'll try OR query on my environment. Thanks. 2016年1月4日(月) 23:03 Arjen Nienhuis <a.g.nienh...@gmail.com>: > > On Jan 4, 2016 09:45, "Hiroyuki Sato" <hiroys...@gmail.com> wrote: > > > > Hello Arjen > > > > Thank you for replying. > > > > 2016年1月4日(月) 16:49 Arjen Nienhuis <a.g.nienh...@gmail.com>: > >> > >> > >> On Dec 28, 2015 00:55, "Hiroyuki Sato" <hiroys...@gmail.com> wrote: > >> > > >> > Hello Andreas and Tom > >> > > >> > Thank you for replying. > >> > > >> > Sorry, I re-created my questions. I was mis-pasted query log on > previous question. > >> > (@~ operator is PGroonga extension (http://pgroonga.github.io)) > >> > Please ignore it. > >> > > >> > Best regards. > >> > > >> > 1, Problem. > >> > (1) Following query is exteme slow. (478sec) > >> > SELECT > >> > u.url > >> > FROM > >> > url_lists4 u, > >> > keywords4 k > >> > WHERE > >> > u.url like k.url > >> > AND > >> > k.name = 'esc_url'; > >> > > >> > > >> > (2) grep -f kwd.txt sample.txt (exec time under 1sec) > >> > > >> > >> These are not the same 'query'. Grep will match any of the patterns for > each url while postgres finds all matching combinations. You need to use > '... OR ... OR ...' or 'ANY()' in postgres to get the same result. > > > > I thought the following query are same meaning. > > Could you point me same examples about two differences? > > If one URL matches two patterns, grep will return the match only once. In > your SQL query it will match multiple times and will be in the result > multiple times. > > You can test this by putting both 'http' and 'yahoo' in the wordlist. > > > > > (1) u.url like k.url > > (2) u.url like 'k.url 1', or u.url like 'k.url2' ... > > > >> > 2, Questions > >> > > >> > (1) Is it possible to improve this query like the command ``grep -f > keyword data``? > >> > >> I get the best results by using OR of all the different patterns: > >> > >> SELECT url FROM url_lists4 > >> WHERE > >> url LIKE 'http://ak.yahoo.co.jp/xwv/%' > >> OR url LIKE 'http://ao.yahoo.co.jp/wdl/%' > >> OR ... > > > > > > I'll try it. > > > > BTW Do you know how many OR can I use ? > > I have 5000 URLs. > > There is no real limit. I tried with your test file (5000 patterns). I got > 6 seconds planning time and 12 seconds execution time. > > (I also tried making one big regular expression with url1|url2|... but > that did fail) > > > > >> In theory you could use: > >> > >> ... WHERE url LIKE ANY(ARRAY(SELECT url FROM keywords4 k WHERE k.name > = 'esc_url')); > >> > >> but that's very slow. > > > > It's very interesting. It seems same query. > > I'll check EXPLAIN > > > > > >> > >> > (2) What kind of Index should I create on url_lists table? > >> > >> Both btree text_pattern_ops and gin trigram on the URL. > >> > >> > > >> > 3, Environment > >> > OS: CentOS7 > >> > PostgreSQL 9.4 > >> > > >> > 4, sample source > >> > https://github.com/hiroyuki-sato/postgres_like_test > >> > > >> > > >> > 5, Create table > >> > > >> > drop table if exists url_lists4; > >> > create table url_lists4 ( > >> > id int not null primary key, > >> > url text not null > >> > ); > >> > create index ix_url_url_lists4 on url_lists4(url); > >> > > >> > drop table if exists keywords4; > >> > create table keywords4 ( > >> > id int not null primary key, > >> > name varchar(40) not null, > >> > url text not null > >> > ); > >> > > >> > create index ix_url_keywords4 on keywords4(url); > >> > create index ix_name_keywords4 on keywords4(name); > >> > > >> > > >> > \copy url_lists4(id,url) from 'sample.txt' with delimiter ','; > >> > \copy keywords4(id,name,url) from 'keyword.txt' with delimiter > ','; > >> > > >> > vacuum url_lists4; > >> > vacuum keywords4; > >> > analyze url_lists4; > >> > analyze keywords4; > >> > > >> > 6, Query > >> > > >> > EXPLAIN SELECT > >> > u.url > >> > FROM > >> > url_lists4 u, > >> > keywords4 k > >> > WHERE > >> > u.url like k.url > >> > AND > >> > k.name = 'esc_url'; > >> > > >> > EXPLAIN ANALYZE SELECT > >> > u.url > >> > FROM > >> > url_lists4 u, > >> > keywords4 k > >> > WHERE > >> > u.url like k.url > >> > AND > >> > k.name = 'esc_url'; > >> > > >> > > >> > SELECT > >> > u.url > >> > FROM > >> > url_lists4 u, > >> > keywords4 k > >> > WHERE > >> > u.url like k.url > >> > AND > >> > k.name = 'esc_url'; > >> > > >> > 7, EXPLAIN > >> > > >> > QUERY PLAN > > >> > > ----------------------------------------------------------------------------- > >> > Nested Loop (cost=0.00..37510799.00 rows=12500000 width=57) > >> > Join Filter: (u.url ~~ k.url) > >> > -> Seq Scan on url_lists4 u (cost=0.00..10682.00 rows=500000 > width=57) > >> > -> Materialize (cost=0.00..129.50 rows=5000 width=28) > >> > -> Seq Scan on keywords4 k (cost=0.00..104.50 > rows=5000 width=28) > >> > Filter: ((name)::text = 'esc_url'::text) > >> > (6 rows) > >> > > >> > 8, EXPLAIN ANALYZE > >> > QUERY > PLAN > >> > > ----------------------------------------------------------------------------------------------------------------------------- > >> > Nested Loop (cost=0.00..37510799.00 rows=12500000 width=57) > (actual time=6011.642..478011.117 rows=4850 loops=1) > >> > Join Filter: (u.url ~~ k.url) > >> > Rows Removed by Join Filter: 2499995150 > >> > -> Seq Scan on url_lists4 u (cost=0.00..10682.00 rows=500000 > width=57) (actual time=0.034..192.646 rows=500000 loops=1) > >> > -> Materialize (cost=0.00..129.50 rows=5000 width=28) > (actual time=0.000..0.261 rows=5000 loops=500000) > >> > -> Seq Scan on keywords4 k (cost=0.00..104.50 > rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1) > >> > Filter: ((name)::text = 'esc_url'::text) > >> > Planning time: 0.061 ms > >> > Execution time: 478011.773 ms > >> > (9 rows) > >> > > >> > > >> > 2015年12月28日(月) 3:39 Tom Lane <t...@sss.pgh.pa.us>: > >> >> > >> >> Andreas Kretschmer <andr...@a-kretschmer.de> writes: > >> >> >> Tom Lane <t...@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 > geschrieben: > >> >> >> What in the world is this @~ operator? And what sort of index are > >> >> >> you using now, that can accept it? Are the rowcount estimates in > >> >> >> the EXPLAIN output accurate? (If they are, it's hardly surprising > >> >> >> that the query takes a long time.) > >> >> > >> >> > in a privat mail he called an other operator: ~~. I think, the @~ > is an error. > >> >> > >> >> Well, ~~ isn't directly indexable by btree indexes either, so there's > >> >> still something wrong with either the EXPLAIN output or the claimed > >> >> index definitions. > >> >> > >> >> regards, tom lane > > > > > > Best regards. > > >