[GENERAL] Import large data set into a table and resolve duplicates?

2015-02-16 Thread Eugene Dzhurinsky
Hello!

I have a huge dictionary table with series data generated by a third-party
service. The table consists of 2 columns

- id : serial, primary key
- series : varchar, not null, indexed

From time to time I need to apply a patch to the dictionary, the patch file
consists of series data, one per line.

Now I need to import the patch into the database, and produce another file as
- if the passed series field exists in the database, then return ID:series
- otherwise insert a new row to the table and generate new ID and return 
ID:series
for each row in the source file.

So the new file will contain both ID and series data, separated by tab or
something.

While reading and writing the data is not a question (I simply described the
whole task just in case), I wonder what is the most efficient way of importing
such a data into a table, keeping in mind that 

- the dictionary table already consists of ~200K records
- the patch could be ~1-50K of records long

Thanks!

-- 
Eugene N Dzhurinsky


pgpbp5vcVZ4XQ.pgp
Description: PGP signature


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-16 Thread Francisco Olarte
On Sun, Feb 15, 2015 at 7:11 PM, Eugene Dzhurinsky jdeve...@gmail.com
wrote:

 On Sun, Feb 15, 2015 at 06:48:36PM +0100, Francisco Olarte wrote:
  One thing that strikes me is you are either at the beginning of your
 usage
  of this or you have A LOT of already present lines in the path ( I mean,
  path has one fourth of the lines of dictionary ). Which is the case?

 Right now I'm on the very beginning stage, yes, and I expect to have cache
 miss for the dictionary at ratio of at least 70%, which eventually will
 drop
 down to 5-10%.


​Well, that gives one important data point ( the other being the expected
median/average size of your series entries ).

And this means the size of the problem is completely different, so the
solutions should be too. I mean, at a 70% hit rate with a 50k lines patch
you are expecting 35k insertions, and you even expect to have 2k5 / 5k
insertions when it drops. Not knowing any other num​ber, I'll urge you to
DISCARD any of my propossed solutions ( i.e, if you have 20 runs at 35k
followed by 100 runs at 5k you will end up with a
200+20*35+100*5=200+700+500=1400k rows table receiving 5k updates from a
50k file, which is a completely different problem than the original. Not
having any expected problem size, I will not propose any more.


  When doing bulk operations ( like this, with a record count of 25% of the
  whole file ) indexing is nearly always slowed than sorting and merging.
 As
 ​.


So you suggest to take this off the Postgres?


​As stated above, I do not presently suggest anything, as I'm unable to do
it without further knowledge of the problem ( which, apart from the great
unknown of the problem size would also need how is the dictionary being to
be used, among other things ).​

Thats interesting. Simply put,
 I'll do a dump of the dictionary, sorted by series, to some file. Then sort
 the file with patch by series. Then merge the dictionary (left) and the
 patch
 (right) files. And during the merge if the (right) line doesn't have a
 corresponding (left) line, then put a nextval expression for sequence as
 an ID
 parameter. Then truncate existing dictionary table and COPY the data from
 the
 merged file into it.


​I would not truncate the dictionary, just put insert returning expressions
in the data file. Anyway, I would not presently suggest doing that, with
the data I have it's performance is a great unknown. With the original
problem size, it can be made highly performant, and I know batch processes
( sort, merge, etc...) easily beat ''interactive'' ( index lookup etc )
throughput wise ( and so total time wise ) on some sizes, but you need to
know the problem.

Also, when doing this, it depends on who updates the table and how. If only
the patches do it you can keep the table on the server and a sorted copy
out of it, and use the sorted copy plus the patch to prepare a set of
insert-returnings, which you then process and use ( their return value ) to
generate a new sorted copy ( this is normally very fast, and it can be made
robust, and most important, does not hit the server too much ( you use the
sorted copy plus the patch to generate a 'to_insert' file, then go to the
server, copy that into a to_insert temp table, insert it into dict, with
returning, inserting the result ( with ID's ) into an 'inserted' temp
table, then copy out the table, and then with the copy out (which you made
already sorted, or just sort again) you patch the dictionary copy. If you
ever think you've lost sync you just copy out the dictionary again and sort
it. ) This is batch-processing 101, but you need knowledge of the problem,
with the current data, I cannot recommend anything.

​I've been doing that sort of things for decades, and it works quite well,
but normally only bother doing it for ''big'' tasks, meaning multihour at
least. The thing is indexes seem quite fast as they give you a row very
fast, while the sort spends hours preparing, but after that the sorted rows
go in really fast, much faster than the index, and ends up first. As I say,
it depends on the problem. If you have a 5 min runtime doing straight
update/inserts ( not unreasonable for 200k/50k) to be done daily, it does
not make sense to make it more complex to make the runtime 15 secs. OTOH,
if you have a weekly 20h process which you can cut to 6h to fit it into a
nightly window, it does ( And I'm not exagerating, on the early 90s
changing indexes to sorting and batching allowed me to cut a 4.5 DAYS
process to 8 h. It did not seem to do anything for the first 7, but then it
went really fast  ).

Anyway, dessign it according to your knowledge of the problem.

Regards.
Francisco Olarte.





 Is it what you've meant?

 Thank you!

 --
 Eugene Dzhurinsky



Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread John McKown
On Sat, Feb 14, 2015 at 3:54 PM, Eugene Dzhurinsky jdeve...@gmail.com
wrote:

 Hello!

 I have a huge dictionary table with series data generated by a third-party
 service. The table consists of 2 columns

 - id : serial, primary key
 - series : varchar, not null, indexed

 From time to time I need to apply a patch to the dictionary, the patch
 file
 consists of series data, one per line.

 Now I need to import the patch into the database, and produce another file
 as
 - if the passed series field exists in the database, then return
 ID:series
 - otherwise insert a new row to the table and generate new ID and return
 ID:series
 for each row in the source file.

 So the new file will contain both ID and series data, separated by tab or
 something.

 While reading and writing the data is not a question (I simply described
 the
 whole task just in case), I wonder what is the most efficient way of
 importing
 such a data into a table, keeping in mind that

 - the dictionary table already consists of ~200K records
 - the patch could be ~1-50K of records long
 - records could not be removed from the dictionary, only added if not exist

 Thanks!

 --
 Eugene Dzhurinsky


​I was hoping that you'd get a good reply from someone else. But it is the
weekend. So I will _try_ to explain what I would try. You can see if it is
of any use to you. Sometimes my ideas are really good. And other times they
are, well let's be nice and say not as good as other times [grin/]. I
can't test the below because I don't have any data. But hopefully it will
be close and even of some help to you.

The first thing that I would do is put the input patch data into its own
table. Perhaps a temporary table, or even a permanent one.

DROP TABLE IF EXISTS patch_data;
CREATE TABLE patch_data (
   input_record_number SERIAL,
   already_exists BOOLEAN DEFAULT FALSE;
   id INTEGER,
   series TEXT ​NOT NULL );

​I don't know if the order of the records​ in output file need to match the
order of the records in the input file. If not, they you don't need the
input_record_number field. Otherwise, that is used to maintain the order
of the input. At this point, you can load the input file with an SQL
command similar to:

COPY patch_data (series) FROM input-file.txt;

Now update the path_data from the existing dictionary table to see which
series data already exists.

UPDATE patch_data SET already_exists=((SELECT TRUE FROM dictionary WHERE
dictionary.series = patch_data.series));

At this point, the table patch_data has been updated such that if the
series data in it already exists, the already_exists column is now TRUE
instead of the initial FALSE. This means that we need to insert all the
series data in patch_data which does not exist in dictionary ( i.e.
already_exists is FALSE in patch_data) into dictionary.

INSERT INTO dictionary(series) SELECT series FROM patch_data WHERE
already_exists = FALSE;

The above should insert the series records which don't exist in
dictionary into it and generate an id column for it from your SERIAL
definition. Now we need to find out the id values for each of the
series values and return them.

UPDATE patch_data SET id=((SELECT id FROM dictionary WHERE
dictionary.series = patch_data.series));

At this point, if I have not messed up, every series value in
patch_data should have the proper id value from dictionary. So the
table patch_data should now have all that we need in it. So we just use
it to make our output file. I don't know you're output requirements, but
I'd just do something like:

SELECT id, e'\t',series FROM patch_data ORDER BY input_record_number;

Again, if the output order does not need to be the same as the input order,
then all the stuff with the input_record_number column can be eliminated.
Just to be a good citizen:

DROP TABLE patch_data;

at this point. Unless, you think you might need it for some reason. Who
knows, you might even want to archive it as some sort of audit information.


-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Eugene Dzhurinsky
On Sun, Feb 15, 2015 at 10:00:50AM -0600, John McKown wrote:
 UPDATE patch_data SET already_exists=((SELECT TRUE FROM dictionary WHERE
 dictionary.series = patch_data.series));

Since the dictionary already has an index on the series, it seems that
patch_data doesn't need to have any index here.

 At this point, the table patch_data has been updated such that if the
 series data in it already exists, the already_exists column is now TRUE
 instead of the initial FALSE. This means that we need to insert all the
 series data in patch_data which does not exist in dictionary ( i.e.
 already_exists is FALSE in patch_data) into dictionary.
 
 INSERT INTO dictionary(series) SELECT series FROM patch_data WHERE
 already_exists = FALSE;

At this point patch_data needs to get an index on already_exists = false,
which seems to be cheap.

 UPDATE patch_data SET id=((SELECT id FROM dictionary WHERE
 dictionary.series = patch_data.series));

No index needed here except the existing one on dictionary.

That looks really promising, thank you John! I need only one index on the
patch_data table, and I will re-use the existing index on the dictionary.

Thanks again!

-- 
Eugene Dzhurinsky


pgpY2gxa8nKur.pgp
Description: PGP signature


Fwd: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Francisco Olarte
Forwarding to the list, as I forgot ( as usual, think I did the same with
my original message ). Sorry for the duplicates.
-- Forwarded message --
From: Francisco Olarte fola...@peoplecall.com
Date: Sun, Feb 15, 2015 at 6:48 PM
Subject: Re: [GENERAL] Import large data set into a table and resolve
duplicates?
To: Eugene Dzhurinsky jdeve...@gmail.com


Hi Eugene:

On Sun, Feb 15, 2015 at 6:29 PM, Eugene Dzhurinsky jdeve...@gmail.com
wrote:

 On Sun, Feb 15, 2015 at 01:06:02PM +0100, Francisco Olarte wrote:
  You state below 200k rows, 50k lines per path. That is not huge unless
  series really big, is it?

 series data is in between of 100-4096 chars


​Well, then it depends on the average length. With 1k avg you'll have 200Mb
for the series. Not having stats on your setup, that is easily manageable
by today standard machines.

One thing that strikes me is you are either at the beginning of your usage
of this or you have A LOT of already present lines in the path ( I mean,
path has one fourth of the lines of dictionary ). Which is the case?​

That sounds cool, but I'm a bit worried about the performance of a lookup
 over
 the series column and the time to create index for the temp table on
 series column. But perhaps it's better to try this and if a performance
 will
 go really bad - then do some optimizations, like partitioning etc.


Who said create an index on the temp tables?​​ They ​are small. And,
anywhere, If i would be doing this with files I'll NEVER do it indexing,
I'll sort, merge, go on ( this is Why I suggested temp files, postgres does
sort and merges really well if it needs to and the optimizer has other
tricks available ).

When doing bulk operations ( like this, with a record count of 25% of the
whole file ) indexing is nearly always slowed than sorting and merging. As
I said, I do not know the distribution of your data size, but assuming the
worst case ( 200k 4k entries dict, which is 800Mb data, 50k 4k patches,
which is 200Mb data), you can do it with text files in 5 minutes by sorting
and merging in 8Mb RAM ( I've done similar things, in that time, and the
disk where much slower, I've even done this kind of things with half inch
tapes and it wasn't thar slow ). You just sort the dictionary by series,
sort patch by series, read both comparing keys and write a result file and
a new dictionary, and, as the new dictionary is already sorted, you do not
need to read it the next time. It was the usual thing to do for updating
accounts on the tape days.

And, as they always said, measure, then optimize. I would try it ( use some
little tables, like say 20 entries dict and 5 lines patches to debug the
queries ), then measure ( if you don't mind hogging the server, or have a
test one launch the whole thing and measure times, if you get it done in a
reasonable time, and you may be surprised, some times simple algorithms are
fast ).

​The only other thing I would try maybe doing the things without temp
tables, something like:

select series, id from dictionary join patches_in using series
UNION ALL ( faster, as you know there will be no duplicates do not bother
checking them )
insert into dictionary (series) select p.series from patches where patches
not in (select series from dictionary) returning series, id

( you may need to experiment a little with the queries and use some with
expresions, but you see the logic ).

Francisco Olarte.









 Thank you!

 --
 Eugene Dzhurinsky



Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Francisco Olarte
Hi Eugene:

On Sun, Feb 15, 2015 at 6:36 PM, Eugene Dzhurinsky jdeve...@gmail.com
wrote:

 ​...​



 Since the dictionary already has an index on the series, it seems that
 patch_data doesn't need to have any index here.
 ​
 At this point patch_data needs to get an index on already_exists =
 false,
 which seems to be cheap.


​As I told you before, do not focus in the indexes too much. When you do
bulk updates like this they tend to be much slower than a proper sort.

The reason is locality of reference. When you do the things with sorts you
do two or three nicely ordered passes on the data, using full pages. When
you use indexes you spend a lot of time parsing index structures and
switching read-index, read-data, index, data,  ( They are cached, but
you have to switch to them anyway ). Also, with your kind of data indexes
on series are going to be big, so less cache available​ for data.


As I said before, it depends on your data anyway, with the current machines
this day what I'll do with this problem would be to just make a program (
in perl, seems adequate for this ), copy dictionary to client memory and
just read the patch spitting the result file and inserting the needed lines
along the way, seems it should fit in 1Gb without problems, which is not
much by today standards.

Regards.
Francisco Olarte.


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread John McKown
OOPS, I forgot to mention in the SELECT generating the output file that the
e'\t' generates a tab character. You likely already know this, but I like
to make my posts as self contained as possible.

On Sun, Feb 15, 2015 at 10:00 AM, John McKown john.archie.mck...@gmail.com
wrote:

 On Sat, Feb 14, 2015 at 3:54 PM, Eugene Dzhurinsky jdeve...@gmail.com
 wrote:

 Hello!

 I have a huge dictionary table with series data generated by a third-party
 service. The table consists of 2 columns

 - id : serial, primary key
 - series : varchar, not null, indexed

 From time to time I need to apply a patch to the dictionary, the patch
 file
 consists of series data, one per line.

 Now I need to import the patch into the database, and produce another
 file as
 - if the passed series field exists in the database, then return
 ID:series
 - otherwise insert a new row to the table and generate new ID and return
 ID:series
 for each row in the source file.

 So the new file will contain both ID and series data, separated by tab or
 something.

 While reading and writing the data is not a question (I simply described
 the
 whole task just in case), I wonder what is the most efficient way of
 importing
 such a data into a table, keeping in mind that

 - the dictionary table already consists of ~200K records
 - the patch could be ~1-50K of records long
 - records could not be removed from the dictionary, only added if not
 exist

 Thanks!

 --
 Eugene Dzhurinsky


 ​I was hoping that you'd get a good reply from someone else. But it is the
 weekend. So I will _try_ to explain what I would try. You can see if it is
 of any use to you. Sometimes my ideas are really good. And other times they
 are, well let's be nice and say not as good as other times [grin/]. I
 can't test the below because I don't have any data. But hopefully it will
 be close and even of some help to you.

 The first thing that I would do is put the input patch data into its own
 table. Perhaps a temporary table, or even a permanent one.

 DROP TABLE IF EXISTS patch_data;
 CREATE TABLE patch_data (
input_record_number SERIAL,
already_exists BOOLEAN DEFAULT FALSE;
id INTEGER,
series TEXT ​NOT NULL );

 ​I don't know if the order of the records​ in output file need to match
 the order of the records in the input file. If not, they you don't need the
 input_record_number field. Otherwise, that is used to maintain the order
 of the input. At this point, you can load the input file with an SQL
 command similar to:

 COPY patch_data (series) FROM input-file.txt;

 Now update the path_data from the existing dictionary table to see which
 series data already exists.

 UPDATE patch_data SET already_exists=((SELECT TRUE FROM dictionary WHERE
 dictionary.series = patch_data.series));

 At this point, the table patch_data has been updated such that if the
 series data in it already exists, the already_exists column is now TRUE
 instead of the initial FALSE. This means that we need to insert all the
 series data in patch_data which does not exist in dictionary ( i.e.
 already_exists is FALSE in patch_data) into dictionary.

 INSERT INTO dictionary(series) SELECT series FROM patch_data WHERE
 already_exists = FALSE;

 The above should insert the series records which don't exist in
 dictionary into it and generate an id column for it from your SERIAL
 definition. Now we need to find out the id values for each of the
 series values and return them.

 UPDATE patch_data SET id=((SELECT id FROM dictionary WHERE
 dictionary.series = patch_data.series));

 At this point, if I have not messed up, every series value in
 patch_data should have the proper id value from dictionary. So the
 table patch_data should now have all that we need in it. So we just use
 it to make our output file. I don't know you're output requirements, but
 I'd just do something like:

 SELECT id, e'\t',series FROM patch_data ORDER BY input_record_number;

 Again, if the output order does not need to be the same as the input
 order, then all the stuff with the input_record_number column can be
 eliminated. Just to be a good citizen:

 DROP TABLE patch_data;

 at this point. Unless, you think you might need it for some reason. Who
 knows, you might even want to archive it as some sort of audit information.


 --
 He's about as useful as a wax frying pan.

 10 to the 12th power microphones = 1 Megaphone

 Maranatha! 
 John McKown




-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Eugene Dzhurinsky
On Sun, Feb 15, 2015 at 01:06:02PM +0100, Francisco Olarte wrote:
 You state below 200k rows, 50k lines per path. That is not huge unless
 series really big, is it?

series data is in between of 100-4096 chars

 1.- Get the patches into a ( temp ) table, using something like \copy, call
 this patches_in.
 2.- create (temp) table existing_out as select series, id from dictionary
 join patches_in on (series);
 3.- delete from patches_in where series in (select series from
 existing_out);
 4.- create (temp) table new_out as insert into dictionary (series) select
 patches_in.series from patches_in returning series, id
 5.- Copy existing out and patches out.
 6.- Cleanup temps.

That sounds cool, but I'm a bit worried about the performance of a lookup over
the series column and the time to create index for the temp table on
series column. But perhaps it's better to try this and if a performance will
go really bad - then do some optimizations, like partitioning etc.

Thank you!

-- 
Eugene Dzhurinsky


pgp4hwQD7LkHm.pgp
Description: PGP signature


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Eugene Dzhurinsky
On Sun, Feb 15, 2015 at 06:48:36PM +0100, Francisco Olarte wrote:
 One thing that strikes me is you are either at the beginning of your usage
 of this or you have A LOT of already present lines in the path ( I mean,
 path has one fourth of the lines of dictionary ). Which is the case?

Right now I'm on the very beginning stage, yes, and I expect to have cache
miss for the dictionary at ratio of at least 70%, which eventually will drop
down to 5-10%.

 When doing bulk operations ( like this, with a record count of 25% of the
 whole file ) indexing is nearly always slowed than sorting and merging. As
 I said, I do not know the distribution of your data size, but assuming the
 worst case ( 200k 4k entries dict, which is 800Mb data, 50k 4k patches,
 which is 200Mb data), you can do it with text files in 5 minutes by sorting
 and merging in 8Mb RAM ( I've done similar things, in that time, and the
 disk where much slower, I've even done this kind of things with half inch
 tapes and it wasn't thar slow ). You just sort the dictionary by series,
 sort patch by series, read both comparing keys and write a result file and
 a new dictionary, and, as the new dictionary is already sorted, you do not
 need to read it the next time. It was the usual thing to do for updating
 accounts on the tape days.

So you suggest to take this off the Postgres? Thats interesting. Simply put,
I'll do a dump of the dictionary, sorted by series, to some file. Then sort
the file with patch by series. Then merge the dictionary (left) and the patch
(right) files. And during the merge if the (right) line doesn't have a
corresponding (left) line, then put a nextval expression for sequence as an ID
parameter. Then truncate existing dictionary table and COPY the data from the 
merged file into it.

Is it what you've meant?

Thank you!

-- 
Eugene Dzhurinsky


pgpQFsOplL4EJ.pgp
Description: PGP signature


Re: [GENERAL] Import large data set into a table and resolve duplicates?

2015-02-15 Thread Paul A Jungwirth
Hi Eugene,

 Now I need to import the patch into the database, and produce another file as
 - if the passed series field exists in the database, then return ID:series
 - otherwise insert a new row to the table and generate new ID and return 
 ID:series
 for each row in the source file.

I think Francisco's approach is good, and I agree that ~200k rows is
hardly anything. My approach is similar but uses CTEs to combine a lot
of Francisco's queries into one. I still have a separate COPY command
though. (It'd be great if you could COPY into a CTE, but I guess
COPYing into a temporary table is pretty close.) Anyway, when I run
this on my machine, the import finishes in a few seconds:

# Makefile
database=dictionary
port=5432
words=/usr/share/dict/american-english
SHELL=/bin/bash

initial.txt:
  for i in {1..3}; do \
cat ${words} | while read line; do \
  echo $$i $$line; \
done; \
  done  initial.txt

tables: initial.txt
  sudo su postgres -c 'psql -p ${port} ${database} -f tables.sql  initial.txt'

a.txt:
  for i in {1,4}; do \
cat ${words} | while read line; do \
  echo $$i $$line; \
done; \
  done  a.txt

b.txt:
  for i in {4,5}; do \
cat ${words} | while read line; do \
  echo $$i $$line; \
done; \
  done  b.txt

a: a.txt
  sudo su postgres -c 'psql -p ${port} ${database} -f import.sql  a.txt'

b: b.txt
  sudo su postgres -c 'psql -p ${port} ${database} -f import.sql  b.txt'

clean:
  rm -f initial.txt a.txt b.txt

.PHONY: tables a b clean


# tables.sql
DROP TABLE IF EXISTS dictionary;
CREATE TABLE dictionary (id SERIAL PRIMARY KEY, series VARCHAR NOT NULL);
\copy dictionary (series) from pstdin
CREATE UNIQUE INDEX idx_series ON dictionary (series);


# import.sql
CREATE TEMPORARY TABLE staging (
  series VARCHAR NOT NULL
);
\copy staging (series) from pstdin
CREATE INDEX idx_staging_series ON staging (series);

WITH already AS (
  SELECT  id, staging.series
  FROMstaging
  LEFT OUTER JOIN dictionary
  ON  dictionary.series = staging.series
),
adding as (
  INSERT INTO dictionary
  (series)
  SELECT  series::text
  FROMalready
  WHERE   id IS NULL
  RETURNING id, series
)
SELECT  id, series
FROMadding
UNION
SELECT  id, series
FROMalready WHERE id IS NOT NULL
;


Good luck!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Import large data set into a table and resolve duplicates?

2015-02-14 Thread Eugene Dzhurinsky
Hello!

I have a huge dictionary table with series data generated by a third-party
service. The table consists of 2 columns

- id : serial, primary key
- series : varchar, not null, indexed

From time to time I need to apply a patch to the dictionary, the patch file
consists of series data, one per line.

Now I need to import the patch into the database, and produce another file as
- if the passed series field exists in the database, then return ID:series
- otherwise insert a new row to the table and generate new ID and return 
ID:series
for each row in the source file.

So the new file will contain both ID and series data, separated by tab or
something.

While reading and writing the data is not a question (I simply described the
whole task just in case), I wonder what is the most efficient way of importing
such a data into a table, keeping in mind that

- the dictionary table already consists of ~200K records
- the patch could be ~1-50K of records long
- records could not be removed from the dictionary, only added if not exist

Thanks!

-- 
Eugene Dzhurinsky


pgpsmOWtdvXCS.pgp
Description: PGP signature