Re: missing rows after using performance.py

2015-09-08 Thread James Taylor
Hi James, Looks like currently you'll get a error log message generated if a row is attempted to be imported but cannot be (usually due to the data not being compatible with the schema). For psql.py, this would be the client side log and messages would look like this: LOG.error("Error

Re: missing rows after using performance.py

2015-09-08 Thread James Heather
Thanks. I've discovered that the cause is even simpler. With 100M rows, you get collisions in the primary key in the CSV file. An experiment (capturing the CSV file, and counting the rows with a unique primary key) reveals that the number of unique primary keys is about 500 short of the full

Re: missing rows after using performance.py

2015-09-08 Thread Mujtaba Chohan
Thanks James. Filed https://issues.apache.org/jira/browse/PHOENIX-2240. On Tue, Sep 8, 2015 at 12:38 PM, James Heather wrote: > Thanks. > > I've discovered that the cause is even simpler. With 100M rows, you get > collisions in the primary key in the CSV file. An

missing rows after using performance.py

2015-09-08 Thread James Heather
I've had another go running the performance.py script to upsert 100,000,000 rows into a Phoenix table, and again I've ended up with around 500 rows missing. Can anyone explain this, or reproduce it? It is rather concerning: I'm reluctant to use Phoenix if I'm not sure whether rows will be