[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-22 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891038#action_12891038
 ] 

Kristian Waagan commented on DERBY-4477:


Sounds good to me. Thanks, Kathey.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-21 Thread Kathey Marsden (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890929#action_12890929
 ] 

Kathey Marsden commented on DERBY-4477:
---

I looked at the changes that did get back ported for DERBY-3645 and DERBY-3646 
and the partial fix for this issue and I think they are ok to leave in 10.5 as 
the behavior has improved and seem to accomplish the phase 1 goals of this fix:

- Phase 1
 1) No crashes or wrong results due to stream reuse when executing duplicate 
column selections (minus goal 4)
 2) Minimal performance degradation for non-duplicate column selections
 3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
BIT DATA] column selections

But not the phase 2 goals of:
- Phase 2
 4) No out-of-memory exceptions during execution of duplicate column selections 
of BLOB/CLOB
 5) Optimize BLOB/CLOB cloning


and thus are good fixes for 10.5.  I fixed up the javadoc warnings for the 
references to the 10.5 lowmem tests with revision 
966440 in 10.5





> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-14 Thread Kathey Marsden (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888441#action_12888441
 ] 

Kathey Marsden commented on DERBY-4477:
---

Thank you Kristian for the clarification. I am not sure where I got off track 
and ended up linking these issues. I must have pulled up the wrong email the 
day Lily and I made the links. 

I will remove the tags for now and take a closer look at whether DERBY-3645 and 
DERBY-3646 should be backed out of 10.5. Although they merged cleanly and 
passed regression tests, the severed fix may be problematic.



> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-14 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888242#action_12888242
 ] 

Kristian Waagan commented on DERBY-4477:


You can't backport 908563 without also backporting DERBY-4520, which includes 
rather significant changes.
I was a bit surprised to see these issues being backported, since the comment 
on the '10.5 backporting' thread on derby-dev suggested they wouldn't.
DERBY-3645 and DERBY-3646 will also work in 10.5 as it stands now, as long as 
there is enough memory for the materialized LOB.

I think the easiest thing to do here is to simply remove the @see tags, but I 
haven't looked in detail at the state of the code in 10.5.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-13 Thread Kathey Marsden (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888064#action_12888064
 ] 

Kathey Marsden commented on DERBY-4477:
---

Myrna pointed out that back porting the change introduced the following javadoc 
warnings.

  [javadoc] 
D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398:
 warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem in 
org.apache.derbyTesting.functionTests.tests.memory.BlobMemTest
  [javadoc] 
D:\svnnightlies\v10_5\src\opensource\java\testing\org\apache\derbyTesting\functionTests\tests\jdbcapi\BLOBTest.java:398:
 warning - Tag @see: can't find testDerby4477_3645_3646_Repro_lowmem_clob in 
org.apache.derbyTesting.functionTests.tests.memory.ClobMemTest

Based on Kristian's earlier comment
I am unclear on whether this means I should also backport revison 908563 along 
with the lowmem tests and hopefully in doing so resolve the difference in trunk 
and 10.5 or just fix up the javadoc not to refer to these methods which are not 
currently in 10.5.




> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online

[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-07-13 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887679#action_12887679
 ] 

Kristian Waagan commented on DERBY-4477:


Backport went in with revision 963326.

I just want to point out that the way the problematic scenario is handled in 
10.5 and 10.6/trunk is different.
10.5 will materialize the values ('((StreamStorable)dvd).loadStream()'), 
whereas 10.6/trunk will clone the source streams if possible 
(dvd.cloneValue(false)''). This will probably not cause much trouble, as LOBs 
would have to be projected more than once to trigger an OOME (i.e 'select 
myclob as clob1, myclob as clob2 ...').

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: SQL, Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Fix For: 10.5.3.1, 10.6.1.0
>
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-19 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835711#action_12835711
 ] 

Kristian Waagan commented on DERBY-4477:


Dag, this issue is resolved as "Not A Problem". Is this correct?
Also, no fix version has been set.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-10 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832018#action_12832018
 ] 

Kristian Waagan commented on DERBY-4477:


+1 to commit patch 'derby-4477-useCloning'

Note that the "length-materialization optimization" (code that was commented 
out earlier, and removed by the latest patch) hasn't been implemented yet. 
However, I think it is better placed inside the various cloneValue-methods and 
hope to get to it at a later time.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Dag H. Wanvik
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat, derby-4477-useCloning.diff, 
> derby-4477-useCloning.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-02 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828620#action_12828620
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Committed *-lowmem-followup as svn 905621, regressions ran ok.


> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem-followup.diff, 
> derby-4477-lowmem-followup.stat, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-01 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828302#action_12828302
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

The reason is that the client driver has a finalizer for the Class 
EncodedInputStream
which when run, also closes the Reader passed in. This was introduced in 10.2; 
the embedded driver does not close the Reader when done with it. I filed 
DERBY-4531 for this difference in behavior. I'll change the ClobMemTest to 
side-step this.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem.diff, derby-4477-lowmem.stat, 
> derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-01 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828110#action_12828110
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Hmm, could it be a finalizer for the prepared statement that closes the stream? 
I'll look into it.


> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem.diff, derby-4477-lowmem.stat, 
> derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-02-01 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828061#action_12828061
 ] 

Knut Anders Hatlen commented on DERBY-4477:
---

ClobMemTest.testDerby4477_3645_3646_Repro_lowmem_clob is failing intermittently 
in the nightly regression tests. See here:

http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista/904556-suitesAll_diff.txt
http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/vista-64/904812-suitesAll_diff.txt
http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/solN+1/904812-suitesAll_diff.txt

There are three different stack traces, but they all report that the 
LoopingAlphabetReader is closed, which is kind of odd since 
LoopingAlphabetReader.reopen() is called right before the failing line in all 
three cases.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem.diff, derby-4477-lowmem.stat, 
> derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-29 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806376#action_12806376
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Committed patch derby-4477-lowmem-2 as svn 904538. There are not two committed 
patches on this issue, both of which should be revisited when cloning streams 
are properly handled (not materialized when large).

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem-2.diff, 
> derby-4477-lowmem-2.stat, derby-4477-lowmem.diff, derby-4477-lowmem.stat, 
> derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-29 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806341#action_12806341
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Thanks for looking at the patch, Knut. I'll substitute the existing assert 
methods in a new rev.


> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-29 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806326#action_12806326
 ] 

Knut Anders Hatlen commented on DERBY-4477:
---

BaseTestCase already implements assertEquals() methods specialized for 
InputStreams and Readers, so I think the new methods assertEqualStreams() and 
assertEqualReaders() could be removed.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-lowmem.diff, 
> derby-4477-lowmem.stat, derby-4477-partial-2.diff, derby-4477-partial-2.stat, 
> derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-25 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804834#action_12804834
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Trying again to make HashTableResultSet fail in a similar way
I was able to make stream column appear duplicated in 
HashJoin/HashTableResultSet, e.g with the following query:

select t1.b, t1.c, t2.b, t2.c from
 (select id, v, v, (select count(*) from sys.systables) w from mytab) 
t1(a,b,c,d), 
 (select id, v, v, (select count(*) from sys.systables) w from mytab) 
t2(a,b,c,d) 
where t1.a=t2.a

where v is the blob similar to repro in DERBY-3646, but there is a 
ProjectRestrictResultset under the HashTableResultset which takes care of the 
cloning after svn 902857 (before the patch, the query will fail).
If I remove the subquery in the select list flattening is allowed, and I see a 
HashExistsJoin/HashScanResultset instead.

I am not yet entirely convinced that a HashTableResultSet can't appear without 
an underlying ProjectRestrictResultset, though, but it may be the case.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial-2.diff, 
> derby-4477-partial-2.stat, derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-25 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804589#action_12804589
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Committed patch derby-4477-partial-2 as svn 902857.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial-2.diff, 
> derby-4477-partial-2.stat, derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-25 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804471#action_12804471
 ] 

Knut Anders Hatlen commented on DERBY-4477:
---

I don't think the scenario I suggested will cause any problems. If the data 
type is VARCHAR, the value appears to be materialized in memory even in the 
case of overflow. If one of the long data types is used, DISTINCT queries are 
not allowed.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial-2.diff, 
> derby-4477-partial-2.stat, derby-4477-partial.diff, derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-22 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803788#action_12803788
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Thanks, Knut! Yes, I was concerned with the similar code in HasTableResultset; 
LOBs can't be involved in a distinct, so the remaining suspects would be the 
long character types. I am going to do some experiments to establish if this 
can be an issue. Thanks for the suggestion scenario! Probably the safest thing 
would be to just include the logic for the cloning in HashTableResultset as 
well.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-22 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803746#action_12803746
 ] 

Knut Anders Hatlen commented on DERBY-4477:
---

I would expect the performance impact of these checks to be negligible, so +1 
from me to keeping the code simple.

The approach looks good to me. It's not obvious after the first quick look that 
it's a complete solution (that is, whether it's enough to do this duplication 
check in PRN), but I don't have any evidence suggesting it's not. The first 
thing that comes to mind is what if the duplication happens in HashTableNode 
(the other caller of mapSourceColumns()). Could it be that that code path will 
be triggered by this (untested) case

  - set page size to 4k
  - create table t (x varchar(32000))
  - insert into t values (..string longer than 4k..)
  - select distinct x,x from t

?

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-22 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803736#action_12803736
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Thanks for the comments, Kristian, I'll include those in my next rev of the 
patch. I don't think always creating a column map matters for performance here, 
since the object is allocated at compilation time. Upside is slightly simpler 
code at execution time (one test in stead of two), although testing for a 
member in a boolean array is slightly expensive that checking to find an empty 
pointer perhaps. If you like, I can reintroduce that logic.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-22 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803724#action_12803724
 ] 

Kristian Waagan commented on DERBY-4477:


Hi Dag,

The current patch looks good to me. A few nits and one question:
 - typo in BLOBTest.testDerby4477Repro JavaDoc
 - typo in PRRS: "columsn"
 - typo in RCL.mapSourceColumns JavaDoc: "column" -> "columns"
 - if you make PRRS.cloneMap final and return null if there are no columns to 
clone (and add the check for cloneMap != null), do you think that will matter 
performance-wise?

I think it's safe to not clone the first occurrence, but I don't know how much 
it matters. My assumption is that the clone code will only be activated for a 
very small percentage of queries. I would be more worried about not incurring 
extra cost in the normal case, where each column is reference only once.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-22 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803695#action_12803695
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

Regressions passed. I will file a new JIRA for the lack of heeding close in our 
internal streams, and add the corrected repro from derby-3646 to the test as 
well, and spin a new rev of my patch.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-21 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803553#action_12803553
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

BinaryToRawStream qua FilterInputStream.close mere closes the passed in 
InputStream, in our case
this is a FormatIdInputStream, which again wraps a  OverflowInputStream which 
fails to take any action on close. This explains why no error was seen 
initially for the DERBY-3646 repro. This is unfortunate and I think we should 
override the close method somewhere, maybe in BinaryToRawStream to avoid 
allowing reading after close.


> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-21 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803533#action_12803533
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

The repro for DERBY-3646 fails because it's coded wrong: Per JDBC, the stream 
should be digested before a new get* method is called, cf. 
http://java.sun.com/j2se/1.5.0/docs/api/java/sql/ResultSet.html#getBinaryStream(int).

When I fix that error in the repro, both derby-4477-partial and 
derby-4477-0a-prototype (with 64K limit) passes. When above the limit with 
derby-4477-0a-prototype, materialization is not done, but rather the new 
copyForRead method is used. This will eventually return a wrapped stream, 
BinaryToRawStream which extends java.io.FilterInputStream. Strangely,  
FilterInputStream does not give an error on read even after it has been closed 
(by EmbedResultSet#closeCurrentStream), so that's why the repro passed with the 
original derby-4477-0a-prototype, so the wrong usage in the repro is not caught.


> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2010-01-21 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803511#action_12803511
 ] 

Dag H. Wanvik commented on DERBY-4477:
--

I notice that test repro for DERBY-3646 still fails with my version of the 
patch. The derby-4477-0a-prototype patch makes that repro pass, but only 
because the limit for materialization is set low: increase it from 32K to 64K , 
and that repro fails with that patch as well, so there seems to be more to be 
done here? Or maybe I misunderstood something of the prototype patch... In my 
version I marked for cloning all occurences which had duplicates in the RCL 
(not just occurence 2..n as the prototype patch does), but that does not seem 
to make a difference..

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff, 
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

2009-12-15 Thread Kristian Waagan (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790733#action_12790733
 ] 

Kristian Waagan commented on DERBY-4477:


Removed unused imports with revision 890789.

> Selecting / projecting a column whose value is represented by a stream more 
> than once fails
> ---
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
>Reporter: Kristian Waagan
>Assignee: Kristian Waagan
>
> Selecting / projecting a column whose value is represented as a stream more 
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream 
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in 
> multiple columns. There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without 
> materializing it. Note that the streams I'm talking about are streams 
> originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required 
> seems acceptable (negligible?)
>  - in some cases (A) has better performance than (B) because the raw data 
> only has to be decoded once
>  - stream clones are preferred when the data value is above a certain size 
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server 
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to 
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the 
> client / user
>(this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with 
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> - Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate 
> column selections (minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
> BIT DATA] column selections
> - Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column 
> selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. 
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I 
> have decided to be liberal when setting the bug behavior facts. Depending on 
> where the duplicate column selection is used, it can cause both crashes, 
> wrong results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.