[jira] [Updated] (CASSANDRA-6640) Improve custom 2i performance and abstraction
[ https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Angel Fernandez Diaz updated CASSANDRA-6640: --- Attachment: 6640v2.diff This is the new patch with the modifications suggested by Sam Tunnicliffe, it has been created according to the current trunk of the 2.1-SNAPSHOT version. Improve custom 2i performance and abstraction - Key: CASSANDRA-6640 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Miguel Angel Fernandez Diaz Assignee: Miguel Angel Fernandez Diaz Labels: patch, performance Fix For: 2.1 Attachments: 6640.diff, 6640v2.diff With the current implementation, the method update from SecondaryIndexManager forces to insert and delete a cell. That happens because we assume that we need the value of the old cell in order to locate the cell that we are updating in our custom secondary index implementation. However, depending on the implementation, a insert and a delete operations could have much worse performance than a simple update. Moreover, if our custom secondary index doesn't use inverted indexes, we don't really need the old cell information and the key information is enough. Therefore, a good solution would be to make the update method more abstract. Thus, the update method for PerColumnSecondaryIndex would receive also the old cell information and from that point we could decide if we must carry out the delete+insert operation or just a update operation. I attach a patch that implements this solution. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (CASSANDRA-6640) Improve custom 2i performance and abstraction
Miguel Angel Fernandez Diaz created CASSANDRA-6640: -- Summary: Improve custom 2i performance and abstraction Key: CASSANDRA-6640 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Miguel Angel Fernandez Diaz Fix For: 2.1 With the current implementation, the method update from SecondaryIndexManager forces to insert and delete a cell. That happens because we assume that we need the value of the old cell in order to locate the cell that we are updating in our custom secondary index implementation. However, depending on the implementation, a insert and a delete operations could have much worse performance than a simple update. Moreover, if our custom secondary index doesn't use inverted indexes, we don't really need the old cell information and the key information is enough. Therefore, a good solution would be to make the update method more abstract. Thus, the update method for PerColumnSecondaryIndex would receive also the old cell information and from that point we could decide if we must carry out the delete+insert operation or just a update operation. I attach a patch that implements this solution. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6640) Improve custom 2i performance and abstraction
[ https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Angel Fernandez Diaz updated CASSANDRA-6640: --- Attachment: 6640.diff The solution takes into account the issue 5540 Improve custom 2i performance and abstraction - Key: CASSANDRA-6640 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Miguel Angel Fernandez Diaz Labels: patch, performance Fix For: 2.1 Attachments: 6640.diff With the current implementation, the method update from SecondaryIndexManager forces to insert and delete a cell. That happens because we assume that we need the value of the old cell in order to locate the cell that we are updating in our custom secondary index implementation. However, depending on the implementation, a insert and a delete operations could have much worse performance than a simple update. Moreover, if our custom secondary index doesn't use inverted indexes, we don't really need the old cell information and the key information is enough. Therefore, a good solution would be to make the update method more abstract. Thus, the update method for PerColumnSecondaryIndex would receive also the old cell information and from that point we could decide if we must carry out the delete+insert operation or just a update operation. I attach a patch that implements this solution. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Issue Comment Deleted] (CASSANDRA-6640) Improve custom 2i performance and abstraction
[ https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Angel Fernandez Diaz updated CASSANDRA-6640: --- Comment: was deleted (was: The solution takes into account the issue 5540) Improve custom 2i performance and abstraction - Key: CASSANDRA-6640 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Miguel Angel Fernandez Diaz Labels: patch, performance Fix For: 2.1 Attachments: 6640.diff With the current implementation, the method update from SecondaryIndexManager forces to insert and delete a cell. That happens because we assume that we need the value of the old cell in order to locate the cell that we are updating in our custom secondary index implementation. However, depending on the implementation, a insert and a delete operations could have much worse performance than a simple update. Moreover, if our custom secondary index doesn't use inverted indexes, we don't really need the old cell information and the key information is enough. Therefore, a good solution would be to make the update method more abstract. Thus, the update method for PerColumnSecondaryIndex would receive also the old cell information and from that point we could decide if we must carry out the delete+insert operation or just a update operation. I attach a patch that implements this solution. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6497) Iterable CqlPagingRecordReader
[ https://issues.apache.org/jira/browse/CASSANDRA-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872115#comment-13872115 ] Miguel Angel Fernandez Diaz commented on CASSANDRA-6497: I'm also interested in this issue, is there any reviewer assigned at the moment? Iterable CqlPagingRecordReader -- Key: CASSANDRA-6497 URL: https://issues.apache.org/jira/browse/CASSANDRA-6497 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Luca Rosellini Fix For: 2.1 Attachments: iterable-CqlPagingRecordReader.diff The current CqlPagingRecordReader implementation provides a non-standard way of iterating over the underlying {{rowIterator}}. It would be nice to have an Iterable CqlPagingRecordReader like the one proposed in the attached diff. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6498) Null pointer exception in custom secondary indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872304#comment-13872304 ] Miguel Angel Fernandez Diaz commented on CASSANDRA-6498: +1 I've been running some tests with this patch and there were no errors. Null pointer exception in custom secondary indexes -- Key: CASSANDRA-6498 URL: https://issues.apache.org/jira/browse/CASSANDRA-6498 Project: Cassandra Issue Type: Bug Reporter: Andrés de la Peña Assignee: Miguel Angel Fernandez Diaz Labels: 2i, secondaryIndex, secondary_index Attachments: CASSANDRA-6498.patch StorageProxy#estimateResultRowsPerRange raises a null pointer exception when using a custom 2i implementation that not uses a column family as underlying storage: {code} resultRowsPerRange = highestSelectivityIndex.getIndexCfs().getMeanColumns(); {code} According to the documentation, the method SecondaryIndex#getIndexCfs should return null when no column family is used. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6498) Null pointer exception in custom secondary indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Angel Fernandez Diaz updated CASSANDRA-6498: --- Attachment: CASSANDRA-6498.patch In order to avoid this null pointer exception, we shouldn't assume that highestSelectivityIndex (which is a SecondaryIndex) has a IndexCfs because that depends on the implementation type. Therefore, a nice way to solve this issue would be to include an abstract method in the SecondaryIndex class, add a implementation of the method where we really know there is a IndexCfs and otherwise delegate the implementation of this method to those who are creating a custom 2i. I submit a patch that implements this solution. Null pointer exception in custom secondary indexes -- Key: CASSANDRA-6498 URL: https://issues.apache.org/jira/browse/CASSANDRA-6498 Project: Cassandra Issue Type: Bug Reporter: Andrés de la Peña Labels: 2i, secondaryIndex, secondary_index Attachments: CASSANDRA-6498.patch StorageProxy#estimateResultRowsPerRange raises a null pointer exception when using a custom 2i implementation that not uses a column family as underlying storage: {code} resultRowsPerRange = highestSelectivityIndex.getIndexCfs().getMeanColumns(); {code} According to the documentation, the method SecondaryIndex#getIndexCfs should return null when no column family is used. -- This message was sent by Atlassian JIRA (v6.1.5#6160)