[jira] [Updated] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-02-02 Thread Miguel Angel Fernandez Diaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Angel Fernandez Diaz updated CASSANDRA-6640:
---

Attachment: 6640v2.diff

This is the new patch with the modifications suggested by Sam Tunnicliffe, it 
has been created according to the current trunk of the 2.1-SNAPSHOT version.

 Improve custom 2i performance and abstraction
 -

 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
Assignee: Miguel Angel Fernandez Diaz
  Labels: patch, performance
 Fix For: 2.1

 Attachments: 6640.diff, 6640v2.diff


 With the current implementation, the method update from SecondaryIndexManager 
 forces to insert and delete a cell. That happens because we assume that we 
 need the value of the old cell in order to locate the cell that we are 
 updating in our custom secondary index implementation. 
 However, depending on the implementation, a insert and a delete operations 
 could have much worse performance than a simple update. Moreover, if our 
 custom secondary index doesn't use inverted indexes, we don't really need the 
 old cell information and the key information is enough. 
 Therefore, a good solution would be to make the update method more abstract. 
 Thus, the update method for PerColumnSecondaryIndex would receive also the 
 old cell information and from that point we could decide if we must carry out 
 the delete+insert operation or just a update operation.
 I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-01-30 Thread Miguel Angel Fernandez Diaz (JIRA)
Miguel Angel Fernandez Diaz created CASSANDRA-6640:
--

 Summary: Improve custom 2i performance and abstraction
 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
 Fix For: 2.1


With the current implementation, the method update from SecondaryIndexManager 
forces to insert and delete a cell. That happens because we assume that we need 
the value of the old cell in order to locate the cell that we are updating in 
our custom secondary index implementation. 

However, depending on the implementation, a insert and a delete operations 
could have much worse performance than a simple update. Moreover, if our custom 
secondary index doesn't use inverted indexes, we don't really need the old cell 
information and the key information is enough. 

Therefore, a good solution would be to make the update method more abstract. 
Thus, the update method for PerColumnSecondaryIndex would receive also the old 
cell information and from that point we could decide if we must carry out the 
delete+insert operation or just a update operation.

I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-01-30 Thread Miguel Angel Fernandez Diaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Angel Fernandez Diaz updated CASSANDRA-6640:
---

Attachment: 6640.diff

The solution takes into account the issue 5540

 Improve custom 2i performance and abstraction
 -

 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
  Labels: patch, performance
 Fix For: 2.1

 Attachments: 6640.diff


 With the current implementation, the method update from SecondaryIndexManager 
 forces to insert and delete a cell. That happens because we assume that we 
 need the value of the old cell in order to locate the cell that we are 
 updating in our custom secondary index implementation. 
 However, depending on the implementation, a insert and a delete operations 
 could have much worse performance than a simple update. Moreover, if our 
 custom secondary index doesn't use inverted indexes, we don't really need the 
 old cell information and the key information is enough. 
 Therefore, a good solution would be to make the update method more abstract. 
 Thus, the update method for PerColumnSecondaryIndex would receive also the 
 old cell information and from that point we could decide if we must carry out 
 the delete+insert operation or just a update operation.
 I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Issue Comment Deleted] (CASSANDRA-6640) Improve custom 2i performance and abstraction

2014-01-30 Thread Miguel Angel Fernandez Diaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Angel Fernandez Diaz updated CASSANDRA-6640:
---

Comment: was deleted

(was: The solution takes into account the issue 5540)

 Improve custom 2i performance and abstraction
 -

 Key: CASSANDRA-6640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Miguel Angel Fernandez Diaz
  Labels: patch, performance
 Fix For: 2.1

 Attachments: 6640.diff


 With the current implementation, the method update from SecondaryIndexManager 
 forces to insert and delete a cell. That happens because we assume that we 
 need the value of the old cell in order to locate the cell that we are 
 updating in our custom secondary index implementation. 
 However, depending on the implementation, a insert and a delete operations 
 could have much worse performance than a simple update. Moreover, if our 
 custom secondary index doesn't use inverted indexes, we don't really need the 
 old cell information and the key information is enough. 
 Therefore, a good solution would be to make the update method more abstract. 
 Thus, the update method for PerColumnSecondaryIndex would receive also the 
 old cell information and from that point we could decide if we must carry out 
 the delete+insert operation or just a update operation.
 I attach a patch that implements this solution.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6497) Iterable CqlPagingRecordReader

2014-01-15 Thread Miguel Angel Fernandez Diaz (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872115#comment-13872115
 ] 

Miguel Angel Fernandez Diaz commented on CASSANDRA-6497:


I'm also interested in this issue, is there any reviewer assigned at the moment?

 Iterable CqlPagingRecordReader
 --

 Key: CASSANDRA-6497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6497
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Luca Rosellini
 Fix For: 2.1

 Attachments: iterable-CqlPagingRecordReader.diff


 The current CqlPagingRecordReader implementation provides a non-standard way 
 of iterating over the underlying {{rowIterator}}. It would be nice to have an 
 Iterable CqlPagingRecordReader like the one proposed in the attached diff.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6498) Null pointer exception in custom secondary indexes

2014-01-15 Thread Miguel Angel Fernandez Diaz (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872304#comment-13872304
 ] 

Miguel Angel Fernandez Diaz commented on CASSANDRA-6498:


+1

I've been running some tests with this patch and there were no errors.

 Null pointer exception in custom secondary indexes
 --

 Key: CASSANDRA-6498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6498
 Project: Cassandra
  Issue Type: Bug
Reporter: Andrés de la Peña
Assignee: Miguel Angel Fernandez Diaz
  Labels: 2i, secondaryIndex, secondary_index
 Attachments: CASSANDRA-6498.patch


 StorageProxy#estimateResultRowsPerRange raises a null pointer exception when 
 using a custom 2i implementation that not uses a column family as underlying 
 storage:
 {code}
 resultRowsPerRange = highestSelectivityIndex.getIndexCfs().getMeanColumns();
 {code}
 According to the documentation, the method SecondaryIndex#getIndexCfs should 
 return null when no column family is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6498) Null pointer exception in custom secondary indexes

2014-01-14 Thread Miguel Angel Fernandez Diaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Angel Fernandez Diaz updated CASSANDRA-6498:
---

Attachment: CASSANDRA-6498.patch

In order to avoid this null pointer exception, we shouldn't assume that 
highestSelectivityIndex (which is a SecondaryIndex) has a IndexCfs because that 
depends on the implementation type.
 
Therefore, a nice way to solve this issue would be to include an abstract 
method in the SecondaryIndex class, add a implementation of the method where we 
really know there is a IndexCfs and otherwise delegate the implementation of 
this method to those who are creating a custom 2i.

I submit a patch that implements this solution.

 Null pointer exception in custom secondary indexes
 --

 Key: CASSANDRA-6498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6498
 Project: Cassandra
  Issue Type: Bug
Reporter: Andrés de la Peña
  Labels: 2i, secondaryIndex, secondary_index
 Attachments: CASSANDRA-6498.patch


 StorageProxy#estimateResultRowsPerRange raises a null pointer exception when 
 using a custom 2i implementation that not uses a column family as underlying 
 storage:
 {code}
 resultRowsPerRange = highestSelectivityIndex.getIndexCfs().getMeanColumns();
 {code}
 According to the documentation, the method SecondaryIndex#getIndexCfs should 
 return null when no column family is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)