[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-05-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530150#comment-14530150
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

{{is_dense}} can be null, in which case we need to determine denseness from the 
columns, and that currently isn't happening - one of the broken things. I've 
committed it, though, just to make your branch not fail.

Will fix it properly later.

Thanks.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0

 Attachments: 8358-fix.patch


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-05-05 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529704#comment-14529704
 ] 

Stefania commented on CASSANDRA-8358:
-

At least on my CASSANDRA-7066 branch just rebased from trunk today, 
*test_basic_snapshot_and_restore* in *snapshot_test.py* got broken by a missing 
is_dense in the query used by NativeSSTableLoaderClient.fetchTablesMetadata().

This diff fixes the test:

{code}
diff --git a/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java 
b/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java
index 1ef686c..5b46700 100644
--- a/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java
+++ b/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java
@@ -100,7 +100,7 @@ public class NativeSSTableLoaderClient extends 
SSTableLoader.Client
 {
 MapString, CFMetaData tables = new HashMap();
 
-String query = String.format(SELECT columnfamily_name, cf_id, type, 
comparator, subcomparator FROM %s.%s WHERE keyspace_name = '%s',
+String query = String.format(SELECT columnfamily_name, cf_id, type, 
comparator, subcomparator, is_dense FROM %s.%s WHERE keyspace_name = '%s',
  SystemKeyspace.NAME,
  LegacySchemaTables.COLUMNFAMILIES,
  keyspace);
{code}

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0

 Attachments: 8358-fix.patch


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-05-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529257#comment-14529257
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

Committed to trunk as {{f698cc228452e847e3ad46bd8178549cf8171767}}. Thanks.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.x


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-05-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529262#comment-14529262
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

I'm sure we've broken a bunch of things by the patch. But there really is no 
unit or functional test coverage for any of hadoop stuff - just a little bit 
for pig.

We'd need a new way to specify custom authenticator handlers now (think 
Kerberos). We might have broken sstableloader a bit, and hadoop things a lot.

This part of the codebase has always been properly bad, and, as I said, there 
really is no coverage. [~pkolaczk] Can you post-review, or volunteer someone to 
post-review, the changes made here? Not for style - for breakage alone. I heard 
you might have some extra tests somewhere, too.

We'll address any brokenness in follow-up tickets.

Thanks.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-05-04 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527184#comment-14527184
 ] 

Philip Thompson commented on CASSANDRA-8358:


I've got a branch at https://github.com/ptnapoleon/cassandra/tree/8358 that 
implements the two changes you wanted [~iamaleksey]. All of pig-test pass.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.x


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-04-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504083#comment-14504083
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

Force pushed another updated (and squashed) version to the same branch - 
https://github.com/iamaleksey/cassandra/commits/8358. It adds some more cleanup 
on top of Philip's, in particular some around SSTableLoader.Client 
implementations, but it's still far from clean - because of original code 
dirtiness.

Things that need fixing:
- NativeSSTableLoaderClient must support connecting over SSL. This is a 
regression - the original code did support this.
- NSSTLC TalbeMetadata to CFMetaData code is broken. I think we should, for 
now, do the ugly thing and reimplement what sstableloader was doing, and SELECT 
stuff from schema table manually, then do the equivalent of 
{{ThriftConversion.fromThriftCqlRow()}} call, now unused.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-04-01 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391420#comment-14391420
 ] 

Philip Thompson commented on CASSANDRA-8358:


[~aleksey], here is 
https://github.com/ptnapoleon/cassandra/tree/cassandra-8358-final to reflect 
your requested changes. The only class hierarchy I did not collapse was 
AbstractColumnFamilyInputFormat, though I did remove all the thrift only code 
from the abstract class. I believe I removed all the column family naming in 
method names. Everything that should be deprecated is. All of pig test and unit 
test still pass.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-27 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384715#comment-14384715
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

Pushed a squashed version based on latest trunk to 
https://github.com/iamaleksey/cassandra/commits/8358 with no changes.

So far things mostly look good. I'd like to do a few more cosmetic things, 
however.

1. {{AbstractColumnFamilyRecordWriter}} is a small class, and both the 
(deprecated) {{ColumnFamilyRecordWriter}} and {{CqlRecordWriter}} extend it. It 
also has Thrift-specific logic. So I would prefer the abstract class to go away 
entirely, with its functionality duplicated, if needed, (the shared bits) in 
{{ColumnFamilyRecordWriter}} and {{CqlRecordWriter}}
2. Same for {{AbstractColumnFamilyOutputFormat}}
3. Same for {{AbstractColumnFamilyInputFormat}}. At the very least it shouldn't 
include Thrift-only functionality ({{createAuthenticatedClient}}), at most I'd 
like to get rid of the abstract class and have {{ColumnFamilyInputFormat}} and 
{{CqlInputFormat}} duplicate the shared bits.
4. Same for {{AbstractBulkRecordWriter}} - more than half the class is 
Thrift-code. Plus, shouldn't old {{BulkRecordWriter}} be {{@Deprecated}} too?
5. Same for {{AbstractBulkOutputFormat}} and deprecation of 
{{BulkOutputFormat}} itself (right now both its methods are deprecated 
individually)

With all the  {{*ColumnFamily*}} versions getting deprecated in this version, 
removing them in 3.later would be as simple as rm-ing the non-CQL classes.

Would also be nice to get rid of column family naming everywhere in Cql* 
classed, in favor of Table* - in method names and class names.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-17 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365885#comment-14365885
 ] 

Alex Liu commented on CASSANDRA-8358:
-

When can this ticket committed to 2.1 branch?

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-17 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365890#comment-14365890
 ] 

Aleksey Yeschenko commented on CASSANDRA-8358:
--

This ticket is headed into 3.0, not 2.1. Will be committed as soon as I review.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-16 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363668#comment-14363668
 ] 

Philip Thompson commented on CASSANDRA-8358:


http://cassci.datastax.com/job/scratch_pt_pigtest/lastCompletedBuild/testReport/

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-03-05 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349620#comment-14349620
 ] 

Philip Thompson commented on CASSANDRA-8358:


Waiting on https://datastax-oss.atlassian.net/browse/JAVA-681 should be done 
tomorrow.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-02-02 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302767#comment-14302767
 ] 

Brandon Williams commented on CASSANDRA-8358:
-

Yes, wait.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-02-02 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302762#comment-14302762
 ] 

Brandon Williams commented on CASSANDRA-8358:
-

bq. As stated earlier, there are probably additional changes needed to 
PigTestBase and AbstractCassandraStorage, but they belong in their own tickets.

That's fine, but then this ticket should depend upon them, because pig-test is 
the best and easiest way we have to make sure this works, and I'd rather we 
make sure this works before committing rather than fix broken stuff again later 
if pig-test fails.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-02-02 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302766#comment-14302766
 ] 

Philip Thompson commented on CASSANDRA-8358:


What should be done about the fact that this depends upon version 2.1.5 of the 
java driver, which is not yet released? I assume the easiest thing to do is 
wait for that release, then bundle the appropriate jar. Currently mvn install 
needs to be run against a snapshot jar in order for this to build.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-28 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296200#comment-14296200
 ] 

Philip Thompson commented on CASSANDRA-8358:


Here is my current branch: https://github.com/ptnapoleon/cassandra/compare/8358
Sorry about the WIP pushed changes to BulkLoader, ignore those for now. I have 
recently received a JAR of the a tentative 2.1.5 of the driver containing 
JAVA-312, so I can finish work on this now.

I was having an issue where the Thread calling the java driver's connect() was 
being interrupted, which was causing the connect() to fail. Currently I check 
for Thread.interrupted() and retry if that is the reason for the failure. I am 
not sure how to prevent the interruption in the first place.

Currently when running pig-test, only one test that uses CqlNativeStorage is 
failing, and that is testCqlNativeStorageCollectionColumnTable. 
This is due to the following problem:
{code}
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:552)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:561)
at 
org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:118)
at 
org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:100)
at org.apache.cassandra.cql3.Maps$Value.fromSerialized(Maps.java:164)
at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:273)
at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:262)
at org.apache.cassandra.cql3.Maps$Putter.doPut(Maps.java:355)
at org.apache.cassandra.cql3.Maps$Setter.execute(Maps.java:292)
at 
org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:98)
at 
org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:655)
at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:487)
at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:473)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:233)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:443)
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:134)
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
at 
io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
This is erroring because in CollectionSerializer.readValue
{code}
public static ByteBuffer readValue(ByteBuffer input, int version)
{
if (version = Server.VERSION_3)
{
int size = input.getInt();
if (size  0)
return null;

return ByteBufferUtil.readBytes(input, size);
}
else
{
return ByteBufferUtil.readBytesWithShortLength(input);
}
}
{code}
The value of size from input.getInt() is an integer in the millions for one of 
the map values. I am still figuring out what is differing from cassandra-2.1 
where the test is passing without my changes, but the ByteBuffer itself doesn't 
appear to be different.

In  CqlConfigHelper, should I be creating an OUTPUT_* property for each INPUT_* 
property?

PigTestBase should be switched over to using the java driver, but I would 
rather handle that in a separate ticket.
AbstractCassandraStorage may be deprecated for 3.0, but it is not working at 
all with the current schema parsing queries. That also belongs in a separate 
ticket.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
  

[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-21 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286608#comment-14286608
 ] 

Philip Thompson commented on CASSANDRA-8358:


I am now just waiting on CASSANDRA-8622 to verify I have not broken pig-test, 
and JAVA-312 for the changes to BulkLoader. The patch is otherwise complete.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284636#comment-14284636
 ] 

Philip Thompson commented on CASSANDRA-8358:


Second progress update:
1. JAVA-312 is still being finalized but should be done soon. At that point I 
will begin on BulkLoader.
2. I am able to run pig locally against tables with a subset of data types. 
Still waiting on CASSANDRA-8622 to ensure I am not breaking any of the pig 
tests. Once everything is working locally, I will take Brandon's advice and run 
Pig against a real cluster. 

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-20 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284647#comment-14284647
 ] 

Philip Thompson commented on CASSANDRA-8358:


It seems that Pig is unhappy with all of the collection types. I see exceptions 
like this:
{code}
18:31:00.105 [Thread-4] WARN  o.a.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Unexpected data type java.util.ArrayList found in 
stream. Note only standard Pig type is supported when you output from 
UDF/LoadFunc
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:596) 
~[pig-0.12.1.jar:na]
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) 
~[pig-0.12.1.jar:na]
at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) 
~[pig-0.12.1.jar:na]
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650) 
~[pig-0.12.1.jar:na]
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470) 
~[pig-0.12.1.jar:na]
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) 
~[pig-0.12.1.jar:na]
at 
org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73) 
~[pig-0.12.1.jar:na]
at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:88) 
~[pig-0.12.1.jar:na]
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 ~[pig-0.12.1.jar:na]
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 ~[pig-0.12.1.jar:na]
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
 ~[hadoop-core-1.0.3.jar:na]
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 ~[hadoop-core-1.0.3.jar:na]
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 ~[pig-0.12.1.jar:na]
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:263)
 ~[pig-0.12.1.jar:na]
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 ~[pig-0.12.1.jar:na]
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
~[hadoop-core-1.0.3.jar:na]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
~[hadoop-core-1.0.3.jar:na]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) 
~[hadoop-core-1.0.3.jar:na]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 
~[hadoop-core-1.0.3.jar:na]
{code}


 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-14 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277874#comment-14277874
 ] 

Philip Thompson commented on CASSANDRA-8358:


Progress Update:
1. Completion of work on BulkLoader is blocked by 
https://datastax-oss.atlassian.net/browse/JAVA-312
2. I have an initial draft for both o.a.c.h.cql3.CqlRecordWriter and 
o.a.c.h.cql3.CqlRecordReader. pig-test is completely broken on trunk right now, 
so I haven't had a good opportunity to test them.
3. I am not touching o.a.c.h.ColumnFamily* on [~jjordan]'s recommendation.
4. o.a.c.h.pig.CqlNativeStorage extends CqlStorage which extends 
AbstractCassandraStorage. CassandraStorage also extends 
AbstractCassandraStorage. I will remove thrift from CqlNativeStorage. Should I 
also remove thrift from CqlStorage as well, or just deprecate it? It seems to 
me that I will need to remove the connection between CqlNativeStorage and 
CqlStorage, or CqlStorage and AbstractCassandraStorage in order to remove 
thrift without affecting CassandraStorage. What would be best here? 

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2015-01-14 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277886#comment-14277886
 ] 

Brandon Williams commented on CASSANDRA-8358:
-

As for 4, CqlStorage is just a dummy wrapper for CqlNativeStorage after 
CASSANDRA-8599.  I'm going to remove it for 3.0.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

2014-11-21 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221110#comment-14221110
 ] 

Jeremiah Jordan commented on CASSANDRA-8358:


o.a.c.h.ColumnFamily* stuff is replaced bu o.a.c.h.cql3.Cql*.  We should either 
just drop those, or leave them alone and people can turn on thrift if they 
still need to use them.

o.a.c.h.pig.CqlStorage and o.a.c.h.pig.CassandraStorage are replaced by 
o.a.c.h.pig.CqlNativeStorage.  Same thing, either drop or leave alone.

Maybe just mark all that stuff deprecated and leave it alone for now.

I think the main task here is to make sure o.a.c.cql3.Cql* and 
o.a.c.h.pig.CqlNativeStorage all use the metadata api's and don't have any 
thrift calls.

We need CASSANDRA-7688 or similar to be able to replace the describe_splits_ex 
call.  So we have to leave that in for now, but should be able to clean 
everything else up.

 Bundled tools shouldn't be using Thrift API
 ---

 Key: CASSANDRA-8358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Philip Thompson
 Fix For: 3.0


 In 2.1, we switched cqlsh to the python-driver.
 In 3.0, we got rid of cassandra-cli.
 Yet there is still code that's using legacy Thrift API. We want to convert it 
 all to use the java-driver instead.
 1. BulkLoader uses Thrift to query the schema tables. It should be using 
 java-driver metadata APIs directly instead.
 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
 Some of the things listed above use Thrift to get the list of partition key 
 columns or clustering columns. Those should be converted to use the Metadata 
 API of the java-driver.
 Somewhat related to that, we also have badly ported code from Thrift in 
 o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
 columns from schema tables instead of properly using the driver's Metadata 
 API.
 We need all of it fixed. One exception, for now, is 
 o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
 describe_splits_ex() call that cannot be currently replaced by any 
 java-driver call (?).
 Once this is done, we can stop starting Thrift RPC port by default in 
 cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)