[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat

2014-12-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229809#comment-14229809
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7827:
---

+1

 Work around for output name restriction when using MultipleOutputs with 
 CqlBulkOutputFormat
 ---

 Key: CASSANDRA-7827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Paul Pak
Assignee: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-7827-v1.txt


 When using MultipleOutputs with CqlBulkOutputFormat, the column family names 
 to output to are restricted to only alphanumeric characters due to the logic 
 found in MultipleOutputs.checkNamedOutputName(). This will provide a way to 
 alias any column family name to a MultipleOutputs compatible output name, so 
 that column family names won't be artificially restricted when using 
 MultipleOutputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat

2014-10-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161746#comment-14161746
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7827:
---

Why is MultipleOutputs name format restricted?


 Work around for output name restriction when using MultipleOutputs with 
 CqlBulkOutputFormat
 ---

 Key: CASSANDRA-7827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Paul Pak
Assignee: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-7827-v1.txt


 When using MultipleOutputs with CqlBulkOutputFormat, the column family names 
 to output to are restricted to only alphanumeric characters due to the logic 
 found in MultipleOutputs.checkNamedOutputName(). This will provide a way to 
 alias any column family name to a MultipleOutputs compatible output name, so 
 that column family names won't be artificially restricted when using 
 MultipleOutputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat

2014-10-07 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162014#comment-14162014
 ] 

Paul Pak commented on CASSANDRA-7827:
-

[~pkolaczk] I'm not exactly sure why. Perhaps it's because the namedOutput name 
is potentially used as part of the HDFS file path and there are 
restrictions/conventions around that? Either way, 
MultipleOutputs.checkNamedOutputName() - MultipleOutputs.checkTokenName() 
restricts the name to only [A-Za-z0-9]. 

 Work around for output name restriction when using MultipleOutputs with 
 CqlBulkOutputFormat
 ---

 Key: CASSANDRA-7827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Paul Pak
Assignee: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-7827-v1.txt


 When using MultipleOutputs with CqlBulkOutputFormat, the column family names 
 to output to are restricted to only alphanumeric characters due to the logic 
 found in MultipleOutputs.checkNamedOutputName(). This will provide a way to 
 alias any column family name to a MultipleOutputs compatible output name, so 
 that column family names won't be artificially restricted when using 
 MultipleOutputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat

2014-08-26 Thread Paul Pak (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110861#comment-14110861
 ] 

Paul Pak commented on CASSANDRA-7827:
-

Usage example:
{code}
String cf = my_cf;  // underscores are not valid for MultipleOutputs output 
names
String alias = myCfAlias;

// set properties for CqlBulkOutputFormat as usual
CqlBulkOutputFormat.setColumnFamilySchema(conf, cf, CREATE TABLE my_cf ...);
CqlBulkOutputFormat.setColumnFamilyInsertStatement(conf, cf, INSERT INTO 
my_cf...);
// set the alias
CqlBulkOutputFormat.setColumnFamilyAlias(conf, alias, cf);
// interactions with MultipleOutputs should be done using the alias
MultipleOutputs.addNamedOutput(job, alias, CqlBulkOutputFormat.class, 
Object.class, List.class);

...

// again, interactions with MultipleOutputs should be done using the alias, 
so...
multipleOutputs.write(alias, null, byteBufferList);

{code}

 Work around for output name restriction when using MultipleOutputs with 
 CqlBulkOutputFormat
 ---

 Key: CASSANDRA-7827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Paul Pak
Assignee: Paul Pak
Priority: Minor
  Labels: cql3, hadoop
 Attachments: trunk-7827-v1.txt


 When using MultipleOutputs with CqlBulkOutputFormat, the column family names 
 to output to are restricted to only alphanumeric characters due to the logic 
 found in MultipleOutputs.checkNamedOutputName(). This will provide a way to 
 alias any column family name to a MultipleOutputs compatible output name, so 
 that column family names won't be artificially restricted when using 
 MultipleOutputs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)