[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229809#comment-14229809 ] Piotr Kołaczkowski commented on CASSANDRA-7827: --- +1 Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat --- Key: CASSANDRA-7827 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Paul Pak Assignee: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-7827-v1.txt When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName(). This will provide a way to alias any column family name to a MultipleOutputs compatible output name, so that column family names won't be artificially restricted when using MultipleOutputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161746#comment-14161746 ] Piotr Kołaczkowski commented on CASSANDRA-7827: --- Why is MultipleOutputs name format restricted? Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat --- Key: CASSANDRA-7827 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Paul Pak Assignee: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-7827-v1.txt When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName(). This will provide a way to alias any column family name to a MultipleOutputs compatible output name, so that column family names won't be artificially restricted when using MultipleOutputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162014#comment-14162014 ] Paul Pak commented on CASSANDRA-7827: - [~pkolaczk] I'm not exactly sure why. Perhaps it's because the namedOutput name is potentially used as part of the HDFS file path and there are restrictions/conventions around that? Either way, MultipleOutputs.checkNamedOutputName() - MultipleOutputs.checkTokenName() restricts the name to only [A-Za-z0-9]. Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat --- Key: CASSANDRA-7827 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Paul Pak Assignee: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-7827-v1.txt When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName(). This will provide a way to alias any column family name to a MultipleOutputs compatible output name, so that column family names won't be artificially restricted when using MultipleOutputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110861#comment-14110861 ] Paul Pak commented on CASSANDRA-7827: - Usage example: {code} String cf = my_cf; // underscores are not valid for MultipleOutputs output names String alias = myCfAlias; // set properties for CqlBulkOutputFormat as usual CqlBulkOutputFormat.setColumnFamilySchema(conf, cf, CREATE TABLE my_cf ...); CqlBulkOutputFormat.setColumnFamilyInsertStatement(conf, cf, INSERT INTO my_cf...); // set the alias CqlBulkOutputFormat.setColumnFamilyAlias(conf, alias, cf); // interactions with MultipleOutputs should be done using the alias MultipleOutputs.addNamedOutput(job, alias, CqlBulkOutputFormat.class, Object.class, List.class); ... // again, interactions with MultipleOutputs should be done using the alias, so... multipleOutputs.write(alias, null, byteBufferList); {code} Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat --- Key: CASSANDRA-7827 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Paul Pak Assignee: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-7827-v1.txt When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName(). This will provide a way to alias any column family name to a MultipleOutputs compatible output name, so that column family names won't be artificially restricted when using MultipleOutputs. -- This message was sent by Atlassian JIRA (v6.2#6252)