[jira] [Comment Edited] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980297#comment-16980297 ] Alex Liu edited comment on CASSANDRA-15141 at 11/22/19 4:28 PM: How does {{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] block heartbeat propagation?}} was (Author: alexliu68): How does {{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] blocks heartbeat propagation?}} > Faster token ownership calculation for NetworkTopologyStrategy > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980297#comment-16980297 ] Alex Liu commented on CASSANDRA-15141: -- How does {{[getAddressReplicas()|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] blocks heartbeat propagation?}} > Faster token ownership calculation for NetworkTopologyStrategy > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11553) hadoop.cql3.CqlRecordWriter does not close cluster on reconnect
[ https://issues.apache.org/jira/browse/CASSANDRA-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237398#comment-15237398 ] Alex Liu commented on CASSANDRA-11553: -- +1, cluster close automatically close all session, so the session close is redundant. > hadoop.cql3.CqlRecordWriter does not close cluster on reconnect > --- > > Key: CASSANDRA-11553 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11553 > Project: Cassandra > Issue Type: Bug >Reporter: Artem Aliev >Assignee: Artem Aliev > Fix For: 2.2.6, 3.5, 3.6, 3.0.6 > > Attachments: CASSANDRA-11553-2.2.txt > > > CASSANDRA-10058 add session and cluster close to all places in hadoop except > one place on reconnection. > The writer uses one connection per new cluster, so I added cluster.close() > call to sesseionClose() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11243) Memory LEAK CqlInputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172084#comment-15172084 ] Alex Liu commented on CASSANDRA-11243: -- DSE test on Cassandra 3.0.3 version doesn't have this issue. It could be something introduced after 3.0.3. > Memory LEAK CqlInputFormat > -- > > Key: CASSANDRA-11243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11243 > Project: Cassandra > Issue Type: Bug > Environment: Ubuntu 14.04.04 LTS > Hadoop 2.7 > Cassandra 3.3 >Reporter: Matteo Zuccon > > Error: "util.ResourceLeakDetector: LEAK: You are creating too many > HashedWheelTimer instances. HashedWheelTimer is a shared resource that must > be reused across the JVM,so that only a few instances are created" > Using CqlInputFormat.Class as input format for an Hadoop Mapreduce program > (on distributed Hadoop Cluster) gives a memory leak error. > Version of the library used: > > org.apache.cassandra > cassandra-all > 3.3 > > The same jar is working on a single node Hadoop configuration, the memory > leak error show up in the cluster hadoop configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11001) Hadoop integration is incompatible with Cassandra Driver 3.0.0
[ https://issues.apache.org/jira/browse/CASSANDRA-11001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094292#comment-15094292 ] Alex Liu commented on CASSANDRA-11001: -- It should be OK. > Hadoop integration is incompatible with Cassandra Driver 3.0.0 > -- > > Key: CASSANDRA-11001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11001 > Project: Cassandra > Issue Type: Bug >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski > > When using Hadoop input format with SSL and Cassandra Driver 3.0.0-beta1, we > hit the following exception: > {noformat} > Exception in thread "main" java.lang.NoSuchFieldError: > DEFAULT_SSL_CIPHER_SUITES > at > org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getSSLOptions(CqlConfigHelper.java:548) > at > org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getCluster(CqlConfigHelper.java:315) > at > org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:298) > at > org.apache.cassandra.hadoop.cql3.CqlInputFormat.getSplits(CqlInputFormat.java:131) > {noformat} > Should this be fixed with reflection so that Hadoop input/output formats are > compatible with both old and new driver? > [~jjordan], [~alexliu68] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060238#comment-15060238 ] Alex Liu commented on CASSANDRA-10837: -- +1, The patch can't be applied to my testing environment for it's used in a slightly different branch. The patches looks good to me, some simple testing should be enough to verify it. > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-3.0-v4.txt, > 10837-v2-3.0-branch.txt, 10837-v3-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058870#comment-15058870 ] Alex Liu edited comment on CASSANDRA-10837 at 12/15/15 9:32 PM: The changes to CqlRecordWriter seems wrong to me. The Cluster and the Session instances were both properly managed by the try-with-resources statement. In the patch version only the Cluster instance is closed. https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java Cluster doesn't implement AutoClosable interface, so it has to be closed manually. Closing cluster will close all session objects associated with it. Making the Cluster an instance variable in NativeRingCache will trigger an error when the write method will be called, as the Cluster has been closed at the end of the constructor. = NativeRingCache uses the cluster to get the metadata in the constructor when cluster object is still open. once the initialization of constructor is done, the cluster object is not used by NativeRingCache anymore. In CqlInputFormat could you use try-with-resources for both Cluster and Session instances. I think it is best to do things properly by closing both of them. Similarly closing cluster object auto-close all sessions objects. was (Author: alexliu68): The changes to CqlRecordWriter seems wrong to me. The Cluster and the Session instances were both properly managed by the try-with-resources statement. In the patch version only the Cluster instance is closed. https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java Cluster doesn't implements AutoClosable interface, so it has to be closes manually. Closing cluster will close all session objects associated with it. Making the Cluster an instance variable in NativeRingCache will trigger an error when the write method will be called, as the Cluster has been closed at the end of the constructor. = NativeRingCache uses the cluster to get the metadata in the constructor when cluster object is still open. once the initialization of constructor is done, the cluster object is not used by NativeRingCache anymore. In CqlInputFormat could you use try-with-resources for both Cluster and Session instances. I think it is best to do things properly by closing both of them. Similarly closing cluster object auto-close all sessions objects. > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058870#comment-15058870 ] Alex Liu commented on CASSANDRA-10837: -- The changes to CqlRecordWriter seems wrong to me. The Cluster and the Session instances were both properly managed by the try-with-resources statement. In the patch version only the Cluster instance is closed. https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java Cluster doesn't implements AutoClosable interface, so it has to be closes manually. Closing cluster will close all session objects associated with it. Making the Cluster an instance variable in NativeRingCache will trigger an error when the write method will be called, as the Cluster has been closed at the end of the constructor. = NativeRingCache uses the cluster to get the metadata in the constructor when cluster object is still open. once the initialization of constructor is done, the cluster object is not used by NativeRingCache anymore. In CqlInputFormat could you use try-with-resources for both Cluster and Session instances. I think it is best to do things properly by closing both of them. Similarly closing cluster object auto-close all sessions objects. > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059195#comment-15059195 ] Alex Liu commented on CASSANDRA-10837: -- Attach v3 path to use close-with-resources and pass cluster to NativeRingCache to minimize creating new cluster objects > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt, > 10837-v3-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10837: - Attachment: 10837-v3-3.0-branch.txt > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt, > 10837-v3-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059162#comment-15059162 ] Alex Liu commented on CASSANDRA-10837: -- Summary the change for next patch. 1. keep use close-with-resources for cluster and session (Don't need manually close them) 2. pass cluster to NativeRingCache (Don't create too many new cluster objects) > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Alex Liu >Assignee: Alex Liu > Fix For: 3.0.x > > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10837: - Attachment: 10837-3.0-branch.txt > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Assignee: Alex Liu > Attachments: 10837-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10837: - Attachment: (was: 10837-3.0-branch.txt) > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Assignee: Alex Liu > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10837: - Attachment: 10837-v2-3.0-branch.txt > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Assignee: Alex Liu > Attachments: 10837-3.0-branch.txt, 10837-v2-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10837: - Attachment: 10837-3.0-branch.txt > Cluster/session should be closed in Cassandra Hadoop Input/Output classes > - > > Key: CASSANDRA-10837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Assignee: Alex Liu > Attachments: 10837-3.0-branch.txt > > > See a lot of following warnings during Hadoop job running > {code} > ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. > HashedWheelTimer is a shared resource that must be reused across the JVM,so > that only a few instances are created. > {code} > Each cluster/session needs be closed and a shared HashedWheelTimer may reduce > the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10837) Cluster/session should be closed in Cassandra Hadoop Input/Output classes
Alex Liu created CASSANDRA-10837: Summary: Cluster/session should be closed in Cassandra Hadoop Input/Output classes Key: CASSANDRA-10837 URL: https://issues.apache.org/jira/browse/CASSANDRA-10837 Project: Cassandra Issue Type: Bug Reporter: Alex Liu Assignee: Alex Liu See a lot of following warnings during Hadoop job running {code} ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created. {code} Each cluster/session needs be closed and a shared HashedWheelTimer may reduce the resource leakage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10806) sstableloader can't handle upper case keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu reassigned CASSANDRA-10806: Assignee: Alex Liu > sstableloader can't handle upper case keyspace > -- > > Key: CASSANDRA-10806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Alex Liu >Assignee: Alex Liu >Priority: Minor > Attachments: CASSANDRA-10806-3.0-branch.txt > > > sstableloader can't handle upper case keyspace. The following shows the > endpoint is missing > {code} > cassandra/bin/sstableloader > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 > -d 127.0.0.1 > objc[7818]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db > to [] > Summary statistics: > Connections per host:: 1 > Total files transferred: : 0 > Total bytes transferred: : 0 > Total duration (ms): : 923 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s):: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10806: - Attachment: CASSANDRA-10806-3.0-branch.txt > sstableloader can't handle upper case keyspace > -- > > Key: CASSANDRA-10806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Alex Liu >Priority: Minor > Attachments: CASSANDRA-10806-3.0-branch.txt > > > sstableloader can't handle upper case keyspace. The following shows the > endpoint is missing > {code} > cassandra/bin/sstableloader > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 > -d 127.0.0.1 > objc[7818]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db > to [] > Summary statistics: > Connections per host:: 1 > Total files transferred: : 0 > Total bytes transferred: : 0 > Total duration (ms): : 923 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s):: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10806: - Summary: sstableloader can't handle upper case keyspace (was: sstableloader can) > sstableloader can't handle upper case keyspace > -- > > Key: CASSANDRA-10806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Priority: Minor > > sstableloader can't handle upper case keyspace. The following shows the > endpoint is missing > {code} > cassandra/bin/sstableloader > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 > -d 127.0.0.1 > objc[7818]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db > to [] > Summary statistics: > Connections per host:: 1 > Total files transferred: : 0 > Total bytes transferred: : 0 > Total duration (ms): : 923 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s):: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10806) sstableloader can't handle upper case keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036549#comment-15036549 ] Alex Liu commented on CASSANDRA-10806: -- keyspace should be quoted at https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/NativeSSTableLoaderClient.java#L78 > sstableloader can't handle upper case keyspace > -- > > Key: CASSANDRA-10806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Priority: Minor > > sstableloader can't handle upper case keyspace. The following shows the > endpoint is missing > {code} > cassandra/bin/sstableloader > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 > -d 127.0.0.1 > objc[7818]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db > to [] > Summary statistics: > Connections per host:: 1 > Total files transferred: : 0 > Total bytes transferred: : 0 > Total duration (ms): : 923 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s):: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10806) sstableloader can't handle upper case keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10806: - Component/s: Tools > sstableloader can't handle upper case keyspace > -- > > Key: CASSANDRA-10806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Alex Liu >Priority: Minor > > sstableloader can't handle upper case keyspace. The following shows the > endpoint is missing > {code} > cassandra/bin/sstableloader > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 > -d 127.0.0.1 > objc[7818]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and > /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db > to [] > Summary statistics: > Connections per host:: 1 > Total files transferred: : 0 > Total bytes transferred: : 0 > Total duration (ms): : 923 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s):: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10806) sstableloader can
Alex Liu created CASSANDRA-10806: Summary: sstableloader can Key: CASSANDRA-10806 URL: https://issues.apache.org/jira/browse/CASSANDRA-10806 Project: Cassandra Issue Type: Bug Reporter: Alex Liu Priority: Minor sstableloader can't handle upper case keyspace. The following shows the endpoint is missing {code} cassandra/bin/sstableloader /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5 -d 127.0.0.1 objc[7818]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined. Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /var/folders/zz/zyxvpxvq6csfxvn_n0/T/bulk-write-to-Test1-Words-a9343a5f-62f3-4901-a9c8-ab7dc42a458e/Test1/Words-5/ma-1-big-Data.db to [] Summary statistics: Connections per host:: 1 Total files transferred: : 0 Total bytes transferred: : 0 Total duration (ms): : 923 Average transfer rate (MB/s): : 0 Peak transfer rate (MB/s):: 0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn
[ https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025825#comment-15025825 ] Alex Liu commented on CASSANDRA-10751: -- Cool, can you check latest C* 2.2.* and C*3.x. If possible, submit a patch. > "Pool is shutdown" error when running Hadoop jobs on Yarn > - > > Key: CASSANDRA-10751 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10751 > Project: Cassandra > Issue Type: Bug > Environment: Hadoop 2.7.1 (HDP 2.3.2) > Cassandra 2.1.11 >Reporter: Cyril Scetbon >Assignee: Alex Liu > Attachments: output.log > > > Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's > internal code. It seems that connections are shutdown but we can't understand > why ... > Here is a subtract of the errors. I also add a file with the complete debug > logs. > {code} > 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying > node006.internal.net/192.168.12.22:9042, trying next host (error is: > com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown) > Failed with exception java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > java.io.IOException: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446) > ... 15 more > Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All > host(s) tried for query failed (tried: > node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214) > at >
[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn
[ https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024873#comment-15024873 ] Alex Liu commented on CASSANDRA-10751: -- Can you get the full stacktrace by changing the log level of Host.STATES to TRACE? > "Pool is shutdown" error when running Hadoop jobs on Yarn > - > > Key: CASSANDRA-10751 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10751 > Project: Cassandra > Issue Type: Bug > Environment: Hadoop 2.7.1 (HDP 2.3.2) > Cassandra 2.1.11 >Reporter: Cyril Scetbon >Assignee: Alex Liu > Attachments: output.log > > > Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's > internal code. It seems that connections are shutdown but we can't understand > why ... > Here is a subtract of the errors. I also add a file with the complete debug > logs. > {code} > 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying > node006.internal.net/192.168.12.22:9042, trying next host (error is: > com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown) > Failed with exception java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > java.io.IOException: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446) > ... 15 more > Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All > host(s) tried for query failed (tried: > node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214) > at >
[jira] [Commented] (CASSANDRA-10751) "Pool is shutdown" error when running Hadoop jobs on Yarn
[ https://issues.apache.org/jira/browse/CASSANDRA-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023480#comment-15023480 ] Alex Liu commented on CASSANDRA-10751: -- The error indicates that C* node can't handle the load(Maybe there are too many splits). Do you try the latest C* 2.x or C*3.x? > "Pool is shutdown" error when running Hadoop jobs on Yarn > - > > Key: CASSANDRA-10751 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10751 > Project: Cassandra > Issue Type: Bug > Environment: Hadoop 2.7.1 (HDP 2.3.2) > Cassandra 2.1.11 >Reporter: Cyril Scetbon >Assignee: Alex Liu > Attachments: output.log > > > Trying to execute an Hadoop job on Yarn, I get errors from Cassandra's > internal code. It seems that connections are shutdown but we can't understand > why ... > Here is a subtract of the errors. I also add a file with the complete debug > logs. > {code} > 15/11/22 20:05:54 [main]: DEBUG core.RequestHandler: Error querying > node006.internal.net/192.168.12.22:9042, trying next host (error is: > com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown) > Failed with exception java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > 15/11/22 20:05:54 [main]: ERROR CliDriver: Failed with exception > java.io.IOException:java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > java.io.IOException: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: > com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) > tried for query failed (tried: node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:132) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446) > ... 15 more > Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All > host(s) tried for query failed (tried: > node006.internal.net/192.168.12.22:9042 > (com.datastax.driver.core.ConnectionException: > [node006.internal.net/192.168.12.22:9042] Pool is shutdown)) > at > com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214) >
[jira] [Commented] (CASSANDRA-10640) hadoop splits are calculated wrong
[ https://issues.apache.org/jira/browse/CASSANDRA-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998855#comment-14998855 ] Alex Liu commented on CASSANDRA-10640: -- +1 > hadoop splits are calculated wrong > -- > > Key: CASSANDRA-10640 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10640 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu >Assignee: Aleksey Yeschenko > Fix For: 2.2.x > > Attachments: 10640.txt > > > A typo at line > https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java#L216 > where getEnd should be used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992236#comment-14992236 ] Alex Liu commented on CASSANDRA-10648: -- C* 2.1.x works fine. I shut down all other programs and C* node native is dead forever without any client running on C*. > Native protocol is dead after running some Hive queries > --- > > Key: CASSANDRA-10648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu > Fix For: 2.2.x > > Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt > > > When test on DSE portfolio demo, which basically creates a few C* tables and > inserts some data into the tables, then run some Hive queries on the tables. > I attach the Hive queries > After some queries are done, C* node is dead on native port, cqlsh can't > login any more. > Some thread dumps are attached. Too many threads are in waiting mode and > system is not responding. > It's tested on C* 2.2.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
Alex Liu created CASSANDRA-10648: Summary: Native protocol is dead after running some Hive queries Key: CASSANDRA-10648 URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 Project: Cassandra Issue Type: Bug Reporter: Alex Liu Fix For: 2.2.x When test on DSE portfolio demo, which basically creates a few C* tables and inserts some data into the tables, then run some Hive queries on the tables. I attach the Hive queries After some queries are done, C* node is dead on native port, cqlsh can't login any more. Some thread dumps are attached. Too many threads are in waiting mode and system is not responding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10648: - Attachment: deadlock1.txt deadlock.txt > Native protocol is dead after running some Hive queries > --- > > Key: CASSANDRA-10648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu > Fix For: 2.2.x > > Attachments: deadlock.txt, deadlock1.txt > > > When test on DSE portfolio demo, which basically creates a few C* tables and > inserts some data into the tables, then run some Hive queries on the tables. > I attach the Hive queries > After some queries are done, C* node is dead on native port, cqlsh can't > login any more. > Some thread dumps are attached. Too many threads are in waiting mode and > system is not responding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10648: - Attachment: 10_day_loss.q > Native protocol is dead after running some Hive queries > --- > > Key: CASSANDRA-10648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu > Fix For: 2.2.x > > Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt > > > When test on DSE portfolio demo, which basically creates a few C* tables and > inserts some data into the tables, then run some Hive queries on the tables. > I attach the Hive queries > After some queries are done, C* node is dead on native port, cqlsh can't > login any more. > Some thread dumps are attached. Too many threads are in waiting mode and > system is not responding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988286#comment-14988286 ] Alex Liu commented on CASSANDRA-10648: -- set hive.auto.convert.join=false; in Hive to disable MapJoin > Native protocol is dead after running some Hive queries > --- > > Key: CASSANDRA-10648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu > Fix For: 2.2.x > > Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt > > > When test on DSE portfolio demo, which basically creates a few C* tables and > inserts some data into the tables, then run some Hive queries on the tables. > I attach the Hive queries > After some queries are done, C* node is dead on native port, cqlsh can't > login any more. > Some thread dumps are attached. Too many threads are in waiting mode and > system is not responding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10648) Native protocol is dead after running some Hive queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10648: - Description: When test on DSE portfolio demo, which basically creates a few C* tables and inserts some data into the tables, then run some Hive queries on the tables. I attach the Hive queries After some queries are done, C* node is dead on native port, cqlsh can't login any more. Some thread dumps are attached. Too many threads are in waiting mode and system is not responding. It's tested on C* 2.2.3 was: When test on DSE portfolio demo, which basically creates a few C* tables and inserts some data into the tables, then run some Hive queries on the tables. I attach the Hive queries After some queries are done, C* node is dead on native port, cqlsh can't login any more. Some thread dumps are attached. Too many threads are in waiting mode and system is not responding. > Native protocol is dead after running some Hive queries > --- > > Key: CASSANDRA-10648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10648 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Liu > Fix For: 2.2.x > > Attachments: 10_day_loss.q, deadlock.txt, deadlock1.txt > > > When test on DSE portfolio demo, which basically creates a few C* tables and > inserts some data into the tables, then run some Hive queries on the tables. > I attach the Hive queries > After some queries are done, C* node is dead on native port, cqlsh can't > login any more. > Some thread dumps are attached. Too many threads are in waiting mode and > system is not responding. > It's tested on C* 2.2.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10640) hadoop splits are calculated wrong
Alex Liu created CASSANDRA-10640: Summary: hadoop splits are calculated wrong Key: CASSANDRA-10640 URL: https://issues.apache.org/jira/browse/CASSANDRA-10640 Project: Cassandra Issue Type: Bug Reporter: Alex Liu Fix For: 2.2.x A typo at line https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java#L216 where getEnd should be used -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Attachment: CASSANDRA-7410-v1-2.2.txt Attached the patch on cassandra-2.2 branch. It also fixed some issue of setting partitioner and zero split count. The NativeSSTableLoaderClient has a bug to load ByteOrderPartitioner sstables. It works fine with Murmur3Partitioner. > Pig support for BulkOutputFormat as a parameter in url > -- > > Key: CASSANDRA-7410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 > Project: Cassandra > Issue Type: Improvement > Components: Hadoop >Reporter: Alex Liu >Assignee: Alex Liu >Priority: Minor > Fix For: 2.1.x, 2.2.x > > Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, > 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v1-2.2.txt, > CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, > CASSANDRA-7410-v4-2.0-branch.txt, CASSANDRA-7410-v4-2.1-branch.txt, > CASSANDRA-7410-v5-2.0-branch.txt > > > Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10058) Close Java driver Client object in Hadoop and Pig classes
Alex Liu created CASSANDRA-10058: Summary: Close Java driver Client object in Hadoop and Pig classes Key: CASSANDRA-10058 URL: https://issues.apache.org/jira/browse/CASSANDRA-10058 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu I found that some Hadoop and Pig code in Cassandra doesn't close the Client object, that's the cause for the following errors in java driver 2.2.0-rc1. {code} ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created. {code} We should close the Client objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10058) Close Java driver Client object in Hadoop and Pig classes
[ https://issues.apache.org/jira/browse/CASSANDRA-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-10058: - Attachment: CASSANDRA-10058-2.2.txt Close Java driver Client object in Hadoop and Pig classes - Key: CASSANDRA-10058 URL: https://issues.apache.org/jira/browse/CASSANDRA-10058 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Fix For: 2.2.x Attachments: CASSANDRA-10058-2.2.txt I found that some Hadoop and Pig code in Cassandra doesn't close the Client object, that's the cause for the following errors in java driver 2.2.0-rc1. {code} ERROR 11:37:45 LEAK: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created. {code} We should close the Client objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8576: Attachment: CASSANDRA-8576-v1-2.2-branch.txt Sorry guys, I missed the above comments. I attach the patch on cassandra-2.2 branch Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.2.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v1-2.2-branch.txt, CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538044#comment-14538044 ] Alex Liu commented on CASSANDRA-8576: - It's no much different,but I will use your changes :) Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8576: Attachment: CASSANDRA-8576-v3-2.1-branch.txt Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt, CASSANDRA-8576-v3-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520084#comment-14520084 ] Alex Liu commented on CASSANDRA-8576: - Which branch should this go into? Is it still going into 2.1.5 ? or other release? Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8576: Attachment: CASSANDRA-8576-v2-2.1-branch.txt v2 is attached which addresses the comments. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520395#comment-14520395 ] Alex Liu commented on CASSANDRA-8576: - if token == null, containToken is false. All other comments will be addressed in the new patch Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Attachment: CASSANDRA-7410-v5-2.0-branch.txt CASSANDRA-7410-v4-2.1-branch.txt Patches addressing comments are attached. Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, CASSANDRA-7410-v4-2.0-branch.txt, CASSANDRA-7410-v4-2.1-branch.txt, CASSANDRA-7410-v5-2.0-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498218#comment-14498218 ] Alex Liu commented on CASSANDRA-7410: - Any reason for not using super.setStoreLocation()? -- We will remove CqlStorage class soon, so try to not couple with it any more. Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Attachment: CASSANDRA-7410-v4-2.0-branch.txt CASSANDRA-7410-v3-2.1-branch.txt V3 on 2.0 and v4 on 2.1 branches are attached to address the comments and update code to latest branch Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt, CASSANDRA-7410-v3-2.1-branch.txt, CASSANDRA-7410-v4-2.0-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498768#comment-14498768 ] Alex Liu commented on CASSANDRA-7410: - This change is no longer need, because I port the patch on 2.1 back to 2.0 which uses BulkLoader.ExternalClient . Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496805#comment-14496805 ] Alex Liu commented on CASSANDRA-6348: - Hive has a setting to enable pushdown, by default it's disable. User can enable it if the table has only one indexed column. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493117#comment-14493117 ] Alex Liu commented on CASSANDRA-8576: - It's been this way for very beginning. Internally, url decoding is used. I think it's not an easy way around here. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.5 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493123#comment-14493123 ] Alex Liu commented on CASSANDRA-8576: - ome one from Product Management should be able to answer it. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.5 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493122#comment-14493122 ] Alex Liu commented on CASSANDRA-8576: - Some one from Product Management should be able to answer it. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.5 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
[ https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393027#comment-14393027 ] Alex Liu commented on CASSANDRA-9074: - Can you provide detail how to reproduce the issue like. Table schema, data and Hadoop query ... etc, so we can reproduce it and debug it. Does it error out in a one node cluster? Hadoop Cassandra CqlInputFormat pagination - not reading all input rows --- Key: CASSANDRA-9074 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074 Project: Cassandra Issue Type: Bug Components: Hadoop Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4 Reporter: fuggy_yama Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase. I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained: 7 nodes * 1k limit = 7k rows read total The limit can be changed using CqlConfigHelper: CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000); Please help me with questions below: Is this a desired behavior? Why CqlInputFormat does not page through the rest of rows? Is it a bug or should I just increase the InputCQLPageRowSize value? What if I want to read all data in table and do not know the row count? What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
[ https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-9074: Comment: was deleted (was: Can you provide detail how to reproduce the issue like. Table schema, data and Hadoop query ... etc, so we can reproduce it and debug it. Does it error out in a one node cluster?) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows --- Key: CASSANDRA-9074 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074 Project: Cassandra Issue Type: Bug Components: Hadoop Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4 Reporter: fuggy_yama Assignee: Alex Liu Priority: Minor Fix For: 2.0.15 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase. I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained: 7 nodes * 1k limit = 7k rows read total The limit can be changed using CqlConfigHelper: CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000); Please help me with questions below: Is this a desired behavior? Why CqlInputFormat does not page through the rest of rows? Is it a bug or should I just increase the InputCQLPageRowSize value? What if I want to read all data in table and do not know the row count? What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391765#comment-14391765 ] Alex Liu commented on CASSANDRA-8576: - pending on CASSANDRA-8358 Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.5 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
[ https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388706#comment-14388706 ] Alex Liu commented on CASSANDRA-9074: - yes, please test it on the latest branch. Hadoop Cassandra CqlInputFormat pagination - not reading all input rows --- Key: CASSANDRA-9074 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074 Project: Cassandra Issue Type: Bug Components: Hadoop Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4 Reporter: fuggy_yama Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase. I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained: 7 nodes * 1k limit = 7k rows read total The limit can be changed using CqlConfigHelper: CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000); Please help me with questions below: Is this a desired behavior? Why CqlInputFormat does not page through the rest of rows? Is it a bug or should I just increase the InputCQLPageRowSize value? What if I want to read all data in table and do not know the row count? What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
[ https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387392#comment-14387392 ] Alex Liu commented on CASSANDRA-9074: - Please try the latest 2.1.x or 2.0.x branch, it should have been fixed. Hadoop Cassandra CqlInputFormat pagination - not reading all input rows --- Key: CASSANDRA-9074 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074 Project: Cassandra Issue Type: Bug Components: Hadoop Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4 Reporter: fuggy_yama Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase. I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained: 7 nodes * 1k limit = 7k rows read total The limit can be changed using CqlConfigHelper: CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000); Please help me with questions below: Is this a desired behavior? Why CqlInputFormat does not page through the rest of rows? Is it a bug or should I just increase the InputCQLPageRowSize value? What if I want to read all data in table and do not know the row count? What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9074) Hadoop Cassandra CqlInputFormat pagination - not reading all input rows
[ https://issues.apache.org/jira/browse/CASSANDRA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387608#comment-14387608 ] Alex Liu commented on CASSANDRA-9074: - CASSANDRA-8166 Hadoop Cassandra CqlInputFormat pagination - not reading all input rows --- Key: CASSANDRA-9074 URL: https://issues.apache.org/jira/browse/CASSANDRA-9074 Project: Cassandra Issue Type: Bug Components: Hadoop Environment: Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4 Reporter: fuggy_yama Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase. I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained: 7 nodes * 1k limit = 7k rows read total The limit can be changed using CqlConfigHelper: CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), 1000); Please help me with questions below: Is this a desired behavior? Why CqlInputFormat does not page through the rest of rows? Is it a bug or should I just increase the InputCQLPageRowSize value? What if I want to read all data in table and do not know the row count? What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6432) Calculate estimated Cql row count per token range
[ https://issues.apache.org/jira/browse/CASSANDRA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368292#comment-14368292 ] Alex Liu commented on CASSANDRA-6432: - This ticket is for count of the rows returned by Java driver per token range. Is CASSANDRA-7688 for that? Calculate estimated Cql row count per token range - Key: CASSANDRA-6432 URL: https://issues.apache.org/jira/browse/CASSANDRA-6432 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Fix For: 2.0.14 CASSANDRA-6311 use the client side to calculate actual CF row count for hadoop job. We need fix it by using Cql row count, which need estimated Cql row count per token range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API
[ https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365885#comment-14365885 ] Alex Liu commented on CASSANDRA-8358: - When can this ticket committed to 2.1 branch? Bundled tools shouldn't be using Thrift API --- Key: CASSANDRA-8358 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Philip Thompson Fix For: 3.0 In 2.1, we switched cqlsh to the python-driver. In 3.0, we got rid of cassandra-cli. Yet there is still code that's using legacy Thrift API. We want to convert it all to use the java-driver instead. 1. BulkLoader uses Thrift to query the schema tables. It should be using java-driver metadata APIs directly instead. 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift 5. o.a.c.hadoop.pig.CqlStorage is using Thrift Some of the things listed above use Thrift to get the list of partition key columns or clustering columns. Those should be converted to use the Metadata API of the java-driver. Somewhat related to that, we also have badly ported code from Thrift in o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches columns from schema tables instead of properly using the driver's Metadata API. We need all of it fixed. One exception, for now, is o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its describe_splits_ex() call that cannot be currently replaced by any java-driver call (?). Once this is done, we can stop starting Thrift RPC port by default in cassandra.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Comment: was deleted (was: Waiting for CASSANDRA-8358) Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365910#comment-14365910 ] Alex Liu commented on CASSANDRA-7410: - [~brandon.williams] Do you have time to review it? Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365884#comment-14365884 ] Alex Liu commented on CASSANDRA-7410: - Waiting for CASSANDRA-8358 Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.14 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8924) Streaming failures during bulkloading data using CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355963#comment-14355963 ] Alex Liu commented on CASSANDRA-8924: - It's fixed in CASSANDRA-7410 patch, but it's waiting for pig test fix. Streaming failures during bulkloading data using CqlBulkOutputFormat Key: CASSANDRA-8924 URL: https://issues.apache.org/jira/browse/CASSANDRA-8924 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Environment: 1 node, Ubuntu 14.04 , vnodes enabled, Oracle JVM, Java 1.7.0_51, Cassandra 2.1.3 Reporter: Aby Kuruvilla Labels: BulkLoading, CqlBulkOutputFormat, Hadoop, Streaming Attachments: 0001-use-cql-to-fetch-CFMetaData.patch I am trying to use the CqlBulkOutputFormat in a Hadoop job to bulk load data into Cassandra. On running the Hadoop job, I can see that the SSTable files do get generated but fails to stream the data out. I get the same exception when I try with Cassandra node on localhost as well as a remote Cassandra node. Also I get this exception on C* versions 2.1.1, 2.1.2 and 2.1.3. Relevant portion of logs and stack trace {noformat} 09:20:23.207 [Thread-6] WARN org.apache.cassandra.utils.CLibrary - JNA link failure, one or more native method will be unavailable. 09:20:23.208 [Thread-6] DEBUG org.apache.cassandra.utils.CLibrary - JNA link failure details: Error looking up function 'posix_fadvise': dlsym(0x7fff6ab8a5e0, posix_fadvise): symbol not found 09:20:23.504 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Filter.db to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Filter.db 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Digest.sha1 to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Digest.sha1 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Statistics.db to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Statistics.db 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Index.db to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Index.db 09:20:23.505 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-CompressionInfo.db to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-CompressionInfo.db 09:20:23.506 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-TOC.txt to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-TOC.txt 09:20:23.506 [Thread-6] DEBUG o.apache.cassandra.io.util.FileUtils - Renaming /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-tmp-ka-1-Data.db to /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1-Data.db 09:20:23.727 [Thread-2] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata for /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1 09:20:23.729 [Thread-2] INFO o.a.c.io.sstable.SSTableReader - Opening /var/folders/bb/c4416mx95xsbb11jx5g5zq15ddhhh4/T/dev/participant-262ce044-0a2d-48f4-9baa-ad4d626e743a/dev-participant-ka-1 (617874 bytes) 09:20:23.780 [Thread-2] INFO o.a.c.streaming.StreamResultFuture - [Stream #98ba8730-c279-11e4-b8e9-55374d280508] Executing streaming plan for Bulk Load 09:20:23.781 [StreamConnectionEstablisher:1] INFO o.a.c.streaming.StreamSession - [Stream #98ba8730-c279-11e4-b8e9-55374d280508] Starting streaming to
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349527#comment-14349527 ] Alex Liu commented on CASSANDRA-8576: - Pig-test on trunk fails, Philip Thompson is fixing it. I attach the patch on trunk, but we need merge it with Philip Thompson's fix. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.4 Attachments: 8576-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8576: Attachment: 8576-trunk.txt Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.4 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Attachment: CASSANDRA-7410-v2-2.1-branch.txt V2 version fixes the streaming error Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.13 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349029#comment-14349029 ] Alex Liu commented on CASSANDRA-6091: - yes, please add cassandra-2.1 as well. If we can cleanly merge it into trunk, we need another one for trunk. Better Vnode support in hadoop/pig -- Key: CASSANDRA-6091 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: mck Attachments: cassandra-2.0-6091.txt CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347173#comment-14347173 ] Alex Liu commented on CASSANDRA-6091: - LGMT+1 Better Vnode support in hadoop/pig -- Key: CASSANDRA-6091 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: mck Attachments: cassandra-2.0-6091.txt CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347505#comment-14347505 ] Alex Liu commented on CASSANDRA-7410: - {{CqlBulkRecordWriter}} comes with its own {{ExternalClient}} which uses a wrong approach for generating CFMetaData (and cfId). We can fix the issue by using the approach found in {{BulkLoader.ExternalClient}} which fetches CFMetadata from the cluster using CQL. Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.13 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925 ] Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:04 PM: -- v1 patch is attached to only support full partition key EQ queries. To test it, e.g. table with key1, and key2 partition columns {code} set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url input_cql=select%20*%20from%20compositekeytable%20where%20key1%20%3D%20%27key1%27%20and%20key2%20%3D%20111%20and%20column1%3D100 {code} was (Author: alexliu68): v1 patch is attached to only support full partition key EQ queries. To test it, e.g. table with key1, and key2 partition columns {code} set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url {code} Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.4 Attachments: 8576-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329142#comment-14329142 ] Alex Liu commented on CASSANDRA-8576: - [~brandon.williams] Do u have time to review this ticket? Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.4 Attachments: 8576-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277925#comment-14277925 ] Alex Liu edited comment on CASSANDRA-8576 at 2/20/15 5:03 PM: -- v1 patch is attached to only support full partition key EQ queries. To test it, e.g. table with key1, and key2 partition columns {code} set where clause as key1 = 'key1' and key2 = 111 and column1=100 in pig url {code} was (Author: alexliu68): v1 patch is attached to only support full partition key EQ queries. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.4 Attachments: 8576-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8774) BulkOutputFormat never completes if streaming have errors
[ https://issues.apache.org/jira/browse/CASSANDRA-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318684#comment-14318684 ] Alex Liu commented on CASSANDRA-8774: - Why not handle ExecutorException errors as a whole if there are no other none-critical root causes? BulkOutputFormat never completes if streaming have errors - Key: CASSANDRA-8774 URL: https://issues.apache.org/jira/browse/CASSANDRA-8774 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Erik Forsberg Fix For: 2.0.13 Attachments: 0001-CASSANDRA-8774-Handle-StreamException-when-bulkloadi.patch With BulkoutputFormat in Cassandra 1.2.18, if any streaming errors occured, the hadoop task would fail. This doesn't seem to happen with 2.0.12. I have a hadoop map task that use BulkoutputFormat. If one of the cassandra nodes I'm writing to is down, I'm getting the following syslog output from the map task: {noformat} 2015-02-10 10:54:15,162 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2015-02-10 10:54:15,601 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2015-02-10 10:54:15,901 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2015-02-10 10:54:15,907 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4984451e 2015-02-10 10:54:16,110 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://hdpmt01.osp-hadoop.osa:9000/user/jenkins/syst/5ef13_osp/tvstore/sumcombinations/hourly/2015021002/per_period-5ba2faa4b1e4aa21fa163e82bc46-sumcombinations/0/data/part-00047:0+462 2015-02-10 10:54:16,739 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded initialized native-zlib library 2015-02-10 10:54:16,740 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2015-02-10 10:54:16,741 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2015-02-10 10:54:16,927 ERROR org.apache.cassandra.cql3.QueryProcessor: Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM. 2015-02-10 10:54:17,780 INFO org.apache.cassandra.utils.CLibrary: JNA not found. Native methods will be disabled. 2015-02-10 10:54:19,446 INFO org.apache.cassandra.io.sstable.SSTableReader: Opening /opera/log1/hadoop/mapred/local/taskTracker/jenkins/jobcache/job_201502041226_13903/attempt_201502041226_13903_m_00_0/work/tmp/syst5ef13osp/Data_hourly/syst5ef13osp-Data_hourly-jb-1 (1018 bytes) 2015-02-10 10:54:20,713 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Executing streaming plan for Bulk Load 2015-02-10 10:54:20,713 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:7 2015-02-10 10:54:20,714 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:8 2015-02-10 10:54:20,715 INFO org.apache.cassandra.streaming.StreamSession: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to /ipv6:prefix:1:441:0:0:0:7 2015-02-10 10:54:20,730 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:4 2015-02-10 10:54:20,750 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:3 2015-02-10 10:54:20,731 INFO org.apache.cassandra.streaming.StreamSession: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to /ipv6:prefix:1:441:0:0:0:8 2015-02-10 10:54:20,750 INFO org.apache.cassandra.streaming.StreamSession: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Starting streaming to /ipv6:prefix:1:441:0:0:0:4 2015-02-10 10:54:20,770 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:6 2015-02-10 10:54:20,778 INFO org.apache.cassandra.streaming.StreamResultFuture: [Stream #29f27cd0-b113-11e4-a465-91cc09fc46f1] Beginning stream session with /ipv6:prefix:1:441:0:0:0:5 2015-02-10 10:54:20,786 INFO
[jira] [Updated] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8609: Assignee: Philip Thompson (was: Alex Liu) Remove depency of hadoop to internals (Cell/CellName) - Key: CASSANDRA-8609 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Philip Thompson Fix For: 3.0 Attachments: CASSANDRA-8609-3.0-branch.txt For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{PairByteBuffer, ByteBuffer}} really. But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8609: Attachment: CASSANDRA-8609-3.0-branch.txt Remove depency of hadoop to internals (Cell/CellName) - Key: CASSANDRA-8609 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Alex Liu Fix For: 3.0 Attachments: CASSANDRA-8609-3.0-branch.txt For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{PairByteBuffer, ByteBuffer}} really. But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319020#comment-14319020 ] Alex Liu commented on CASSANDRA-8609: - I tested pig-test on trunk and found some failed test cases, I am fixing those in this ticket as well. Remove depency of hadoop to internals (Cell/CellName) - Key: CASSANDRA-8609 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Alex Liu Fix For: 3.0 For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{PairByteBuffer, ByteBuffer}} really. But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319176#comment-14319176 ] Alex Liu commented on CASSANDRA-8609: - All pig tests fail, and CASSANDRA-8358 is addressing the issue. I attach my patch on cassandra-3.0 as a reference for [~philipthompson]. He will take over this ticket and address it on CASSANDRA-8358. Remove depency of hadoop to internals (Cell/CellName) - Key: CASSANDRA-8609 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Alex Liu Fix For: 3.0 For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{PairByteBuffer, ByteBuffer}} really. But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318501#comment-14318501 ] Alex Liu commented on CASSANDRA-8609: - Sorry, I miss this ticket. I am working on it today or tomorrow to get it done. Do this ticket only remove Cell and CellName from any of Hadoop related class? Remove depency of hadoop to internals (Cell/CellName) - Key: CASSANDRA-8609 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Alex Liu Fix For: 3.0 For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{PairByteBuffer, ByteBuffer}} really. But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316667#comment-14316667 ] Alex Liu commented on CASSANDRA-6091: - We need at least check the total estimated rows of multiple token ranges per split, instead of multiple taken ranges per node. Better Vnode support in hadoop/pig -- Key: CASSANDRA-6091 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6091) Better Vnode support in hadoop/pig
[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317223#comment-14317223 ] Alex Liu edited comment on CASSANDRA-6091 at 2/11/15 11:44 PM: --- One more improvement is combining adjacent token ranges into one range. But it will create some small corner ranges, so the trade off is not that good as as the number of adjacent token ranges is going to drop very quickly with the size of the cluster. quoted from Piotr was (Author: alexliu68): One more improvement is combining adjacent token ranges into one range. But it will create some small corner ranges, so the trade off is not that good as as the number of adjacent token ranges is going to drop very quickly with the size of the cluster. Reply Better Vnode support in hadoop/pig -- Key: CASSANDRA-6091 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317223#comment-14317223 ] Alex Liu commented on CASSANDRA-6091: - One more improvement is combining adjacent token ranges into one range. But it will create some small corner ranges, so the trade off is not that good as as the number of adjacent token ranges is going to drop very quickly with the size of the cluster. Reply Better Vnode support in hadoop/pig -- Key: CASSANDRA-6091 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-7410: Attachment: 7410-2.1-branch.txt patch on cassandra-2.1 is attached. It can be used to debug the streaming error as shown above. Pig support for BulkOutputFormat as a parameter in url -- Key: CASSANDRA-7410 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Alex Liu Assignee: Alex Liu Priority: Minor Fix For: 2.0.13 Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt, 7410-v3-2.0-branch.txt Add BulkOutputFormat support in Pig url -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
[ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284918#comment-14284918 ] Alex Liu commented on CASSANDRA-7410: - I got following sstable loading error after bulk writing. {code} DEBUG [Thread-193] 2015-01-20 16:43:53,751 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-TOC.txt to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-TOC.txt DEBUG [Thread-193] 2015-01-20 16:43:53,751 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Statistics.db to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Statistics.db DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Digest.sha1 to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Digest.sha1 DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Filter.db to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Filter.db DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Index.db to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Index.db DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-CompressionInfo.db to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-CompressionInfo.db DEBUG [Thread-193] 2015-01-20 16:43:53,752 Renaming /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-tmp-ka-1-Data.db to /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1-Data.db DEBUG [Thrift:9] 2015-01-20 16:43:53,758 computing ranges for 8e56ee4c6ef2c35b3d97c6abeefe8b92 DEBUG [Thread-191] 2015-01-20 16:43:53,786 Load metadata for /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1 INFO [Thread-191] 2015-01-20 16:43:53,786 Opening /var/folders/92/cw97kmh10dxf6kj0b4ly1mbhgn/T/cql3ks/test_bulk-46af6a92-6c4c-44be-9b14-d817f7d63174/cql3ks-test_bulk-ka-1 (52 bytes) INFO [Thread-191] 2015-01-20 16:43:53,802 [Stream #931af290-a106-11e4-ae05-d11cec192498] Executing streaming plan for Bulk Load INFO [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,802 [Stream #931af290-a106-11e4-ae05-d11cec192498] Starting streaming to /127.0.0.1 DEBUG [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,802 [Stream #931af290-a106-11e4-ae05-d11cec192498] Sending stream init for incoming stream DEBUG [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,805 [Stream #931af290-a106-11e4-ae05-d11cec192498] Sending stream init for outgoing stream DEBUG [ACCEPT-/127.0.0.1] 2015-01-20 16:43:53,805 Connection version 2 from /127.0.0.1 DEBUG [ACCEPT-/127.0.0.1] 2015-01-20 16:43:53,806 Connection version 2 from /127.0.0.1 INFO [STREAM-INIT-/127.0.0.1:52052] 2015-01-20 16:43:53,806 [Stream #931af290-a106-11e4-ae05-d11cec192498 ID#0] Creating new streaming plan for Bulk Load DEBUG [STREAM-OUT-/127.0.0.1] 2015-01-20 16:43:53,806 [Stream #931af290-a106-11e4-ae05-d11cec192498] Sending Prepare (0 requests, 1 files} INFO [StreamConnectionEstablisher:1] 2015-01-20 16:43:53,807 [Stream #931af290-a106-11e4-ae05-d11cec192498 ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 1 files(52 bytes) INFO [STREAM-INIT-/127.0.0.1:52052] 2015-01-20 16:43:53,807 [Stream #931af290-a106-11e4-ae05-d11cec192498, ID#0] Received streaming plan for Bulk Load INFO [STREAM-INIT-/127.0.0.1:52053] 2015-01-20 16:43:53,808 [Stream #931af290-a106-11e4-ae05-d11cec192498, ID#0] Received streaming plan for Bulk Load DEBUG [STREAM-IN-/127.0.0.1] 2015-01-20 16:43:53,808 [Stream #931af290-a106-11e4-ae05-d11cec192498] Received Prepare (0 requests, 1 files} INFO [STREAM-IN-/127.0.0.1] 2015-01-20 16:43:53,808 [Stream #931af290-a106-11e4-ae05-d11cec192498 ID#0] Prepare completed. Receiving 1 files(52 bytes), sending 0 files(0
[jira] [Issue Comment Deleted] (CASSANDRA-8577) Values of set types not loading correctly into Pig
[ https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8577: Comment: was deleted (was: duplicate of CASSANDRA-8577) Values of set types not loading correctly into Pig -- Key: CASSANDRA-8577 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577 Project: Cassandra Issue Type: Bug Reporter: Oksana Danylyshyn Assignee: Artem Aliev Fix For: 2.1.3 Attachments: cassandra-2.1-8577.txt Values of set types are not loading correctly from Cassandra (cql3 table, Native protocol v3) into Pig using CqlNativeStorage. When using Cassandra version 2.1.0 only empty values are loaded, and for newer versions (2.1.1 and 2.1.2) the following error is received: org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) Steps to reproduce: {code}cqlsh:socialdata CREATE TABLE test ( key varchar PRIMARY KEY, tags setvarchar ); cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 'onestep4red', 'running'}); cqlsh:socialdata select * from test; key | tags -+--- key | {'Running', 'onestep4red', 'running'} (1 rows){code} With version 2.1.0: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; (key,()){code} With version 2.1.2: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27) at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796) at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195) at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code} Expected result: {code}(key,(Running,onestep4red,running)){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig
[ https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277952#comment-14277952 ] Alex Liu edited comment on CASSANDRA-8577 at 1/15/15 12:09 AM: --- duplicate of CASSANDRA-8577 was (Author: alexliu68): duplicate of CASSANDRA-8577 Values of set types not loading correctly into Pig -- Key: CASSANDRA-8577 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577 Project: Cassandra Issue Type: Bug Reporter: Oksana Danylyshyn Assignee: Artem Aliev Fix For: 2.1.3 Attachments: cassandra-2.1-8577.txt Values of set types are not loading correctly from Cassandra (cql3 table, Native protocol v3) into Pig using CqlNativeStorage. When using Cassandra version 2.1.0 only empty values are loaded, and for newer versions (2.1.1 and 2.1.2) the following error is received: org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) Steps to reproduce: {code}cqlsh:socialdata CREATE TABLE test ( key varchar PRIMARY KEY, tags setvarchar ); cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 'onestep4red', 'running'}); cqlsh:socialdata select * from test; key | tags -+--- key | {'Running', 'onestep4red', 'running'} (1 rows){code} With version 2.1.0: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; (key,()){code} With version 2.1.2: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27) at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796) at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195) at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code} Expected result: {code}(key,(Running,onestep4red,running)){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8576: Attachment: 8576-2.1-branch.txt v1 patch is attached to only support full partition key EQ queries. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Attachments: 8576-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8577) Values of set types not loading correctly into Pig
[ https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277952#comment-14277952 ] Alex Liu commented on CASSANDRA-8577: - duplicate of CASSANDRA-8577 Values of set types not loading correctly into Pig -- Key: CASSANDRA-8577 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577 Project: Cassandra Issue Type: Bug Reporter: Oksana Danylyshyn Assignee: Artem Aliev Fix For: 2.1.3 Attachments: cassandra-2.1-8577.txt Values of set types are not loading correctly from Cassandra (cql3 table, Native protocol v3) into Pig using CqlNativeStorage. When using Cassandra version 2.1.0 only empty values are loaded, and for newer versions (2.1.1 and 2.1.2) the following error is received: org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) Steps to reproduce: {code}cqlsh:socialdata CREATE TABLE test ( key varchar PRIMARY KEY, tags setvarchar ); cqlsh:socialdata insert into test (key, tags) values ('key', {'Running', 'onestep4red', 'running'}); cqlsh:socialdata select * from test; key | tags -+--- key | {'Running', 'onestep4red', 'running'} (1 rows){code} With version 2.1.0: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; (key,()){code} With version 2.1.2: {code}grunt data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); grunt dump data; org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94) at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27) at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796) at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195) at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code} Expected result: {code}(key,(Running,onestep4red,running)){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276140#comment-14276140 ] Alex Liu commented on CASSANDRA-8576: - Should it work only for EQ predicates? Should it also include IN predicates? Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274134#comment-14274134 ] Alex Liu commented on CASSANDRA-8599: - I am removing CS and refactoring CNS. Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: 8166_2.1_branch.txt v1 patch on 2.1 branch is attached which wrap CqlStorage on CqlNativeStorage to retain backward compatibility. Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8166_2.1_branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: 8599-2.1-branch.txt Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8599-2.1-branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: (was: 8166_2.1_branch.txt) Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: (was: 8599-2.1-branch.txt) Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8599-2.1-branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: 8599-2.1-branch.txt Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8599-2.1-branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Liu updated CASSANDRA-8599: Attachment: 8599-v2-2.1-branch.txt Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8599-2.1-branch.txt, 8599-v2-2.1-branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8599) Refactor or fix CqlStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274370#comment-14274370 ] Alex Liu commented on CASSANDRA-8599: - V2 is attached to add deprecation warning message and combine imports Refactor or fix CqlStorage -- Key: CASSANDRA-8599 URL: https://issues.apache.org/jira/browse/CASSANDRA-8599 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Brandon Williams Assignee: Alex Liu Fix For: 2.1.3 Attachments: 8599-2.1-branch.txt, 8599-v2-2.1-branch.txt In CASSANDRA-8541, for 2.1, I ultimately replace the non-existent CPIF references with CIF, since CNS was broken otherwise. But this means CS no longer works since it's not a simple drop in replacement (but have CNS work is better than having them both broken by a class that doesn't exist.) We can't just deprecate and remove CS either, because CNS extends it. We either need to fix CS to work with CIF, or we need to refactor CNS so that we can just remove CS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8541) References to non-existent/deprecated CqlPagingInputFormat in code
[ https://issues.apache.org/jira/browse/CASSANDRA-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266858#comment-14266858 ] Alex Liu commented on CASSANDRA-8541: - Please rollback the change to CqlStorage which supports to use CqlPagingInputFormat. CqlNativeStorage uses CqlInputFormat References to non-existent/deprecated CqlPagingInputFormat in code -- Key: CASSANDRA-8541 URL: https://issues.apache.org/jira/browse/CASSANDRA-8541 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: hadoop Fix For: 2.0.12 Attachments: CASSANDRA-8541.txt On Mac 10.9.5, Java 1.7, latest cassandra trunk - References to non-existent/deprecated CqlPagingInputFormat in code. As per Changes.txt/7570 both CqlPagingInputFormat and CqlPagingRecordReader are removed, but lingering references in WordCount,CqlStorage.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8486) Can't authenticate using CqlRecordReader
[ https://issues.apache.org/jira/browse/CASSANDRA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247212#comment-14247212 ] Alex Liu commented on CASSANDRA-8486: - try set it as {code} ConfigHelper.setInputKeyspaceUserNameAndPassword(conf, username, password); CqlConfigHelper.setInputNativeAuthProvider(conf, PlainTextAuthProvider.class.getName()); {code} Can't authenticate using CqlRecordReader Key: CASSANDRA-8486 URL: https://issues.apache.org/jira/browse/CASSANDRA-8486 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Cyril Scetbon Assignee: Alex Liu Using CqlPagingRecordReader, it was possible to use authentification to connect to the cassandra cluster, but now that we only have CqlRecordReader we can't anymore. We should put [this code|https://github.com/apache/cassandra/blob/cassandra-2.0.9/src/java/org/apache/cassandra/hadoop/cql3/CqlPagingRecordReader.java#L140-L153] back in CqlRecordReader -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7083) Authentication Support for CqlRecordWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200579#comment-14200579 ] Alex Liu commented on CASSANDRA-7083: - +1 Authentication Support for CqlRecordWriter -- Key: CASSANDRA-7083 URL: https://issues.apache.org/jira/browse/CASSANDRA-7083 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Henning Kropp Assignee: Brandon Williams Labels: authentication, pig Fix For: 2.0.12 Attachments: auth_cql.patch The {{CqlRecordWriter}} seems not to support authentication. When the keyspace in Cassandra is to set to use authentication our Pig job fails with, when credentials are provided using the URI ('cql://username:password...): {code} java.lang.RuntimeException: InvalidRequestException(why:You have not logged in) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:123) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:90) at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:76) at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:57) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: InvalidRequestException(why:You have not logged in) at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:38677) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1597) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1582) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.retrievePartitionKeyValidator(CqlRecordWriter.java:332) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:108) ... 7 more {code} If not supplied in the URI but as only in the {{JobConf}} the exception is: {code} Output Location Validation Failed for: 'cql://...' More info to follow: InvalidRequestException(why:You have not logged in) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$ {code} Which let to the finding, that authentication is correctly supplied for {{CqlStorage}} but not for the {{CqlRecordWriter}}. Maybe it would make sense to put the authentication part into {{ConfigHelper.getClientFromAddressList()}}? Then in {{CqlStorage}} the username and password in the conf would need to be set from the URI. If so the {{ConfigHelper}} has all the information to authenticate and already returns the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8166) Not all data is loaded to Pig using CqlNativeStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181550#comment-14181550 ] Alex Liu commented on CASSANDRA-8166: - yes Not all data is loaded to Pig using CqlNativeStorage Key: CASSANDRA-8166 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Oksana Danylyshyn Assignee: Alex Liu Attachments: 8166_2.1_branch.txt, pig_header, pig_schema, sorted.zip Not all the data from Cassandra table is loaded into Pig using CqlNativeStorage function. Steps to reproduce: cql3 create table statement: CREATE TABLE time_bucket_step ( key varchar, object_id varchar, value varchar, PRIMARY KEY (key, object_id) ); Loading and saving data to Cassandra (sorted file is in the attachment): time_bucket_step = load 'sorted' using PigStorage('\t') as (key:chararray, object_id:chararray, value:chararray); records = foreach time_bucket_step generate TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)), TOTUPLE(value); store records into 'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); Results: Input(s): Successfully read 139026 records (5817 bytes) from: hdfs://.../sorted Output(s): Successfully stored 139026 records in: cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F Loading data from Cassandra: (note that not all data are read) time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); store time_bucket_step_cass into 'time_bucket_step_cass' using PigStorage('\t','-schema'); Results: Input(s): Successfully read 80727 records (20068 bytes) from: cql://socialdata/time_bucket_step Output(s): Successfully stored 80727 records (2098178 bytes) in: hdfs:///time_bucket_step_cass Actual: only 80727 of 139026 records were loaded Expected: All data should be loaded -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8166) Not all data is loaded to Pig using CqlNativeStorage
[ https://issues.apache.org/jira/browse/CASSANDRA-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180459#comment-14180459 ] Alex Liu commented on CASSANDRA-8166: - Did you check Cqlsh for result count? select count(*) from socialdata.time_bucket_step Not all data is loaded to Pig using CqlNativeStorage Key: CASSANDRA-8166 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166 Project: Cassandra Issue Type: Bug Components: Hadoop Reporter: Oksana Danylyshyn Assignee: Alex Liu Attachments: sorted.zip Not all the data from Cassandra table is loaded into Pig using CqlNativeStorage function. Steps to reproduce: cql3 create table statement: CREATE TABLE time_bucket_step ( key varchar, object_id varchar, value varchar, PRIMARY KEY (key, object_id) ); Loading and saving data to Cassandra (sorted file is in the attachment): time_bucket_step = load 'sorted' using PigStorage('\t','-schema'); records = foreach time_bucket_step generate TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)), TOTUPLE(value); store records into 'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); Results: Input(s): Successfully read 139026 records (5817 bytes) from: hdfs://.../sorted Output(s): Successfully stored 139026 records in: cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F Loading data from Cassandra: (note that not all data are read) time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using org.apache.cassandra.hadoop.pig.CqlNativeStorage(); store time_bucket_step_cass into 'time_bucket_step_cass' using PigStorage('\t','-schema'); Results: Input(s): Successfully read 80727 records (20068 bytes) from: cql://socialdata/time_bucket_step Output(s): Successfully stored 80727 records (2098178 bytes) in: hdfs:///time_bucket_step_cass Actual: only 80727 of 139026 records were loaded Expected: All data should be loaded -- This message was sent by Atlassian JIRA (v6.3.4#6332)