[jira] [Created] (ARROW-17255) Support JSON logical type in Arrow
Pradeep Gollakota created ARROW-17255: - Summary: Support JSON logical type in Arrow Key: ARROW-17255 URL: https://issues.apache.org/jira/browse/ARROW-17255 Project: Apache Arrow Issue Type: Improvement Components: Archery Reporter: Pradeep Gollakota As a BigQuery developer, I would like the Arrow libraries to support the JSON logical Type. This would enable us to use the JSON type in the Arrow format of our ReadAPI. This would also enable us to use the JSON type to export data from BigQuery to Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17255) Support JSON logical type in Arrow
Pradeep Gollakota created ARROW-17255: - Summary: Support JSON logical type in Arrow Key: ARROW-17255 URL: https://issues.apache.org/jira/browse/ARROW-17255 Project: Apache Arrow Issue Type: Improvement Components: Archery Reporter: Pradeep Gollakota As a BigQuery developer, I would like the Arrow libraries to support the JSON logical Type. This would enable us to use the JSON type in the Arrow format of our ReadAPI. This would also enable us to use the JSON type to export data from BigQuery to Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PARQUET-869) Min/Max record counts for block size checks are not configurable
Pradeep Gollakota created PARQUET-869: - Summary: Min/Max record counts for block size checks are not configurable Key: PARQUET-869 URL: https://issues.apache.org/jira/browse/PARQUET-869 Project: Parquet Issue Type: Improvement Reporter: Pradeep Gollakota While the min/max record counts for page size check are configurable via ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK and ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK configs and via ParquetProperties directly, the min/max record counts for block size check are hard coded inside InternalParquetRecordWriter. These two settings should also be configurable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names
[ https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310241#comment-14310241 ] Pradeep Gollakota commented on KAFKA-1884: -- [~guozhang] That's what I figured at first. But the odd behavior is that the exception storm is happening on server even after the producer has been shut down (and the broker restarted). Not sure why that would be the case. New Producer blocks forever for Invalid topic names --- Key: KAFKA-1884 URL: https://issues.apache.org/jira/browse/KAFKA-1884 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Manikumar Reddy Fix For: 0.8.3 New producer blocks forever for invalid topics names producer logs: DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50845. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50846. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50847. Broker logs: [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: my-producer; Topics: TOPIC= (kafka.server.KafkaApis) kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-' at kafka.common.Topic$.validate(Topic.scala:42) at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:93) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389) at kafka.server.KafkaApis.handle(KafkaApis.scala:57) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names
[ https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310548#comment-14310548 ] Pradeep Gollakota commented on KAFKA-1884: -- I guess that makes sense... I'll confirm. New Producer blocks forever for Invalid topic names --- Key: KAFKA-1884 URL: https://issues.apache.org/jira/browse/KAFKA-1884 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Manikumar Reddy Fix For: 0.8.3 New producer blocks forever for invalid topics names producer logs: DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50845. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50846. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50847. Broker logs: [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: my-producer; Topics: TOPIC= (kafka.server.KafkaApis) kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-' at kafka.common.Topic$.validate(Topic.scala:42) at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:93) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389) at kafka.server.KafkaApis.handle(KafkaApis.scala:57) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names
[ https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308670#comment-14308670 ] Pradeep Gollakota commented on KAFKA-1884: -- What makes the behavior in #2 earlier even more odd is, I stopped the server, deleted the znodes, deleted the kafka log dir and restarted the server and the same behavior is seen. O.o New Producer blocks forever for Invalid topic names --- Key: KAFKA-1884 URL: https://issues.apache.org/jira/browse/KAFKA-1884 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Manikumar Reddy Fix For: 0.8.3 New producer blocks forever for invalid topics names producer logs: DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50845. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50846. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50847. Broker logs: [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: my-producer; Topics: TOPIC= (kafka.server.KafkaApis) kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-' at kafka.common.Topic$.validate(Topic.scala:42) at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:93) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389) at kafka.server.KafkaApis.handle(KafkaApis.scala:57) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names
[ https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308539#comment-14308539 ] Pradeep Gollakota commented on KAFKA-1884: -- I'd like to work on this. Please assign to me. I've been able to reproduce the issue. I also noticed another oddity about this though. 1. The server side error above is being repeated 100's of times a second (each repeat increments the CorrelationId). This seems to indicate some type of retry logic. 2. If I kill the server, kill the client and start the server. The error continues to repeat. This seems to indicate that this request may be persisted somewhere. I have a good grasp of where to start looking for the problem, though I have no idea why the above two are occurring. New Producer blocks forever for Invalid topic names --- Key: KAFKA-1884 URL: https://issues.apache.org/jira/browse/KAFKA-1884 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Manikumar Reddy Fix For: 0.8.3 New producer blocks forever for invalid topics names producer logs: DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50845. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50846. DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying to send metadata request to node -1 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer}, body={topics=[TOPIC=]})) to node -1 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): Ignoring empty metadata response with correlation id 50847. Broker logs: [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: my-producer; Topics: TOPIC= (kafka.server.KafkaApis) kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a character other than ASCII alphanumerics, '.', '_' and '-' at kafka.common.Topic$.validate(Topic.scala:42) at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367) at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:93) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389) at kafka.server.KafkaApis.handle(KafkaApis.scala:57) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System
[ https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017422#comment-14017422 ] Pradeep Gollakota commented on AMBARI-5707: --- I too agree that it may not be the best idea for Ambari to rebuild components of the stack. However, I would like to see a pluggable architecture for Metrics. For example, we use Datadog heavily, so it would be great if Ambari could plug into our existing metrics infrastructure and pull graphs directly from Datadog. Replace Ganglia with high performant and pluggable Metrics System - Key: AMBARI-5707 URL: https://issues.apache.org/jira/browse/AMBARI-5707 Project: Ambari Issue Type: New Feature Components: agent, controller Affects Versions: 1.6.0 Reporter: Siddharth Wagle Assignee: Siddharth Wagle Priority: Critical Attachments: MetricsSystemArch.png Ambari Metrics System - Ability to collect metrics from Hadoop and other Stack services - Ability to retain metrics at a high precision for a configurable time period (say 5 days) - Ability to automatically purge metrics after retention period - At collection time, provide clear integration point for external system (such as TSDB) - At purge time, provide clear integration point for metrics retention by external system - Should provide default options for external metrics retention (say “HDFS”) - Provide tools / utilities for analyzing metrics in retention system (say “Hive schema, Pig scripts, etc” that can be used with the default retention store “HDFS”) System Requirements - Must be portable and platform independent - Must not conflict with any existing metrics system (such as Ganglia) - Must not conflict with existing SNMP infra - Must not run as root - Must have HA story (no SPOF) Usage - Ability to obtain metrics from Ambari REST API (point in time and temporal) - Ability to view metric graphs in Ambari Web (currently, fixed) - Ability to configure custom metric graphs in Ambari Web (currently, we have metric graphs “fixed” into the UI) - Need to improve metric graph “navigation” in Ambari Web (currently, metric graphs do not allow navigation at arbitrary timeframes, but only at ganglia aggregation intervals) - Ability to “view cluster” at point in time (i.e. see all metrics at that point) - Ability to define metrics (and how + where to obtain) in Stack Definitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1226) Rack-Aware replica assignment option
[ https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895123#comment-13895123 ] Pradeep Gollakota commented on KAFKA-1226: -- [~jvanremoortere] Can you either add the patch as an attachment or send it to me privately? I'm also very interested in this feature. Rack-Aware replica assignment option Key: KAFKA-1226 URL: https://issues.apache.org/jira/browse/KAFKA-1226 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1 Reporter: Joris Van Remoortere Assignee: Neha Narkhede Fix For: 0.8.0 Adding a rack-id to kafka config. This rack-id can be used during replica assignment by using the max-rack-replication argument in the admin scripts (create topic, etc.). By default the original replication assignment algorithm is used because max-rack-replication defaults to -1. max-rack-replication -1 is not honored if you are doing manual replica assignment (preffered). If this looks good I can add some test cases specific to the rack-aware assignment. I can also port this to trunk. We are currently running 0.8.0 in production and need this, so i wrote the patch against that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1226) Rack-Aware replica assignment option
[ https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895138#comment-13895138 ] Pradeep Gollakota commented on KAFKA-1226: -- [~jvanremoortere] Sweet! Thanks. Can we mark this closed as a duplicate please? Rack-Aware replica assignment option Key: KAFKA-1226 URL: https://issues.apache.org/jira/browse/KAFKA-1226 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.0, 0.8.1 Reporter: Joris Van Remoortere Assignee: Neha Narkhede Fix For: 0.8.0 Adding a rack-id to kafka config. This rack-id can be used during replica assignment by using the max-rack-replication argument in the admin scripts (create topic, etc.). By default the original replication assignment algorithm is used because max-rack-replication defaults to -1. max-rack-replication -1 is not honored if you are doing manual replica assignment (preffered). If this looks good I can add some test cases specific to the rack-aware assignment. I can also port this to trunk. We are currently running 0.8.0 in production and need this, so i wrote the patch against that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1175) Hierarchical Topics
[ https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Gollakota updated KAFKA-1175: - Issue Type: New Feature (was: Bug) Hierarchical Topics --- Key: KAFKA-1175 URL: https://issues.apache.org/jira/browse/KAFKA-1175 Project: Kafka Issue Type: New Feature Reporter: Pradeep Gollakota Allow for creation of hierarchical topics so that related topics can be grouped together. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (KAFKA-1175) Hierarchical Topics
[ https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843861#comment-13843861 ] Pradeep Gollakota commented on KAFKA-1175: -- I'm very interested in this feature. I created this JIRA to track the progress and, to start a dialog so we can discuss this further. I would love to work on this if no one is working on it. I would love to discuss how this would relate/affect securing Kafka [KAFKA-1176] Hierarchical Topics --- Key: KAFKA-1175 URL: https://issues.apache.org/jira/browse/KAFKA-1175 Project: Kafka Issue Type: New Feature Reporter: Pradeep Gollakota Allow for creation of hierarchical topics so that related topics can be grouped together. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (KAFKA-1175) Hierarchical Topics
[ https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844002#comment-13844002 ] Pradeep Gollakota commented on KAFKA-1175: -- In the proposal, [~jkreps] talks about a couple of use cases at LinkedIn. # Group a set of topics that are related by the application area (ads, search, etc.) # Group a set of topics based on usage paradigm (tracking, metrics, etc.) At Lithium, we have an extension of those use cases. We have multiple products that are deployed for different customers. So we need to partition by customer_id and product_id. So, we may have hierarchical topics of the following nature: # /p1/google/tracking/search/click_events # /p1/google/tracking/community/message_create_events # /p1/linkedin/tracking/search/click_events # /p1/linkedin/tracking/community/message_create_events # /p2/apache/tracking/core/user_ident_event Hierarchical Topics --- Key: KAFKA-1175 URL: https://issues.apache.org/jira/browse/KAFKA-1175 Project: Kafka Issue Type: New Feature Reporter: Pradeep Gollakota Allow for creation of hierarchical topics so that related topics can be grouped together. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (PIG-3453) Implement a Storm backend to Pig
[ https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813136#comment-13813136 ] Pradeep Gollakota commented on PIG-3453: [~thedatachef] Wow... This is a great start! Thanks so much for hacking this out. [~cheolsoo] I agree that this work should be committed back to a branch on Apache. I foresee a lot more contributions and collaboration on this, so it would be easier to coordinate via Apache as opposed to the git mirror. [~dvryaboy] I have been strongly considering writing this DSL to Summingbird instead of Trident/Storm. I am considering if there are going to be any implications to doing this though. By writing to Summingbird we would get both a real-time mode and a hybrid mode execution, which in my mind is a huge win. At Lithium Technologies, we have been considering using Summingbird for hybrid mode execution. The question in my mind is, do we want to use Summingbird if all we want is a real-time engine (i.e. storm). We can change the scope of this JIRA to write a Summingbird backend or we can open another JIRA to implement a Summingbird POC and then see where that gets us. Implement a Storm backend to Pig Key: PIG-3453 URL: https://issues.apache.org/jira/browse/PIG-3453 Project: Pig Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Pradeep Gollakota Assignee: Jacob Perkins Labels: storm Fix For: 0.13.0 Attachments: storm-integration.patch There is a lot of interest around implementing a Storm backend to Pig for streaming processing. The proposal and initial discussions can be found at https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3453) Implement a Storm backend to Pig
[ https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764812#comment-13764812 ] Pradeep Gollakota commented on PIG-3453: I personally don't have a concrete use case for this yet. In terms of using a system that can work both in warehousing and in real-time, I have been looking at Summingbird (recently opensourced). I think the word count example is a good place to start as it's the canonical example. However, I'd like to have a more complicated example as well, so I'm writing a TF-IDF implementation in Pig and in Trident. Perhaps, this can be step 2 PoC after word count. I'd also like to cut out some of the more complex operations like nested foreach statements etc in the initial PoC. I'm not sure yet how we'd solve them. [~thedatachef] I started a new job last week and I'm not sure how this task would fit into the road map of my new company yet. I'd love to work on this, if I have time. You're more than welcome to work on this as well. Thanks for all your great comments, input and enthusiasm. Implement a Storm backend to Pig Key: PIG-3453 URL: https://issues.apache.org/jira/browse/PIG-3453 Project: Pig Issue Type: New Feature Reporter: Pradeep Gollakota Labels: storm There is a lot of interest around implementing a Storm backend to Pig for streaming processing. The proposal and initial discussions can be found at https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3453) Implement a Storm backend to Pig
Pradeep Gollakota created PIG-3453: -- Summary: Implement a Storm backend to Pig Key: PIG-3453 URL: https://issues.apache.org/jira/browse/PIG-3453 Project: Pig Issue Type: New Feature Reporter: Pradeep Gollakota There is a lot of interest around implementing a Storm backend to Pig for streaming processing. The proposal and initial discussions can be found at https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3391) Issue with DataType- Long conversion in New AvroStorage()
[ https://issues.apache.org/jira/browse/PIG-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717697#comment-13717697 ] Pradeep Gollakota commented on PIG-3391: I have a couple of quick questions: 1. Should the summary read DateTime \- Long instead of DataType\- Long? 2. What do you mean by datetime-long in the description? 3. Could you provide a link to the JIRA that you're talking about? If the answer to #2 is epoch time, then I believe you are correct that timezone information will be lost, epoch time is always UTC. Issue with DataType- Long conversion in New AvroStorage() -- Key: PIG-3391 URL: https://issues.apache.org/jira/browse/PIG-3391 Project: Pig Issue Type: Improvement Reporter: Anup Ahire Shouldn't we loose the timezone information if we convert datetime to long ? After going through the jira for datetime, it appears that datetime-long wasn't supported to avoid timezone information loss. Thanks !! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2495) Using merge JOIN from a HBaseStorage produces an error
[ https://issues.apache.org/jira/browse/PIG-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717915#comment-13717915 ] Pradeep Gollakota commented on PIG-2495: Hi Kevin, I have a very minor request for your patch. When throwing the RuntimeException, could you also include the class information for the given type? This could potentially be useful for debugging purposes. Using merge JOIN from a HBaseStorage produces an error -- Key: PIG-2495 URL: https://issues.apache.org/jira/browse/PIG-2495 Project: Pig Issue Type: Bug Affects Versions: 0.9.1, 0.9.2 Environment: HBase 0.90.3, Hadoop 0.20-append Reporter: Kevin Lion Assignee: Kevin Lion Fix For: 0.12 Attachments: PIG-2495.patch To increase performance of my computation, I would like to use a merge join between two tables to increase speed computation but it produces an error. Here is the script: {noformat} start_sessions = LOAD 'hbase://startSession.bea00.dev.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid meta:imei meta:timestamp', '-loadKey') AS (sid:chararray, infoid:chararray, imei:chararray, start:long); end_sessions = LOAD 'hbase://endSession.bea00.dev.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:timestamp meta:locid', '-loadKey') AS (sid:chararray, end:long, locid:chararray); sessions = JOIN start_sessions BY sid, end_sessions BY sid USING 'merge'; STORE sessions INTO 'sessionsTest' USING PigStorage ('*'); {noformat} Here is the result of this script : {noformat} 2012-01-30 16:12:43,920 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1327939963919.log 2012-01-30 16:12:44,025 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://lxc233:9000 2012-01-30 16:12:44,102 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: lxc233:9001 2012-01-30 16:12:44,760 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: MERGE_JION 2012-01-30 16:12:44,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2012-01-30 16:12:44,982 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 2 2012-01-30 16:12:44,982 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2012-01-30 16:12:45,001 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2012-01-30 16:12:45,006 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=lxc233.machine.com 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.6.0_22 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun Microsystems Inc. 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre 2012-01-30 16:12:45,039 [main] INFO org.apache.zookeeper.ZooKeeper - Client
[jira] [Commented] (HBASE-3732) New configuration option for client-side compression
[ https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711038#comment-13711038 ] Pradeep Gollakota commented on HBASE-3732: -- Yes it does. I misread his comment the first time. I also found HBASE-5355 which is exactly the thing that addresses my use case. Thanks. New configuration option for client-side compression Key: HBASE-3732 URL: https://issues.apache.org/jira/browse/HBASE-3732 Project: HBase Issue Type: New Feature Reporter: Jean-Daniel Cryans Attachments: compressed_streams.jar We have a case here where we have to store very fat cells (arrays of integers) which can amount into the hundreds of KBs that we need to read often, concurrently, and possibly keep in cache. Compressing the values on the client using java.util.zip's Deflater before sending them to HBase proved to be in our case almost an order of magnitude faster. There reasons are evident: less data sent to hbase, memstore contains compressed data, block cache contains compressed data too, etc. I was thinking that it might be something useful to add to a family schema, so that Put/Result do the conversion for you. The actual compression algo should also be configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3732) New configuration option for client-side compression
[ https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710623#comment-13710623 ] Pradeep Gollakota commented on HBASE-3732: -- I'd like to reopen discussion on this ticket. I have a slightly different use case that I'm considering for client side compression (sorry if this isn't the right forum for this question). I have a scenario where clients are in a different network topology than the hbase cluster. The bandwidth between the clients and the cluster is limited. Since the client buffers writes, is there any mechanism in place for compressing the over the wire transfers? New configuration option for client-side compression Key: HBASE-3732 URL: https://issues.apache.org/jira/browse/HBASE-3732 Project: HBase Issue Type: New Feature Reporter: Jean-Daniel Cryans Attachments: compressed_streams.jar We have a case here where we have to store very fat cells (arrays of integers) which can amount into the hundreds of KBs that we need to read often, concurrently, and possibly keep in cache. Compressing the values on the client using java.util.zip's Deflater before sending them to HBase proved to be in our case almost an order of magnitude faster. There reasons are evident: less data sent to hbase, memstore contains compressed data, block cache contains compressed data too, etc. I was thinking that it might be something useful to add to a family schema, so that Put/Result do the conversion for you. The actual compression algo should also be configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
[ https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676043#comment-13676043 ] Pradeep Gollakota commented on ACCUMULO-391: This would be a great addition. We have just started working with Pig (with Accumulo) at my company. The first thing that we noticed is that in a lot of situations, where we are joining data from one Accumulo table to data from another, we have to first dump the data from both tables to HDFS (perhaps using PigStorage), load the data back and then join the data. This was because the scan information is encoded in the job configuration. So, when Pig uses the MultiInputFormat to scan both tables in the same job, only one table ends up getting exported from Accumulo. If this is completed, we could use the MultiTableInputFormat instead of Accumulo(Row)InputFormat to optimize our pig scripts. Any thoughts on when this would be included? Multi-table Accumulo input format - Key: ACCUMULO-391 URL: https://issues.apache.org/jira/browse/ACCUMULO-391 Project: Accumulo Issue Type: New Feature Reporter: John Vines Assignee: William Slacum Priority: Minor Labels: mapreduce, Attachments: multi-table-if.patch, new-multitable-if.patch Just realized we had no MR input method which supports multiple Tables for an input format. I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively have the Table/Key be the key tuple and stick with Values being the value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
[ https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676290#comment-13676290 ] Pradeep Gollakota commented on ACCUMULO-391: I'm also available to help with this task if needed. Multi-table Accumulo input format - Key: ACCUMULO-391 URL: https://issues.apache.org/jira/browse/ACCUMULO-391 Project: Accumulo Issue Type: New Feature Reporter: John Vines Assignee: William Slacum Priority: Minor Labels: mapreduce, Fix For: 1.6.0 Attachments: multi-table-if.patch, new-multitable-if.patch Just realized we had no MR input method which supports multiple Tables for an input format. I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively have the Table/Key be the key tuple and stick with Values being the value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (JENA-402) Move etc/*.rules to src/main/resources/etc/*.rules
[ https://issues.apache.org/jira/browse/JENA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Gollakota updated JENA-402: --- Attachment: JENA-402-1.patch Moved *.rules from jena-core/etc/ to jena-core/src/main/resources/etc and Removed reference to etc from POM. Please review Move etc/*.rules to src/main/resources/etc/*.rules -- Key: JENA-402 URL: https://issues.apache.org/jira/browse/JENA-402 Project: Apache Jena Issue Type: Improvement Reporter: Andy Seaborne Priority: Minor Attachments: JENA-402-1.patch If we move the rules files to src/main/resources, we can drop the specific mention of etc in POM. This will also make use of Eclipse-linked projects work because at the moment, there are kludges to make the tests work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-402) Move etc/*.rules to src/main/resources/etc/*.rules
[ https://issues.apache.org/jira/browse/JENA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585287#comment-13585287 ] Pradeep Gollakota commented on JENA-402: This appears to be complete. Should this be closed? Move etc/*.rules to src/main/resources/etc/*.rules -- Key: JENA-402 URL: https://issues.apache.org/jira/browse/JENA-402 Project: Apache Jena Issue Type: Improvement Reporter: Andy Seaborne Priority: Minor If we move the rules files to src/main/resources, we can drop the specific mention of etc in POM. This will also make use of Eclipse-linked projects work because at the moment, there are kludges to make the tests work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-285) Release Giraph-0.2
[ https://issues.apache.org/jira/browse/GIRAPH-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582405#comment-13582405 ] Pradeep Gollakota commented on GIRAPH-285: -- Any progress on this guys? The 0.2 version of Giraph is MUCH better to work with and it would be great to have a stable released version of this soon (especially considering 0.1 was released last February). Release Giraph-0.2 -- Key: GIRAPH-285 URL: https://issues.apache.org/jira/browse/GIRAPH-285 Project: Giraph Issue Type: Task Reporter: Avery Ching Assignee: Avery Ching I think it's time to do this. Trunk is moving fast and we need to provide something for users that is fixed. Giraph has already progressed a lot from 0.1. Jakob, can you please share your notes on releasing 0.1? I'd really appreciate having them as a way to get started. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (JENA-228) Limiting query output centrally
[ https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Gollakota updated JENA-228: --- Attachment: JENA-228-1.patch Submitting an initial patch. I chose to intercept the the query in QueryEngineBase after it's been compiled to the Algebra but before the optimization. Added a few unit test cases to validate that the outer most Op is always an OpSlice (not sure of performance implications). Limiting query output centrally --- Key: JENA-228 URL: https://issues.apache.org/jira/browse/JENA-228 Project: Apache Jena Issue Type: New Feature Components: ARQ, Fuseki Affects Versions: ARQ 2.9.0, Fuseki 0.2.1 Reporter: Giuseppe Sollazzo Attachments: JENA-228-1.patch I was wondering whether there will be some way of limiting output in fuseki. Basically, I'd like to be able to enforce limits on the number of results returned by the system. As an example, think about a numrows in sql. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-228) Limiting query output centrally
[ https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539448#comment-13539448 ] Pradeep Gollakota commented on JENA-228: I'd like to start working on this if I may. But, I have no idea where to start, any guidance would be appreciated. Limiting query output centrally --- Key: JENA-228 URL: https://issues.apache.org/jira/browse/JENA-228 Project: Apache Jena Issue Type: New Feature Components: ARQ, Fuseki Affects Versions: ARQ 2.9.0, Fuseki 0.2.1 Reporter: Giuseppe Sollazzo I was wondering whether there will be some way of limiting output in fuseki. Basically, I'd like to be able to enforce limits on the number of results returned by the system. As an example, think about a numrows in sql. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-575) AvroOutputFormat doesn't work for map-only jobs if only the map output schema has been set
[ https://issues.apache.org/jira/browse/AVRO-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539460#comment-13539460 ] Pradeep Gollakota commented on AVRO-575: This JIRA seems to be OBE. The patch attached is no longer applicable and the AvroOutputFormat is already handling map-only jobs. AvroOutputFormat doesn't work for map-only jobs if only the map output schema has been set -- Key: AVRO-575 URL: https://issues.apache.org/jira/browse/AVRO-575 Project: Avro Issue Type: Bug Components: java Reporter: Tom White Attachments: AVRO-575.patch AvroOutputFormat should use AvroJob.MAP_OUTPUT_SCHEMA for map-only jobs if AvroJob.OUTPUT_SCHEMA has not been set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1180) Broken links on Code Review Checklist page on confluence
Pradeep Gollakota created AVRO-1180: --- Summary: Broken links on Code Review Checklist page on confluence Key: AVRO-1180 URL: https://issues.apache.org/jira/browse/AVRO-1180 Project: Avro Issue Type: Task Reporter: Pradeep Gollakota Priority: Trivial The [Code Review Checklist|https://cwiki.apache.org/confluence/display/AVRO/Code+Review+Checklist] has two broken links. The link referencing Sun's code conventions points to http://java.sun.com/docs/codeconv/ This link should be updated to (I'm guessing) http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html The link referencing Log4j Level's is pointing to http://logging.apache.org/log4j/docs/api/org/apache/log4j/Level.html This should be updated to https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ACCUMULO-736) Add Column Pagination Filter
Pradeep Gollakota created ACCUMULO-736: -- Summary: Add Column Pagination Filter Key: ACCUMULO-736 URL: https://issues.apache.org/jira/browse/ACCUMULO-736 Project: Accumulo Issue Type: Bug Components: client Reporter: Pradeep Gollakota Assignee: Billie Rinaldi Client application may need to perform pagination of data depending on the number of columns returned. This would be more efficient if the database itself handled the pagination. Similar to https://issues.apache.org/jira/browse/HBASE-2438 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ACCUMULO-736) Add Column Pagination Filter
[ https://issues.apache.org/jira/browse/ACCUMULO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Gollakota updated ACCUMULO-736: --- Issue Type: Wish (was: Bug) Add Column Pagination Filter Key: ACCUMULO-736 URL: https://issues.apache.org/jira/browse/ACCUMULO-736 Project: Accumulo Issue Type: Wish Components: client Reporter: Pradeep Gollakota Assignee: Billie Rinaldi Client application may need to perform pagination of data depending on the number of columns returned. This would be more efficient if the database itself handled the pagination. Similar to https://issues.apache.org/jira/browse/HBASE-2438 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-736) Add Column Pagination Filter
[ https://issues.apache.org/jira/browse/ACCUMULO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440033#comment-13440033 ] Pradeep Gollakota commented on ACCUMULO-736: I myself have extremely limited knowledge of the HBase API. I provided the link as a way of including relevant discussions. The reason I'm requesting this feature is for network optimization. Please correct me if my understanding of the Accumulo API is not correct. Scanner returns the data in KV pairs via a Java Iterator. However, the data itself is returned from the server to the Scanner in batches (of size 1000 by default). So, if I'm looking for columns (n, n+k) from a row, the only way the client can filter the correct range is by retrieving n+k KV pairs. For large values of n, this can cause a lot of network overhead. If we can page the data server side and return only the relevant data over the network, it would be more optimized. My initial attempt at this problem would probably be an Iterator/Filter. However, if this can become a part of the Scanner API, it would become more natural to work with it. Add Column Pagination Filter Key: ACCUMULO-736 URL: https://issues.apache.org/jira/browse/ACCUMULO-736 Project: Accumulo Issue Type: Wish Components: client Reporter: Pradeep Gollakota Assignee: Billie Rinaldi Client application may need to perform pagination of data depending on the number of columns returned. This would be more efficient if the database itself handled the pagination. Similar to https://issues.apache.org/jira/browse/HBASE-2438 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira