[jira] [Created] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread Pradeep Gollakota (Jira)
Pradeep Gollakota created ARROW-17255:
-

 Summary: Support JSON logical type in Arrow
 Key: ARROW-17255
 URL: https://issues.apache.org/jira/browse/ARROW-17255
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Pradeep Gollakota


As a BigQuery developer, I would like the Arrow libraries to support the JSON 
logical Type. This would enable us to use the JSON type in the Arrow format of 
our ReadAPI. This would also enable us to use the JSON type to export data from 
BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17255) Support JSON logical type in Arrow

2022-07-29 Thread Pradeep Gollakota (Jira)
Pradeep Gollakota created ARROW-17255:
-

 Summary: Support JSON logical type in Arrow
 Key: ARROW-17255
 URL: https://issues.apache.org/jira/browse/ARROW-17255
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Pradeep Gollakota


As a BigQuery developer, I would like the Arrow libraries to support the JSON 
logical Type. This would enable us to use the JSON type in the Arrow format of 
our ReadAPI. This would also enable us to use the JSON type to export data from 
BigQuery to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PARQUET-869) Min/Max record counts for block size checks are not configurable

2017-02-07 Thread Pradeep Gollakota (JIRA)
Pradeep Gollakota created PARQUET-869:
-

 Summary: Min/Max record counts for block size checks are not 
configurable
 Key: PARQUET-869
 URL: https://issues.apache.org/jira/browse/PARQUET-869
 Project: Parquet
  Issue Type: Improvement
Reporter: Pradeep Gollakota


While the min/max record counts for page size check are configurable via 
ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK and 
ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK configs and via 
ParquetProperties directly, the min/max record counts for block size check are 
hard coded inside InternalParquetRecordWriter.

These two settings should also be configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names

2015-02-06 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310241#comment-14310241
 ] 

Pradeep Gollakota commented on KAFKA-1884:
--

[~guozhang] That's what I figured at first. But the odd behavior is that the 
exception storm is happening on server even after the producer has been shut 
down (and the broker restarted). Not sure why that would be the case.

 New Producer blocks forever for Invalid topic names
 ---

 Key: KAFKA-1884
 URL: https://issues.apache.org/jira/browse/KAFKA-1884
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Manikumar Reddy
 Fix For: 0.8.3


 New producer blocks forever for invalid topics names
 producer logs:
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50845.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50846.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50847.
 Broker logs:
 [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request 
 Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: 
 my-producer; Topics: TOPIC= (kafka.server.KafkaApis)
 kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a 
 character other than ASCII alphanumerics, '.', '_' and '-'
   at kafka.common.Topic$.validate(Topic.scala:42)
   at 
 kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186)
   at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at 
 scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
   at scala.collection.SetLike$class.map(SetLike.scala:93)
   at scala.collection.AbstractSet.map(Set.scala:47)
   at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350)
   at 
 kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:57)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
   at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names

2015-02-06 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310548#comment-14310548
 ] 

Pradeep Gollakota commented on KAFKA-1884:
--

I guess that makes sense... I'll confirm.

 New Producer blocks forever for Invalid topic names
 ---

 Key: KAFKA-1884
 URL: https://issues.apache.org/jira/browse/KAFKA-1884
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Manikumar Reddy
 Fix For: 0.8.3


 New producer blocks forever for invalid topics names
 producer logs:
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50845.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50846.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50847.
 Broker logs:
 [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request 
 Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: 
 my-producer; Topics: TOPIC= (kafka.server.KafkaApis)
 kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a 
 character other than ASCII alphanumerics, '.', '_' and '-'
   at kafka.common.Topic$.validate(Topic.scala:42)
   at 
 kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186)
   at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at 
 scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
   at scala.collection.SetLike$class.map(SetLike.scala:93)
   at scala.collection.AbstractSet.map(Set.scala:47)
   at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350)
   at 
 kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:57)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
   at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names

2015-02-05 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308670#comment-14308670
 ] 

Pradeep Gollakota commented on KAFKA-1884:
--

What makes the behavior in #2 earlier even more odd is, I stopped the server, 
deleted the znodes, deleted the kafka log dir and restarted the server and the 
same behavior is seen.

O.o

 New Producer blocks forever for Invalid topic names
 ---

 Key: KAFKA-1884
 URL: https://issues.apache.org/jira/browse/KAFKA-1884
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Manikumar Reddy
 Fix For: 0.8.3


 New producer blocks forever for invalid topics names
 producer logs:
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50845.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50846.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50847.
 Broker logs:
 [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request 
 Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: 
 my-producer; Topics: TOPIC= (kafka.server.KafkaApis)
 kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a 
 character other than ASCII alphanumerics, '.', '_' and '-'
   at kafka.common.Topic$.validate(Topic.scala:42)
   at 
 kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186)
   at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at 
 scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
   at scala.collection.SetLike$class.map(SetLike.scala:93)
   at scala.collection.AbstractSet.map(Set.scala:47)
   at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350)
   at 
 kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:57)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
   at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1884) New Producer blocks forever for Invalid topic names

2015-02-05 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308539#comment-14308539
 ] 

Pradeep Gollakota commented on KAFKA-1884:
--

I'd like to work on this. Please assign to me.

I've been able to reproduce the issue. I also noticed another oddity about this 
though.

1. The server side error above is being repeated 100's of times a second (each 
repeat increments the CorrelationId). This seems to indicate some type of retry 
logic.
2. If I kill the server, kill the client and start the server. The error 
continues to repeat. This seems to indicate that this request may be persisted 
somewhere.

I have a good grasp of where to start looking for the problem, though I have no 
idea why the above two are occurring.

 New Producer blocks forever for Invalid topic names
 ---

 Key: KAFKA-1884
 URL: https://issues.apache.org/jira/browse/KAFKA-1884
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Manikumar Reddy
 Fix For: 0.8.3


 New producer blocks forever for invalid topics names
 producer logs:
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,406] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50845,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,416] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50845.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50846,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,417] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50846.
 DEBUG [2015-01-20 12:46:13,417] NetworkClient: maybeUpdateMetadata(): Trying 
 to send metadata request to node -1
 DEBUG [2015-01-20 12:46:13,418] NetworkClient: maybeUpdateMetadata(): Sending 
 metadata request ClientRequest(expectResponse=true, payload=null, 
 request=RequestSend(header={api_key=3,api_version=0,correlation_id=50847,client_id=my-producer},
  body={topics=[TOPIC=]})) to node -1
 TRACE [2015-01-20 12:46:13,418] NetworkClient: handleMetadataResponse(): 
 Ignoring empty metadata response with correlation id 50847.
 Broker logs:
 [2015-01-20 12:46:14,074] ERROR [KafkaApi-0] error when handling request 
 Name: TopicMetadataRequest; Version: 0; CorrelationId: 51020; ClientId: 
 my-producer; Topics: TOPIC= (kafka.server.KafkaApis)
 kafka.common.InvalidTopicException: topic name TOPIC= is illegal, contains a 
 character other than ASCII alphanumerics, '.', '_' and '-'
   at kafka.common.Topic$.validate(Topic.scala:42)
   at 
 kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:186)
   at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:177)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:367)
   at kafka.server.KafkaApis$$anonfun$5.apply(KafkaApis.scala:350)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at 
 scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
   at scala.collection.SetLike$class.map(SetLike.scala:93)
   at scala.collection.AbstractSet.map(Set.scala:47)
   at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:350)
   at 
 kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:389)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:57)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
   at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System

2014-06-04 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017422#comment-14017422
 ] 

Pradeep Gollakota commented on AMBARI-5707:
---

I too agree that it may not be the best idea for Ambari to rebuild components 
of the stack. However, I would like to see a pluggable architecture for 
Metrics. For example, we use Datadog heavily, so it would be great if Ambari 
could plug into our existing metrics infrastructure and pull graphs directly 
from Datadog.

 Replace Ganglia with high performant and pluggable Metrics System
 -

 Key: AMBARI-5707
 URL: https://issues.apache.org/jira/browse/AMBARI-5707
 Project: Ambari
  Issue Type: New Feature
  Components: agent, controller
Affects Versions: 1.6.0
Reporter: Siddharth Wagle
Assignee: Siddharth Wagle
Priority: Critical
 Attachments: MetricsSystemArch.png


 Ambari Metrics System
 - Ability to collect metrics from Hadoop and other Stack services
 - Ability to retain metrics at a high precision for a configurable time 
 period (say 5 days)
 - Ability to automatically purge metrics after retention period
 - At collection time, provide clear integration point for external system 
 (such as TSDB)
 - At purge time, provide clear integration point for metrics retention by 
 external system
 - Should provide default options for external metrics retention (say “HDFS”)
 - Provide tools / utilities for analyzing metrics in retention system (say 
 “Hive schema, Pig scripts, etc” that can be used with the default retention 
 store “HDFS”)
 System Requirements
 - Must be portable and platform independent
 - Must not conflict with any existing metrics system (such as Ganglia)
 - Must not conflict with existing SNMP infra
 - Must not run as root
 - Must have HA story (no SPOF)
 Usage
 - Ability to obtain metrics from Ambari REST API (point in time and temporal)
 - Ability to view metric graphs in Ambari Web (currently, fixed)
 - Ability to configure custom metric graphs in Ambari Web (currently, we have 
 metric graphs “fixed” into the UI)
 - Need to improve metric graph “navigation” in Ambari Web (currently, metric 
 graphs do not allow navigation at arbitrary timeframes, but only at ganglia 
 aggregation intervals) 
 - Ability to “view cluster” at point in time (i.e. see all metrics at that 
 point)
 - Ability to define metrics (and how + where to obtain) in Stack Definitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1226) Rack-Aware replica assignment option

2014-02-07 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895123#comment-13895123
 ] 

Pradeep Gollakota commented on KAFKA-1226:
--

[~jvanremoortere] Can you either add the patch as an attachment or send it to 
me privately? I'm also very interested in this feature.

 Rack-Aware replica assignment option
 

 Key: KAFKA-1226
 URL: https://issues.apache.org/jira/browse/KAFKA-1226
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1
Reporter: Joris Van Remoortere
Assignee: Neha Narkhede
 Fix For: 0.8.0


 Adding a rack-id to kafka config. This rack-id can be used during replica 
 assignment by using the max-rack-replication argument in the admin scripts 
 (create topic, etc.). By default the original replication assignment 
 algorithm is used because max-rack-replication defaults to -1. 
 max-rack-replication  -1 is not honored if you are doing manual replica 
 assignment (preffered). 
 If this looks good I can add some test cases specific to the rack-aware 
 assignment. 
 I can also port this to trunk. We are currently running 0.8.0 in production 
 and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (KAFKA-1226) Rack-Aware replica assignment option

2014-02-07 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895138#comment-13895138
 ] 

Pradeep Gollakota commented on KAFKA-1226:
--

[~jvanremoortere] Sweet! Thanks.

Can we mark this closed as a duplicate please?

 Rack-Aware replica assignment option
 

 Key: KAFKA-1226
 URL: https://issues.apache.org/jira/browse/KAFKA-1226
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1
Reporter: Joris Van Remoortere
Assignee: Neha Narkhede
 Fix For: 0.8.0


 Adding a rack-id to kafka config. This rack-id can be used during replica 
 assignment by using the max-rack-replication argument in the admin scripts 
 (create topic, etc.). By default the original replication assignment 
 algorithm is used because max-rack-replication defaults to -1. 
 max-rack-replication  -1 is not honored if you are doing manual replica 
 assignment (preffered). 
 If this looks good I can add some test cases specific to the rack-aware 
 assignment. 
 I can also port this to trunk. We are currently running 0.8.0 in production 
 and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (KAFKA-1175) Hierarchical Topics

2013-12-09 Thread Pradeep Gollakota (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Gollakota updated KAFKA-1175:
-

Issue Type: New Feature  (was: Bug)

 Hierarchical Topics
 ---

 Key: KAFKA-1175
 URL: https://issues.apache.org/jira/browse/KAFKA-1175
 Project: Kafka
  Issue Type: New Feature
Reporter: Pradeep Gollakota

 Allow for creation of hierarchical topics so that related topics can be 
 grouped together.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (KAFKA-1175) Hierarchical Topics

2013-12-09 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843861#comment-13843861
 ] 

Pradeep Gollakota commented on KAFKA-1175:
--

I'm very interested in this feature. I created this JIRA  to track the progress 
and, to start a dialog so we can discuss this further. I would love to work on 
this if no one is working on it. I would love to discuss how this would 
relate/affect securing Kafka [KAFKA-1176]

 Hierarchical Topics
 ---

 Key: KAFKA-1175
 URL: https://issues.apache.org/jira/browse/KAFKA-1175
 Project: Kafka
  Issue Type: New Feature
Reporter: Pradeep Gollakota

 Allow for creation of hierarchical topics so that related topics can be 
 grouped together.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (KAFKA-1175) Hierarchical Topics

2013-12-09 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844002#comment-13844002
 ] 

Pradeep Gollakota commented on KAFKA-1175:
--

In the proposal, [~jkreps] talks about a couple of use cases at LinkedIn.

# Group a set of topics that are related by the application area (ads, search, 
etc.)
# Group a set of topics based on usage paradigm (tracking, metrics, etc.)

At Lithium, we have an extension of those use cases. We have multiple products 
that are deployed for different customers. So we need to partition by 
customer_id and product_id. So, we may have hierarchical topics of the 
following nature:
# /p1/google/tracking/search/click_events
# /p1/google/tracking/community/message_create_events
# /p1/linkedin/tracking/search/click_events
# /p1/linkedin/tracking/community/message_create_events
# /p2/apache/tracking/core/user_ident_event

 Hierarchical Topics
 ---

 Key: KAFKA-1175
 URL: https://issues.apache.org/jira/browse/KAFKA-1175
 Project: Kafka
  Issue Type: New Feature
Reporter: Pradeep Gollakota

 Allow for creation of hierarchical topics so that related topics can be 
 grouped together.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (PIG-3453) Implement a Storm backend to Pig

2013-11-04 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813136#comment-13813136
 ] 

Pradeep Gollakota commented on PIG-3453:


[~thedatachef] Wow... This is a great start! Thanks so much for hacking this 
out.

[~cheolsoo] I agree that this work should be committed back to a branch on 
Apache. I foresee a lot more contributions and collaboration on this, so it 
would be easier to coordinate via Apache as opposed to the git mirror.

[~dvryaboy] I have been strongly considering writing this DSL to Summingbird 
instead of Trident/Storm. I am considering if there are going to be any 
implications to doing this though. By writing to Summingbird we would get both 
a real-time mode and a hybrid mode execution, which in my mind is a huge win. 
At Lithium Technologies, we have been considering using Summingbird for hybrid 
mode execution. The question in my mind is, do we want to use Summingbird if 
all we want is a real-time engine (i.e. storm). We can change the scope of this 
JIRA to write a Summingbird backend or we can open another JIRA to implement a 
Summingbird POC and then see where that gets us.

 Implement a Storm backend to Pig
 

 Key: PIG-3453
 URL: https://issues.apache.org/jira/browse/PIG-3453
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Pradeep Gollakota
Assignee: Jacob Perkins
  Labels: storm
 Fix For: 0.13.0

 Attachments: storm-integration.patch


 There is a lot of interest around implementing a Storm backend to Pig for 
 streaming processing. The proposal and initial discussions can be found at 
 https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3453) Implement a Storm backend to Pig

2013-09-11 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764812#comment-13764812
 ] 

Pradeep Gollakota commented on PIG-3453:


I personally don't have a concrete use case for this yet. In terms of using a 
system that can work both in warehousing and in real-time, I have been looking 
at Summingbird (recently opensourced). I think the word count example is a good 
place to start as it's the canonical example. However, I'd like to have a more 
complicated example as well, so I'm writing a TF-IDF implementation in Pig and 
in Trident. Perhaps, this can be step 2 PoC after word count. I'd also like to 
cut out some of the more complex operations like nested foreach statements etc 
in the initial PoC. I'm not sure yet how we'd solve them.

[~thedatachef] I started a new job last week and I'm not sure how this task 
would fit into the road map of my new company yet. I'd love to work on this, if 
I have time. You're more than welcome to work on this as well. Thanks for all 
your great comments, input and enthusiasm.

 Implement a Storm backend to Pig
 

 Key: PIG-3453
 URL: https://issues.apache.org/jira/browse/PIG-3453
 Project: Pig
  Issue Type: New Feature
Reporter: Pradeep Gollakota
  Labels: storm

 There is a lot of interest around implementing a Storm backend to Pig for 
 streaming processing. The proposal and initial discussions can be found at 
 https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3453) Implement a Storm backend to Pig

2013-09-06 Thread Pradeep Gollakota (JIRA)
Pradeep Gollakota created PIG-3453:
--

 Summary: Implement a Storm backend to Pig
 Key: PIG-3453
 URL: https://issues.apache.org/jira/browse/PIG-3453
 Project: Pig
  Issue Type: New Feature
Reporter: Pradeep Gollakota


There is a lot of interest around implementing a Storm backend to Pig for 
streaming processing. The proposal and initial discussions can be found at 
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3391) Issue with DataType- Long conversion in New AvroStorage()

2013-07-23 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717697#comment-13717697
 ] 

Pradeep Gollakota commented on PIG-3391:


I have a couple of quick questions:

1. Should the summary read DateTime \- Long instead of DataType\- Long?
2. What do you mean by datetime-long in the description?
3. Could you provide a link to the JIRA that you're talking about?

If the answer to #2 is epoch time, then I believe you are correct that timezone 
information will be lost, epoch time is always UTC.

 Issue with DataType- Long conversion in New AvroStorage()
 --

 Key: PIG-3391
 URL: https://issues.apache.org/jira/browse/PIG-3391
 Project: Pig
  Issue Type: Improvement
Reporter: Anup Ahire

 Shouldn't we loose the timezone information if we convert datetime to long ?
 After going through the jira for datetime, it appears that datetime-long 
 wasn't supported to avoid timezone information loss.
 Thanks !!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2495) Using merge JOIN from a HBaseStorage produces an error

2013-07-23 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717915#comment-13717915
 ] 

Pradeep Gollakota commented on PIG-2495:


Hi Kevin,

I have a very minor request for your patch. When throwing the RuntimeException, 
could you also include the class information for the given type? This could 
potentially be useful for debugging purposes.

 Using merge JOIN from a HBaseStorage produces an error
 --

 Key: PIG-2495
 URL: https://issues.apache.org/jira/browse/PIG-2495
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1, 0.9.2
 Environment: HBase 0.90.3, Hadoop 0.20-append
Reporter: Kevin Lion
Assignee: Kevin Lion
 Fix For: 0.12

 Attachments: PIG-2495.patch


 To increase performance of my computation, I would like to use a merge join 
 between two tables to increase speed computation but it produces an error.
 Here is the script:
 {noformat}
 start_sessions = LOAD 'hbase://startSession.bea00.dev.ubithere.com' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid meta:imei 
 meta:timestamp', '-loadKey') AS (sid:chararray, infoid:chararray, 
 imei:chararray, start:long);
 end_sessions = LOAD 'hbase://endSession.bea00.dev.ubithere.com' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:timestamp meta:locid', 
 '-loadKey') AS (sid:chararray, end:long, locid:chararray);
 sessions = JOIN start_sessions BY sid, end_sessions BY sid USING 'merge';
 STORE sessions INTO 'sessionsTest' USING PigStorage ('*');
 {noformat} 
 Here is the result of this script :
 {noformat}
 2012-01-30 16:12:43,920 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /root/pig_1327939963919.log
 2012-01-30 16:12:44,025 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://lxc233:9000
 2012-01-30 16:12:44,102 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: lxc233:9001
 2012-01-30 16:12:44,760 [main] INFO  
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: 
 MERGE_JION
 2012-01-30 16:12:44,923 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
 File concatenation threshold: 100 optimistic? false
 2012-01-30 16:12:44,982 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 2
 2012-01-30 16:12:44,982 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 2
 2012-01-30 16:12:45,001 [main] INFO  
 org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
 the job
 2012-01-30 16:12:45,006 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 environment:host.name=lxc233.machine.com
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 environment:java.version=1.6.0_22
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 environment:java.vendor=Sun Microsystems Inc.
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre
 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client 
 

[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

2013-07-17 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711038#comment-13711038
 ] 

Pradeep Gollakota commented on HBASE-3732:
--

Yes it does. I misread his comment the first time. I also found HBASE-5355 
which is exactly the thing that addresses my use case.

Thanks.

 New configuration option for client-side compression
 

 Key: HBASE-3732
 URL: https://issues.apache.org/jira/browse/HBASE-3732
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Attachments: compressed_streams.jar


 We have a case here where we have to store very fat cells (arrays of 
 integers) which can amount into the hundreds of KBs that we need to read 
 often, concurrently, and possibly keep in cache. Compressing the values on 
 the client using java.util.zip's Deflater before sending them to HBase proved 
 to be in our case almost an order of magnitude faster.
 There reasons are evident: less data sent to hbase, memstore contains 
 compressed data, block cache contains compressed data too, etc.
 I was thinking that it might be something useful to add to a family schema, 
 so that Put/Result do the conversion for you. The actual compression algo 
 should also be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3732) New configuration option for client-side compression

2013-07-16 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710623#comment-13710623
 ] 

Pradeep Gollakota commented on HBASE-3732:
--

I'd like to reopen discussion on this ticket. I have a slightly different use 
case that I'm considering for client side compression (sorry if this isn't the 
right forum for this question).

I have a scenario where clients are in a different network topology than the 
hbase cluster. The bandwidth between the clients and the cluster is limited. 
Since the client buffers writes, is there any mechanism in place for 
compressing the over the wire transfers?

 New configuration option for client-side compression
 

 Key: HBASE-3732
 URL: https://issues.apache.org/jira/browse/HBASE-3732
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Attachments: compressed_streams.jar


 We have a case here where we have to store very fat cells (arrays of 
 integers) which can amount into the hundreds of KBs that we need to read 
 often, concurrently, and possibly keep in cache. Compressing the values on 
 the client using java.util.zip's Deflater before sending them to HBase proved 
 to be in our case almost an order of magnitude faster.
 There reasons are evident: less data sent to hbase, memstore contains 
 compressed data, block cache contains compressed data too, etc.
 I was thinking that it might be something useful to add to a family schema, 
 so that Put/Result do the conversion for you. The actual compression algo 
 should also be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

2013-06-05 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676043#comment-13676043
 ] 

Pradeep Gollakota commented on ACCUMULO-391:


This would be a great addition.

We have just started working with Pig (with Accumulo) at my company. The first 
thing that we noticed is that in a lot of situations, where we are joining data 
from one Accumulo table to data from another, we have to first dump the data 
from both tables to HDFS (perhaps using PigStorage), load the data back and 
then join the data. This was because the scan information is encoded in the job 
configuration. So, when Pig uses the MultiInputFormat to scan both tables in 
the same job, only one table ends up getting exported from Accumulo.

If this is completed, we could use the MultiTableInputFormat instead of 
Accumulo(Row)InputFormat to optimize our pig scripts.

Any thoughts on when this would be included?

 Multi-table Accumulo input format
 -

 Key: ACCUMULO-391
 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
 Project: Accumulo
  Issue Type: New Feature
Reporter: John Vines
Assignee: William Slacum
Priority: Minor
  Labels: mapreduce,
 Attachments: multi-table-if.patch, new-multitable-if.patch


 Just realized we had no MR input method which supports multiple Tables for an 
 input format. I would see it making the table the mapper's key and making the 
 Key/Value a tuple, or alternatively have the Table/Key be the key tuple and 
 stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

2013-06-05 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676290#comment-13676290
 ] 

Pradeep Gollakota commented on ACCUMULO-391:


I'm also available to help with this task if needed.

 Multi-table Accumulo input format
 -

 Key: ACCUMULO-391
 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
 Project: Accumulo
  Issue Type: New Feature
Reporter: John Vines
Assignee: William Slacum
Priority: Minor
  Labels: mapreduce,
 Fix For: 1.6.0

 Attachments: multi-table-if.patch, new-multitable-if.patch


 Just realized we had no MR input method which supports multiple Tables for an 
 input format. I would see it making the table the mapper's key and making the 
 Key/Value a tuple, or alternatively have the Table/Key be the key tuple and 
 stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (JENA-402) Move etc/*.rules to src/main/resources/etc/*.rules

2013-02-24 Thread Pradeep Gollakota (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Gollakota updated JENA-402:
---

Attachment: JENA-402-1.patch

Moved *.rules from jena-core/etc/ to jena-core/src/main/resources/etc and 
Removed reference to etc from POM.

Please review

 Move etc/*.rules to src/main/resources/etc/*.rules
 --

 Key: JENA-402
 URL: https://issues.apache.org/jira/browse/JENA-402
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Priority: Minor
 Attachments: JENA-402-1.patch


 If we move the rules files to src/main/resources, we can drop the specific 
 mention of etc in POM.
 This will also make use of Eclipse-linked projects work because at the 
 moment, there are kludges to make the tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-402) Move etc/*.rules to src/main/resources/etc/*.rules

2013-02-23 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585287#comment-13585287
 ] 

Pradeep Gollakota commented on JENA-402:


This appears to be complete. Should this be closed?

 Move etc/*.rules to src/main/resources/etc/*.rules
 --

 Key: JENA-402
 URL: https://issues.apache.org/jira/browse/JENA-402
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Priority: Minor

 If we move the rules files to src/main/resources, we can drop the specific 
 mention of etc in POM.
 This will also make use of Eclipse-linked projects work because at the 
 moment, there are kludges to make the tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-285) Release Giraph-0.2

2013-02-20 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582405#comment-13582405
 ] 

Pradeep Gollakota commented on GIRAPH-285:
--

Any progress on this guys? The 0.2 version of Giraph is MUCH better to work 
with and it would be great to have a stable released version of this soon 
(especially considering 0.1 was released last February).

 Release Giraph-0.2
 --

 Key: GIRAPH-285
 URL: https://issues.apache.org/jira/browse/GIRAPH-285
 Project: Giraph
  Issue Type: Task
Reporter: Avery Ching
Assignee: Avery Ching

 I think it's time to do this.  Trunk is moving fast and we need to provide 
 something for users that is fixed.  Giraph has already progressed a lot from 
 0.1.  Jakob, can you please share your notes on releasing 0.1?  I'd really 
 appreciate having them as a way to get started.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (JENA-228) Limiting query output centrally

2013-01-27 Thread Pradeep Gollakota (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Gollakota updated JENA-228:
---

Attachment: JENA-228-1.patch

Submitting an initial patch. I chose to intercept the the query in 
QueryEngineBase after it's been compiled to the Algebra but before the 
optimization. Added a few unit test cases to validate that the outer most Op is 
always an OpSlice (not sure of performance implications).

 Limiting query output centrally
 ---

 Key: JENA-228
 URL: https://issues.apache.org/jira/browse/JENA-228
 Project: Apache Jena
  Issue Type: New Feature
  Components: ARQ, Fuseki
Affects Versions: ARQ 2.9.0, Fuseki 0.2.1
Reporter: Giuseppe Sollazzo
 Attachments: JENA-228-1.patch


 I was wondering whether there will be some way of limiting output in fuseki. 
 Basically, I'd like to be able to enforce limits on the number of results 
 returned by the system.
 As an example, think about a numrows in sql.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-228) Limiting query output centrally

2012-12-25 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539448#comment-13539448
 ] 

Pradeep Gollakota commented on JENA-228:


I'd like to start working on this if I may. But, I have no idea where to start, 
any guidance would be appreciated.

 Limiting query output centrally
 ---

 Key: JENA-228
 URL: https://issues.apache.org/jira/browse/JENA-228
 Project: Apache Jena
  Issue Type: New Feature
  Components: ARQ, Fuseki
Affects Versions: ARQ 2.9.0, Fuseki 0.2.1
Reporter: Giuseppe Sollazzo

 I was wondering whether there will be some way of limiting output in fuseki. 
 Basically, I'd like to be able to enforce limits on the number of results 
 returned by the system.
 As an example, think about a numrows in sql.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-575) AvroOutputFormat doesn't work for map-only jobs if only the map output schema has been set

2012-12-25 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539460#comment-13539460
 ] 

Pradeep Gollakota commented on AVRO-575:


This JIRA seems to be OBE. The patch attached is no longer applicable and the 
AvroOutputFormat is already handling map-only jobs.

 AvroOutputFormat doesn't work for map-only jobs if only the map output schema 
 has been set
 --

 Key: AVRO-575
 URL: https://issues.apache.org/jira/browse/AVRO-575
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Tom White
 Attachments: AVRO-575.patch


 AvroOutputFormat should use AvroJob.MAP_OUTPUT_SCHEMA for map-only jobs if 
 AvroJob.OUTPUT_SCHEMA has not been set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1180) Broken links on Code Review Checklist page on confluence

2012-10-18 Thread Pradeep Gollakota (JIRA)
Pradeep Gollakota created AVRO-1180:
---

 Summary: Broken links on Code Review Checklist page on confluence
 Key: AVRO-1180
 URL: https://issues.apache.org/jira/browse/AVRO-1180
 Project: Avro
  Issue Type: Task
Reporter: Pradeep Gollakota
Priority: Trivial


The [Code Review 
Checklist|https://cwiki.apache.org/confluence/display/AVRO/Code+Review+Checklist]
 has two broken links.

The link referencing Sun's code conventions points to 
http://java.sun.com/docs/codeconv/
This link should be updated to (I'm guessing) 
http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html

The link referencing Log4j Level's is pointing to 
http://logging.apache.org/log4j/docs/api/org/apache/log4j/Level.html
This should be updated to 
https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (ACCUMULO-736) Add Column Pagination Filter

2012-08-22 Thread Pradeep Gollakota (JIRA)
Pradeep Gollakota created ACCUMULO-736:
--

 Summary: Add Column Pagination Filter
 Key: ACCUMULO-736
 URL: https://issues.apache.org/jira/browse/ACCUMULO-736
 Project: Accumulo
  Issue Type: Bug
  Components: client
Reporter: Pradeep Gollakota
Assignee: Billie Rinaldi


Client application may need to perform pagination of data depending on the 
number of columns returned. This would be more efficient if the database itself 
handled the pagination.

Similar to https://issues.apache.org/jira/browse/HBASE-2438

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ACCUMULO-736) Add Column Pagination Filter

2012-08-22 Thread Pradeep Gollakota (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Gollakota updated ACCUMULO-736:
---

Issue Type: Wish  (was: Bug)

 Add Column Pagination Filter
 

 Key: ACCUMULO-736
 URL: https://issues.apache.org/jira/browse/ACCUMULO-736
 Project: Accumulo
  Issue Type: Wish
  Components: client
Reporter: Pradeep Gollakota
Assignee: Billie Rinaldi

 Client application may need to perform pagination of data depending on the 
 number of columns returned. This would be more efficient if the database 
 itself handled the pagination.
 Similar to https://issues.apache.org/jira/browse/HBASE-2438

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ACCUMULO-736) Add Column Pagination Filter

2012-08-22 Thread Pradeep Gollakota (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440033#comment-13440033
 ] 

Pradeep Gollakota commented on ACCUMULO-736:


I myself have extremely limited knowledge of the HBase API. I provided the link 
as a way of including relevant discussions.

The reason I'm requesting this feature is for network optimization. Please 
correct me if my understanding of the Accumulo API is not correct. Scanner 
returns the data in KV pairs via a Java Iterator. However, the data itself is 
returned from the server to the Scanner in batches (of size 1000 by default). 
So, if I'm looking for columns (n, n+k) from a row, the only way the client can 
filter the correct range is by retrieving n+k KV pairs. For large values of n, 
this can cause a lot of network overhead. If we can page the data server side 
and return only the relevant data over the network, it would be more optimized.

My initial attempt at this problem would probably be an Iterator/Filter. 
However, if this can become a part of the Scanner API, it would become more 
natural to work with it.

 Add Column Pagination Filter
 

 Key: ACCUMULO-736
 URL: https://issues.apache.org/jira/browse/ACCUMULO-736
 Project: Accumulo
  Issue Type: Wish
  Components: client
Reporter: Pradeep Gollakota
Assignee: Billie Rinaldi

 Client application may need to perform pagination of data depending on the 
 number of columns returned. This would be more efficient if the database 
 itself handled the pagination.
 Similar to https://issues.apache.org/jira/browse/HBASE-2438

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira