[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-06 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Status: Open  (was: Patch Available)

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-06 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Status: Patch Available  (was: Open)

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-04 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1104:
--

Status: Patch Available  (was: Open)

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-04 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Attachment: PIG-1104.patch

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1104.patch, PIG1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-04 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Attachment: (was: PIG1104.patch)

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-04 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Status: Open  (was: Patch Available)

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-03 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Fix Version/s: 0.7.0

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1104) [zebra] Provide streaming support in Zebra.

2009-12-03 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1104:
---

Attachment: PIG1104.patch

 [zebra] Provide streaming support in Zebra.
 ---

 Key: PIG-1104
 URL: https://issues.apache.org/jira/browse/PIG-1104
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG1104.patch


 Hadoop streaming is very popular among Hadoop users. The main attraction is 
 the simplicity of use. A user can write the application logic in any language 
 and process large amounts of data using Hadoop framework. As more people 
 start to use Zebra to store their data, we expect users would like to run 
 Hadoop streaming scripts to easily process Zebra tables. 
 The following lists a simple example of using Hadoop streaming to access 
 Zebra data. It loads data from foo table using Zebra's TableInputFormat and 
 then writes the data into output using default TextOutputFormat. 
 $ hadoop jar hadoop-streaming.jar -D mapred.reduce.tasks=0 -input foo -output 
 output -mapper 'cat' -inputformat 
 org.apache.hadoop.zebra.mapred.TableInputFormat 
 More detailed, Zebra uses Pig DefaultTuple implementation of Tuple for its 
 records. Currently, when Zebra's TableInputFormat is used for input, the user 
 script sees each line containing  key_if_any\tTuple.toString() . We plan to 
 generate CSV format representation of our Pig tuples. To this end, we plan to 
 do the following: 
 1) Derive a sub class ZupleTuple from pig's DefaultTuple class and override 
 its toString() method to present the data into CSV format. 
 2) On Zebra side, the tuple factory should be changed to create ZebraTuple 
 objects, instead of DefaultTuple objects. 
 Note that we can only support streaming on the input side - ability to use 
 streaming to read data from Zebra tables. For the output side, the streaming 
 support is not feasible, since the streaming mapper or reducer only emits 
 Text\tText, the output collector has no way of knowing how to convert this 
 to (BytesWritable,Tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.