[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-15 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Status: Open  (was: Patch Available)

 [Zebra] No type check when we write to the basic table
 --

 Key: PIG-1351
 URL: https://issues.apache.org/jira/browse/PIG-1351
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1351.patch, PIG-1351.patch


 In Zebra, we do not have any type check when writing to a basic table. 
 Say, we have a schema: f1:int, f2:string,
 however we can write a tuple (abc, 123) without any problem, which is 
 definitely not desirable.
 To overcome this problem, we decide to perform certain amount of type 
 checking in Zebra - We check the first row only for each writer.
 This only serves as a sanity check purpose in cases where users screw up 
 specifying the output schema. We do NOT perform a rigorous type checking for 
 all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-15 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Attachment: (was: PIG-1351.patch)

 [Zebra] No type check when we write to the basic table
 --

 Key: PIG-1351
 URL: https://issues.apache.org/jira/browse/PIG-1351
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1351.patch


 In Zebra, we do not have any type check when writing to a basic table. 
 Say, we have a schema: f1:int, f2:string,
 however we can write a tuple (abc, 123) without any problem, which is 
 definitely not desirable.
 To overcome this problem, we decide to perform certain amount of type 
 checking in Zebra - We check the first row only for each writer.
 This only serves as a sanity check purpose in cases where users screw up 
 specifying the output schema. We do NOT perform a rigorous type checking for 
 all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-15 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Attachment: (was: PIG-1351.patch)

 [Zebra] No type check when we write to the basic table
 --

 Key: PIG-1351
 URL: https://issues.apache.org/jira/browse/PIG-1351
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1351.patch


 In Zebra, we do not have any type check when writing to a basic table. 
 Say, we have a schema: f1:int, f2:string,
 however we can write a tuple (abc, 123) without any problem, which is 
 definitely not desirable.
 To overcome this problem, we decide to perform certain amount of type 
 checking in Zebra - We check the first row only for each writer.
 This only serves as a sanity check purpose in cases where users screw up 
 specifying the output schema. We do NOT perform a rigorous type checking for 
 all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-15 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Attachment: PIG-1351.patch

 [Zebra] No type check when we write to the basic table
 --

 Key: PIG-1351
 URL: https://issues.apache.org/jira/browse/PIG-1351
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1351.patch


 In Zebra, we do not have any type check when writing to a basic table. 
 Say, we have a schema: f1:int, f2:string,
 however we can write a tuple (abc, 123) without any problem, which is 
 definitely not desirable.
 To overcome this problem, we decide to perform certain amount of type 
 checking in Zebra - We check the first row only for each writer.
 This only serves as a sanity check purpose in cases where users screw up 
 specifying the output schema. We do NOT perform a rigorous type checking for 
 all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-16 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1375:
---

Attachment: PIG-1375.patch

 [Zebra] To support writing multiple Zebra tables through Pig
 

 Key: PIG-1375
 URL: https://issues.apache.org/jira/browse/PIG-1375
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1375.patch, PIG-1375.patch


 In Zebra, we already have multiple outputs support for map/reduce.  But we do 
 not support this feature if users use Zebra through Pig.
 This jira is to address this issue. We plan to support writing to multiple 
 output tables through Pig as well.
 We propose to support the following Pig store statements with multiple 
 outputs:
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class', 'some arguments to partition 
 class'); /* if certain partition class arguments is needed */
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class'); /* if no partition class 
 arguments is needed */
 Note that users need to specify up to three arguments - storage hint string, 
 complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-16 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1375:
---

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0
   (was: 0.8.0)

 [Zebra] To support writing multiple Zebra tables through Pig
 

 Key: PIG-1375
 URL: https://issues.apache.org/jira/browse/PIG-1375
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1375.patch, PIG-1375.patch


 In Zebra, we already have multiple outputs support for map/reduce.  But we do 
 not support this feature if users use Zebra through Pig.
 This jira is to address this issue. We plan to support writing to multiple 
 output tables through Pig as well.
 We propose to support the following Pig store statements with multiple 
 outputs:
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class', 'some arguments to partition 
 class'); /* if certain partition class arguments is needed */
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class'); /* if no partition class 
 arguments is needed */
 Note that users need to specify up to three arguments - storage hint string, 
 complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-19 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1375:
---

Attachment: PIG-1375.patch

Thank Xuefu for the feedback.

Updated the patch to incorporate in comment 2 and 4.
For comment 1) The indentation change is only incidental to make some files 
(impacted by this feature) to follow Zebra's tab policy - space of width two.
For comment 3) The flag idea needs to be justified by further performance 
profiling work. The check here should be trivial compared with other operations 
such as generateKey() and insert().


 

 [Zebra] To support writing multiple Zebra tables through Pig
 

 Key: PIG-1375
 URL: https://issues.apache.org/jira/browse/PIG-1375
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch


 In Zebra, we already have multiple outputs support for map/reduce.  But we do 
 not support this feature if users use Zebra through Pig.
 This jira is to address this issue. We plan to support writing to multiple 
 output tables through Pig as well.
 We propose to support the following Pig store statements with multiple 
 outputs:
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class', 'some arguments to partition 
 class'); /* if certain partition class arguments is needed */
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class'); /* if no partition class 
 arguments is needed */
 Note that users need to specify up to three arguments - storage hint string, 
 complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1342:
---

Status: Patch Available  (was: Open)

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-21 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1342:
---

Attachment: PIG-1342.patch

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-21 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859523#action_12859523
 ] 

Chao Wang commented on PIG-1342:


Rebase the patch against the latest trunk.

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-22 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1342:
---

Status: Open  (was: Patch Available)

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-22 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1342:
---

Status: Patch Available  (was: Open)

From the test result log, it looks like the testcase TestFinish failed.

I manually ran this test case against the Pig trunk + patch, and it passed.  
Seems it's env issue and resubmit the patch.

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    1   2