[ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763439#action_12763439
 ] 

Hong Tang commented on PIG-992:
-------------------------------

Comments:
- In many places, both types.ParseException and schema.ParseException are 
thrown. Do you really want both?
- In the following
{noformat}
+public enum ColumnType implements Writable {
{noformat}
Is the Writable interface actually used? You have rather odd pattern of 
asymmetric readFields and write:
{noformat}
+  @Override
+  public void readFields(DataInput in) throws IOException {
+    // no op, instantiated by the caller
+  }
+
+  @Override
+  public void write(DataOutput out) throws IOException {
+    Utils.writeString(out, name);
+  }
{noformat}
- In the following code
{noformat}
+  public static class ColumnSchema {
+    public String name;
+    public ColumnType type;
+    public Schema schema;
+    public int index; // field index in schema
{noformat}
Exposing fields as all-public seems like a bad idea.
- Is there a specific usage case to allow schema to be mutable at any time? 
(minor nit: the comment says add a field, but the code seems to add a column to 
the schema).
{noformat}
+  /**
+   * add a field
+   */
+  public void add(ColumnSchema f) throws ParseException
+  {
+    add(f, false);
+  }
{noformat}
- Why Schema.equals(Object) is not implemented on top of the static version of 
the method (or vice versa)?
- In Schema.readFields(), the Version string from the input is not checked for 
compatibility.
- In the following
{noformat}
+  private void init(String[] columnNames) throws ParseException {
+    // the arg must be of type or they will be treated as the default type
+    // TODO: verify column names don't contain COLUMN_DELIMITER
{noformat}
It seems that the TODO should not involve too much work and please consider not 
deferring it later.
- Need more detailed documentation on the spec of the parameter for 
Schema.getColumnSchema(String name)
{noformat}
+  /**
+   * Get a column's schema
+   */
+  public ColumnSchema getColumnSchema(String name) throws ParseException
+  {
{noformat}
- Schema.getColumnSchemaOnParsedName and Schema.getColumnSchema seems to be 
copy/paste code.
- Schema.getColumnSchema(ParsedName pn) has side effect of modifying the 
parameter pn. The javadoc reads cryptic to me.
- There are many classes generated by JavaCC. It is probably better not 
including them in the patch (and put the generated source under build/src).

Other minor issues:
- Typically contrib projects should use the version string as the parent 
project.
- Style: there are some very long lines.
 - There are a few white space changes. That should be avoided if possible.
- In the following
{noformat}
+    } catch (org.apache.hadoop.zebra.schema.ParseException e) {
+      throw new AssertionError("Invalid Projection: "+e.getMessage());
{noformat}
consider change AssertionError to IllegalArgumentException.
- In the following:
{noformat}
+  /*
+   * helper class to parse a column name string one section at a time and find 
the required
+   * type for the parsed part.
+   */
+  public static class ParsedName {
+    public String mName;
+    int mKeyOffset; // the offset where the keysstring starts
+    public ColumnType mDT = ColumnType.ANY; // parent's type
{noformat}
The description seems to indicate that this should not be a public class. I 
tried to understand the body of the class and do not feel that it serves a 
general purpose.
- The following seems like useless assignment:
{noformat}
+  private long mVersion = schemaVersion;
{noformat}
- {noformat}
  /**
+   * Normalize the schema string.
+   * 
+   * @param value
+   *          the input string representation of the schema.
+   * @return the normalized string representation.
+   */
+  public static String normalize(String value) {
+    String result = new String();
+
+    if (value == null || value.trim().isEmpty())
+      return result;
+
+    StringBuilder sb = new StringBuilder();
+    String[] parts = value.trim().split(COLUMN_DELIMITER);
+    for (int nx = 0; nx < parts.length; nx++) {
+      if (nx > 0) sb.append(COLUMN_DELIMITER);
+      sb.append(parts[nx].trim());
+    }
+    return sb.toString();
+  }

{noformat}
There is a wasted value.trim().
- In Schema.equals(Object), instead of comparing class equality, using 
"instanceof" is typically better.
- Use StringBuilder instead in the following code:
{noformat}
+    String merged = new String();
+    for (int i = 0; i < columnNames.length; i++) {
+      if (i > 0) merged += ",";
+      merged += columnNames[i];
+    }
{noformat}
- There are a few indentation problems.

> [zebra] Separate Schema-related files into a "Schema" package
> -------------------------------------------------------------
>
>                 Key: PIG-992
>                 URL: https://issues.apache.org/jira/browse/PIG-992
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to