[jira] [Updated] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce

2019-02-18 Thread Kevin J. Price (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin J. Price updated PIG-5377:

Attachment: PIG-5377-2.patch

> Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce
> -
>
> Key: PIG-5377
> URL: https://issues.apache.org/jira/browse/PIG-5377
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs, piggybank
>Reporter: Kevin J. Price
>Assignee: Kevin J. Price
>Priority: Minor
> Attachments: PIG-5377-2.patch, PIG-5377.patch
>
>
> Now that we're running on JDK8 and can have default implementations in 
> interfaces, we can move supportsParallelWriteToStoreLocation() to the 
> StoreFuncInterface class and properly set it on the supported built-in 
> functions rather than having a static list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce

2019-01-14 Thread Kevin J. Price (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin J. Price updated PIG-5377:

Attachment: PIG-5377.patch
Status: Patch Available  (was: Open)

> Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce
> -
>
> Key: PIG-5377
> URL: https://issues.apache.org/jira/browse/PIG-5377
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs, piggybank
>Reporter: Kevin J. Price
>Assignee: Kevin J. Price
>Priority: Minor
> Attachments: PIG-5377.patch
>
>
> Now that we're running on JDK8 and can have default implementations in 
> interfaces, we can move supportsParallelWriteToStoreLocation() to the 
> StoreFuncInterface class and properly set it on the supported built-in 
> functions rather than having a static list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce

2019-01-14 Thread Kevin J. Price (JIRA)
Kevin J. Price created PIG-5377:
---

 Summary: Move supportsParallelWriteToStoreLocation from StoreFunc 
to StoreFuncInterfce
 Key: PIG-5377
 URL: https://issues.apache.org/jira/browse/PIG-5377
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs, piggybank
Reporter: Kevin J. Price
Assignee: Kevin J. Price


Now that we're running on JDK8 and can have default implementations in 
interfaces, we can move supportsParallelWriteToStoreLocation() to the 
StoreFuncInterface class and properly set it on the supported built-in 
functions rather than having a static list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-4608) FOREACH ... UPDATE

2018-01-16 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327736#comment-16327736
 ] 

Kevin J. Price commented on PIG-4608:
-

This is Yahoo-centric, but would it be possible to grep our logs for existing 
pig jobs and see how many of them have keyword conflicts with 'update', 
'delete', 'drop', etc? I'm indifferent on 'delete' versus 'drop', but it'd be 
interesting to know which one would impact fewer existing scripts.

As for the 'update val AS col' versus 'update col BY val', I think the former 
looks less confusing to a current pig user. 'BY' only gets used currently for 
key ordering in groups and joins, whereas 'AS' is already used for value 
assignment. I agree that there's a difference between 'GENERATE val AS col' and 
'UPDATE val AS col', but it's a fairly philosophical difference from the user's 
perspective. In both cases, they want col to have the value val after the 
statement, so having the same syntax makes sense.

> FOREACH ... UPDATE
> --
>
> Key: PIG-4608
> URL: https://issues.apache.org/jira/browse/PIG-4608
> Project: Pig
>  Issue Type: New Feature
>Reporter: Haley Thrapp
>Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIG-3000) Optimize nested foreach

2016-03-21 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204921#comment-15204921
 ] 

Kevin J. Price commented on PIG-3000:
-

Did this patch just get dropped? This is still a serious problem.

> Optimize nested foreach
> ---
>
> Key: PIG-3000
> URL: https://issues.apache.org/jira/browse/PIG-3000
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>Assignee: Mona Chitnis
> Attachments: PIG-3000-6.patch, unit_tests.patch
>
>
> In this Pig script:
> {code}
> A = load 'data' as (a:chararray);
> B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
> ? 1 : 0); }
> {code}
> The Eval function UPPER is called twice for each record.
> This should be optimized so that the UPPER is called only once for each record



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4608) FOREACH ... UPDATE

2015-06-22 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596462#comment-14596462
 ] 

Kevin J. Price commented on PIG-4608:
-

Several of us actually discussed this at some length, and didn't think it was 
worth differentiating between modified columns and appended columns in the 
command. Two ideas we had:
# A token, like you have, indicating that the remaining fields are being added. 
We were considering using an 'ADD' keyword. As in:
{code}
updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6 ADD f1+f2 AS new_sum;
{code}
# Separate statements for 'strict' versus 'non-strict' mode. e.g., for updating 
with appending you would use
{code}
updated = FOREACH three_numbers UPDATE_STRICT 3 AS f3, 6 AS f6;
{code}
and for updating with appending, you could use
{code}
updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6, f1+f2 AS new_sum;
{code}

However, our overall view from writing pig scripts is that chances are very few 
people would ever want to use the strict mode, nor did we see much value in 
having the extra token (ADD or ...) separating out appended columns. From a 
programming viewpoint, it just makes more logical sense to us to view it as an 
implicit update or add construct.

 FOREACH ... UPDATE
 --

 Key: PIG-4608
 URL: https://issues.apache.org/jira/browse/PIG-4608
 Project: Pig
  Issue Type: New Feature
Reporter: Haley Thrapp

 I would like to propose a new command in Pig, FOREACH...UPDATE.
 Syntactically, it would look much like FOREACH … GENERATE.
 Example:
 Input data:
 (1,2,3)
 (2,3,4)
 (3,4,5)
 -- Load the data
 three_numbers = LOAD 'input_data'
 USING PigStorage()
 AS (f1:int, f2:int, f3:int);
 -- Sum up the row
 updated = FOREACH three_numbers UPDATE
 5 as f1,
 f1+f2 as new_sum
 ;
 Dump updated;
 (5,2,3,3)
 (5,3,4,5)
 (5,4,5,7)
 Fields to update must be specified by alias. Any fields in the UPDATE that do 
 not match an existing field will be appended to the end of the tuple.
 This command is particularly desirable in scripts that deal with a large 
 number of fields (in the 20-200 range). Often, we need to only make 
 modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
 developer to focus on the actual logical changes instead of having to list 
 all of the fields that are also being passed through.
 My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
 this can be done with changes to the parser and the creation of a new 
 LOUpdate. No physical plan changes should be needed because we will leverage 
 what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work

2015-02-25 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336632#comment-14336632
 ] 

Kevin J. Price commented on PIG-4433:
-

Thanks, Daniel! Will do.

 Loading bigdecimal in nested tuple does not work
 

 Key: PIG-4433
 URL: https://issues.apache.org/jira/browse/PIG-4433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0, 0.14.1, 0.15.0
Reporter: Kevin J. Price
Assignee: Kevin J. Price
 Fix For: 0.15.0

 Attachments: PIG-4433-1.patch


 The parsing of BigDecimal data types in a nested tuple, as implemented by 
 Utf8StorageConverter.java, does not work. There's a break; missing from a 
 switch statement.
 Code example that demonstrates the problem:
 === input.txt ===
 (17,1234567890.0987654321)
 === pig_script ===:
 inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal));
 STORE inp INTO 'output';
 === output ===
 (17,)
 With patch, the output becomes the expected:
 (17,1234567890.0987654321)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work

2015-02-24 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335375#comment-14335375
 ] 

Kevin J. Price commented on PIG-4433:
-

Pull request created on github: https://github.com/apache/pig/pull/16

 Loading bigdecimal in nested tuple does not work
 

 Key: PIG-4433
 URL: https://issues.apache.org/jira/browse/PIG-4433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0, 0.14.1, 0.15.0
Reporter: Kevin J. Price
 Fix For: 0.14.1, 0.15.0


 The parsing of BigDecimal data types in a nested tuple, as implemented by 
 Utf8StorageConverter.java, does not work. There's a break; missing from a 
 switch statement.
 Code example that demonstrates the problem:
 === input.txt ===
 (17,1234567890.0987654321)
 === pig_script ===:
 inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal));
 STORE inp INTO 'output';
 === output ===
 (17,)
 With patch, the output becomes the expected:
 (17,1234567890.0987654321)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4433) Loading bigdecimal in nested tuple does not work

2015-02-24 Thread Kevin J. Price (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin J. Price updated PIG-4433:

Status: Patch Available  (was: Open)

diff --git a/src/org/apache/pig/builtin/Utf8StorageConverter.java 
b/src/org/apache/pig/builtin/Utf8StorageConverter.java
index 814c746..1b905e2 100644
--- a/src/org/apache/pig/builtin/Utf8StorageConverter.java
+++ b/src/org/apache/pig/builtin/Utf8StorageConverter.java
@@ -315,6 +315,7 @@ public class Utf8StorageConverter implements 
LoadStoreCaster {
 break;
 case DataType.BIGDECIMAL:
 field = bytesToBigDecimal(b);
+break;
 case DataType.DATETIME:
 field = bytesToDateTime(b);
 break;
diff --git a/test/org/apache/pig/builtin/TestUtf8StorageConverter.java 
b/test/org/apache/pig/builtin/TestUtf8StorageConverter.java
new file mode 100644
index 000..8cc9e55
--- /dev/null
+++ b/test/org/apache/pig/builtin/TestUtf8StorageConverter.java
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.pig.builtin;
+
+import static org.junit.Assert.assertEquals;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.pig.ResourceSchema;
+import org.apache.pig.ResourceSchema.ResourceFieldSchema;
+import org.apache.pig.data.DataByteArray;
+import org.apache.pig.data.Tuple;
+import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;
+import org.apache.pig.impl.util.Utils;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.junit.Test;
+
+public class TestUtf8StorageConverter {
+
+@Test
+/* Test that the simple data types convert properly in a tuple context */
+public void testSimpleTypes() throws Exception {
+Utf8StorageConverter converter = new Utf8StorageConverter();
+String schemaString = a:int, b:long, c:float, d:double, e:chararray, 
f:bytearray, g:boolean, h:biginteger, i:bigdecimal, j:datetime;
+String dataString = 
(1,2,3.0,4.0,five,6,true,12345678901234567890,1234567890.0987654321,2007-04-05T14:30Z);
+
+ResourceSchema.ResourceFieldSchema rfs = new ResourceFieldSchema(new 
FieldSchema(schema, Utils.getSchemaFromString(schemaString)));
+Tuple result = converter.bytesToTuple(dataString.getBytes(), rfs);
+assertEquals(10, result.size());
+assertEquals(new Integer(1), result.get(0));
+assertEquals(new Long(2L), result.get(1));
+assertEquals(new Float(3.0f), result.get(2));
+assertEquals(new Double(4.0), result.get(3));
+assertEquals(five, result.get(4));
+assertEquals(new DataByteArray(new byte[] { (byte) '6' }), 
result.get(5));
+assertEquals(new Boolean(true), result.get(6));
+assertEquals(new BigInteger(12345678901234567890), result.get(7));
+assertEquals(new BigDecimal(1234567890.0987654321), result.get(8));
+assertEquals(new DateTime(2007-04-05T14:30Z, DateTimeZone.UTC), 
result.get(9));
+}
+}


 Loading bigdecimal in nested tuple does not work
 

 Key: PIG-4433
 URL: https://issues.apache.org/jira/browse/PIG-4433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0, 0.14.1, 0.15.0
Reporter: Kevin J. Price
 Fix For: 0.14.1, 0.15.0


 The parsing of BigDecimal data types in a nested tuple, as implemented by 
 Utf8StorageConverter.java, does not work. There's a break; missing from a 
 switch statement.
 Code example that demonstrates the problem:
 === input.txt ===
 (17,1234567890.0987654321)
 === pig_script ===:
 inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal));
 STORE inp INTO 'output';
 === output ===
 (17,)
 With patch, the output becomes the expected:
 (17,1234567890.0987654321)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4433) Loading bigdecimal in nested tuple does not work

2015-02-24 Thread Kevin J. Price (JIRA)
Kevin J. Price created PIG-4433:
---

 Summary: Loading bigdecimal in nested tuple does not work
 Key: PIG-4433
 URL: https://issues.apache.org/jira/browse/PIG-4433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0, 0.14.1, 0.15.0
Reporter: Kevin J. Price
 Fix For: 0.14.1, 0.15.0


The parsing of BigDecimal data types in a nested tuple, as implemented by 
Utf8StorageConverter.java, does not work. There's a break; missing from a 
switch statement.

Code example that demonstrates the problem:

=== input.txt ===
(17,1234567890.0987654321)

=== pig_script ===:
inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal));
STORE inp INTO 'output';

=== output ===
(17,)


With patch, the output becomes the expected:
(17,1234567890.0987654321)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-2046) Properties defined through 'SET' are not passed through to fs commands

2011-05-09 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030769#comment-13030769
 ] 

Kevin J. Price commented on PIG-2046:
-

Odd.  It definitely works correctly if you set up a 
pig-cluster-hadoop-site.xml file in a conf directory and include it on the 
class path using -cp.  That's the workaround I'm using right now.

 Properties defined through 'SET' are not passed through to fs commands
 --

 Key: PIG-2046
 URL: https://issues.apache.org/jira/browse/PIG-2046
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Vivek Padmanabhan

 The properties which are set through 'SET' commands are not passed through to 
 FS commands.
 Ex;
 SET dfs.umaskmode '026'
 fs -touchz umasktest/file0
 It looks like the SET commands are processed by GruntParser after the FsShell 
 creation happens with current set of properties. Hence whatever properties 
 defined in SET will not be reflected for fs commands executed in the script.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira