[jira] [Commented] (PIG-5305) Enable yarn-client mode execution of tests in Spark (1) mode

2017-09-22 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176186#comment-16176186
 ] 

Adam Szita commented on PIG-5305:
-

[~kellyzly] do you think this is ready for commit now?

> Enable yarn-client mode execution of tests in Spark (1) mode
> 
>
> Key: PIG-5305
> URL: https://issues.apache.org/jira/browse/PIG-5305
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Attachments: PIG-5305.0.patch, PIG-5305.1.patch, PIG-5305.2.patch
>
>
> See parent jira (PIG-5305) for problem description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)

2017-09-22 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176067#comment-16176067
 ] 

Nandor Kollar commented on PIG-5271:


[~knoguchi] I think you forgot to commit TEZC-Union-22.gld from this patch! 
Could you please commit this file too?

> StackOverflowError when compiling in Tez mode (with union and replicated join)
> --
>
> Key: PIG-5271
> URL: https://issues.apache.org/jira/browse/PIG-5271
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.18.0
>
> Attachments: pig-5271-v01.patch, pig-5271-v02.patch, 
> pig-5271-v03.patch
>
>
> Sample script
> {code}
> a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float);
> a4_1 = filter a4 by gpa is null or gpa >= 3.9;
> a4_2 = filter a4 by gpa < 1;
> b4 = union a4_1, a4_2;
> b4_1 = filter b4 by age < 30;
> b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa;
> c4 = load 'voternulltab10k' as (name, age, registration, contributions);
> d4 = join b4_2 by name, c4 by name using 'replicated';
> e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, 
> registration, contributions;
> f4 = order e4 by name, age DESC;
> store f4 into 'tmp_table_4' ;
> a5_1 = filter a4 by gpa is null or gpa <= 3.9;
> a5_2 = filter a4 by gpa < 2;
> b5 = union a5_1, a5_2;
> d5 = join c4 by name, b5 by name using 'replicated';
> store d5 into 'tmp_table_5' ;
> {code}
> This script fails to compile with StackOverflowError.
> {noformat}
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError
> at java.lang.reflect.Constructor.newInstance(Constructor.java:415)
> at java.lang.Class.newInstance(Class.java:442)
> at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to stable : Pig-trunk-commit #2540

2017-09-22 Thread Apache Jenkins Server
See 




[jira] [Updated] (PIG-5306) REGEX_EXTRACT() logs every line that doesn't match

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5306:
-
Status: Patch Available  (was: Open)

> REGEX_EXTRACT() logs every line that doesn't match
> --
>
> Key: PIG-5306
> URL: https://issues.apache.org/jira/browse/PIG-5306
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: PIG-5306-1.patch
>
>
> Pig logs a warning message for every call that doesn't doesn't match a 
> capture group. The documentation only says this case returns NULL. From a 
> developer standpoint, the messages are unlikely to be useful.
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java#L107



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5306) REGEX_EXTRACT() logs every line that doesn't match

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-5306:
-
Attachment: PIG-5306-1.patch

> REGEX_EXTRACT() logs every line that doesn't match
> --
>
> Key: PIG-5306
> URL: https://issues.apache.org/jira/browse/PIG-5306
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: PIG-5306-1.patch
>
>
> Pig logs a warning message for every call that doesn't doesn't match a 
> capture group. The documentation only says this case returns NULL. From a 
> developer standpoint, the messages are unlikely to be useful.
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java#L107



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PIG-5306) REGEX_EXTRACT() logs every line that doesn't match

2017-09-22 Thread Satish Subhashrao Saley (JIRA)
Satish Subhashrao Saley created PIG-5306:


 Summary: REGEX_EXTRACT() logs every line that doesn't match
 Key: PIG-5306
 URL: https://issues.apache.org/jira/browse/PIG-5306
 Project: Pig
  Issue Type: Bug
Reporter: Satish Subhashrao Saley
Priority: Minor



Pig logs a warning message for every call that doesn't doesn't match a capture 
group. The documentation only says this case returns NULL. From a developer 
standpoint, the messages are unlikely to be useful.

https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java#L107



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PIG-5306) REGEX_EXTRACT() logs every line that doesn't match

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley reassigned PIG-5306:


Assignee: Satish Subhashrao Saley

> REGEX_EXTRACT() logs every line that doesn't match
> --
>
> Key: PIG-5306
> URL: https://issues.apache.org/jira/browse/PIG-5306
> Project: Pig
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>
> Pig logs a warning message for every call that doesn't doesn't match a 
> capture group. The documentation only says this case returns NULL. From a 
> developer standpoint, the messages are unlikely to be useful.
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java#L107



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)

2017-09-22 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176391#comment-16176391
 ] 

Koji Noguchi commented on PIG-5271:
---

bq. Koji Noguchi I think you forgot to commit TEZC-Union-22.gld from this 
patch! Could you please commit this file too?

Ouch.  It was from my own patch and I still missed adding this file.  Thanks 
for pointing it out.  Added now.

> StackOverflowError when compiling in Tez mode (with union and replicated join)
> --
>
> Key: PIG-5271
> URL: https://issues.apache.org/jira/browse/PIG-5271
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.18.0
>
> Attachments: pig-5271-v01.patch, pig-5271-v02.patch, 
> pig-5271-v03.patch
>
>
> Sample script
> {code}
> a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float);
> a4_1 = filter a4 by gpa is null or gpa >= 3.9;
> a4_2 = filter a4 by gpa < 1;
> b4 = union a4_1, a4_2;
> b4_1 = filter b4 by age < 30;
> b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa;
> c4 = load 'voternulltab10k' as (name, age, registration, contributions);
> d4 = join b4_2 by name, c4 by name using 'replicated';
> e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, 
> registration, contributions;
> f4 = order e4 by name, age DESC;
> store f4 into 'tmp_table_4' ;
> a5_1 = filter a4 by gpa is null or gpa <= 3.9;
> a5_2 = filter a4 by gpa < 2;
> b5 = union a5_1, a5_2;
> d5 = join c4 by name, b5 by name using 'replicated';
> store d5 into 'tmp_table_5' ;
> {code}
> This script fails to compile with StackOverflowError.
> {noformat}
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError
> at java.lang.reflect.Constructor.newInstance(Constructor.java:415)
> at java.lang.Class.newInstance(Class.java:442)
> at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] Subscription: PIG patch available

2017-09-22 Thread jira
Issue Subscription
Filter: PIG patch available (36 issues)

Subscriber: pigdaily

Key Summary
PIG-5305Enable yarn-client mode execution of tests in Spark (1) mode
https://issues.apache.org/jira/browse/PIG-5305
PIG-5302Remove HttpClient dependency
https://issues.apache.org/jira/browse/PIG-5302
PIG-5300hashCode for Bag needs to be order independent 
https://issues.apache.org/jira/browse/PIG-5300
PIG-5273_SUCCESS file should be created at the end of the job
https://issues.apache.org/jira/browse/PIG-5273
PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream
https://issues.apache.org/jira/browse/PIG-5267
PIG-5256Bytecode generation for POFilter and POForeach
https://issues.apache.org/jira/browse/PIG-5256
PIG-5191Pig HBase 2.0.0 support
https://issues.apache.org/jira/browse/PIG-5191
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] [Updated] (PIG-5272) BagToTuple Output Schema

2017-09-22 Thread Joshua Juen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Juen updated PIG-5272:
-
Attachment: BagToTupleSchema.patch

> BagToTuple Output Schema
> 
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Joshua Juen
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5272) BagToTuple Output Schema

2017-09-22 Thread Joshua Juen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Juen updated PIG-5272:
-
Flags: Patch
   Patch Info: Patch Available
Affects Version/s: 0.17.0
Fix Version/s: 0.18.0
  Summary: BagToTuple Output Schema  (was: BagToString Output 
Schema)

> BagToTuple Output Schema
> 
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Joshua Juen
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-5272) BagToTuple Output Schema

2017-09-22 Thread Joshua Juen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Juen updated PIG-5272:
-
  Labels: patch  (was: )
Release Note: Removed Incorrect Schema Definition from BagToTuple
  Status: Patch Available  (was: Open)

> BagToTuple Output Schema
> 
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Joshua Juen
>Priority: Minor
>  Labels: patch
> Fix For: 0.18.0
>
> Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62422: PIG-4120 Broadcast the index file in case of POMergeCoGroup and POMergeJoin

2017-09-22 Thread Satish Saley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62422/
---

(Updated Sept. 22, 2017, 10:19 a.m.)


Review request for pig.


Changes
---

added POMergeCogroupTez, addressed Rohini's comments


Bugs: PIG-4120
https://issues.apache.org/jira/browse/PIG-4120


Repository: pig-git


Description
---

PIG-4120 Broadcast the index file in case of POMergeCoGroup and POMergeJoin


Diffs (updated)
-

  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeCogroup.java
 f18d47a34 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeCogroupTez.java
 PRE-CREATION 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
 815a32586 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoinTez.java
 PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 
79739e98a 
  src/org/apache/pig/impl/builtin/DefaultIndexableLoader.java a4688e499 
  src/org/apache/pig/impl/builtin/TezIndexableLoader.java PRE-CREATION 
  test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-MergeCogroup-1.gld 
PRE-CREATION 
  test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-MergeJoin-1.gld 
PRE-CREATION 
  test/org/apache/pig/tez/TestTezCompiler.java f99d6f39c 


Diff: https://reviews.apache.org/r/62422/diff/2/

Changes: https://reviews.apache.org/r/62422/diff/1-2/


Testing
---


Thanks,

Satish Saley



[jira] [Updated] (PIG-4120) Broadcast the index file in case of POMergeCoGroup and POMergeJoin

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-4120:
-
Status: Patch Available  (was: Open)

> Broadcast the index file in case of POMergeCoGroup and POMergeJoin
> --
>
> Key: PIG-4120
> URL: https://issues.apache.org/jira/browse/PIG-4120
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.18.0
>
> Attachments: PIG-4120-1.patch, PIG-4120-2.patch
>
>
> Currently merge join and merge cogroup use two DAGs - the first DAG creates 
> the index file in hdfs and second DAG does the merge join.  Similar to 
> replicate join, we can broadcast the index file and cache it and use it in 
> merge join and merge cogroup. This will give better performance and also 
> eliminate need for the second DAG.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PIG-4120) Broadcast the index file in case of POMergeCoGroup and POMergeJoin

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176757#comment-16176757
 ] 

Satish Subhashrao Saley commented on PIG-4120:
--

Updated patch in review board.

> Broadcast the index file in case of POMergeCoGroup and POMergeJoin
> --
>
> Key: PIG-4120
> URL: https://issues.apache.org/jira/browse/PIG-4120
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.18.0
>
> Attachments: PIG-4120-1.patch, PIG-4120-2.patch
>
>
> Currently merge join and merge cogroup use two DAGs - the first DAG creates 
> the index file in hdfs and second DAG does the merge join.  Similar to 
> replicate join, we can broadcast the index file and cache it and use it in 
> merge join and merge cogroup. This will give better performance and also 
> eliminate need for the second DAG.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PIG-4120) Broadcast the index file in case of POMergeCoGroup and POMergeJoin

2017-09-22 Thread Satish Subhashrao Saley (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-4120:
-
Attachment: PIG-4120-2.patch

> Broadcast the index file in case of POMergeCoGroup and POMergeJoin
> --
>
> Key: PIG-4120
> URL: https://issues.apache.org/jira/browse/PIG-4120
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.18.0
>
> Attachments: PIG-4120-1.patch, PIG-4120-2.patch
>
>
> Currently merge join and merge cogroup use two DAGs - the first DAG creates 
> the index file in hdfs and second DAG does the merge join.  Similar to 
> replicate join, we can broadcast the index file and cache it and use it in 
> merge join and merge cogroup. This will give better performance and also 
> eliminate need for the second DAG.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)