[jira] [Commented] (CRUNCH-179) Add a properly typed union() method to PCollection and PTable

2013-03-06 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595644#comment-13595644 ] Gabriel Reid commented on CRUNCH-179: - Sounds like a very good plan, those compiler

[jira] [Updated] (CRUNCH-180) Crunch archetype uses two different versions of commons-codec

2013-03-12 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-180: Attachment: CRUNCH-180.patch Trivial patch. Any objections to this? Crunch

[jira] [Resolved] (CRUNCH-191) Object reuse bug in o.a.c.lib.Distinct

2013-04-05 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-191. - Resolution: Fixed Fix Version/s: 0.6.0 Assignee: Gabriel Reid Pushed to master

[jira] [Commented] (CRUNCH-192) Document and enforce the semantics around reducer-based Iterables

2013-04-08 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625414#comment-13625414 ] Gabriel Reid commented on CRUNCH-192: - Ok, good point, I'll add it in to the in-memory

[jira] [Commented] (CRUNCH-192) Document and enforce the semantics around reducer-based Iterables

2013-04-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628928#comment-13628928 ] Gabriel Reid commented on CRUNCH-192: - Yep, good point, will do.

[jira] [Updated] (CRUNCH-192) Document and enforce the semantics around reducer-based Iterables

2013-04-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-192: Attachment: CRUNCH-192.patch.v3 Updated patch that moves the SingleUseIterable to o.a.c.impl.

[jira] [Resolved] (CRUNCH-192) Document and enforce the semantics around reducer-based Iterables

2013-04-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-192. - Resolution: Fixed Fix Version/s: 0.6.0 Assignee: Gabriel Reid Pushed to master

[jira] [Commented] (CRUNCH-129) Cache the Iterable values for each key when a groupByKey op has multiple children

2013-04-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628972#comment-13628972 ] Gabriel Reid commented on CRUNCH-129: - [~joshwills] are these both (i.e. the title and

[jira] [Updated] (CRUNCH-211) Add one-to-many join functionality

2013-06-02 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-211: Attachment: CRUNCH-211.patch Updated patch will full classnames removed -- thanks for catching

[jira] [Commented] (CRUNCH-91) Enable custom output file naming

2013-06-07 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677892#comment-13677892 ] Gabriel Reid commented on CRUNCH-91: [~rem120] No, I haven't looked at that in quite a

[jira] [Created] (CRUNCH-213) Add sharded join functionality

2013-06-07 Thread Gabriel Reid (JIRA)
Gabriel Reid created CRUNCH-213: --- Summary: Add sharded join functionality Key: CRUNCH-213 URL: https://issues.apache.org/jira/browse/CRUNCH-213 Project: Crunch Issue Type: New Feature

[jira] [Updated] (CRUNCH-213) Add sharded join functionality

2013-06-07 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-213: Attachment: CRUNCH-213.patch Patch to introduce sharded joins. The join code is also pretty

[jira] [Commented] (CRUNCH-213) Add sharded join functionality

2013-06-07 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678450#comment-13678450 ] Gabriel Reid commented on CRUNCH-213: - Ummm...thanks? [uncomfortable silence]

[jira] [Commented] (CRUNCH-213) Add sharded join functionality

2013-06-07 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678458#comment-13678458 ] Gabriel Reid commented on CRUNCH-213: - Hmm, good question. Well, it works on hadoop-2

[jira] [Commented] (CRUNCH-213) Add sharded join functionality

2013-06-08 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678696#comment-13678696 ] Gabriel Reid commented on CRUNCH-213: - I've run the integration tests with both

[jira] [Updated] (CRUNCH-213) Add sharded join functionality

2013-06-08 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-213: Attachment: CRUNCH-213.patch Updated patch that provides the TaskAttemptID in MemCollection for

[jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat

2013-06-08 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678850#comment-13678850 ] Gabriel Reid commented on CRUNCH-212: - That's correct about the total ordering --

[jira] [Updated] (CRUNCH-215) Add BloomFilterJoinStrategy

2013-06-09 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-215: Attachment: CRUNCH-215.patch Patch to introduce BloomFilterJoinStrategy Add

[jira] [Commented] (CRUNCH-205) Remove superfluous directory from source distribution

2013-07-18 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712078#comment-13712078 ] Gabriel Reid commented on CRUNCH-205: - [~mafr] good point, this test does seem to come

[jira] [Commented] (CRUNCH-174) Add support for join3 and cogroup3

2013-07-18 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712091#comment-13712091 ] Gabriel Reid commented on CRUNCH-174: - Yep, exactly -- I just realized the same thing

[jira] [Commented] (CRUNCH-240) Make DefaultJoinStrategy.join(PTable, PTable, JoinFn) public

2013-07-19 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713377#comment-13713377 ] Gabriel Reid commented on CRUNCH-240: - Yep, definitely seems fine to me. The reason I

[jira] [Updated] (CRUNCH-174) Add support for cogroup3

2013-07-25 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-174: Description: This seemed like a nice starter JIRA: it would be great to have the three (and even

[jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat

2013-07-31 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725616#comment-13725616 ] Gabriel Reid commented on CRUNCH-212: - Cool, just took it for a mini test-run in local

[jira] [Commented] (CRUNCH-216) Transpose arguments in MapsideJoinStrategy.join

2013-08-19 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743580#comment-13743580 ] Gabriel Reid commented on CRUNCH-216: - Yeah, this didn't make it into the 0.7.0, so I

[jira] [Commented] (CRUNCH-256) SequentialFileNamingScheme should cache the # of files in the target directory after the first read

2013-08-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748343#comment-13748343 ] Gabriel Reid commented on CRUNCH-256: - The one potential issue that I can see with

[jira] [Commented] (CRUNCH-256) SequentialFileNamingScheme should cache the # of files in the target directory after the first read

2013-08-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748719#comment-13748719 ] Gabriel Reid commented on CRUNCH-256: - Good point, that sounds like the best way to

[jira] [Commented] (CRUNCH-269) Allow clients to disable deep copies on intermediate DoFn outputs

2013-09-21 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773828#comment-13773828 ] Gabriel Reid commented on CRUNCH-269: - Any idea on the actual slowdown caused by the

[jira] [Commented] (CRUNCH-264) Writing to TextFileTarget map side does not show up in plan

2013-09-29 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781461#comment-13781461 ] Gabriel Reid commented on CRUNCH-264: - +1 My only comment would be that the

[jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code

2013-10-10 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791928#comment-13791928 ] Gabriel Reid commented on CRUNCH-278: - Do you see the ReadableSourceBundle as always

[jira] [Commented] (CRUNCH-279) Allow DoFn.process to throw exceptions

2013-10-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792452#comment-13792452 ] Gabriel Reid commented on CRUNCH-279: - Yeah, I think that either throwing a

[jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code

2013-10-12 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793427#comment-13793427 ] Gabriel Reid commented on CRUNCH-278: - Another idea I just had for having an option in

[jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code

2013-10-14 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793967#comment-13793967 ] Gabriel Reid commented on CRUNCH-278: - I think that there's still something that I

[jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code

2013-10-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794939#comment-13794939 ] Gabriel Reid commented on CRUNCH-278: - Yeah, I think that that could work for the more

[jira] [Updated] (CRUNCH-283) Add additional job dependency info to the job's dotfile

2013-10-17 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-283: Attachment: CRUNCH-283.patch +1, that always bugged me that the side-inputs (i.e. MapsideJoin)

[jira] [Commented] (CRUNCH-284) Optimize for minimal disk i/o rather than the number of stages?

2013-10-21 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800561#comment-13800561 ] Gabriel Reid commented on CRUNCH-284: - +1 to the the idea of having a setting to

[jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-10-25 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805695#comment-13805695 ] Gabriel Reid commented on CRUNCH-286: - I was just thinking about this one again, and

[jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-10-25 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805713#comment-13805713 ] Gabriel Reid commented on CRUNCH-286: - I don't think it's really necessary to make the

[jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-10-25 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806002#comment-13806002 ] Gabriel Reid commented on CRUNCH-286: - The contract for a Combiner is that it can be

[jira] [Commented] (CRUNCH-288) Make it easier to identify the failing job in a long pipeline

2013-10-30 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809584#comment-13809584 ] Gabriel Reid commented on CRUNCH-288: - +1 Make it easier to identify the failing job

[jira] [Created] (CRUNCH-289) Can't materialize PCollection of Avro SpecificRecords

2013-10-30 Thread Gabriel Reid (JIRA)
Gabriel Reid created CRUNCH-289: --- Summary: Can't materialize PCollection of Avro SpecificRecords Key: CRUNCH-289 URL: https://issues.apache.org/jira/browse/CRUNCH-289 Project: Crunch Issue

[jira] [Updated] (CRUNCH-289) Can't materialize PCollection of Avro SpecificRecords

2013-10-30 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-289: Attachment: CRUNCH-289.patch Patch to resolve the issue -- passes all integration tests, and I've

[jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-10-30 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809630#comment-13809630 ] Gabriel Reid commented on CRUNCH-286: - How are you feeling on this one [~jwills]? I

[jira] [Commented] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-10-31 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810215#comment-13810215 ] Gabriel Reid commented on CRUNCH-286: - Very weird with that missing comment -- I've

[jira] [Resolved] (CRUNCH-291) Add a toString method on CrunchInputSplit

2013-10-31 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-291. - Resolution: Fixed Pushed to master Add a toString method on CrunchInputSplit

[jira] [Updated] (CRUNCH-291) Add a toString method on CrunchInputSplit

2013-10-31 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-291: Fix Version/s: 0.8.0 Assignee: Gabriel Reid Issue Type: Improvement (was: Bug)

[jira] [Resolved] (CRUNCH-289) Can't materialize PCollection of Avro SpecificRecords

2013-10-31 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-289. - Resolution: Fixed Fix Version/s: 0.8.0 Assignee: Gabriel Reid Pushed to master

[jira] [Resolved] (CRUNCH-286) ability to specify a different function for combiner reducer

2013-11-01 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-286. - Resolution: Fixed Fix Version/s: 0.8.0 Pushed to master ability to specify a different

[jira] [Updated] (CRUNCH-294) Cost-based job planning

2013-11-16 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-294: Attachment: jobplan-lopsided.png jobplan-large_s2_s3.png

[jira] [Commented] (CRUNCH-294) Cost-based job planning

2013-11-17 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824943#comment-13824943 ] Gabriel Reid commented on CRUNCH-294: - Yes, that all sounds very right to me, and I

[jira] [Commented] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

2013-11-18 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825429#comment-13825429 ] Gabriel Reid commented on CRUNCH-296: - Looks and sounds very interesting -- I'm

[jira] [Commented] (CRUNCH-294) Cost-based job planning

2013-11-20 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827755#comment-13827755 ] Gabriel Reid commented on CRUNCH-294: - The logic for choosing splits sounds good to

[jira] [Commented] (CRUNCH-294) Cost-based job planning

2013-11-20 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828134#comment-13828134 ] Gabriel Reid commented on CRUNCH-294: - Yeah, that sounds like an excellent idea. Right

[jira] [Commented] (CRUNCH-294) Cost-based job planning

2013-11-20 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828180#comment-13828180 ] Gabriel Reid commented on CRUNCH-294: - Smallest materialized point makes the most

[jira] [Commented] (CRUNCH-299) Support predicate pushdown for Parquet sources

2013-11-22 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830127#comment-13830127 ] Gabriel Reid commented on CRUNCH-299: - FWIW, option A (giving a ColumnRecordFilter to

[jira] [Commented] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

2013-11-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830767#comment-13830767 ] Gabriel Reid commented on CRUNCH-296: - Cool! About the sorting issue, I think that

[jira] [Commented] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

2013-11-24 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830914#comment-13830914 ] Gabriel Reid commented on CRUNCH-296: - Yeah, that sounds right. I'm thinking that this

[jira] [Commented] (CRUNCH-173) Make WritableTypeFamily more compact for composite types

2013-11-25 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831774#comment-13831774 ] Gabriel Reid commented on CRUNCH-173: - [~stepinto] If you can add it here, it would be

[jira] [Commented] (CRUNCH-296) Support new distributed execution engines (e.g., Spark)

2013-12-10 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844284#comment-13844284 ] Gabriel Reid commented on CRUNCH-296: - Looks good to me in general, although I'm

[jira] [Commented] (CRUNCH-316) Data Corruption when DatumWriter.write() throws MapBufferTooSmallException when called by SafeAvroSerialization

2014-01-01 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859848#comment-13859848 ] Gabriel Reid commented on CRUNCH-316: - I don't think that creating a new

[jira] [Updated] (CRUNCH-323) Add a section on getDetachedValue to the user guide

2014-01-19 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-323: Attachment: getDetachedValue.patch Initial version of a section on the motivation and usage of

[jira] [Created] (CRUNCH-323) Add a section on getDetachedValue to the user guide

2014-01-19 Thread Gabriel Reid (JIRA)
Gabriel Reid created CRUNCH-323: --- Summary: Add a section on getDetachedValue to the user guide Key: CRUNCH-323 URL: https://issues.apache.org/jira/browse/CRUNCH-323 Project: Crunch Issue Type:

[jira] [Updated] (CRUNCH-324) Sample.reservoirSample method name is spelled incorrectly

2014-01-19 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-324: Attachment: CRUNCH-324.patch Trivial patch, in case we do want to make the change. If we don't

[jira] [Commented] (CRUNCH-326) Add section on unit testing to user guide

2014-01-20 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876246#comment-13876246 ] Gabriel Reid commented on CRUNCH-326: - Thanks for the tip about the tabs in markdown,

[jira] [Updated] (CRUNCH-328) Add support for ExtensionRegistry with PTypes.protos

2014-01-21 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-328: Issue Type: Improvement (was: Bug) Add support for ExtensionRegistry with PTypes.protos

[jira] [Commented] (CRUNCH-328) Add support for ExtensionRegistry with PTypes.protos

2014-01-21 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878347#comment-13878347 ] Gabriel Reid commented on CRUNCH-328: - +1 Add support for ExtensionRegistry with

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879886#comment-13879886 ] Gabriel Reid commented on CRUNCH-329: - The general working of the patch looks good to

[jira] [Comment Edited] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880377#comment-13880377 ] Gabriel Reid edited comment on CRUNCH-329 at 1/23/14 9:24 PM: --

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880377#comment-13880377 ] Gabriel Reid commented on CRUNCH-329: - {blockquote} what if we made the writable codes

[jira] [Commented] (CRUNCH-330) Use of multiple output counters can be disabled in configuration.

2014-01-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880380#comment-13880380 ] Gabriel Reid commented on CRUNCH-330: - +1 Use of multiple output counters can be

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880443#comment-13880443 ] Gabriel Reid commented on CRUNCH-329: - I think if we would want to throw out the

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-24 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880828#comment-13880828 ] Gabriel Reid commented on CRUNCH-329: - I've been thinking about this one some more,

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-01-24 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881094#comment-13881094 ] Gabriel Reid commented on CRUNCH-329: - I was thinking that this could be done in

[jira] [Commented] (CRUNCH-332) InMemory impl should call DoFn.configure

2014-01-24 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881132#comment-13881132 ] Gabriel Reid commented on CRUNCH-332: - +1 InMemory impl should call DoFn.configure

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-02-03 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889602#comment-13889602 ] Gabriel Reid commented on CRUNCH-329: - As far as I see, it's only the custom (i.e.

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-02-03 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889660#comment-13889660 ] Gabriel Reid commented on CRUNCH-329: - After a look at the code I see what you mean. I

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-02-03 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889695#comment-13889695 ] Gabriel Reid commented on CRUNCH-329: - Yeah, that definitely sounds better than having

[jira] [Commented] (CRUNCH-338) TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.avro.generic.IndexedRecord

2014-02-04 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1389#comment-1389 ] Gabriel Reid commented on CRUNCH-338: - I'm having a hard time reproducing this -- any

[jira] [Commented] (CRUNCH-338) TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.avro.generic.IndexedRecord

2014-02-05 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891900#comment-13891900 ] Gabriel Reid commented on CRUNCH-338: - Thanks for the stack trace. I'm still having a

[jira] [Updated] (CRUNCH-338) TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.avro.generic.IndexedRecord

2014-02-06 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-338: Attachment: CRUNCH-338.patch Patch to correctly set the output PType when running Cogroup with a

[jira] [Resolved] (CRUNCH-338) TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.avro.generic.IndexedRecord

2014-02-07 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-338. - Resolution: Fixed Fix Version/s: 0.8.3 0.10.0 Assignee:

[jira] [Commented] (CRUNCH-341) Move test resources used across multiple modules to crunch-test

2014-02-11 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898891#comment-13898891 ] Gabriel Reid commented on CRUNCH-341: - +1 Awesome, all the copies of those files

[jira] [Commented] (CRUNCH-341) Move test resources used across multiple modules to crunch-test

2014-02-12 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899128#comment-13899128 ] Gabriel Reid commented on CRUNCH-341: - I think the rebranding of CRUNCH-188 as you

[jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

2014-02-14 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901921#comment-13901921 ] Gabriel Reid commented on CRUNCH-329: - +1, looks good to me, and I like how you got

[jira] [Created] (CRUNCH-344) Full file glob syntax does not work correctly with Crunch

2014-02-15 Thread Gabriel Reid (JIRA)
Gabriel Reid created CRUNCH-344: --- Summary: Full file glob syntax does not work correctly with Crunch Key: CRUNCH-344 URL: https://issues.apache.org/jira/browse/CRUNCH-344 Project: Crunch Issue

[jira] [Updated] (CRUNCH-344) Full file glob syntax does not work correctly with Crunch

2014-02-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-344: Attachment: CRUNCH-344.patch Patch to resolve the issue. URL encoding is used to serialize path

[jira] [Commented] (CRUNCH-216) Transpose arguments in MapsideJoinStrategy.join

2014-02-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902503#comment-13902503 ] Gabriel Reid commented on CRUNCH-216: - I'm still regularly kicking myself for

[jira] [Resolved] (CRUNCH-344) Full file glob syntax does not work correctly with Crunch

2014-02-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-344. - Resolution: Fixed Fix Version/s: 0.8.3 0.10.0 Pushed to master and 0.8

[jira] [Commented] (CRUNCH-345) Force materialization of PCollections prior to multi reduce sorts

2014-02-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902550#comment-13902550 ] Gabriel Reid commented on CRUNCH-345: - The HFile-loading tests have the same issue, so

[jira] [Updated] (CRUNCH-216) Transpose arguments in MapsideJoinStrategy.join

2014-02-15 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-216: Attachment: CRUNCH-216b.patch About reversing both left and right outer joins, only one outer join

[jira] [Commented] (CRUNCH-341) Move test resources used across multiple modules to crunch-test

2014-02-16 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902745#comment-13902745 ] Gabriel Reid commented on CRUNCH-341: - The intention of pushing things to the 0.8

[jira] [Commented] (CRUNCH-346) Don't deep-copy immutable Writable PTypes

2014-02-16 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902765#comment-13902765 ] Gabriel Reid commented on CRUNCH-346: - Ok, I'll put together a patch for this. About

[jira] [Resolved] (CRUNCH-216) Transpose arguments in MapsideJoinStrategy.join

2014-02-18 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-216. - Resolution: Fixed Fix Version/s: 0.10.0 Assignee: Gabriel Reid Pushed to master

[jira] [Commented] (CRUNCH-350) Non-serializable BloomFilter field in BloomFilterJoinStrategy should be marked transient

2014-02-21 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908080#comment-13908080 ] Gabriel Reid commented on CRUNCH-350: - [~jwills] Any idea how this bug was getting

[jira] [Resolved] (CRUNCH-346) Don't deep-copy immutable Writable PTypes

2014-02-22 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved CRUNCH-346. - Resolution: Fixed Fix Version/s: 0.8.3 0.10.0 Pushed to master and 0.8

[jira] [Commented] (CRUNCH-351) Improve performance of Shard#shard on large records

2014-02-22 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909543#comment-13909543 ] Gabriel Reid commented on CRUNCH-351: - Looks to me like this will indeed work a lot

[jira] [Commented] (CRUNCH-351) Improve performance of Shard#shard on large records

2014-02-23 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909754#comment-13909754 ] Gabriel Reid commented on CRUNCH-351: - {quote} I think a constant random seed is

[jira] [Commented] (CRUNCH-351) Improve performance of Shard#shard on large records

2014-02-24 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910448#comment-13910448 ] Gabriel Reid commented on CRUNCH-351: - {quote} I think I have missed something here. I

[jira] [Updated] (CRUNCH-357) Allow AvroMode overrides to be less global

2014-02-26 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-357: Attachment: CRUNCH-357_immutable.patch Sounds like a good plan. It looks like AvroMode can

[jira] [Commented] (CRUNCH-360) GenericData.Record avro records without schema namespace gets implicit namespacecrunch

2014-02-27 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914466#comment-13914466 ] Gabriel Reid commented on CRUNCH-360: - This seems to be a bug in Avro (see AVRO-1295)

[jira] [Commented] (CRUNCH-360) GenericData.Record avro records without schema namespace gets implicit namespacecrunch

2014-02-27 Thread Gabriel Reid (JIRA)
[ https://issues.apache.org/jira/browse/CRUNCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914615#comment-13914615 ] Gabriel Reid commented on CRUNCH-360: - To my knowledge, Crunch doesn't explicitly set

  1   2   3   >