[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

christ...@quelltextlich.at changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from christ...@quelltextlich.at ---
I recomputed the data for a few days using the new pig.jar, and it matched
the data we received from the old jar.

Logs did not show any peculiarities with the new pig.jar.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

--- Comment #6 from Toby Negrin tneg...@wikimedia.org ---
Thanks Christian -- nice work.

-Toby

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

--- Comment #1 from Bingle bingle-ad...@wikimedia.org ---
Prioritization and scheduling of this bug is tracked on Mingle card
https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1505

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

--- Comment #2 from christ...@quelltextlich.at ---
Rerunning the job gave the same result, so it's probably not some random
failure.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

--- Comment #3 from christ...@quelltextlich.at ---
Mhmm ... uncompressed zero files for today are for the first time
2^32 bytes. Trimming each file below 2^32 bytes is making things
work again.

Our big data tooling cannot take more than 32-bit sized data?

And it's 1st April ... epic :-D

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63371] Wikipedia Zero job for 2014-03-01 failed on Hadoop with java.io.IOException: stored gzip size doesn't match decompressed size

2014-04-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63371

--- Comment #4 from christ...@quelltextlich.at ---
Upstream bug seems to be
  https://issues.apache.org/jira/browse/HADOOP-8900

That's included Hadoop 1.2.0, but the Pig snapshot version we used up
to now for Wikipedia Zero is Hadoop 1.2.0.

Rebuilding the current Pig head from sources also uses Hadoop 1.2.0.

Cloudera picks up the upstream bug with CDH 4.2.0. However, the CDH
4.2.0 pig jar from

 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/pig/pig/0.10.0-cdh4.2.0/pig-0.10.0-cdh4.2.0.jar

does not include dependencies and fails with

  Exception in thread main java.lang.NoClassDefFoundError:
jline/ConsoleReaderInputStream
  at java.lang.Class.getDeclaredMethods0(Native Method)
  [...]

.

Adding all dependencies by hand would be heavy lifting.

However, Cloudera's archive at

  http://archive-primary.cloudera.com/cdh4/cdh/4/pig-0.10.0-cdh4.2.0.tar.gz

holds the full sources after the build completed. So in that archive

  pig-0.10.0-cdh4.2.0.jar

is the jar with full dependencies that can be used to run pig in local
mode without having to extend the classpath by hand.

Using that jar, the carrier file could get generated again.

Doing some more tests tomorrow to make sure the switch in the used pig
version does not affect numbers.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l