[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1415 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49114762 Ok I'm merging this one. Thanks guys. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does no

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49100573 FYI filed JIRA: https://issues.apache.org/jira/browse/SPARK-2496 Compression streams should write its codec info to the stream --- If your project is set up for it, you ca

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49100370 Only the codec names are stored in the event logs; no other information is currently recorded. But this change isn't really breaking anything in that area. (And, by defaul

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49099980 Yea - stability seems much more important than a small performance gain --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49090832 @rxin IIRC at one point we changed this before and it caused a performance regression for our perf suite so we reverted it. At the time I think we were running on smalle

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49006312 Cant comment on tachyon since we dont use it and have no experience with it unfortunately. I am fine with this change for the rest. --- If your project is set up for

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005883 Yea the test failure isn't related. If there is no objection, I'm going to merge this tomorrow. I will file a jira ticket so we can prepend compression codec inform

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005818 ah yes, blocksize is only used during compression time : and inferred from stream during decompression. Then only class name should be sufficient --- If your project

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005728 weird that test failures - unrelated to this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If you

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49001592 QA results for PR 1415:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48998038 We should create a JIRA so compression streams use the first few bytes to track the compression codec and various settings it needs (for lzf/snappy/lz4, there isn't any).

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-14 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996763 @andrewor14 do we also log the block size, etc of the codec used ? If yes, then atleast for event data we should be fine. IIRC we use the codec to compress

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-14 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996256 Yes, we log the codec used in a separate file so we don't lock ourselves out of our old event logs. This change seems fine. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-14 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996149 I looked into the event logger code and it appears that codec change should be fine. It figures out the codec for old data automatically anyway. --- If your project is set