[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-11 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-55356385 First, `ensureFreeSpace`(I renamed it to `findToBeDroppedBlocks`) doesn't always return true. If it can't find enough to-be-dropped blocks to free space, it will return

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-11 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-55356449 Seems a big change has made to memory store, I will digest it and update my PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17469980 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,93 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17518799 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,93 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17581688 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,93 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-09-15 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/2405 [SPARK-2096][SQL] support dot notation on arbitrarily nested array of struct The rule is simple: If you want `a.b` work, then `a` must be some level of nested array of struct(level 0 means just

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2405#discussion_r17583846 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -68,36 +72,96 @@ case class GetItem(child

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2405#discussion_r17583870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -68,36 +72,96 @@ case class GetItem(child

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17584609 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -118,21 +118,29 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17584855 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -118,21 +118,29 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-09-16 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-55840951 Hmmm, I didn't create the class `NonASCIICharacterChecker`... This fix also works for hql, but I'm not sure where to put the test case, any ideas? --- If your

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17645275 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -118,21 +118,29 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-16 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-55850975 Hi @andrewor14 , I think about your fix and focus on the timing to release unrollMemory. If we unroll a partition successfully, currently we release unrollMemory

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r17646960 --- Diff: core/src/main/scala/org/apache/spark/util/collection/SizeTrackingVector.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-09-17 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-55987610 @yhuai It's hard to define the semantic of f1.f11 f2.f22 as they are arbitrarily nested arrays. What if the array size is not equal? What if the nested level

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-56164031 Hi @liyezhang556520 , thanks for pointing this out! I have updated my PR, please review @andrewor14 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-22 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-56334684 @liyezhang556520 Thanks for you comments. 1) yes, the logic is not the same with the original intention. I have updated my PR to fix this. 2) the origin logic

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

2014-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2382#discussion_r18017841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -96,21 +103,69 @@ abstract class LogicalPlan

[GitHub] spark pull request: [SQL] Correctly check case sensitivity in GetF...

2014-09-26 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/2543 [SQL] Correctly check case sensitivity in GetField This PR is a follow up to https://github.com/apache/spark/pull/2382 It fix a bug when resolve something like `a.b[0].c.d`, https

[GitHub] spark pull request: [SQL] Correctly check case sensitivity in GetF...

2014-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18078324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -123,11 +123,7 @@ abstract class LogicalPlan

[GitHub] spark pull request: [SQL] Correctly check case sensitivity in GetF...

2014-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18078436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -118,6 +119,19 @@ class Analyzer(catalog: Catalog

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

2014-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2382#discussion_r18078535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -96,21 +103,69 @@ abstract class LogicalPlan

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-26 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-56940392 Nice catch! I think the problem is: for a table(a string, b string), when we run select a.b from test a join test b, we have 2 options to resolve `a.b`. One is table

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-26 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-56952680 @tianyi Let me raise an example. For `table: { a: { c: String }, b: String }`, if we run `select a.b from test a join test b`, your PR will still give 2 options

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-09-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18126983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-09-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18127040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -366,7 +366,7 @@ class SqlParser extends StandardTokenParsers

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-09-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18127069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -73,31 +75,35 @@ case class GetItem(child

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-27 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57073139 @liancheng Hmm..I don't have a hive environment for test... CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCTx: INT, k INT); SELECT a.x FROM

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-27 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-57074476 @tianyi CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCTx: INT, k INT); SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k; But hive can

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-09-28 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2543#issuecomment-57110457 Hi @marmbrus , I have updated my PR according to your comments. Do you mind review it again? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-09-29 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57261596 Have you considered cooperate with https://github.com/apache/spark/pull/2475? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-10-02 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-57647672 I think we can just handle one level nested array to fix SPARK-2096. What about adding a rule to using another type of `GetField` to handle array of struct? So that we

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-10-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2543#discussion_r18564336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -118,6 +120,19 @@ class Analyzer(catalog: Catalog

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-10-08 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-58338302 I tried this on hive: CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCTx: INT, k INT); SELECT a.x FROM t1 a JOIN t2 b; And hive can

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2816#discussion_r18933714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -300,11 +300,15 @@ class SqlParser extends

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2816#discussion_r18934453 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -323,13 +327,38 @@ class SqlParser extends

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-15 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2816#issuecomment-59304268 Hi, @sarutak , nice catch for this problem! And I have some small suggestion for the code: protected lazy val positiveNumericLiteral: Parser[Literal

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2816#discussion_r19001459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -301,33 +301,75 @@ class SqlParser extends

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2816#discussion_r19001491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -301,33 +301,75 @@ class SqlParser extends

[GitHub] spark pull request: [SPARK-3959][SPARK-3960][SQL] SqlParser fails ...

2014-10-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/2816#discussion_r19001551 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -301,33 +301,75 @@ class SqlParser extends

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-10-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2543#issuecomment-59674040 Ping @marmbrus @liancheng I have finished the code locally, if you vote for `UnresolvedGetField`, I can push the code immediately. --- If your project is set up

[GitHub] spark pull request: [SPARK-4052][SQL] Use scala.collection.Map for...

2014-10-22 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2899#issuecomment-60180896 Just curious about why scala link `Seq` to `scala.collection.Seq` by default, but link `Map` to `scala.collection.immutable.Map` by default. --- If your project

[GitHub] spark pull request: [SPARK-3698][SQL] Correctly check case sensiti...

2014-10-28 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2543#issuecomment-60737418 Hi @marmbrus @liancheng, I think it's better to calculate the `ordinal` of `GetField` in analyze phase, and I have updated the code to introduce

[GitHub] spark pull request: [SPARK-2044] Pluggable interface for shuffles

2014-06-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1009#discussion_r13688708 --- Diff: core/src/main/scala/org/apache/spark/rdd/ShuffledRDD.scala --- @@ -42,10 +42,11 @@ class ShuffledRDD[K, V, P : Product2[K, V] : ClassTag

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-06-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-46532970 did a manual merge :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO

2014-06-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1173#discussion_r14062333 --- Diff: core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala --- @@ -83,7 +83,7 @@ private[spark] class RollingFileAppender

[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47624200 With the new `synchronized`s you added in usage, I don't think we need `ConcurrentHashMap` any more. Maybe just a simple `HashMap` is enough. --- If your project

[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14438619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +63,17 @@ trait SQLConf { } def get(key: String

[GitHub] spark pull request: Update SQLConf.scala

2014-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14545788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +64,17 @@ trait SQLConf { } def get(key: String

[GitHub] spark pull request: Update SQLConf.scala

2014-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r1454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +64,17 @@ trait SQLConf { } def get(key: String

[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...

2014-09-01 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2179#issuecomment-54034960 It's much neater and simpler :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-2096 Correctly parse dot notations

2014-09-01 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/2230 SPARK-2096 Correctly parse dot notations First let me write down the current `projections` grammar of spark sql: expression: orExpression orExpression

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-02 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54122147 sorry for the code style, fixed! Test again please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-03 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54269552 @marmbrus Sorry for missing the `distinct`. Since we parse the dot in `SqlParser` now, the only possible formats of `name` passed into `LogicalPlan.resolve` is ident

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-05 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54601247 @marmbrus Seems hive parser will pass something like a.b.c... to `LogicalPlan`, so I have to roll back(and I changed `dotExpressionHeader` to `ident . ident {. ident

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-05 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54601682 I'm not sure how to modify `lazy val resolved` in `GetField` since it handles not only StructType now. Currently I just removed the type check. What do you think

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-09 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54937332 Actually hive doesn't support using dot notation to access fields of nested array, even one level. Anyway, I will put this support in another PR to keep this PR simple

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-09 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54942044 The failed test case seems a regression test for a new fix. I have done rebase to include the new fix. Test again please. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-09 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-55068488 rebase done, test again please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-11-10 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-16 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/3115#issuecomment-63264274 The `pom.xml` in `example` should be a good example to demonstrate how to declare hbase dependency when writing spark application for spark users. However

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/736 use Iterator#size in RDD#count in RDD#count, we used while loop to get the size of Iterator because that Iterator#size used a for loop, which was slightly slower in that version of Scala

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42955215 @rxin I'm sorry I didn't got a link for that, but I didn't find any discussion about performance issue of Iterator#size, either. I just checked the source code

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-43039510 I wrote a simple benchmark to test performance, Iterator#size really sucks... Sorry for my mistake, I'll close this pull request :( --- If your project is set up

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-16 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/791 improve performance of MemoryStore#tryToPut by elimating unnecessary lock It's unefficient to drop memory blocks to disk inside a synchronized block as IO is slow. As the TODO says, we just need

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-17 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43411524 This is thread safe. `tryToPut` call `ensureFreeSpace` in a synchronized block, so there is only one thread can run `ensureFreeSpace` at the same time, which means each

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43484083 As far as I know, reasons for task failure may be: exception happens during task execution, Executor lost and relaunch, stage cancelled by user. But I'm not sure if I

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43581928 @mridulm @tdas I have created a JIRA for this: https://issues.apache.org/jira/browse/SPARK-1888 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43611475 As we know, memory store is used for add, read, remove blocks. Reading and removing is quite simple, so let's focus on adding. Adding may trigger dropping action

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12878553 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -243,10 +250,13 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43706083 @mridulm Thanks very much for your comment! I think a big difference is: earlier code call BlockManager#dropFromMemory within putLock, but now we call it in parallel

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43733545 `ensureFreeSpace` has 2 jobs. 1) iterate entries and select blocks to be dropped. 2) if to-be-dropped blocks can free enough space, mark them as dropping and return

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43738827 @mridulm I checked the code of BlockManager#doPut. val putBlockInfo = { val tinfo = new BlockInfo(level, tellMaster) // Do atomically

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43752165 @tdas you missed an important thing. `trToPut` call `ensureFreeSpace` within the putLock, so one thread have to wait another thread done both selecting and marking

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43778259 @tdas @mridulm what about we moving the `putLock.synchronized` into `ensureFreeSpace ` and let `tryToPut` call `ensureFreeSpace ` directly? I think it will be more

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43842578 @tdas I think we shouldn't synchronize on this. When one thread is running `ensureFreeSpace`, others should not get into `ensureFreeSpace`, but should be able to add

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43842861 @mridulm I checked all caller of MemoryStore#putValues and putBytes via IDE, it shows only BlockManager will call them and with block info synchronized. So maybe we

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43848168 @mridulm @tdas I have moved `putLock.synchronized` into `ensureFreeSpace` and rename this method to `getToBeDroppedBlocks`. And I also updated the scaladoc to explain

[GitHub] spark pull request: [SPARK-1912] fix compress memory issue during ...

2014-05-23 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/860 [SPARK-1912] fix compress memory issue during reduce When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block

[GitHub] spark pull request: [SPARK-1912] fix compress memory issue during ...

2014-05-25 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/860#issuecomment-44157148 @rxin Thanks for your advice! I have added the comment and override, and please take a look to see if I missed something. Thanks! --- If your project is set up

[GitHub] spark pull request: [SPARK-1912] fix compress memory issue during ...

2014-05-29 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/860#issuecomment-44503345 @mateiz That's a good idea! I have moved the lazy iterator into `BlockManager.dataDeserialize`. Thanks for your comments! --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-1912] fix compress memory issue during ...

2014-06-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/860#discussion_r13317607 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1015,8 +1015,26 @@ private[spark] class BlockManager( bytes

[GitHub] spark pull request: [SPARK-1912] fix compress memory issue during ...

2014-06-02 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/860#issuecomment-44915823 @mateiz Does lazy val has performance overhead? I agree lazy val can make the code clearer here, but dataDeserialize can be called many times if there are lots

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/369#discussion_r13376621 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag]( def ++(other: RDD[T]): RDD[T

[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/369#discussion_r13422474 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag]( def ++(other: RDD[T]): RDD[T

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arbi...

2014-12-09 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66282781 This PR is blocked by https://github.com/apache/spark/pull/2543. I'll update the code tomorrow and make it work :) --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66593009 Hi @marmbrus @liancheng, I have updated this PR to support `GetField` on one level of array of struct for now. As I mentioned in https://github.com/apache/spark/pull

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-17 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-67432312 Hi, @marmbrus ,the key point why I want to introduce `UnResolvedGetField` is that: for something like `a.b[0].c.d`, we first parse it to `GetField(GetField(GetItem

[GitHub] spark pull request: [SPARK-4959] [SQL] Attributes are case sensiti...

2014-12-25 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/3796#issuecomment-68121673 Looks like using `AttributeMap` can avoid creating many `AttributeEquals` during map building and searching. Did I miss something here? I'm not so familiar

[GitHub] spark pull request: [SPARK-4945] [SQL] Add overwrite option suppor...

2014-12-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/3780#discussion_r22278028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -199,11 +199,15 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-01-26 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4055#issuecomment-71460355 @suyanNone Thanks for the explanation of re-submit! What's the Chinese name of HarryZhang? We don't use English name in the lab…… --- If your project is set

[GitHub] spark pull request: Create SparkAPSP.scala

2015-02-04 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4362#issuecomment-72813069 Have you checked the GraphX example https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api? It implemented single source shortest path

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-02-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/4325#discussion_r23987594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-02-03 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-72613023 Hi @yhuai , I have updated this PR introducing `UnresolvedGetField` to fix this issue. Do you have time to review it? Thanks! --- If your project is set up

[GitHub] spark pull request: [HOT FIX] import sparkContext.implicits._ to p...

2015-02-05 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4391#issuecomment-73012360 ping @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [HOT FIX] import sparkContext.implicits._ to p...

2015-02-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/4391 [HOT FIX] import sparkContext.implicits._ to pass the compile minor You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark hot

[GitHub] spark pull request: [HOT FIX] import sparkContext.implicits._ to p...

2015-02-05 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4391#issuecomment-73013698 OK, closing. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [HOT FIX] import sparkContext.implicits._ to p...

2015-02-05 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/4391 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2015-02-08 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-73356946 Hi @marmbrus , since https://github.com/apache/spark/pull/4068 is merged, it's much simpler to implement this now. Do you have time to review it? Thanks! --- If your

  1   2   3   4   5   6   7   8   9   10   >