[GitHub] spark issue #18430: [SPARK-21223] Change fileToAppInfo in FsHistoryProvider ...

2017-06-30 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/18430
  
test please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider

2017-06-29 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/18430
  
sorry, it's a typing error, i mean the related JIRA: SPARK-21078. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider

2017-06-29 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/18430
  
@srowen thanks for your suggestions again! and should I address the problem 
of SPARK-13988 in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider

2017-06-28 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/18430
  
@jerryshao  actually, this threading issue cause an infinite loop when we 
restart historyserver and replaying event logs of spark apps. you can see the 
jstack log in attachments of SPARK-21223. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider

2017-06-28 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/18430
  
@jerryshao do you mean that after fileToAppInfo.get(entry.getPath()) return 
a value, other threads may add or change the value of entry.getPath(), which 
cause an inconsistent issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18430: [SPARK-21223]:Thread-safety issue in FsHistoryPro...

2017-06-27 Thread zenglinxi0615
Github user zenglinxi0615 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18430#discussion_r124271386
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -321,7 +322,7 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
   // scan for modified applications, replay and merge them
   val logInfos: Seq[FileStatus] = statusList
 .filter { entry =>
-  val prevFileSize = 
fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
+  val prevFileSize = 
fileToAppInfo.asScala.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
--- End diff --

@srowen thanks for your suggestions, I have made some modifications, could 
you take a look when you have time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18430: [SPARK-21223]:Thread-safety issue in FsHistoryPro...

2017-06-27 Thread zenglinxi0615
GitHub user zenglinxi0615 opened a pull request:

https://github.com/apache/spark/pull/18430

[SPARK-21223]:Thread-safety issue in FsHistoryProvider

## What changes were proposed in this pull request?
fix the Thread-safety issue in FsHistoryProvider
Currently, Spark HistoryServer use a HashMap named fileToAppInfo in class 
FsHistoryProvider to store the map of eventlog path and attemptInfo. 
When use ThreadPool to Replay the log files in the list and merge the list 
of old applications with new ones, multi thread may update fileToAppInfo at the 
same time, which may cause Thread-safety issues.
(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zenglinxi0615/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18430


commit d2b3c960012403fcc9be6fbd33f74f395d879f9d
Author: 曾林西 
Date:   2017-06-27T07:29:44Z

[SPARK-21223]:Thread-safety issue in FsHistoryProvider




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

2017-06-18 Thread zenglinxi0615
Github user zenglinxi0615 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14085#discussion_r122620464
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -113,8 +113,9 @@ case class AddFile(path: String) extends 
RunnableCommand {
 
   override def run(sqlContext: SQLContext): Seq[Row] = {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
+val recursive = 
sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
--- End diff --

I was wondering if we could call:
sparkSession.sparkContext.addFile(path, true)
in AddFileCommand func, since it's a general demand in ETL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17550: [SPARK-20240][SQL] SparkSQL support limitations o...

2017-04-11 Thread zenglinxi0615
Github user zenglinxi0615 closed the pull request at:

https://github.com/apache/spark/pull/17550


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17550: [SPARK-20240][SQL] SparkSQL support limitations of max d...

2017-04-11 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/17550
  
ok,going to close this PR and open a new PR using the master branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17550: [SPARK-20240][SQL] SparkSQL support limitations o...

2017-04-06 Thread zenglinxi0615
GitHub user zenglinxi0615 opened a pull request:

https://github.com/apache/spark/pull/17550

[SPARK-20240][SQL] SparkSQL support limitations of max dynamic partit…

…ions when inserting hive table

## What changes were proposed in this pull request?
support limitations of max dynamic partitions when inserting hive table by 
using hive.exec.max.dynamic.partitions.pernode

## How was this patch tested?
manual test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zenglinxi0615/spark SPARK-20240

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17550


commit adc91b958ec7aeeab0eaf1663de15ebbcb83da0d
Author: 曾林西 
Date:   2017-04-06T11:00:47Z

[SPARK-20240][SQL] SparkSQL support limitations of max dynamic partitions 
when inserting hive table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...

2016-08-31 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/14686
  
sorry for long time no response. 
yes, you are right, when you can change the sql from using '/temp/test.py' 
to using 'python /temp/test.py', there's no need for changing the spark source 
code.
However, this patch is work for the case when there are already many hive 
sql which using '/temp/test.py', it cost too much time for modifing these hive 
sql, so we want to spark sql compatible with hive sql that using python script 
transform like using 'xxx.py'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...

2016-08-18 Thread zenglinxi0615
Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/14686
  
Have you tried it on spark 1.6.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14686: [SPARK-16253][SQL] make spark sql compatible with...

2016-08-17 Thread zenglinxi0615
GitHub user zenglinxi0615 opened a pull request:

https://github.com/apache/spark/pull/14686

[SPARK-16253][SQL] make spark sql compatible with hive sql that using…

## What changes were proposed in this pull request?

make spark sql compatible with hive sql that using python script transform 
like using 'xxx.py'


## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zenglinxi0615/spark v1.6.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14686


commit 29df40dab9963e0dbce4119bdd872a86ff670af9
Author: 曾林西 
Date:   2016-06-28T12:37:05Z

[SPARK-16253][SQL] make spark sql compatible with hive sql that using 
python script transform like using 'xxx.py'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

2016-07-07 Thread zenglinxi0615
Github user zenglinxi0615 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14085#discussion_r69865365
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -113,8 +113,9 @@ case class AddFile(path: String) extends 
RunnableCommand {
 
   override def run(sqlContext: SQLContext): Seq[Row] = {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
+val recursive = 
sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
--- End diff --

And by the way, I have tried:
val recursive = hiveContext.getConf("spark.input.dir.recursive", "false")
but this can only work in spark sql by execute set 
spark.input.dir.recursive=true before add file, and we can't set the value by 
--conf spark.input.dir.recursive=true. This makes it difficult for us to move 
some hive sql directly to SparkSQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

2016-07-07 Thread zenglinxi0615
Github user zenglinxi0615 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14085#discussion_r69864435
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -113,8 +113,9 @@ case class AddFile(path: String) extends 
RunnableCommand {
 
   override def run(sqlContext: SQLContext): Seq[Row] = {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
+val recursive = 
sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
--- End diff --

I'm pretty sure that it's supported by the SQL dialect in Spark SQL. 
And about "the name of this property is too generic, and I don't think it 
is something that is set globally", do you think we should use another name? 
and the default value should be true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

2016-07-07 Thread zenglinxi0615
GitHub user zenglinxi0615 opened a pull request:

https://github.com/apache/spark/pull/14085

[SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory …

## What changes were proposed in this pull request?
This PR is for adding an parameter (spark.input.dir.recursive) to control 
the value of recursive in SparkContext#addFile, so we can support "add file 
hdfs://dir/path" cmd in SparkSQL 

## How was this patch tested?
manual tests:
set the conf: --conf spark.input.dir.recursive=true, and run spark-sql -e 
"add file hdfs://dir/path"

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zenglinxi0615/spark SPARK-16408

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14085.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14085


commit d2e05c155e4e52dfda177a21615de7743a2c5917
Author: 曾林西 
Date:   2016-07-07T06:20:19Z

[SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory and 
recursive is not turned on




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org