[GitHub] [spark] sarutak opened a new pull request #29619: [SPARK-32772][SQL] Reduce log messages for spark-sql CLI

GitBox Tue, 01 Sep 2020 12:47:57 -0700


sarutak opened a new pull request #29619:
URL: https://github.com/apache/spark/pull/29619



   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section 
is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster 
reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class 
hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other 
DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR reduces log messages for spark-sql CLI like spark-shell and pyspark 
CLI.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   When we launch spark-sql CLI, too many log messages are shown and it's 
sometimes difficult to find the result of query.
   ```
   spark-sql> SELECT now();
   20/09/02 00:11:45 INFO CodeGenerator: Code generated in 10.121625 ms
   20/09/02 00:11:45 INFO SparkContext: Starting job: main at 
NativeMethodAccessorImpl.java:0
   20/09/02 00:11:45 INFO DAGScheduler: Got job 0 (main at 
NativeMethodAccessorImpl.java:0) with 1 output partitions
   20/09/02 00:11:45 INFO DAGScheduler: Final stage: ResultStage 0 (main at 
NativeMethodAccessorImpl.java:0)
   20/09/02 00:11:45 INFO DAGScheduler: Parents of final stage: List()
   20/09/02 00:11:45 INFO DAGScheduler: Missing parents: List()
   20/09/02 00:11:45 INFO DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0), which has no 
missing parents
   20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 6.3 KiB, free 366.3 MiB)
   20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
in memory (estimated size 3.2 KiB, free 366.3 MiB)
   20/09/02 00:11:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 192.168.1.204:42615 (size: 3.2 KiB, free: 366.3 MiB)
   20/09/02 00:11:45 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:1348
   20/09/02 00:11:45 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 0 (MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0) 
(first 15 tasks are for partitions Vector(0))
   20/09/02 00:11:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
resource profile 0
   20/09/02 00:11:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
0) (192.168.1.204, executor driver, partition 0, PROCESS_LOCAL, 7561 bytes) 
taskResourceAssignments Map()
   20/09/02 00:11:45 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
   20/09/02 00:11:45 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 
1446 bytes result sent to driver
   20/09/02 00:11:45 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 
0) in 238 ms on 192.168.1.204 (executor driver) (1/1)
   20/09/02 00:11:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool 
   20/09/02 00:11:45 INFO DAGScheduler: ResultStage 0 (main at 
NativeMethodAccessorImpl.java:0) finished in 0.343 s
   20/09/02 00:11:45 INFO DAGScheduler: Job 0 is finished. Cancelling potential 
speculative or zombie tasks for this job
   20/09/02 00:11:45 INFO TaskSchedulerImpl: Killing all running tasks in stage 
0: Stage finished
   20/09/02 00:11:45 INFO DAGScheduler: Job 0 finished: main at 
NativeMethodAccessorImpl.java:0, took 0.377489 s
   2020-09-02 00:11:45.07
   Time taken: 0.704 seconds, Fetched 1 row(s)
   20/09/02 00:11:45 INFO SparkSQLCLIDriver: Time taken: 0.704 seconds, Fetched 
1 row(s)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as 
the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes. Log messages are reduced for spark-sql CLI like as follows.
   ```
   20/09/02 00:34:51 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
does not exist
   20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
does not exist
   20/09/02 00:34:55 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 2.3.0
   20/09/02 00:34:55 WARN ObjectStore: setMetaStoreSchemaVersion called but 
recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
[email protected]
   Spark master: local[*], Application Id: local-1598974492822
   spark-sql> SELECT now();
   2020-09-02 00:35:05.258
   Time taken: 2.299 seconds, Fetched 1 row(s)
   ```
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   -->
   Launched spark-sql CLI and confirmed that log messages are reduced as I 
paste above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sarutak opened a new pull request #29619: [SPARK-32772][SQL] Reduce log messages for spark-sql CLI

Reply via email to