Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Nan Zhu
just curious what happened on google’s spark operator? On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko wrote: > +1 > > On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue wrote: > >> +1 >> >> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala wrote: >> >>> +1 for creating an official Kubernetes operator for

[jira] [Created] (SPARK-44517) first operator should respect the nullability of child expression as well as ignoreNulls option

2023-07-23 Thread Nan Zhu (Jira)
Nan Zhu created SPARK-44517: --- Summary: first operator should respect the nullability of child expression as well as ignoreNulls option Key: SPARK-44517 URL: https://issues.apache.org/jira/browse/SPARK-44517

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-19 Thread Nan Zhu
Hello! Glad to help here and also present our use case on iceberg Thanks! Nan On Wed, Jul 19, 2023 at 3:00 PM Jay Dave wrote: > Hello JB: > > I am interested and help in whatever I can. > > Thanks > JD > -- > *From:* Brian Olsen > *Sent:* Wednesday, July 19, 2023

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Nan Zhu
for EMR, I think they show 3.1.2-amazon in Spark UI, no? On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub wrote: > Hi, > > I am not taking sides here, but just for fairness, I think it should be > noted that AWS EMR does exactly the same thing. > We choose the EMR version (e.g., 6.4.0) and it

Re: C++/Rust SDK sync

2023-04-11 Thread Nan Zhu
Thanks! yeah, I'd like to join the meeting, April 19 works for me best! On Tue, Apr 11, 2023 at 1:41 PM Samrose Ahmed wrote: > I'd love to join this discussion, all those times work for me. > > On Tue, Apr 11, 2023 at 9:35 AM Ryan Blue wrote: > >> The 19th works best for me. >> >> On Mon, Apr

[jira] [Commented] (SEDONA-211) Enforce release managers to use JDK 8

2022-12-22 Thread Nan Zhu (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651443#comment-17651443 ] Nan Zhu commented on SEDONA-211: [~jiayu] Nan from SafeGraph here, we were hit by this in Sedona 1.2

Re: [VOTE] Release Apache Iceberg 0.14.0 RC1

2022-07-13 Thread Nan Zhu
t; > On Tue, Jul 12, 2022 at 7:19 PM Nan Zhu wrote: > >> can we consider having https://github.com/apache/iceberg/pull/5083 in >> before releasing a new iceberg version with Spark 3.3 support? >> >> the issue being addressed there is a breaking change from iceberg +

Re: [VOTE] Release Apache Iceberg 0.14.0 RC1

2022-07-12 Thread Nan Zhu
can we consider having https://github.com/apache/iceberg/pull/5083 in before releasing a new iceberg version with Spark 3.3 support? the issue being addressed there is a breaking change from iceberg + spark 3.1 to iceberg + spark 3.2 and above...and it essentially blocks us from upgrading to

Re: 【Feature】Request support for c++ sdk

2022-06-22 Thread Nan Zhu
+1 for using rust as the backbone for new language bindings On Sun, Jun 12, 2022 at 23:52 OpenInx wrote: > Thanks Kyle for sharing your context. > > Recently, I also spent some time practicing my Rust skills. Generally, > I'm +1 for adding Rust SDK support for native language. > > > On Mon,

Re: [VOTE] Release Apache Iceberg 0.13.0 RC1

2022-01-26 Thread Nan Zhu
tested with internal cases with some focus on a few changes we made in this version, looks good from my perspective +1 On Tue, Jan 25, 2022 at 4:08 PM Kyle Bendickson wrote: > Thank you, Jack! > > Quick announcement when testing: *the runtime jars / artifacts for Spark > & Flink have changed

[jira] [Created] (SPARK-33940) allow configuring the max column name length in csv writer

2020-12-29 Thread Nan Zhu (Jira)
Nan Zhu created SPARK-33940: --- Summary: allow configuring the max column name length in csv writer Key: SPARK-33940 URL: https://issues.apache.org/jira/browse/SPARK-33940 Project: Spark Issue Type

[jira] [Commented] (SPARK-32351) Partially pushed partition filters are not explained

2020-10-19 Thread Nan Zhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217238#comment-17217238 ] Nan Zhu commented on SPARK-32351: - [~hyukjin.kwon] nit: could you reassign this to me? [~codingcat

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Nan Zhu
I am not sure if it is a good practice to have breaking changes in dependencies for maintenance releases On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer wrote: > Hello, > > Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that > Parquet 1.10.1 to 1.11 will be a

[jira] [Resolved] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-26862. - Resolution: Invalid > assertion failed in ParquetRowConver

[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766355#comment-16766355 ] Nan Zhu commented on SPARK-26862: - [~srowen] I don't think so, as the same parquet files can be accessed

[jira] [Updated] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-26862: Description: When I run the following  query over a internal table (A and B are typed in string, C

Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Nan Zhu
just filed a JIRA in https://issues.apache.org/jira/browse/SPARK-26862 ' this issue only happens in 2.4.0 but not in 2.3.2 anyone would help to look into that? On Tue, Feb 12, 2019 at 10:41 AM DB Tsai wrote: > Great. I'll prepare the release for voting. Thanks! > > DB Tsai | Siri Open

[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766327#comment-16766327 ] Nan Zhu commented on SPARK-26862: - [~felixcheung] > assertion failed in ParquetRowConver

[jira] [Created] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-26862: --- Summary: assertion failed in ParquetRowConverter Key: SPARK-26862 URL: https://issues.apache.org/jira/browse/SPARK-26862 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-24797) Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table

2018-07-13 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-24797. - Resolution: Won't Fix > Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when bu

[jira] [Created] (SPARK-24797) Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table

2018-07-12 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-24797: --- Summary: Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table Key: SPARK-24797 URL: https://issues.apache.org/jira/browse/SPARK-24797

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Nan Zhu
.how I skipped the last part On Tue, May 8, 2018 at 11:16 AM, Reynold Xin <r...@databricks.com> wrote: > Yes, Nan, totally agree. To be on the same page, that's exactly what I > wrote wasn't it? > > On Tue, May 8, 2018 at 11:14 AM Nan Zhu <zhunanmcg...@gmail.com

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Nan Zhu
besides that, one of the things which is needed by multiple frameworks is to schedule tasks in a single wave i.e. if some frameworks like xgboost/mxnet requires 50 parallel workers, Spark is desired to provide a capability to ensure that either we run 50 tasks at once, or we should quit the

Re: Extend MXNET distributed training with MPI AllReduce

2018-03-26 Thread Nan Zhu
Hi, Patric It's pretty nice work! A question: how the future code structure would look like when putting this allreduce module as an submodule? We will have two communication submodules? Is there any plan to give an unified abstraction for communication so that a single communication submodule

Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-13 Thread Nan Zhu
> would be to either (1) provide a fascade from the new package to the > old > package or (2) keep two copies of the scala code temporarily along > with two > copies of the JNI entry points. In both of these cases we could setup > @deprecated on all public call

Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-13 Thread Nan Zhu
re Chris: I do not have any good idea about this. On Tue, Mar 13, 2018 at 8:13 AM, Chris Olivier <cjolivie...@gmail.com> wrote: > is it possible to somehow alias a namespace in scala > in order to maintain backwards compatibility? > > On Tue, Mar 13, 2018 at 7:21 AM

Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-13 Thread Nan Zhu
> > modification technical vote. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Mon, Mar 12, 2018 at 9:56 AM, Marco de Abreu < > > > > > >> > > > marc

Re: [VOTE] Disconnect all non-C API's from mxnet versioning

2018-03-13 Thread Nan Zhu
; >> in > > >> > > time, > > >> > > >> but our users would be losing trust due to unexpected failures > > >> during > > >> > > >> upgrades. > > >> > > >> > > >> > > >> -M

Re: [VOTE] Disconnect all non-C API's from mxnet versioning

2018-03-12 Thread Nan Zhu
how about release cycle? On Mon, Mar 12, 2018 at 9:37 AM, Yuan Tang wrote: > +1 > > On Mon, Mar 12, 2018 at 12:35 PM, Marco de Abreu < > marco.g.ab...@googlemail.com> wrote: > > > +1 > > > > Tianqi Chen schrieb am Mo., 12. März 2018, > >

Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-12 Thread Nan Zhu
I think we'd specify it will change in the next version (1.2)? On Mon, Mar 12, 2018 at 9:26 AM, Chris Olivier wrote: > This vote is for the code-change of altering the Scala API namespace from > dmlc to org.apache. > > > Vote will conclude on Thursday, 5pm PDT. > > Thank

Re: Publishing Scala Package/namespace change

2018-03-09 Thread Nan Zhu
I think last time we postpone it because the release is a minor version but actually such a change is actually affordable for a jump from 1.1 - 1.2 -1 on separate versions (not following apache rules) On Fri, Mar 9, 2018 at 2:38 PM, Chris Olivier wrote: > IMHO, I don't

Re: [RESULT][VOTE] tracking code changes with JIRA by associating pull requests

2018-03-08 Thread Nan Zhu
l.com> wrote: > > >> > > >> The PR template is designed for that and its poor adoption is causing > > the > > >> same issue of missing information in PRs. My concern of using JIRA is > > that > > >> more overhead would deter contribution

[jira] [Updated] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated MXNET-62: - Component/s: Scala API > improve the quality of Spark integrat

[jira] [Updated] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated MXNET-62: - Labels: spark (was: ) > improve the quality of Spark integrat

[jira] [Assigned] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu reassigned MXNET-62: Assignee: Nan Zhu > improve the quality of Spark integrat

[jira] [Created] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
Nan Zhu created MXNET-62: Summary: improve the quality of Spark integration Key: MXNET-62 URL: https://issues.apache.org/jira/browse/MXNET-62 Project: Apache MXNet Issue Type: Improvement

Re: [RESULT][VOTE] tracking code changes with JIRA by associating pull requests

2018-03-08 Thread Nan Zhu
+1 on both suggestions a bit concern is on the quality of JIRA which is created automatically I can see a lot of PRs are not described comprehensively, if we just post what in description to JIRA, it's error-propagating but the quality of JIRA is a big topic worth more discussions On Thu,

Re: [RESULT][VOTE] tracking code changes with JIRA by associating pull requests

2018-03-06 Thread Nan Zhu
I think the right approach here is to start another vote on terminate the starting process of using JIRA, since we have passed this vote On Tue, Mar 6, 2018 at 9:13 PM, Eric Xie wrote: > -1 > > JIRA is ancient and arcane. This adds unnecessary overhead. > > On 2018/03/03

broken UI in 2.3?

2018-03-05 Thread Nan Zhu
Hi, all I am experiencing some issues in UI when using 2.3 when I clicked executor/storage tab, I got the following exception java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at

Re: [VOTE] tracking code changes with JIRA by associating pull requests

2018-02-27 Thread Nan Zhu
gt; Agreed. I feel its unnecessary too - let's leave the [MODULE] out. > > > > > -Marco > > > > On Tue, Feb 27, 2018 at 11:34 PM, Yuan Tang <terrytangy...@gmail.com> > > wrote: > > > > > +1 > > > > > > On Tue, Feb 27,

Re: [VOTE] tracking code changes with JIRA by associating pull requests

2018-02-27 Thread Nan Zhu
Marthi <smar...@apache.org> wrote: > On Tue, Feb 27, 2018 at 10:50 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > > Thanks, Suneel! > > > > the vote still remains sense on its major points > > > > " > > 1. most of PRs should be titled

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-26 Thread Nan Zhu
+1 (non-binding), tested with internal workloads and benchmarks On Mon, Feb 26, 2018 at 12:09 PM, Michael Armbrust wrote: > +1 all our pipelines have been running the RC for several days now. > > On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun

Re: [VOTE] Release MXNet version 1.1.0.RC1

2018-02-10 Thread Nan Zhu
tested with scala-package's building and unit test +1 On Sat, Feb 10, 2018 at 10:49 AM, YiZhi Liu wrote: > Let's make the voting end at 11:10 p.m., Tuesday, February. 13th. > > Thanks for reminding. > > Best, > Yizhi > > 2018-02-10 9:54 GMT-08:00 Marco de Abreu

Re: [VOTE] When in Doubt, Wait 24 Hours Before Merging

2018-02-01 Thread Nan Zhu
+1, but do not understand why we merged PRs which was not completely approved? On Thu, Feb 1, 2018 at 4:20 PM, Sheng Zha wrote: > Hi, > > In order to avoid having miscommunication and unaligned expectation, I'd > like to propose a lazy vote on a new rule for merging pull

Re: Release plan - MXNET 1.0.1

2018-01-25 Thread Nan Zhu
ch and build from there. Happy to > help with this if needed. > > On Thu, Jan 25, 2018 at 6:19 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > > +1 and suggest consolidating all maintenance releases under the same > > major.minor version into a single branch > >

Re: Release plan - MXNET 1.0.1

2018-01-24 Thread Nan Zhu
+1 and suggest consolidating all maintenance releases under the same major.minor version into a single branch On Wed, Jan 24, 2018 at 9:06 PM, Meghna Baijal wrote: > I agree. If the release candidate is being cut from the master branch, it > should be considered a

Re: Request For comments: MXNet Scala Inference API

2018-01-22 Thread Nan Zhu
google.com/document/d/13EVnCtQ5d0wCnHWTsZT0jNzHdQfv2 > PR9AnX1YNlOhm0/edit?usp=sharing. > You can leave your comments there and I will move the final version of the > document again to the wiki. > > Thanks, Naveen > > > On Mon, Jan 22, 2018 at 6:48 PM, Nan Zhu <zhunanmcg...@gma

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
nvm On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > Hi, all > > Out of curious, I just found a bunch of Palantir release under > org.apache.spark in maven central (https://mvnrepository.com/ > artifact/org.apache.spark/spark-core_2.11)? > >

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
nvm On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > Hi, all > > Out of curious, I just found a bunch of Palantir release under > org.apache.spark in maven central (https://mvnrepository.com/ > artifact/org.apache.spark/spark-core_2.11)? > >

Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
Hi, all Out of curious, I just found a bunch of Palantir release under org.apache.spark in maven central ( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)? Is it on purpose? Best, Nan

Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
Hi, all Out of curious, I just found a bunch of Palantir release under org.apache.spark in maven central ( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)? Is it on purpose? Best, Nan

Re: Increase indentation limit from 100 to 120 characters

2018-01-05 Thread Nan Zhu
are doing something special BTW, considering monitor-relevant concern, http://scalameta.org/scalafmt/ tells that 100 is good enough even for a 30'' wide monitor On Fri, Jan 5, 2018 at 11:10 AM, Chris Olivier <cjolivie...@gmail.com> wrote: > Why -1? > > On Fri, Jan 5, 2018 at 11

Re: Increase indentation limit from 100 to 120 characters

2018-01-05 Thread Nan Zhu
-1 for scala part On Fri, Jan 5, 2018 at 9:48 AM, Marco de Abreu wrote: > +1 > > Am 05.01.2018 5:49 nachm. schrieb "Chris Olivier" : > > +1 > > On Fri, Jan 5, 2018 at 8:00 AM, Pedro Larroy > > wrote: > > > Hi

Re: Refactoring MXNet scala code to use "org.apache.mxnet"

2018-01-04 Thread Nan Zhu
anywhere to mark it as a breaking change in the latest version? On Thu, Jan 4, 2018 at 2:16 PM, Roshani Nagmote wrote: > Hello all, > > I am working on publishing mxnet-scala release to maven repository and as a > part of that, I will also be refactoring mxnet-scala

[jira] [Commented] (SPARK-22599) Avoid extra reading for cached table

2017-12-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303022#comment-16303022 ] Nan Zhu commented on SPARK-22599: - [~rajesh.balamohan] no, it means that SPARK-22599 and master

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297453#comment-16297453 ] Nan Zhu commented on SPARK-22765: - I took a look at the code, one of the possibilities is as following

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295958#comment-16295958 ] Nan Zhu commented on SPARK-22765: - [~xuefuz] Regarding this, "The symptom is that newly allo

[jira] [Commented] (SPARK-21656) spark dynamic allocation should not idle timeout executors when there are enough tasks to run on them

2017-12-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295282#comment-16295282 ] Nan Zhu commented on SPARK-21656: - NOTE: the issue fixed by https://github.com/apache/spark/pull/18874

[jira] [Created] (SPARK-22790) add a configurable factor to describe HadoopFsRelation's size

2017-12-14 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22790: --- Summary: add a configurable factor to describe HadoopFsRelation's size Key: SPARK-22790 URL: https://issues.apache.org/jira/browse/SPARK-22790 Project: Spark Issue

[jira] [Commented] (SPARK-22790) add a configurable factor to describe HadoopFsRelation's size

2017-12-14 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291985#comment-16291985 ] Nan Zhu commented on SPARK-22790: - created per discussion in https://github.com/apache/spark/pull/19864

[jira] [Commented] (SPARK-22680) SparkSQL scan all partitions when the specified partitions are not exists in parquet formatted table

2017-12-07 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282248#comment-16282248 ] Nan Zhu commented on SPARK-22680: - how you observed that spark scans all partitions? I tried to reproduce

Re: request for reviewing PR in ps-lite

2017-12-05 Thread Nan Zhu
; > > On Sat, Dec 2, 2017 at 10:04 AM, CodingCat <coding...@apache.org> > wrote: > > > > > > > ping > > > > > > > > On Fri, Dec 1, 2017 at 12:18 AM, Nan Zhu <zhunanmcg...@gmail.com> > > wrote: > > > > > > > >> Hi, a

[jira] [Created] (SPARK-22673) InMemoryRelation should utilize on-disk table stats whenever possible

2017-12-01 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22673: --- Summary: InMemoryRelation should utilize on-disk table stats whenever possible Key: SPARK-22673 URL: https://issues.apache.org/jira/browse/SPARK-22673 Project: Spark

Request for review of SPARK-22599

2017-11-29 Thread Nan Zhu
Hi, all When we do perf test for Spark, we found that enabling table cache does not bring the expected speedup comparing to cloud-storage + parquet in many scenarios. We identified that the performance cost is brought by the fact that the current InMemoryRelation/InMemorytTableScanExec will

[jira] [Updated] (SPARK-22599) Avoid extra reading for cached table

2017-11-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-22599: Description: In the current implementation of Spark, InMemoryTableExec read all data in a cached table

[jira] [Updated] (SPARK-22599) Avoid extra reading for cached table

2017-11-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-22599: Description: In the current implementation of Spark, InMemoryTableExec read all data in a cached table

[jira] [Created] (SPARK-22599) Avoid extra reading for cached table

2017-11-23 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22599: --- Summary: Avoid extra reading for cached table Key: SPARK-22599 URL: https://issues.apache.org/jira/browse/SPARK-22599 Project: Spark Issue Type: Improvement

[jira] [Created] (LIVY-410) support rate throttling in livy

2017-10-05 Thread Nan Zhu (JIRA)
Nan Zhu created LIVY-410: Summary: support rate throttling in livy Key: LIVY-410 URL: https://issues.apache.org/jira/browse/LIVY-410 Project: Livy Issue Type: Improvement Reporter: Nan

Re: What's everyone working on?

2017-09-26 Thread Nan Zhu
Subject: Re: What's everyone working on? Hi Nan Zhu, Thanks for the update. Curious to know what part of mxnet-spark are you working on? I am also evaluating the integration of MXNet with Spark, planning to start with PySpark and also looking into spark-deep learning-pipelines <https://github

Re: MXNet: Run PR builds on Apache Jenkins only after the commit is reviewed

2017-09-11 Thread Nan Zhu
+1 and recommend Jenkins-GitHub plugin with which committers/(accounts with assigned permissions) can trigger Jenkins build with "Test this please, Jenkins!" "Retest this please, Jenkins!" And accounts in the list can trigger build automatically when submitting a PR

Re: run Jenkins in a newer version of OS

2017-08-24 Thread Nan Zhu
I see, thanks, Mu! On Wed, Aug 23, 2017 at 9:18 AM, Mu Li <muli@gmail.com> wrote: > We can (should) upgrade the docker image used in CI from ubuntu 14.04 to > ubuntu 16.04 > > On Wed, Aug 23, 2017 at 7:01 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > &g

run Jenkins in a newer version of OS

2017-08-23 Thread Nan Zhu
Hi, all I just noticed that our Jenkins is running on Ubuntu 14.04, which does not even contain JDK8 in its default repo I am trying to do https://github.com/apache/incubator-mxnet/pull/7574/files to use some JDK8 APIs JDK 9 has been the topic, we might want to at least move on to JDK 8 Is it

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Nan Zhu
based on this result, I think we should follow the bulk operation pattern Shall we move forward with the PR from Paypal? Best, Nan On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi wrote: > Bottom line up front: > 1. The cost of calling 1 individual REST calls is

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
have a well defined Java API you could look at the work I have > done by then and see how it can be plugged in or what can be learnt > from it. > > Jörn > > On Wed, Aug 16, 2017 at 9:05 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > +1 for Sandeep's suggestion > &g

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
at 12:36 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Wed, Aug 16, 2017 at 12:27 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > I am using your words *current*. What's the definition of "current" in > > livy? I think that's all application which stil

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
me it looks like that the C API is very stable and used by all/most > >> > other APIs. If we have a Java API - accessing the C API via JavaCPP - > >> > then we should end up with a pretty stable solution and a lot the code > >> > that is duplicated with th

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
com> wrote: > On Wed, Aug 16, 2017 at 11:17 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > Looks like non-REST API also contains this https://hadoop.apache. > > org/docs/r2.7.0/api/src-html/org/apache/hadoop/yarn/client/ > > api/YarnClient.html#line.225 > > > &

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
yes, it is going to be Akka if moving forward (at least not going to introduce an actor framework to livy) On Wed, Aug 16, 2017 at 11:24 AM, Meisam Fathi wrote: > That is true, but I was under the impression that this will be implemented > with Akka (maybe because it is

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
gt; >> > Even if we don't use JavaCPP, the JNI layer should be easy to get into > >> > a state where both can share it, the current Scala JNI layers LibInfo > >> > classes could be converted to Java classes and would in most cases > >> > require only minor

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
are throwing some imaginations without any values Please go with direct discussion On Wed, Aug 16, 2017 at 9:11 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Wed, Aug 16, 2017 at 9:06 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > >> I'm not really sure what you

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
are it, the current Scala JNI layers LibInfo > > classes could be converted to Java classes and would in most cases > > require only minor changes in the Scala code. > > > > Jörn > > > > [1] https://github.com/apache/mahout/tree/master/viennacl/src/main > > &g

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
e converted to Java classes and would in most cases > require only minor changes in the Scala code. > > Jörn > > [1] https://github.com/apache/mahout/tree/master/viennacl/src/main > > On Wed, Aug 16, 2017 at 5:30 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > >

Re: Java API for MXNet

2017-08-16 Thread Nan Zhu
I agree with Yizhi My major concern is the duplicate implementations, which are usually one of the major sources of bugs, especially with two languages which are naturally interactive (OK, Calling Scala from Java might need some more efforts). It is just like we provide C++ & C APIs of MxNet in

resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-14 Thread Nan Zhu
Hi, all In HDInsight, we (Microsoft) use Livy as the Spark job submission service. We keep seeing the customers fall into the problem when they submit many concurrent applications to the system, or recover livy from a state with many concurrent applications By looking at the code and the

Re: how CI system works in MxNet?

2017-08-13 Thread Nan Zhu
ra/browse/INFRA-14840 > > On Sun, Aug 13, 2017 at 9:21 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > > no, I mean the CI system in our github repo.. > > > > On Sun, Aug 13, 2017 at 8:06 AM, shiwen hu <yajiedes...@gmail.com> > wrote: > > > >

Re: how CI system works in MxNet?

2017-08-13 Thread Nan Zhu
last build of a residual file. Try make clean. first > > > > 2017-08-13 22:35 GMT+08:00 Nan Zhu <zhunanmcg...@gmail.com>: > > > >> Hi, all > >> > >> I just noticed something which raises this question > >> > >> Yesterday afternoon

how CI system works in MxNet?

2017-08-13 Thread Nan Zhu
Hi, all I just noticed something which raises this question Yesterday afternoon, I checked https://github.com/apache/incubator-mxnet/commits/master and the first commit passed all tests However, when I checked again this morning, the test result was changed to fail... (one of my PRs also

Re: red flags in IntelliJ when importing from pom.xml

2017-08-06 Thread Nan Zhu
; plugin. Currently I don't have a good solution to handle it. > > On Mon, Aug 7, 2017 at 8:05 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > > Hi, all > > > > I just start looking at the source code of the project, it looks like > when > > importing src fro

[jira] [Closed] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-21197. --- Resolution: Won't Fix > Tricky use case makes dead application struggle for a long durat

[jira] [Commented] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062162#comment-16062162 ] Nan Zhu commented on SPARK-21197: - yeah, after rethinking about the solution, I think daemon thread would

[jira] [Updated] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-21197: Summary: Tricky use case makes dead application struggle for a long duration (was: Tricky use cases makes

[jira] [Created] (SPARK-21197) Tricky use cases makes dead application struggle for a long duration

2017-06-23 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-21197: --- Summary: Tricky use cases makes dead application struggle for a long duration Key: SPARK-21197 URL: https://issues.apache.org/jira/browse/SPARK-21197 Project: Spark

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-03 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036157#comment-16036157 ] Nan Zhu commented on SPARK-20928: - if I understand correctly the tasks will be "long-term"

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030379#comment-16030379 ] Nan Zhu commented on SPARK-20928: - Hi, is there any description on what does it mean? > Continu

[jira] [Commented] (SPARK-4921) TaskSetManager mistakenly returns PROCESS_LOCAL for NO_PREF tasks

2017-05-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021417#comment-16021417 ] Nan Zhu commented on SPARK-4921: I forgot most of details...but the final conclusion was that "it's a

[jira] [Commented] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowError

2017-05-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018246#comment-16018246 ] Nan Zhu commented on SPARK-20811: - thanks, let me try it > GBT Classifier failed with mysteri

[jira] [Created] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowException

2017-05-19 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-20811: --- Summary: GBT Classifier failed with mysterious StackOverflowException Key: SPARK-20811 URL: https://issues.apache.org/jira/browse/SPARK-20811 Project: Spark Issue

[jira] [Updated] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowError

2017-05-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-20811: Summary: GBT Classifier failed with mysterious StackOverflowError (was: GBT Classifier failed

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
I see.Thanks! On Tue, May 2, 2017 at 9:12 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, May 2, 2017 at 9:07 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > I have no easy way to pass jar path to those forked Spark > > applications? (except that I down

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
ay 2, 2017 at 8:43 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > Hi, all > > > > For some reason, I tried to pass in a HDFS path to the --jars option in > > spark-submit > > > > According to the document, > > http://spark.apache.org/docs/latest/submi

  1   2   3   4   5   6   7   >